Skip to main content

Uptime Monitoring

Uptime monitoring answers two questions: "is the site reachable?" and "is the site actually working?" The second part is where most setups fail.

What to Monitor

CheckWhat it provesNotes
HTTPS (status)the site is reachablepair with a content check to reduce false positives
HTTPS (keyword)the right page renderedcatches "200 OK" with an error page or blank output
SSL expirycert is validalert well before expiry
Health endpoint (optional)app stack is aliveuseful when you need more than a homepage check
caution

Avoid monitoring only the homepage on a heavily cached setup. A cached 200 can hide broken PHP, DB, or wp-admin paths.

Monitor Design Rules

  • Use at least 2-3 regions (or a provider that confirms outages) to reduce noise.
  • Prefer an HTTPS check plus a keyword check.
  • Choose at least one dynamic endpoint (REST, login, or a dedicated health check) if the site has uncached critical flows.

Examples of endpoints to monitor:

  • https://example.com/ (cached reachability)
  • https://example.com/wp-login.php (auth surface)
  • https://example.com/wp-json/ (REST surface)

Alerting That People Actually Respond To

  • Critical: site down (confirmed) or sustained 5xx spike -> page/call.
  • Warning: intermittent 5xx or rising latency -> Slack/email.
  • Info: SSL expiring, minor spikes -> ticket/email.

First 5 Minutes: Triage

Origin triage commands
uptime
free -m
df -h /
sudo systemctl status lsws nginx php8.2-fpm mariadb 2>/dev/null || true
sudo tail -50 /usr/local/lsws/logs/error.log 2>/dev/null || true
sudo tail -50 /var/log/nginx/error.log 2>/dev/null || true
sudo tail -50 /var/log/mysql/error.log 2>/dev/null || true

Checklist

  • HTTPS status monitor exists.
  • Keyword/content check exists.
  • SSL expiry alerts exist.
  • Monitoring is multi-region or outage-confirmed.
  • Alerts route by severity (no paging for a single transient blip).
  • A minimal triage playbook exists.

What's Next