Uptime Monitoring
Uptime monitoring answers two questions: "is the site reachable?" and "is the site actually working?" The second part is where most setups fail.
What to Monitor
| Check | What it proves | Notes |
|---|---|---|
| HTTPS (status) | the site is reachable | pair with a content check to reduce false positives |
| HTTPS (keyword) | the right page rendered | catches "200 OK" with an error page or blank output |
| SSL expiry | cert is valid | alert well before expiry |
| Health endpoint (optional) | app stack is alive | useful when you need more than a homepage check |
caution
Avoid monitoring only the homepage on a heavily cached setup. A cached 200 can hide broken PHP, DB, or wp-admin paths.
Monitor Design Rules
- Use at least 2-3 regions (or a provider that confirms outages) to reduce noise.
- Prefer an HTTPS check plus a keyword check.
- Choose at least one dynamic endpoint (REST, login, or a dedicated health check) if the site has uncached critical flows.
Examples of endpoints to monitor:
https://example.com/(cached reachability)https://example.com/wp-login.php(auth surface)https://example.com/wp-json/(REST surface)
Alerting That People Actually Respond To
- Critical: site down (confirmed) or sustained 5xx spike -> page/call.
- Warning: intermittent 5xx or rising latency -> Slack/email.
- Info: SSL expiring, minor spikes -> ticket/email.
First 5 Minutes: Triage
Origin triage commands
uptime
free -m
df -h /
sudo systemctl status lsws nginx php8.2-fpm mariadb 2>/dev/null || true
sudo tail -50 /usr/local/lsws/logs/error.log 2>/dev/null || true
sudo tail -50 /var/log/nginx/error.log 2>/dev/null || true
sudo tail -50 /var/log/mysql/error.log 2>/dev/null || true
Checklist
- HTTPS status monitor exists.
- Keyword/content check exists.
- SSL expiry alerts exist.
- Monitoring is multi-region or outage-confirmed.
- Alerts route by severity (no paging for a single transient blip).
- A minimal triage playbook exists.