Uptime Monitoring

Uptime monitoring answers two questions: "is the site reachable?" and "is the site actually working?" The second part is where most setups fail.

What to Monitor

Check	What it proves	Notes
HTTPS (status)	the site is reachable	pair with a content check to reduce false positives
HTTPS (keyword)	the right page rendered	catches "200 OK" with an error page or blank output
SSL expiry	cert is valid	alert well before expiry
Health endpoint (optional)	app stack is alive	useful when you need more than a homepage check

caution

Avoid monitoring only the homepage on a heavily cached setup. A cached 200 can hide broken PHP, DB, or wp-admin paths.

Monitor Design Rules

Use at least 2-3 regions (or a provider that confirms outages) to reduce noise.
Prefer an HTTPS check plus a keyword check.
Choose at least one dynamic endpoint (REST, login, or a dedicated health check) if the site has uncached critical flows.

Examples of endpoints to monitor:

https://example.com/ (cached reachability)
https://example.com/wp-login.php (auth surface)
https://example.com/wp-json/ (REST surface)

Alerting That People Actually Respond To

Critical: site down (confirmed) or sustained 5xx spike -> page/call.
Warning: intermittent 5xx or rising latency -> Slack/email.
Info: SSL expiring, minor spikes -> ticket/email.

First 5 Minutes: Triage

Origin triage commands
uptime
free -m
df -h /
sudo systemctl status lsws nginx php8.2-fpm mariadb 2>/dev/null || true
sudo tail -50 /usr/local/lsws/logs/error.log 2>/dev/null || true
sudo tail -50 /var/log/nginx/error.log 2>/dev/null || true
sudo tail -50 /var/log/mysql/error.log 2>/dev/null || true

Checklist

HTTPS status monitor exists.
Keyword/content check exists.
SSL expiry alerts exist.
Monitoring is multi-region or outage-confirmed.
Alerts route by severity (no paging for a single transient blip).
A minimal triage playbook exists.

What to Monitor​

Monitor Design Rules​

Alerting That People Actually Respond To​

First 5 Minutes: Triage​

Checklist​

What's Next​

What to Monitor

Monitor Design Rules

Alerting That People Actually Respond To

First 5 Minutes: Triage

Checklist

What's Next