DNS Failover Strategy

DNS failover reduces downtime by routing traffic away from a failing origin to a standby origin. It only works if the standby can serve correct content (and for dynamic sites: correct state).

What Failover Does (and Doesn't)

Failover is routing. It does not magically replicate your database, uploads, sessions, or cache state.

caution

If the backup does not have current content/state, failover trades downtime for incorrect behavior (stale content, broken logins, carts, or admin tasks).

Choose a Model

Model	How it behaves	Who it fits
Active-passive	one primary, one standby	most sites that want better uptime without full HA complexity
Active-active	both origins serve traffic	high-traffic sites that can keep state shared consistently

Option A: Cloudflare Load Balancing (If Available)

Cloudflare Load Balancing (LB) gives you health checks and automatic pool failover at the edge.

Recommended settings:

Setting	Recommendation	Why
Monitor type	HTTPS	tests the real request path
Monitor path	custom health endpoint	avoids cached-homepage false positives
Timeout	3-5s	fail fast during real outages
Retries	1-2	reduce flapping
Steering	failover	only use backup when needed
Session affinity	enable for logged-in flows	reduces cross-origin session weirdness

tip

Avoid using / as the health check path on cached sites. A cached homepage can return 200 even when PHP or the database is down.

Health Check Endpoint (WordPress)

This MU plugin provides a lightweight endpoint that can return 200 or 503 based on critical checks.

wp-content/mu-plugins/health-check.php
<?php
/**
 * Load balancer health check.
 * Request: https://example.com/?health_check=1
 */
add_action('init', function () {
    if (!isset($_GET['health_check'])) {
        return;
    }

    header('Content-Type: application/json');

    $checks = [];

    // Database
    global $wpdb;
    $checks['database'] = ($wpdb->get_var('SELECT 1') == 1);

    // Object cache (optional)
    if (function_exists('wp_cache_get')) {
        wp_cache_set('health_test', 'ok', '', 60);
        $checks['object_cache'] = (wp_cache_get('health_test') === 'ok');
    }

    $healthy = !in_array(false, $checks, true);
    http_response_code($healthy ? 200 : 503);
    echo json_encode(['healthy' => $healthy, 'checks' => $checks]);
    exit;
});

note

If you want a health check that stays valid even when PHP is down, implement a simple static endpoint at the web server instead (but then you are only testing "web server alive", not "WordPress alive").

Keep the Backup Ready

At minimum, the backup origin needs:

the same web/PHP configuration as primary (or automation to recreate it)
current WordPress code and configuration
current uploads/media (prefer object storage for HA)
a plan for database continuity (managed DB, replication, or clear RPO/RTO)

Common sync patterns (examples)

Rsync cron examples (files and uploads)
# WordPress files (every 5 minutes)
*/5 * * * * rsync -az --delete /var/www/html/ backup:/var/www/html/

# Uploads (every minute)
* * * * * rsync -az /var/www/html/wp-content/uploads/ backup:/var/www/html/wp-content/uploads/

caution

Rsync-based failover is usually fine for mostly-static marketing sites, but it is not enough for transactional sites (WooCommerce, memberships) unless your database/session strategy is solid.

Test Failover and Failback

Run drills on purpose (quarterly is a good baseline):

Verify both origins serve the same build/config.
Trigger an outage on primary (stop the web server or firewall off 443).
Confirm edge routing moves traffic to backup.
Test critical user flows (login, checkout, form submissions).
Restore primary and confirm failback behavior.

Check which origin responded (headers)
curl -I https://example.com/ | grep -iE 'server|cf-ray|x-litespeed-cache'

Common Pitfalls

Pitfall	Symptom	Fix
health check hits cached content	LB says "healthy" during a real outage	use a real health endpoint (dynamic or purpose-built)
backup is stale	users see old content or broken assets	automate sync/deploy; prefer shared storage for uploads
no database plan	logins/carts/admin fail after failover	decide replication/managed DB strategy before calling it HA
never testing failback	"recovered" origin causes surprises	drill both failover and failback

What Failover Does (and Doesn't)​

Choose a Model​

Option A: Cloudflare Load Balancing (If Available)​

Health Check Endpoint (WordPress)​

Keep the Backup Ready​

Test Failover and Failback​

Common Pitfalls​

What's Next​