← Back to blog
pingcheck

What Is Uptime Monitoring

Learn what uptime monitoring is, why you need it before users notice downtime, and how to set up health checks and alerting for your web services.

uptime-monitoringhealth-checksalertingdevopsreliability

What Is Uptime Monitoring



Your SaaS app went down at 3 PM on a Tuesday. You found out at 4:30 PM when a customer tweeted about it. In that 90-minute window, 340 API requests failed, 12 users tried to sign up and bounced, and your Stripe webhook queue backed up. All of this was preventable with a single HTTP check running every 60 seconds. Uptime monitoring is the practice of continuously testing whether your web services are accessible, responsive, and returning correct results. When a check fails, you get alerted immediately — not when a user complains on social media. This article explains exactly how uptime monitoring works, what to monitor, how to build effective health checks, and how to set up alerting that actually wakes you up when it matters.

How Uptime Monitoring Works



At its core, uptime monitoring is simple: an external service sends an HTTP request to your endpoint at regular intervals and checks the response. If the response meets your criteria (correct status code, response time under a threshold, body contains expected content), the check passes. If it fails, the monitor records the failure and notifies you.

The "external" part is critical. Monitoring from inside your own infrastructure tells you whether your service can reach itself, which is rarely useful. External monitoring tells you whether your users can reach your service, which is what actually matters.

A typical monitoring flow looks like this:

  • Monitor sends GET https://api.yourapp.com/health from multiple geographic regions
  • Your server responds with 200 OK and a JSON body
  • Monitor verifies: status is 200, response time is under 5 seconds, body contains "status": "ok"
  • If all checks pass: record as "up"
  • If any check fails: retry from a different region to avoid false positives
  • If retry also fails: mark as "down" and send alert


  • The multi-region retry is important. A single failed check could be a network blip between the monitoring service and your server. Confirming from a second location eliminates most false positives.

    What to Monitor



    Most developers start by monitoring their homepage and stop there. That covers one failure mode and misses dozens of others. Here is a comprehensive monitoring strategy:

    Primary Endpoints



  • Homepage / marketing site: basic availability
  • API health endpoint: confirms the application process is running
  • Authentication endpoint: confirms login works
  • Core business endpoint: the most important API route your customers use


  • Infrastructure Dependencies



  • Database connectivity: your health check should query the database
  • Cache connectivity: verify Redis/Memcached is responsive
  • External API dependencies: check that third-party services you depend on are reachable


  • SSL and Security



  • Certificate expiry: alert 14-30 days before your SSL certificate expires
  • Certificate chain validity: verify the full chain, not just the leaf certificate
  • HTTPS redirect: confirm HTTP requests redirect to HTTPS


  • Performance



  • Response time thresholds: alert when response time exceeds a baseline (e.g., 2x your normal P95)
  • Time to first byte (TTFB): measures server processing time independent of payload size


  • Building Effective Health Check Endpoints



    A good health check endpoint does more than return 200 OK. It verifies that your application can actually serve requests by testing its critical dependencies.

    Basic Health Check



    // routes/health.js — basic health check
    app.get("/health", (req, res) => {
      res.status(200).json({
        status: "ok",
        timestamp: new Date().toISOString(),
        version: process.env.APP_VERSION || "unknown",
        uptime: process.uptime(),
      });
    });


    This confirms the Node.js process is running and can handle HTTP requests. But it does not tell you whether the application can actually do its job.

    Deep Health Check



    A deep health check tests every critical dependency:

    // routes/health.js — deep health check with dependency verification
    app.get("/health", async (req, res) => {
      const checks = {};
      const start = Date.now();

    // Check database try { const dbStart = Date.now(); await db.query("SELECT 1"); checks.database = { status: "ok", latencyMs: Date.now() - dbStart, }; } catch (err) { checks.database = { status: "error", message: err.message, }; }

    // Check Redis try { const redisStart = Date.now(); await redis.ping(); checks.redis = { status: "ok", latencyMs: Date.now() - redisStart, }; } catch (err) { checks.redis = { status: "error", message: err.message, }; }

    // Check external payment provider try { const stripeStart = Date.now(); const stripeRes = await fetch("https://api.stripe.com/v1/", { headers: { Authorization: Bearer ${process.env.STRIPE_SECRET_KEY} }, }); checks.stripe = { status: stripeRes.ok ? "ok" : "degraded", latencyMs: Date.now() - stripeStart, }; } catch (err) { checks.stripe = { status: "error", message: err.message, }; }

    // Check disk space (for services that write to disk) try { const { availableMemory, totalMemory } = process.getMemoryInfo?.() || {}; checks.memory = { status: "ok", heapUsedMB: Math.round(process.memoryUsage().heapUsed / 1024 / 1024), rssMB: Math.round(process.memoryUsage().rss / 1024 / 1024), }; } catch { checks.memory = { status: "unknown" }; }

    // Determine overall status const hasError = Object.values(checks).some((c) => c.status === "error"); const hasDegraded = Object.values(checks).some((c) => c.status === "degraded"); const overallStatus = hasError ? "error" : hasDegraded ? "degraded" : "ok"; const httpStatus = hasError ? 503 : 200;

    res.status(httpStatus).json({ status: overallStatus, timestamp: new Date().toISOString(), version: process.env.APP_VERSION || "unknown", responseTimeMs: Date.now() - start, checks, }); });


    This endpoint returns a 503 if any critical dependency is down, which correctly signals to the uptime monitor that the service is unhealthy even though the process itself is running.

    Health Check Security



    Do not expose sensitive information in your health check response. Specifically:

  • Never include database connection strings
  • Never include API keys or secrets
  • Limit detailed health info to authenticated requests or internal networks
  • Consider having a public /health (basic) and a private /health/detailed (deep)


  • app.get("/health/detailed", (req, res) => {
      const authHeader = req.headers.authorization;
      if (authHeader !== Bearer ${process.env.HEALTH_CHECK_TOKEN}) {
        return res.status(401).json({ error: "Unauthorized" });
      }
      // ... run deep health check
    });


    Setting Up Uptime Monitoring with curl



    Before using a monitoring service, you can test your health check setup manually:

    # Basic availability check
    curl -o /dev/null -s -w "HTTP %{http_code} | Time: %{time_total}s | TTFB: %{time_starttransfer}s\n" \
      https://api.yourapp.com/health

    Verbose SSL check

    curl -vI https://api.yourapp.com 2>&1 | grep -E "expire|subject|issuer"

    Check with timeout (simulates monitoring threshold)

    curl --max-time 5 -s -w "\nTotal: %{time_total}s\n" \ https://api.yourapp.com/health

    Check response body contains expected content

    RESPONSE=$(curl -s https://api.yourapp.com/health) echo "$RESPONSE" | jq . if echo "$RESPONSE" | jq -e '.status == "ok"' > /dev/null 2>&1; then echo "Health check PASSED" else echo "Health check FAILED" fi


    These commands are useful for debugging, but they are not a replacement for continuous monitoring. You need automated checks running 24/7 from external locations.

    Alerting That Works



    The monitoring check is only half the equation. The alert that fires when a check fails is equally important — and much easier to get wrong.

    Alert Fatigue



    The number one failure mode in uptime monitoring is alert fatigue. If your monitor alerts on every 500ms timeout spike, you will start ignoring alerts within a week. When a real outage happens, you will miss it because you have trained yourself to dismiss notifications.

    Effective alerting requires:

  • Confirmation checks: never alert on a single failure. Require 2-3 consecutive failures from different regions.
  • Appropriate thresholds: set response time alerts at 2-3x your normal P95, not at an arbitrary round number.
  • Severity levels: distinguish between "site is completely down" (page on-call) and "response time is elevated" (Slack notification).
  • Routing: critical alerts go to phone/PagerDuty, warnings go to Slack, informational goes to email.


  • Webhook Alert Example



    Most monitoring services can send alerts to a webhook URL. Here is how to receive and process alerts from PingCheck:

    // routes/webhooks/pingcheck.js
    app.post("/webhooks/pingcheck", async (req, res) => {
      const { event, monitor, check } = req.body;

    // Verify webhook signature const signature = req.headers["x-pingcheck-signature"]; const expected = crypto .createHmac("sha256", process.env.PINGCHECK_WEBHOOK_SECRET) .update(JSON.stringify(req.body)) .digest("hex");

    if (signature !== expected) { return res.status(401).json({ error: "Invalid signature" }); }

    switch (event) { case "monitor.down": // Critical: site is down await sendSlackAlert({ channel: "#incidents", text: 🔴 ${monitor.name} is DOWN, blocks: [ { type: "section", text: { type: "mrkdwn", text: [ *${monitor.name}* is unreachable, URL: ${monitor.url}, Status: ${check.statusCode || "timeout"}, Region: ${check.region}, Downtime started: ${check.timestamp}, ].join("\n"), }, }, ], });

    // Also page on-call via PagerDuty/Opsgenie await pageOnCall({ summary: ${monitor.name} is DOWN, severity: "critical", source: "pingcheck", }); break;

    case "monitor.up": // Recovery await sendSlackAlert({ channel: "#incidents", text: 🟢 ${monitor.name} is back UP (was down for ${check.downtimeDuration}), }); break;

    case "ssl.expiring": // SSL certificate expiring soon await sendSlackAlert({ channel: "#engineering", text: ⚠️ SSL certificate for ${monitor.url} expires in ${check.daysUntilExpiry} days, }); break; }

    res.status(200).json({ received: true }); });


    This webhook handler differentiates between severity levels: downtime gets a Slack alert AND a page, recovery gets only a Slack notification, and SSL expiry gets a warning in the engineering channel.

    Uptime Monitoring Metrics That Matter



    Uptime Percentage



    The classic metric. A 99.9% uptime target means 8.77 hours of allowed downtime per year, or 43.8 minutes per month. For context:

    | Uptime | Monthly Downtime | Annual Downtime | |---|---|---| | 99.0% | 7.31 hours | 3.65 days | | 99.5% | 3.65 hours | 1.83 days | | 99.9% | 43.8 minutes | 8.77 hours | | 99.95% | 21.9 minutes | 4.38 hours | | 99.99% | 4.38 minutes | 52.6 minutes |

    Most solo-developer SaaS products should aim for 99.9%. Achieving 99.99% requires redundant infrastructure, automated failover, and on-call rotations — overkill for a one-person team.

    Mean Time to Detection (MTTD)



    How quickly do you learn about an outage? Without monitoring, MTTD equals however long it takes a user to complain. With 60-second check intervals and 2-check confirmation, MTTD drops to approximately 2-3 minutes.

    Mean Time to Recovery (MTTR)



    How quickly do you fix the issue after learning about it? Monitoring cannot reduce MTTR directly, but good health checks that identify which dependency failed (database vs. Redis vs. external API) dramatically reduce diagnosis time.

    Choosing a Monitoring Tool



    The monitoring tool market is crowded. Here is what to evaluate:

  • Check frequency: how often can you run checks? 60 seconds is standard; 30 seconds or less is premium.
  • Check locations: how many geographic regions does the tool check from? More regions = fewer false positives.
  • Alert channels: Slack, email, SMS, PagerDuty, webhook support.
  • SSL monitoring: automatic certificate expiry alerts.
  • Status page integration: can downtime events automatically update a public status page?
  • Pricing: per-monitor, per-check, or flat-rate.


  • Luxkern PingCheck is included in every Luxkern plan (starting at €19/month for the Solo plan). It covers HTTP checks, SSL monitoring, multi-region verification, and webhook alerts — the essentials without the feature bloat.

    For a direct comparison with a popular alternative, read PingCheck vs BetterStack. For a hands-on setup guide covering HTTP checks, SSL monitoring, and alerting, see How to Monitor API Endpoints.

    Try Luxkern PingCheck free — no credit card required.