January 2, 2026pingcheck

What Is Uptime Monitoring

Learn what uptime monitoring is, why you need it before users notice downtime, and how to set up health checks and alerting for your web services.

uptime-monitoringhealth-checksalertingdevopsreliability

What Is Uptime Monitoring

Your SaaS app went down at 3 PM on a Tuesday. You found out at 4:30 PM when a customer tweeted about it. In that 90-minute window, 340 API requests failed, 12 users tried to sign up and bounced, and your Stripe webhook queue backed up. All of this was preventable with a single HTTP check running every 60 seconds. Uptime monitoring is the practice of continuously testing whether your web services are accessible, responsive, and returning correct results. When a check fails, you get alerted immediately — not when a user complains on social media. This article explains exactly how uptime monitoring works, what to monitor, how to build effective health checks, and how to set up alerting that actually wakes you up when it matters.

How Uptime Monitoring Works

At its core, uptime monitoring is simple: an external service sends an HTTP request to your endpoint at regular intervals and checks the response. If the response meets your criteria (correct status code, response time under a threshold, body contains expected content), the check passes. If it fails, the monitor records the failure and notifies you.

The "external" part is critical. Monitoring from inside your own infrastructure tells you whether your service can reach itself, which is rarely useful. External monitoring tells you whether your users can reach your service, which is what actually matters.

A typical monitoring flow looks like this:

Monitor sends GET https://api.yourapp.com/health from multiple geographic regions

Your server responds with 200 OK and a JSON body

Monitor verifies: status is 200, response time is under 5 seconds, body contains "status": "ok"

If all checks pass: record as "up"

If any check fails: retry from a different region to avoid false positives

If retry also fails: mark as "down" and send alert

The multi-region retry is important. A single failed check could be a network blip between the monitoring service and your server. Confirming from a second location eliminates most false positives.

What to Monitor

Most developers start by monitoring their homepage and stop there. That covers one failure mode and misses dozens of others. Here is a comprehensive monitoring strategy:

Primary Endpoints

Homepage / marketing site: basic availability

API health endpoint: confirms the application process is running

Authentication endpoint: confirms login works

Core business endpoint: the most important API route your customers use

Infrastructure Dependencies

Database connectivity: your health check should query the database

Cache connectivity: verify Redis/Memcached is responsive

External API dependencies: check that third-party services you depend on are reachable

SSL and Security

Certificate expiry: alert 14-30 days before your SSL certificate expires

Certificate chain validity: verify the full chain, not just the leaf certificate

HTTPS redirect: confirm HTTP requests redirect to HTTPS

Performance

Response time thresholds: alert when response time exceeds a baseline (e.g., 2x your normal P95)

Time to first byte (TTFB): measures server processing time independent of payload size

Building Effective Health Check Endpoints

A good health check endpoint does more than return 200 OK. It verifies that your application can actually serve requests by testing its critical dependencies.

Basic Health Check

// routes/health.js — basic health check
app.get("/health", (req, res) => {
  res.status(200).json({
    status: "ok",
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || "unknown",
    uptime: process.uptime(),
  });
});

This confirms the Node.js process is running and can handle HTTP requests. But it does not tell you whether the application can actually do its job.

Deep Health Check

A deep health check tests every critical dependency:

// routes/health.js — deep health check with dependency verification
app.get("/health", async (req, res) => {
  const checks = {};
  const start = Date.now();

  // Check database
  try {
    const dbStart = Date.now();
    await db.query("SELECT 1");
    checks.database = {
      status: "ok",
      latencyMs: Date.now() - dbStart,
    };
  } catch (err) {
    checks.database = {
      status: "error",
      message: err.message,
    };
  }

  // Check Redis
  try {
    const redisStart = Date.now();
    await redis.ping();
    checks.redis = {
      status: "ok",
      latencyMs: Date.now() - redisStart,
    };
  } catch (err) {
    checks.redis = {
      status: "error",
      message: err.message,
    };
  }

  // Check external payment provider
  try {
    const stripeStart = Date.now();
    const stripeRes = await fetch("https://api.stripe.com/v1/", {
      headers: { Authorization: Bearer ${process.env.STRIPE_SECRET_KEY} },
    });
    checks.stripe = {
      status: stripeRes.ok ? "ok" : "degraded",
      latencyMs: Date.now() - stripeStart,
    };
  } catch (err) {
    checks.stripe = {
      status: "error",
      message: err.message,
    };
  }

  // Check disk space (for services that write to disk)
  try {
    const { availableMemory, totalMemory } = process.getMemoryInfo?.() || {};
    checks.memory = {
      status: "ok",
      heapUsedMB: Math.round(process.memoryUsage().heapUsed / 1024 / 1024),
      rssMB: Math.round(process.memoryUsage().rss / 1024 / 1024),
    };
  } catch {
    checks.memory = { status: "unknown" };
  }

  // Determine overall status
  const hasError = Object.values(checks).some((c) => c.status === "error");
  const hasDegraded = Object.values(checks).some((c) => c.status === "degraded");
  const overallStatus = hasError ? "error" : hasDegraded ? "degraded" : "ok";
  const httpStatus = hasError ? 503 : 200;

  res.status(httpStatus).json({
    status: overallStatus,
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || "unknown",
    responseTimeMs: Date.now() - start,
    checks,
  });
});

This endpoint returns a 503 if any critical dependency is down, which correctly signals to the uptime monitor that the service is unhealthy even though the process itself is running.

Health Check Security

Do not expose sensitive information in your health check response. Specifically:

Never include database connection strings

Never include API keys or secrets

Limit detailed health info to authenticated requests or internal networks

Consider having a public /health (basic) and a private /health/detailed (deep)

app.get("/health/detailed", (req, res) => {
  const authHeader = req.headers.authorization;
  if (authHeader !== Bearer ${process.env.HEALTH_CHECK_TOKEN}) {
    return res.status(401).json({ error: "Unauthorized" });
  }
  // ... run deep health check
});

Setting Up Uptime Monitoring with curl

Before using a monitoring service, you can test your health check setup manually:

# Basic availability check
curl -o /dev/null -s -w "HTTP %{http_code} | Time: %{time_total}s | TTFB: %{time_starttransfer}s\n" \
  https://api.yourapp.com/health

Verbose SSL check
curl -vI https://api.yourapp.com 2>&1 | grep -E "expire|subject|issuer"

Check with timeout (simulates monitoring threshold)
curl --max-time 5 -s -w "\nTotal: %{time_total}s\n" \
  https://api.yourapp.com/health

Check response body contains expected content
RESPONSE=$(curl -s https://api.yourapp.com/health)
echo "$RESPONSE" | jq .
if echo "$RESPONSE" | jq -e '.status == "ok"' > /dev/null 2>&1; then
  echo "Health check PASSED"
else
  echo "Health check FAILED"
fi

These commands are useful for debugging, but they are not a replacement for continuous monitoring. You need automated checks running 24/7 from external locations.

Alerting That Works

The monitoring check is only half the equation. The alert that fires when a check fails is equally important — and much easier to get wrong.

Alert Fatigue

The number one failure mode in uptime monitoring is alert fatigue. If your monitor alerts on every 500ms timeout spike, you will start ignoring alerts within a week. When a real outage happens, you will miss it because you have trained yourself to dismiss notifications.

Effective alerting requires:

Confirmation checks: never alert on a single failure. Require 2-3 consecutive failures from different regions.

Appropriate thresholds: set response time alerts at 2-3x your normal P95, not at an arbitrary round number.

Severity levels: distinguish between "site is completely down" (page on-call) and "response time is elevated" (Slack notification).

Routing: critical alerts go to phone/PagerDuty, warnings go to Slack, informational goes to email.

Webhook Alert Example

Most monitoring services can send alerts to a webhook URL. Here is how to receive and process alerts from PingCheck:

// routes/webhooks/pingcheck.js
app.post("/webhooks/pingcheck", async (req, res) => {
  const { event, monitor, check } = req.body;

  // Verify webhook signature
  const signature = req.headers["x-pingcheck-signature"];
  const expected = crypto
    .createHmac("sha256", process.env.PINGCHECK_WEBHOOK_SECRET)
    .update(JSON.stringify(req.body))
    .digest("hex");

  if (signature !== expected) {
    return res.status(401).json({ error: "Invalid signature" });
  }

  switch (event) {
    case "monitor.down":
      // Critical: site is down
      await sendSlackAlert({
        channel: "#incidents",
        text: 🔴 ${monitor.name} is DOWN,
        blocks: [
          {
            type: "section",
            text: {
              type: "mrkdwn",
              text: [
                *${monitor.name}* is unreachable,
                URL: ${monitor.url},
                Status: ${check.statusCode || "timeout"},
                Region: ${check.region},
                Downtime started: ${check.timestamp},
              ].join("\n"),
            },
          },
        ],
      });

      // Also page on-call via PagerDuty/Opsgenie
      await pageOnCall({
        summary: ${monitor.name} is DOWN,
        severity: "critical",
        source: "pingcheck",
      });
      break;

    case "monitor.up":
      // Recovery
      await sendSlackAlert({
        channel: "#incidents",
        text: 🟢 ${monitor.name} is back UP (was down for ${check.downtimeDuration}),
      });
      break;

    case "ssl.expiring":
      // SSL certificate expiring soon
      await sendSlackAlert({
        channel: "#engineering",
        text: ⚠️ SSL certificate for ${monitor.url} expires in ${check.daysUntilExpiry} days,
      });
      break;
  }

  res.status(200).json({ received: true });
});

This webhook handler differentiates between severity levels: downtime gets a Slack alert AND a page, recovery gets only a Slack notification, and SSL expiry gets a warning in the engineering channel.

Uptime Monitoring Metrics That Matter

Uptime Percentage

The classic metric. A 99.9% uptime target means 8.77 hours of allowed downtime per year, or 43.8 minutes per month. For context:

| Uptime | Monthly Downtime | Annual Downtime | |---|---|---| | 99.0% | 7.31 hours | 3.65 days | | 99.5% | 3.65 hours | 1.83 days | | 99.9% | 43.8 minutes | 8.77 hours | | 99.95% | 21.9 minutes | 4.38 hours | | 99.99% | 4.38 minutes | 52.6 minutes |

Most solo-developer SaaS products should aim for 99.9%. Achieving 99.99% requires redundant infrastructure, automated failover, and on-call rotations — overkill for a one-person team.

Mean Time to Detection (MTTD)

How quickly do you learn about an outage? Without monitoring, MTTD equals however long it takes a user to complain. With 60-second check intervals and 2-check confirmation, MTTD drops to approximately 2-3 minutes.

Mean Time to Recovery (MTTR)

How quickly do you fix the issue after learning about it? Monitoring cannot reduce MTTR directly, but good health checks that identify which dependency failed (database vs. Redis vs. external API) dramatically reduce diagnosis time.

Choosing a Monitoring Tool

The monitoring tool market is crowded. Here is what to evaluate:

Check frequency: how often can you run checks? 60 seconds is standard; 30 seconds or less is premium.

Check locations: how many geographic regions does the tool check from? More regions = fewer false positives.

Alert channels: Slack, email, SMS, PagerDuty, webhook support.

SSL monitoring: automatic certificate expiry alerts.

Status page integration: can downtime events automatically update a public status page?

Pricing: per-monitor, per-check, or flat-rate.

Luxkern PingCheck is included in every Luxkern plan (starting at €19/month for the Solo plan). It covers HTTP checks, SSL monitoring, multi-region verification, and webhook alerts — the essentials without the feature bloat.

For a direct comparison with a popular alternative, read PingCheck vs BetterStack. For a hands-on setup guide covering HTTP checks, SSL monitoring, and alerting, see How to Monitor API Endpoints.

Try Luxkern PingCheck free — no credit card required.