What Is Uptime Monitoring
Learn what uptime monitoring is, why you need it before users notice downtime, and how to set up health checks and alerting for your web services.
What Is Uptime Monitoring
Your SaaS app went down at 3 PM on a Tuesday. You found out at 4:30 PM when a customer tweeted about it. In that 90-minute window, 340 API requests failed, 12 users tried to sign up and bounced, and your Stripe webhook queue backed up. All of this was preventable with a single HTTP check running every 60 seconds. Uptime monitoring is the practice of continuously testing whether your web services are accessible, responsive, and returning correct results. When a check fails, you get alerted immediately — not when a user complains on social media. This article explains exactly how uptime monitoring works, what to monitor, how to build effective health checks, and how to set up alerting that actually wakes you up when it matters.
How Uptime Monitoring Works
At its core, uptime monitoring is simple: an external service sends an HTTP request to your endpoint at regular intervals and checks the response. If the response meets your criteria (correct status code, response time under a threshold, body contains expected content), the check passes. If it fails, the monitor records the failure and notifies you.
The "external" part is critical. Monitoring from inside your own infrastructure tells you whether your service can reach itself, which is rarely useful. External monitoring tells you whether your users can reach your service, which is what actually matters.
A typical monitoring flow looks like this:
GET https://api.yourapp.com/health from multiple geographic regions200 OK and a JSON body"status": "ok"The multi-region retry is important. A single failed check could be a network blip between the monitoring service and your server. Confirming from a second location eliminates most false positives.
What to Monitor
Most developers start by monitoring their homepage and stop there. That covers one failure mode and misses dozens of others. Here is a comprehensive monitoring strategy:
Primary Endpoints
Infrastructure Dependencies
SSL and Security
Performance
Building Effective Health Check Endpoints
A good health check endpoint does more than return
200 OK. It verifies that your application can actually serve requests by testing its critical dependencies.Basic Health Check
// routes/health.js — basic health check
app.get("/health", (req, res) => {
res.status(200).json({
status: "ok",
timestamp: new Date().toISOString(),
version: process.env.APP_VERSION || "unknown",
uptime: process.uptime(),
});
});This confirms the Node.js process is running and can handle HTTP requests. But it does not tell you whether the application can actually do its job.
Deep Health Check
A deep health check tests every critical dependency:
// routes/health.js — deep health check with dependency verification
app.get("/health", async (req, res) => {
const checks = {};
const start = Date.now();
// Check database
try {
const dbStart = Date.now();
await db.query("SELECT 1");
checks.database = {
status: "ok",
latencyMs: Date.now() - dbStart,
};
} catch (err) {
checks.database = {
status: "error",
message: err.message,
};
}
// Check Redis
try {
const redisStart = Date.now();
await redis.ping();
checks.redis = {
status: "ok",
latencyMs: Date.now() - redisStart,
};
} catch (err) {
checks.redis = {
status: "error",
message: err.message,
};
}
// Check external payment provider
try {
const stripeStart = Date.now();
const stripeRes = await fetch("https://api.stripe.com/v1/", {
headers: { Authorization: Bearer ${process.env.STRIPE_SECRET_KEY} },
});
checks.stripe = {
status: stripeRes.ok ? "ok" : "degraded",
latencyMs: Date.now() - stripeStart,
};
} catch (err) {
checks.stripe = {
status: "error",
message: err.message,
};
}
// Check disk space (for services that write to disk)
try {
const { availableMemory, totalMemory } = process.getMemoryInfo?.() || {};
checks.memory = {
status: "ok",
heapUsedMB: Math.round(process.memoryUsage().heapUsed / 1024 / 1024),
rssMB: Math.round(process.memoryUsage().rss / 1024 / 1024),
};
} catch {
checks.memory = { status: "unknown" };
}
// Determine overall status
const hasError = Object.values(checks).some((c) => c.status === "error");
const hasDegraded = Object.values(checks).some((c) => c.status === "degraded");
const overallStatus = hasError ? "error" : hasDegraded ? "degraded" : "ok";
const httpStatus = hasError ? 503 : 200;
res.status(httpStatus).json({
status: overallStatus,
timestamp: new Date().toISOString(),
version: process.env.APP_VERSION || "unknown",
responseTimeMs: Date.now() - start,
checks,
});
});This endpoint returns a
503 if any critical dependency is down, which correctly signals to the uptime monitor that the service is unhealthy even though the process itself is running.Health Check Security
Do not expose sensitive information in your health check response. Specifically:
/health (basic) and a private /health/detailed (deep)app.get("/health/detailed", (req, res) => {
const authHeader = req.headers.authorization;
if (authHeader !== Bearer ${process.env.HEALTH_CHECK_TOKEN}) {
return res.status(401).json({ error: "Unauthorized" });
}
// ... run deep health check
});Setting Up Uptime Monitoring with curl
Before using a monitoring service, you can test your health check setup manually:
# Basic availability check
curl -o /dev/null -s -w "HTTP %{http_code} | Time: %{time_total}s | TTFB: %{time_starttransfer}s\n" \
https://api.yourapp.com/health
Verbose SSL check
curl -vI https://api.yourapp.com 2>&1 | grep -E "expire|subject|issuer"
Check with timeout (simulates monitoring threshold)
curl --max-time 5 -s -w "\nTotal: %{time_total}s\n" \
https://api.yourapp.com/health
Check response body contains expected content
RESPONSE=$(curl -s https://api.yourapp.com/health)
echo "$RESPONSE" | jq .
if echo "$RESPONSE" | jq -e '.status == "ok"' > /dev/null 2>&1; then
echo "Health check PASSED"
else
echo "Health check FAILED"
fiThese commands are useful for debugging, but they are not a replacement for continuous monitoring. You need automated checks running 24/7 from external locations.
Alerting That Works
The monitoring check is only half the equation. The alert that fires when a check fails is equally important — and much easier to get wrong.
Alert Fatigue
The number one failure mode in uptime monitoring is alert fatigue. If your monitor alerts on every 500ms timeout spike, you will start ignoring alerts within a week. When a real outage happens, you will miss it because you have trained yourself to dismiss notifications.
Effective alerting requires:
Webhook Alert Example
Most monitoring services can send alerts to a webhook URL. Here is how to receive and process alerts from PingCheck:
// routes/webhooks/pingcheck.js
app.post("/webhooks/pingcheck", async (req, res) => {
const { event, monitor, check } = req.body;
// Verify webhook signature
const signature = req.headers["x-pingcheck-signature"];
const expected = crypto
.createHmac("sha256", process.env.PINGCHECK_WEBHOOK_SECRET)
.update(JSON.stringify(req.body))
.digest("hex");
if (signature !== expected) {
return res.status(401).json({ error: "Invalid signature" });
}
switch (event) {
case "monitor.down":
// Critical: site is down
await sendSlackAlert({
channel: "#incidents",
text: 🔴 ${monitor.name} is DOWN,
blocks: [
{
type: "section",
text: {
type: "mrkdwn",
text: [
*${monitor.name}* is unreachable,
URL: ${monitor.url},
Status: ${check.statusCode || "timeout"},
Region: ${check.region},
Downtime started: ${check.timestamp},
].join("\n"),
},
},
],
});
// Also page on-call via PagerDuty/Opsgenie
await pageOnCall({
summary: ${monitor.name} is DOWN,
severity: "critical",
source: "pingcheck",
});
break;
case "monitor.up":
// Recovery
await sendSlackAlert({
channel: "#incidents",
text: 🟢 ${monitor.name} is back UP (was down for ${check.downtimeDuration}),
});
break;
case "ssl.expiring":
// SSL certificate expiring soon
await sendSlackAlert({
channel: "#engineering",
text: ⚠️ SSL certificate for ${monitor.url} expires in ${check.daysUntilExpiry} days,
});
break;
}
res.status(200).json({ received: true });
});This webhook handler differentiates between severity levels: downtime gets a Slack alert AND a page, recovery gets only a Slack notification, and SSL expiry gets a warning in the engineering channel.
Uptime Monitoring Metrics That Matter
Uptime Percentage
The classic metric. A 99.9% uptime target means 8.77 hours of allowed downtime per year, or 43.8 minutes per month. For context:
| Uptime | Monthly Downtime | Annual Downtime | |---|---|---| | 99.0% | 7.31 hours | 3.65 days | | 99.5% | 3.65 hours | 1.83 days | | 99.9% | 43.8 minutes | 8.77 hours | | 99.95% | 21.9 minutes | 4.38 hours | | 99.99% | 4.38 minutes | 52.6 minutes |
Most solo-developer SaaS products should aim for 99.9%. Achieving 99.99% requires redundant infrastructure, automated failover, and on-call rotations — overkill for a one-person team.
Mean Time to Detection (MTTD)
How quickly do you learn about an outage? Without monitoring, MTTD equals however long it takes a user to complain. With 60-second check intervals and 2-check confirmation, MTTD drops to approximately 2-3 minutes.
Mean Time to Recovery (MTTR)
How quickly do you fix the issue after learning about it? Monitoring cannot reduce MTTR directly, but good health checks that identify which dependency failed (database vs. Redis vs. external API) dramatically reduce diagnosis time.
Choosing a Monitoring Tool
The monitoring tool market is crowded. Here is what to evaluate:
Luxkern PingCheck is included in every Luxkern plan (starting at €19/month for the Solo plan). It covers HTTP checks, SSL monitoring, multi-region verification, and webhook alerts — the essentials without the feature bloat.
For a direct comparison with a popular alternative, read PingCheck vs BetterStack. For a hands-on setup guide covering HTTP checks, SSL monitoring, and alerting, see How to Monitor API Endpoints.
Try Luxkern PingCheck free — no credit card required.