How to Monitor API Endpoints
Step-by-step guide to monitoring API endpoints with HTTP checks, SSL monitoring, and alerting. Includes curl, webhook, and PingCheck API examples.
How to Monitor API Endpoints
Your payment API went down at 2:17 PM on a Tuesday. For 47 minutes, every checkout attempt returned a 502 Bad Gateway. You found out at 3:04 PM -- not from your monitoring stack, but from a customer who tweeted "is @yourapp down?" Your uptime dashboard showed green because it was pinging your marketing site, not your API. In those 47 minutes, 312 transactions failed, your support queue tripled, and your Stripe webhook backlog grew to 1,400 events. The root cause was a misconfigured Nginx upstream after a deploy. A single HTTP check against
/api/health with a 30-second interval would have caught it in under a minute. This guide shows you how to set up that check -- and everything beyond it -- using curl for manual diagnostics and PingCheck for automated continuous monitoring.What "Healthy" Actually Means for an API
Most monitoring setups check one thing: does the endpoint return a 200? That catches roughly 40% of real-world API failures. The other 60% are subtler: the API responds but takes 8 seconds instead of 200 milliseconds, the response body is empty, the JSON structure changed after a deploy, or the SSL certificate expired and every client using certificate pinning is getting rejected.
A complete health check covers five dimensions:
"status":"ok", valid JSON structure, or a minimum payload size).If you are only checking dimensions 1 and 2, you are flying with instruments that cover less than half the dashboard. The remaining dimensions are where the sneaky outages live.
Manual Diagnostics with curl
Before you automate anything, you need to understand your API's baseline behavior. curl gives you a complete timing breakdown that no GUI tool matches for precision.
Full Timing Breakdown
# Measure every phase of the request lifecycle
curl -w "\n--- Timing ---
DNS Lookup: %{time_namelookup}s
TCP Connect: %{time_connect}s
TLS Handshake: %{time_appconnect}s
TTFB: %{time_starttransfer}s
Total: %{time_total}s
HTTP Code: %{http_code}
Download Size: %{size_download} bytes
" -s -o /tmp/api-response.json \
https://api.yourapp.com/v1/health
Pretty-print the response
cat /tmp/api-response.json | python3 -m json.toolRun this 10 times across different hours. You will get a distribution of response times that tells you where to set your thresholds. If your P95 is 380ms on a normal day, set your alert threshold at 1,200ms (roughly 3x) -- tight enough to catch real problems, loose enough to avoid false alarms from occasional slow responses.
Here is what each timing field means for diagnosing issues:
/health might be heavier than you think, or the application server is under load.SSL Certificate Check
# Check certificate expiry and details
echo | openssl s_client -servername api.yourapp.com \
-connect api.yourapp.com:443 2>/dev/null | \
openssl x509 -noout -dates -subject -issuer
Output:
notBefore=Mar 1 00:00:00 2026 GMT
notAfter=May 30 23:59:59 2026 GMT
subject=CN = api.yourapp.com
issuer=O = Let's Encrypt, CN = R3
A 2025 analysis by Netcraft found that 3.8% of production SSL certificates expired without renewal each quarter. For an API serving paying customers, an expired certificate means every HTTPS client rejects the connection -- a complete outage that is entirely preventable with 14-day advance monitoring.
Setting Up Automated Monitoring with PingCheck
Manual checks tell you what is happening now. Automated monitoring tells you what happened at 3 AM while you were asleep. Here is how to set up continuous monitoring using the PingCheck API.
Creating Your First Monitor
# Create a health check monitor via the PingCheck API
curl -X POST "https://api.luxkern.com/pingcheck/monitors" \
-H "Authorization: Bearer ${LUXKERN_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "Production API Health",
"url": "https://api.yourapp.com/v1/health",
"method": "GET",
"interval": 30,
"timeout": 10000,
"expectedStatus": 200,
"bodyMatch": "\"status\":\"ok\"",
"regions": ["eu-west", "us-east"],
"confirmations": 2,
"ssl": {
"checkExpiry": true,
"warnDaysBefore": 14
},
"alertChannels": ["slack", "email"],
"tags": ["critical", "api", "production"]
}'This creates a monitor that checks your API every 30 seconds from two geographic regions. The
confirmations: 2 parameter means PingCheck will retry from a second region before declaring the endpoint down -- this eliminates false positives from transient network issues in a single region.The
bodyMatch parameter verifies that the response contains "status":"ok". If your API returns a 200 but the health check endpoint reports a degraded database connection, the body match catches it.Multi-Endpoint Monitoring Script
Most APIs have multiple endpoints that need separate monitors with different thresholds. Here is a Node.js script that sets up a complete monitoring suite:
// setup-monitors.js
// Creates PingCheck monitors for all critical API endpoints.
const API = "https://api.luxkern.com/pingcheck/monitors";
const KEY = process.env.LUXKERN_API_KEY;
const monitors = [
{
name: "API Health Check",
url: "https://api.yourapp.com/v1/health",
interval: 30,
timeout: 10000,
expectedStatus: 200,
bodyMatch: '"status":"ok"',
tags: ["critical"],
},
{
name: "Auth Service",
url: "https://api.yourapp.com/v1/auth/health",
interval: 60,
timeout: 8000,
expectedStatus: 200,
tags: ["critical"],
},
{
name: "Payment Webhook",
url: "https://api.yourapp.com/webhooks/stripe",
method: "GET",
interval: 300,
timeout: 5000,
expectedStatus: 405, // Method Not Allowed = endpoint exists
tags: ["payments"],
},
{
name: "Public API v2",
url: "https://api.yourapp.com/v2/status",
interval: 60,
timeout: 5000,
expectedStatus: 200,
bodyMatch: '"version"',
tags: ["public"],
},
];
async function create(config) {
const res = await fetch(API, {
method: "POST",
headers: {
Authorization: Bearer ${KEY},
"Content-Type": "application/json",
},
body: JSON.stringify({
...config,
regions: ["eu-west", "us-east"],
confirmations: 2,
ssl: { checkExpiry: true, warnDaysBefore: 14 },
alertChannels: ["slack", "email"],
}),
});
if (!res.ok) throw new Error(${config.name}: ${res.status});
const data = await res.json();
console.log(Created: ${data.name} (${data.id}));
}
async function main() {
for (const m of monitors) {
await create(m);
}
console.log(\n${monitors.length} monitors created.);
}
main().catch(console.error);Run this once during your infrastructure setup. PingCheck starts checking immediately. Four monitors covering your critical endpoints, each with SSL monitoring, multi-region verification, and dual-channel alerting.
Choosing the Right Check Intervals
Not every endpoint deserves 30-second checks. Over-monitoring wastes resources and can even trigger rate limiters on your own API. Under-monitoring means slow detection.
| Endpoint Type | Recommended Interval | Rationale | |---|---|---| | Primary API health | 30 seconds | Revenue-critical, fast detection needed | | Authentication | 60 seconds | Login failures are high-impact but slightly less time-sensitive | | Payment webhooks | 5 minutes | Webhook processors queue and retry; a 5-min gap is recoverable | | Marketing site | 60 seconds | User-facing but not transactional | | Internal admin tools | 5 minutes | Low traffic, internal users tolerate brief outages | | CDN / static assets | 10 minutes | CDN has its own redundancy; you are checking origin health |
At 30-second intervals, a single monitor generates 2,880 checks per day. PingCheck's free tier includes enough checks for most small-to-medium setups. For context, monitoring 5 endpoints at 60-second intervals produces 7,200 checks per day -- well within typical free-tier limits.
Multi-Region Monitoring: Why It Matters
A single-region monitor can report a false positive when the issue is between the monitor's location and your server, not your server itself. It can also miss region-specific outages: your API might be up in eu-west but unreachable from us-east because of a CDN edge node failure.
PingCheck supports multi-region checks with a confirmation mechanism. When a check fails from one region, PingCheck immediately retries from a second region. Only if both regions report failure does it mark the endpoint as down and fire an alert.
This reduces false-positive alert rates by roughly 85% compared to single-region monitoring, based on PingCheck's internal data across 12,000+ monitors. The tradeoff is a 15-30 second delay between the actual failure and the alert -- the time it takes to confirm from the second region. For most teams, eliminating noise is worth the extra half-minute of detection latency.
Building an Alert Routing Pipeline
A monitoring check is only useful if the alert reaches the right person in the right channel at the right time. Here is how to build a webhook receiver that classifies alerts and routes them appropriately.
// routes/webhooks/pingcheck.js
import crypto from "crypto";
const ROUTES = {
critical: {
slack: { channel: "#incidents", mention: "@oncall" },
email: ["oncall@yourapp.com"],
pagerduty: true,
},
warning: {
slack: { channel: "#monitoring" },
email: ["devs@yourapp.com"],
pagerduty: false,
},
info: {
slack: { channel: "#monitoring" },
email: [],
pagerduty: false,
},
};
function classify(event, monitor) {
if (event === "monitor.down" && monitor.tags?.includes("critical")) {
return "critical";
}
if (event === "monitor.down") return "warning";
if (event === "ssl.expiring") return "warning";
return "info"; // recovery events
}
app.post("/webhooks/pingcheck", async (req, res) => {
// Verify webhook signature
const sig = req.headers["x-pingcheck-signature"];
const expected = crypto
.createHmac("sha256", process.env.PINGCHECK_SECRET)
.update(JSON.stringify(req.body))
.digest("hex");
if (sig !== expected) return res.status(401).json({ error: "bad sig" });
const { event, monitor, check } = req.body;
const severity = classify(event, monitor);
const route = ROUTES[severity];
// Slack notification
if (route.slack) {
const mention = route.slack.mention ? \n${route.slack.mention} : "";
await fetch(process.env.SLACK_WEBHOOK, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
channel: route.slack.channel,
text: [${severity.toUpperCase()}] ${monitor.name}: ${event}${mention},
attachments: [{
color: severity === "critical" ? "#dc2626" : "#f59e0b",
fields: [
{ title: "URL", value: monitor.url, short: true },
{ title: "Status", value: String(check?.statusCode ?? "N/A"), short: true },
{ title: "Response Time", value: ${check?.responseTimeMs ?? "N/A"}ms, short: true },
{ title: "Region", value: check?.region ?? "N/A", short: true },
],
}],
}),
});
}
// PagerDuty for critical
if (route.pagerduty) {
await fetch("https://events.pagerduty.com/v2/enqueue", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
routing_key: process.env.PD_KEY,
event_action: event === "monitor.down" ? "trigger" : "resolve",
dedup_key: pingcheck-${monitor.id},
payload: {
summary: ${monitor.name} is DOWN (${check?.statusCode}),
severity: "critical",
source: "pingcheck",
},
}),
});
}
res.json({ processed: true, severity });
});This receiver classifies each alert based on the event type and monitor tags, then routes it to the matching channels. Critical API-down events go to Slack with an
@oncall mention, email, and PagerDuty simultaneously. Recovery events post a quiet update to #monitoring. The result is zero noise on quiet days and loud, unmissable alerts when something breaks.What 99.9% Uptime Actually Costs You
Teams throw around "99.9% uptime" like it is a high bar. It is not. 99.9% uptime allows 8 hours, 45 minutes, and 36 seconds of downtime per year. That is 43 minutes per month. Your 47-minute payment API outage already blew your monthly budget.
For a deeper analysis of what uptime percentages mean in practice -- including the financial impact of each "nine" -- read our breakdown of what 99.9% uptime really means.
The takeaway: without monitoring, you do not know your actual uptime. You might think you are at 99.9% when you are actually at 99.2% (which is 29 hours of downtime per year). Monitoring gives you the data to measure, and alerting gives you the speed to respond.
If you are evaluating monitoring tools and comparing alternatives, our UptimeRobot alternative guide covers how PingCheck stacks up against the most popular option in the space.
CI/CD Integration: Auto-Monitor New Deployments
The best monitoring setup is one you never have to think about. Add monitor creation to your deployment pipeline so every new service is automatically watched from the moment it goes live.
# .github/workflows/deploy.yml (monitoring step)
name: Create or update PingCheck monitor
run: |
curl -X PUT \
"https://api.luxkern.com/pingcheck/monitors/by-name/${SERVICE_NAME}" \
-H "Authorization: Bearer ${{ secrets.LUXKERN_API_KEY }}" \
-H "Content-Type: application/json" \
-d '{
"name": "'"${SERVICE_NAME}"'",
"url": "'"${DEPLOY_URL}"'/health",
"interval": 60,
"expectedStatus": 200,
"bodyMatch": "\"status\":\"ok\"",
"ssl": { "checkExpiry": true, "warnDaysBefore": 14 },
"alertChannels": ["slack", "email"],
"regions": ["eu-west", "us-east"],
"confirmations": 2
}'
env:
SERVICE_NAME: ${{ github.event.repository.name }}
DEPLOY_URL: ${{ steps.deploy.outputs.url }}This ensures no service ships without monitoring. The PUT method is idempotent -- it creates the monitor on first deploy and updates it on subsequent deploys. No manual dashboard configuration needed.
Stop Learning About Outages from Twitter
Your users should never be the first to tell you your API is down. A 30-second health check with body validation, SSL monitoring, and multi-region confirmation catches failures before they reach your customers.
Set up PingCheck in under 5 minutes. Create your first monitor, configure Slack alerts, and know about outages in seconds instead of hours. Free tier available -- no credit card required.