January 22, 2026pingcheck

99.9% Uptime

Understand what 99.9% uptime really means in minutes of downtime per year. Includes SLA tier calculations, JavaScript formulas, and monitoring setup.

uptimeSLAmonitoringreliabilitydevops

99.9% Uptime

Your hosting provider guarantees 99.9% uptime. Your status page shows 99.9% uptime for the last 30 days. Your sales team tells prospects you have 99.9% uptime. But none of you can answer the simplest follow-up question: how many minutes of downtime is that? The answer is 8 hours, 45 minutes, and 36 seconds per year. Not per decade — per year. That means your service can be completely unreachable for an entire workday and still technically meet a 99.9% SLA. This article breaks down what uptime percentages actually mean in real time, how to calculate them correctly, the difference between each SLA tier, and how to monitor and measure uptime in a way that reflects reality instead of marketing.

The Uptime Percentage Illusion

Uptime percentages feel high because humans are bad at intuitively understanding percentages above 99%. The difference between 99% and 99.9% sounds trivial — it is one-tenth of one percent. But in terms of allowed downtime, the gap is enormous:

99% uptime = 3.65 days of downtime per year

99.9% uptime = 8 hours, 45 minutes per year

99.99% uptime = 52 minutes, 35 seconds per year

99.999% uptime = 5 minutes, 15 seconds per year

Going from 99% to 99.9% means reducing your allowed downtime by a factor of 10. Going from 99.9% to 99.99% means reducing it by another factor of 10. Each additional nine is exponentially harder and more expensive to achieve.

The Complete Downtime Table

Here is the full breakdown across all common SLA tiers, calculated for yearly, monthly, weekly, and daily windows:

| SLA Level | Downtime/Year | Downtime/Month | Downtime/Week | Downtime/Day | |-----------|---------------|----------------|---------------|--------------| | 99% (two nines) | 3d 15h 39m | 7h 18m 17s | 1h 40m 48s | 14m 24s | | 99.5% | 1d 19h 49m | 3h 39m 8s | 50m 24s | 7m 12s | | 99.9% (three nines) | 8h 45m 36s | 43m 49s | 10m 4s | 1m 26s | | 99.95% | 4h 22m 48s | 21m 54s | 5m 2s | 43s | | 99.99% (four nines) | 52m 35s | 4m 23s | 1m 0s | 8.6s | | 99.999% (five nines) | 5m 15s | 26.3s | 6.0s | 0.86s |

Let those numbers sink in. A 99.9% SLA allows nearly 44 minutes of downtime per month. That is enough time for a failed deployment to take down your service, your team to notice via an alert, investigate the root cause, roll back, and verify recovery — if everything goes smoothly.

A 99.99% SLA allows 4 minutes and 23 seconds per month. At that level, a human cannot be in the incident response loop. You need automated failover, health checks running every few seconds, and zero-downtime deployment strategies.

Calculating Uptime: The JavaScript Formula

The formula is straightforward:

Uptime % = ((Total Time - Downtime) / Total Time) * 100

Here is a complete JavaScript implementation that calculates uptime from incident data:

/**
 * Calculate SLA uptime percentage from incident records.
 *
 * @param {Date} periodStart - Start of the measurement period
 * @param {Date} periodEnd - End of the measurement period
 * @param {Array<{start: Date, end: Date, partial: boolean}>} incidents
 * @returns {Object} Uptime metrics
 */
function calculateUptime(periodStart, periodEnd, incidents) {
  const totalMs = periodEnd.getTime() - periodStart.getTime();

  // Calculate total downtime in milliseconds
  let downtimeMs = 0;

  for (const incident of incidents) {
    // Clamp incident to the measurement period
    const incidentStart = Math.max(
      incident.start.getTime(),
      periodStart.getTime()
    );
    const incidentEnd = Math.min(
      incident.end.getTime(),
      periodEnd.getTime()
    );

    if (incidentEnd > incidentStart) {
      if (incident.partial) {
        // Partial outage counts as 50% downtime (configurable)
        downtimeMs += (incidentEnd - incidentStart) * 0.5;
      } else {
        // Full outage counts as 100% downtime
        downtimeMs += incidentEnd - incidentStart;
      }
    }
  }

  const uptimeMs = totalMs - downtimeMs;
  const uptimePercent = (uptimeMs / totalMs) * 100;

  return {
    totalMinutes: totalMs / 60000,
    downtimeMinutes: downtimeMs / 60000,
    uptimeMinutes: uptimeMs / 60000,
    uptimePercent: uptimePercent,
    uptimeFormatted: uptimePercent.toFixed(4) + "%",
    slaMet: checkSlaCompliance(uptimePercent),
    downtimeFormatted: formatDuration(downtimeMs),
  };
}

/**
 * Check SLA compliance against common tier thresholds.
 */
function checkSlaCompliance(uptimePercent) {
  return {
    "99%": uptimePercent >= 99,
    "99.5%": uptimePercent >= 99.5,
    "99.9%": uptimePercent >= 99.9,
    "99.95%": uptimePercent >= 99.95,
    "99.99%": uptimePercent >= 99.99,
    "99.999%": uptimePercent >= 99.999,
  };
}

/**
 * Format milliseconds into a human-readable duration.
 */
function formatDuration(ms) {
  const seconds = Math.floor(ms / 1000);
  const minutes = Math.floor(seconds / 60);
  const hours = Math.floor(minutes / 60);
  const days = Math.floor(hours / 24);

  if (days > 0) return ${days}d ${hours % 24}h ${minutes % 60}m;
  if (hours > 0) return ${hours}h ${minutes % 60}m ${seconds % 60}s;
  if (minutes > 0) return ${minutes}m ${seconds % 60}s;
  return ${seconds}s;
}

// ── Example Usage ──────────────────────────────────────────

const periodStart = new Date("2026-06-01T00:00:00Z");
const periodEnd = new Date("2026-07-01T00:00:00Z");

const incidents = [
  {
    // Full outage: 45 minutes during a deployment
    start: new Date("2026-06-12T14:00:00Z"),
    end: new Date("2026-06-12T14:45:00Z"),
    partial: false,
  },
  {
    // Partial degradation: slow responses for 2 hours
    start: new Date("2026-06-20T08:00:00Z"),
    end: new Date("2026-06-20T10:00:00Z"),
    partial: true,
  },
];

const result = calculateUptime(periodStart, periodEnd, incidents);
console.log(result);
// {
//   totalMinutes: 43200,
//   downtimeMinutes: 105,
//   uptimeMinutes: 43095,
//   uptimePercent: 99.7569,
//   uptimeFormatted: "99.7569%",
//   slaMet: {
//     "99%": true,
//     "99.5%": true,
//     "99.9%": false,    <-- 99.9% SLA breached
//     "99.95%": false,
//     "99.99%": false,
//     "99.999%": false
//   },
//   downtimeFormatted: "1h 45m 0s"
// }

This implementation handles several real-world complexities:

Period clamping — incidents that span period boundaries are clamped to the measurement window

Partial outages — degraded performance counted at 50% (configurable per incident)

Multiple SLA tiers — checks compliance against all common thresholds simultaneously

What Counts as Downtime?

This is where SLA definitions get contentious. Different providers define downtime differently:

Total Outage Only

Some providers only count complete, total outages. If your API returns 500 errors for 80% of requests but 200 for the other 20%, they consider the service "up." This is the most provider-friendly definition and the least useful for customers.

Error Rate Threshold

A better definition: downtime begins when the error rate exceeds a threshold (e.g., 5% of requests return errors). This captures partial outages that meaningfully impact users.

Response Time Threshold

The strictest definition: downtime includes periods where response times exceed an acceptable threshold (e.g., p95 latency > 2 seconds). Slow is the new down for many applications.

Scheduled Maintenance

Most SLAs exclude scheduled maintenance from downtime calculations. This is reasonable as long as maintenance windows are communicated in advance and kept within agreed limits (e.g., no more than 4 hours per month).

A good SLA clearly defines what constitutes downtime. A bad SLA uses vague language that lets the provider redefine downtime after the fact.

The Cost of Each Nine

Adding a nine to your uptime is not just a technical challenge — it is an economic one:

| Level | Downtime/Year | What It Requires | Approximate Cost Premium | |-------|---------------|------------------|------------------------| | 99% | 3.65 days | Single server, basic monitoring | Baseline | | 99.9% | 8h 45m | Redundancy, health checks, alerting | 2-5x | | 99.99% | 52m | Multi-AZ, auto-failover, zero-downtime deploys | 10-20x | | 99.999% | 5m 15s | Multi-region, active-active, sub-second failover | 50-100x |

The jump from 99.9% to 99.99% typically requires:

Multi-availability-zone deployment — your application runs in at least two data centers

Automated health checking — checks every 10-30 seconds, not every 5 minutes

Automated failover — traffic reroutes without human intervention

Zero-downtime deployments — rolling updates, blue-green, or canary deployments

Dependency redundancy — databases, caches, and queues all have failover

Chaos engineering — regularly testing failure scenarios in production

Each of these adds operational complexity and infrastructure cost. The question is not "can we achieve 99.99%?" but "does the business value justify the engineering investment?"

Measuring Uptime Correctly

You cannot claim an uptime number without measuring it. And how you measure determines how accurate the number is.

Synthetic Monitoring

Synthetic monitors send requests to your service at regular intervals from multiple geographic locations. This is the industry standard for uptime measurement.

// PingCheck synthetic monitoring configuration
const createUptimeMonitor = async () => {
  const response = await fetch("https://api.luxkern.com/v1/pingcheck/monitors", {
    method: "POST",
    headers: {
      "Authorization": "Bearer YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      name: "Production API",
      url: "https://api.yourproduct.com/health",
      method: "GET",
      interval: 30,                    // Check every 30 seconds
      timeout: 10000,                  // 10 second timeout
      expectedStatus: 200,
      expectedBody: '{"status":"ok"}', // Optional response body check
      regions: [
        "us-east-1",
        "eu-west-1",
        "ap-southeast-1",
      ],
      alertAfterFailures: 2,           // Alert after 2 consecutive failures
      alertChannels: [
        { type: "slack", webhookUrl: "https://hooks.slack.com/..." },
        { type: "email", address: "oncall@company.com" },
      ],
      sla: {
        target: 99.9,                  // Track against 99.9% SLA
        period: "monthly",
        notifyOnBreach: true,
      },
    }),
  });

  return response.json();
};

Check Interval Matters

The frequency of your uptime checks directly affects the accuracy of your measurement:

5-minute checks — 288 data points per day. A 4-minute outage between checks could go undetected.

1-minute checks — 1,440 data points per day. Better, but still misses sub-minute blips.

30-second checks — 2,880 data points per day. Good for 99.9% SLAs.

10-second checks — 8,640 data points per day. Required for 99.99%+ SLAs.

If you are claiming 99.99% uptime but only checking every 5 minutes, your number is unreliable. The measurement granularity must match the SLA precision.

Multi-Region Checking

A single monitoring location gives you a single perspective. Your service might be up in Virginia but down in Frankfurt because of a DNS issue, a CDN misconfiguration, or a regional cloud provider outage. Always monitor from at least three regions.

SLA Credits and Financial Implications

Most SLAs include a credit mechanism: if the provider fails to meet the uptime guarantee, the customer receives a credit against their bill. Here is what typical credit structures look like:

| Monthly Uptime | Credit (% of monthly bill) | |----------------|---------------------------| | 99.0% - 99.9% | 10% | | 95.0% - 99.0% | 25% | | < 95.0% | 50% |

Notice the asymmetry: the provider risks a 10-50% credit, but the customer bears the full cost of the outage — lost revenue, damaged reputation, broken integrations, and engineering time investigating the impact. SLA credits never make you whole. They are a signal of commitment, not compensation.

This is why you monitor your providers independently rather than trusting their self-reported numbers.

The SLA Calculator: A Free Tool

If you do not want to do the math manually, use the Luxkern SLA Calculator — a free tool that:

Converts any uptime percentage to real downtime (yearly, monthly, weekly, daily)

Compares SLA tiers side by side

Calculates your actual uptime from incident data

Generates SLA compliance reports

Bookmark it. You will use it every time you negotiate a vendor contract or review your own SLA.

Building an Internal SLA Dashboard

For teams that want real-time SLA tracking, here is a pattern using the PingCheck API:

/**
 * Fetch SLA compliance data for all monitors
 * and generate a dashboard summary.
 */
async function generateSlaDashboard() {
  const response = await fetch(
    "https://api.luxkern.com/v1/pingcheck/monitors?include=sla",
    {
      headers: { "Authorization": "Bearer YOUR_API_KEY" },
    }
  );

  const { monitors } = await response.json();

  const dashboard = monitors.map((monitor) => ({
    name: monitor.name,
    url: monitor.url,
    currentStatus: monitor.status, // "up" | "down" | "degraded"
    sla: {
      target: monitor.sla.target,
      currentMonth: monitor.sla.currentMonthUptime,
      previousMonth: monitor.sla.previousMonthUptime,
      trailing90Days: monitor.sla.trailing90DaysUptime,
      compliance: monitor.sla.currentMonthUptime >= monitor.sla.target
        ? "COMPLIANT"
        : "BREACHED",
      remainingBudget: calculateRemainingBudget(
        monitor.sla.target,
        monitor.sla.currentMonthUptime,
        monitor.sla.currentMonthTotalMinutes,
        monitor.sla.currentMonthDowntimeMinutes
      ),
    },
    incidents: {
      thisMonth: monitor.incidents.currentMonthCount,
      mttr: monitor.incidents.meanTimeToRecover, // in minutes
    },
  }));

  return dashboard;
}

function calculateRemainingBudget(target, current, totalMin, downtimeMin) {
  const allowedDowntimeMin = totalMin * (1 - target / 100);
  const remainingMin = allowedDowntimeMin - downtimeMin;

  return {
    allowedMinutes: allowedDowntimeMin.toFixed(1),
    usedMinutes: downtimeMin.toFixed(1),
    remainingMinutes: Math.max(0, remainingMin).toFixed(1),
    percentUsed: ((downtimeMin / allowedDowntimeMin) * 100).toFixed(1) + "%",
  };
}

This gives you a real-time view of your error budget — how much downtime you can still "spend" before breaching your SLA this month. When the remaining budget drops below 25%, it is time to freeze risky deployments and focus on stability.

Practical Recommendations by SLA Tier

Targeting 99.9% (Most SaaS Products)

Monitor from 3+ regions every 30 seconds

Automated alerting with 2-minute detection time

Single-AZ deployment with auto-restart

Blue-green or rolling deployments

Database replication with manual failover

Targeting 99.99% (High-Value Enterprise)

Monitor from 5+ regions every 10 seconds

Automated alerting with 30-second detection time

Multi-AZ deployment with automated failover

Zero-downtime deployments (canary + automated rollback)

Database multi-AZ with automatic failover

Quarterly game day exercises

Targeting 99.999% (Financial, Healthcare)

Multi-region active-active architecture

Sub-second health checks

DNS-based traffic management with automatic region failover

Full dependency redundancy (database, cache, queue, CDN)

Chaos engineering in production

Dedicated SRE team

Try PingCheck free — no credit card required.

The Uptime Number Is Not the Goal

Uptime is a proxy metric. The real goal is user experience. A service with 99.99% uptime but 5-second response times is worse than a service with 99.95% uptime and 100ms responses. When you optimize for uptime alone, you miss the forest for the trees.

Track uptime alongside latency, error rate, and throughput. Together, these four metrics (the "four golden signals" from Google's SRE book) give you a complete picture of service health.

For the math behind uptime calculations, read our detailed tutorial on how to calculate SLA uptime and downtime. For a broader introduction to monitoring, see our guide on what is uptime monitoring.