Skip to main content

SLO Metrics & Calculations

This document provides an in-depth explanation of the core metrics used in Service Level Objectives (SLOs) within Motadata AIOps. These metrics are critical for understanding service health, performance violations, and maintaining reliability against expected thresholds.

SLO Evaluation Frequency Windows

Let's just have a quick recap on SLO Evaluation Frequency.

FrequencyEvaluation WindowEvaluation Interval
Daily00:00 to 23:59 of the same dayEvery 15 min
Weekly00:00 of start day to 23:59 of 7th day from the start dateEvery 30 min
Monthly00:00 of the 1st day to 23:59 of the 30th day from the start dateEvery 1 Hr
Quarterly00:00 of 1st day to 23:59 of 90th day from the start dateEvery 1 Hr

Card View

The card view displays the bird eye view to the SLO and you get the below mentioned information.

MetricValue
Target95%
Achieved22.92%
Violation77.08%
StatusBreached

You can also toggle between views and filter using Breached, Warning, OK, or Total.

Overall Monitor Distribution

Clicking the card you are drilled down to the Overview tab where you get overview for the service.

As displayed in the above image, only 2 out of 10 monitors have violated their thresholds and fall under the Breached category.

StatusCount
Breached2
Warning0
OK8
Total10

Also, you can observe the configured monitors' grid with details at the bottom of the screen. That displays only the monitor M01Z02PIFW01 caused the SLO breach.

MTTR (Mean Time To Recovery)

Formula

MTTR = T_violation_all / downincidentCount

Where:

  • T_violation_all: Total downtime
  • downincidentCount: Number of incidents

Purpose

Measures the average time taken to resolve each incident.

Example

  • T_violation_all: 18h 30m
  • Incident Count: 1 (Continuous outage)

MTTR = 18.5 / 1 = 18h 30m

Service Reliability Metrics section on the Overview tab

MTBF (Mean Time Between Failures)

Formula

MTBF = (T_total_all - T_violation_all) / downincidentCount

Where:

  • T_total_all: Total time monitored
  • T_violation_all: Total time in violation
  • downincidentCount: Number of incidents

Purpose

Shows how long the system ran without a failure, on average.

Example (not shown)

Since the example only had 1 continuous violation, MTBF = N/A.

SLO Achieved (%)

Formula

SLO Achieved (%) = ((T_total - T_violation) / T_total) * 100

Where:

  • T_total: Total monitoring time during the evaluation window
  • T_violation: Time during which SLO was violated

Purpose

Reflects the percentage of time the service remained within acceptable performance levels.

Example (from Firewall SLO)

In the walkthrough example:

  • Total Monitoring Time (T_total): 24 hours
  • Violation Time (T_violation): 18h 30m
  • SLO Achieved: 22.92%

SLO Achieved = ((24 - 18.5) / 24) * 100 = 22.92%

The green progress bar represents the Achieved % (22.92%) and the blue marker shows the Target (95%).

Error Budget Left (%)

Formula

Error Budget Left (%) = ((T_allowed - T_violation) / T_allowed) * 100

Where:

  • T_allowed: Error budget (e.g., 5% of daily time)
  • T_violation: Actual time the service was in violation

Purpose

Indicates how much of the error budget remains. If this reaches zero or goes negative, the SLO is considered Breached.

Example (from Firewall SLO)

  • Acceptable Violation Time: 1h 12m (T_allowed)
  • Violated Time: 18h 30m

Error Budget Left = ((1.2 - 18.5) / 1.2) * 100 = -1441.67%

The red bar under the SLO Trend section confirms the service has exceeded its violation time drastically that represents the SLO is Degraded.

Burn Rate (Elapsed Time-Based)

Formula

Burn Rate = (violations_so_far / time_elapsed) / (total_error_budget / total_cycle_duration)

Where:

  • violations_so_far: Total violation duration
  • time_elapsed: Time since the cycle began
  • total_error_budget: Allowable error time
  • total_cycle_duration: Full evaluation window (e.g., 24h for Daily SLO)

Purpose

Indicates how fast you’re consuming the error budget.

Example (from Firewall SLO)

  • Violation Time: 18h 30m
  • Time Elapsed: 18h 30m
  • Error Budget: 1h 12m
  • Cycle Duration: 24h

Burn Rate = (18.5 / 18.5) / (1.2 / 24) = 1 / 0.05 = 20

As displayed above the Burn Rate graph shows value of 20, indicating extremely fast error consumption.

SLO History Tab

This tab helps understand compliance over multiple days.

Key Takeaways

  • SLO Achieved shows how close you're to the goal.
  • Error Budget Left tracks remaining safe time.
  • Burn Rate forecasts future risk based on consumption speed.
  • MTTR / MTBF benchmark your recovery and uptime reliability.

These metrics empower teams to move from reactive fixes to proactive reliability engineering.