SRE-bench: Kubernetes SRE Agent Benchmark

Welcome to SRE-bench, an open and reproducible framework designed to evaluate autonomous agents on real-world Kubernetes Site Reliability Engineering (SRE) tasks.

What is SRE-bench?

SRE-bench is inspired by SWE-bench and provides a comprehensive testing ground for evaluating AI agents on:

  • Incident Response - How quickly can agents diagnose and resolve production incidents?
  • Infrastructure Changes - Can agents safely apply configuration updates and deployments?
  • Observability Triage - How effectively do agents analyze metrics, logs, and traces?
  • Reliability Improvements - Can agents identify and remediate reliability issues proactively?

Key Features

  • 10 Real-World Scenarios - Production-grade failure scenarios covering GitOps drift, resource pressure, networking issues, and more
  • Reproducible Environments - Each scenario can run in isolated Kind clusters or existing Kubernetes environments
  • GitOps Integration - Scenarios use ArgoCD for realistic deployment workflows where applicable
  • Practical Metrics - Measure time-to-diagnose, safe remediation rate, MTTR, and explainability
  • Community-Driven - Open for contributions of new scenarios and agent implementations

Who is this for?

  • SRE Teams - Training ground for understanding complex Kubernetes failure modes
  • AI/Agent Developers - Benchmark platform for evaluating autonomous agent capabilities
  • Platform Engineers - Test environment for validating infrastructure changes
  • Security Teams - Safe sandbox for chaos engineering and failure injection

Quick Navigation

Understand the Architecture
Run Scenarios

Execute failure scenarios and observe incidents.

Running Scenarios

Step-by-step guide to running SRE scenarios.

Contribute

Add your own scenarios to the benchmark.

Contributing Guide

Learn how to contribute new scenarios.

Project Purpose

This repository serves multiple purposes:

  1. Benchmarking Platform - Evaluate SRE agent performance against standardized scenarios
  2. Agentkube POC - Testing environment for autonomous Kubernetes agents
  3. Community Scenario Library - Users can contribute diverse scenarios to test their own agents
  4. SRE Training - Hands-on learning environment for understanding Kubernetes failure modes

Getting Started

The fastest way to get started is to run a scenario:

# Clone the repository
git clone https://github.com/siddhantprateek/SRE-bench.git
cd SRE-bench

# Run scenario 1: ConfigMap Drift
./scripts/1_scenerio.sh

Each scenario is self-contained and will:

  1. Create a Kind cluster (or use your existing cluster with --cluster flag)
  2. Install necessary components (ArgoCD, metrics-server, etc.)
  3. Deploy the initial stable state
  4. Trigger the failure condition
  5. Demonstrate the cascading failure
  6. Show detection signals and mitigation steps

What's Next?

Support

For questions, issues, or contributions: