SRE-bench: Kubernetes SRE Agent Benchmark
Welcome to SRE-bench, an open and reproducible framework designed to evaluate autonomous agents on real-world Kubernetes Site Reliability Engineering (SRE) tasks.
What is SRE-bench?
SRE-bench is inspired by SWE-bench and provides a comprehensive testing ground for evaluating AI agents on:
- Incident Response - How quickly can agents diagnose and resolve production incidents?
- Infrastructure Changes - Can agents safely apply configuration updates and deployments?
- Observability Triage - How effectively do agents analyze metrics, logs, and traces?
- Reliability Improvements - Can agents identify and remediate reliability issues proactively?
Key Features
- 10 Real-World Scenarios - Production-grade failure scenarios covering GitOps drift, resource pressure, networking issues, and more
- Reproducible Environments - Each scenario can run in isolated Kind clusters or existing Kubernetes environments
- GitOps Integration - Scenarios use ArgoCD for realistic deployment workflows where applicable
- Practical Metrics - Measure time-to-diagnose, safe remediation rate, MTTR, and explainability
- Community-Driven - Open for contributions of new scenarios and agent implementations
Who is this for?
- SRE Teams - Training ground for understanding complex Kubernetes failure modes
- AI/Agent Developers - Benchmark platform for evaluating autonomous agent capabilities
- Platform Engineers - Test environment for validating infrastructure changes
- Security Teams - Safe sandbox for chaos engineering and failure injection
Quick Navigation
Learn how the codebase is organized.
Architecture OverviewExplore the project structure and components.
Execute failure scenarios and observe incidents.
Running ScenariosStep-by-step guide to running SRE scenarios.
Add your own scenarios to the benchmark.
Contributing GuideLearn how to contribute new scenarios.
Project Purpose
This repository serves multiple purposes:
- Benchmarking Platform - Evaluate SRE agent performance against standardized scenarios
- Agentkube POC - Testing environment for autonomous Kubernetes agents
- Community Scenario Library - Users can contribute diverse scenarios to test their own agents
- SRE Training - Hands-on learning environment for understanding Kubernetes failure modes
Getting Started
The fastest way to get started is to run a scenario:
# Clone the repository
git clone https://github.com/siddhantprateek/SRE-bench.git
cd SRE-bench
# Run scenario 1: ConfigMap Drift
./scripts/1_scenerio.sh
Each scenario is self-contained and will:
- Create a Kind cluster (or use your existing cluster with
--clusterflag) - Install necessary components (ArgoCD, metrics-server, etc.)
- Deploy the initial stable state
- Trigger the failure condition
- Demonstrate the cascading failure
- Show detection signals and mitigation steps
What's Next?
- Architecture Overview - Understand the codebase structure
- Running Scenarios - Learn how to execute scenarios
- Contributing - Add your own scenarios
Support
For questions, issues, or contributions:
- GitHub Issues: github.com/siddhantprateek/SRE-bench/issues
- Documentation: You're reading it!
- Scenario Details: See scenario README for detailed descriptions of all 10 scenarios