By Arjun Mehta
DORA Metrics: The Complete Guide for Engineering Leaders
DORA metrics are the gold standard for measuring engineering team performance. They were identified by Google's DevOps Research and Assessment program and validated across thousands of companies.
Unlike vanity metrics or activity metrics, DORA metrics measure outcomes that directly correlate with business success: how quickly you ship, how reliable your systems are, and how fast you recover from failures.
Yet most teams either ignore DORA metrics or measure them incorrectly.
This guide explains what the four DORA metrics actually measure, why they matter, how to measure them correctly, and how to improve them.
The Four DORA Metrics
1. Deployment Frequency
What it measures: How often you deploy code to production.
Why it matters: High deployment frequency means:
- You're shipping value constantly
- You have the infrastructure to deploy safely
- You can respond to problems quickly
- Your team is not blocked by long release cycles
Elite teams: Deploy multiple times per day High performers: Deploy 1-7 times per week Medium performers: Deploy 1-4 times per month Low performers: Deploy every few months or less frequently
How to measure: Count production deployments per day/week/month.
Note: This is tricky to define. Does a canary deployment to 5% of users count? Does a feature flag change count? The answer is: measure what matters. If you're not shipping value, it shouldn't count.
2. Lead Time for Changes
What it measures: How long from commit to production.
Why it matters: Lead time indicates:
- How quickly you can respond to customer needs
- How much friction is in your deployment process
- Whether you're blocked waiting for reviews, tests, or manual approvals
Elite teams: < 1 hour commit to production High performers: 1-24 hours Medium performers: 1-7 days Low performers: > 1 week
How to measure: Track the time from commit merge to production deployment.
Gotcha: Don't include time waiting for approval or manual reviews. That's not part of the system. Unless your system requires reviews, in which case it is.
3. Mean Time to Recovery (MTTR)
What it measures: How long to fix a production incident.
Why it matters: Incidents happen. What matters is how fast you recover:
- Can you detect issues quickly?
- Can you identify the root cause?
- Can you deploy a fix?
Elite teams: < 1 hour High performers: 1-24 hours Medium performers: 1-7 days Low performers: > 1 week
How to measure: Track time from incident detection to resolution in production.
Gotcha: This is hard to measure consistently. What counts as "resolution"? Code deployed? Monitoring shows recovery? Customer-facing impact gone?
Be consistent about your definition.
4. Change Failure Rate
What it measures: What percentage of deployments cause problems requiring rollback or hotfix.
Why it matters: This measures the quality and safety of your deployments:
- Do you test thoroughly?
- Do you catch issues before production?
- Can you safely deploy frequently?
Elite teams: 0-15% change failure rate High performers: 16-30% Medium performers: 31-45% Low performers: > 45%
How to measure: Count failed deployments / total deployments.
Gotcha: What counts as "failed"? A P2 bug that required a hotfix? A feature flag disable? A rollback? Define it clearly.
Why These Metrics?
These four metrics were chosen because they:
-
Correlate with business outcomes: Companies with elite DORA metrics have higher customer satisfaction, faster time-to-market, and better financial performance.
-
Don't reward bad behavior: You can't game these metrics by working overtime or cutting corners. Gaming one metric usually hurts the others.
-
Are actionable: They point to specific areas for improvement (deployment frequency, test reliability, monitoring, incident response).
-
Are culture-agnostic: They work for startups and enterprises, monoliths and microservices, waterfall and agile teams.
Anti-Patterns and Pitfalls
Anti-Pattern 1: Chasing Metrics Instead of Outcomes
If you optimize for deployment frequency at the expense of reliability, you'll ship broken code faster. That's not progress.
Right approach: Optimize all four metrics together. They should move in the same direction.
Anti-Pattern 2: Measuring Individual Contribution
"Our deployment frequency is low because developers aren't shipping enough code."
Wrong. Deployment frequency is a team and system metric. It's limited by:
- How fast tests run
- How complex code review is
- How much manual approval is required
- Whether infrastructure is reliable
Individual developers can't improve deployment frequency alone.
Anti-Pattern 3: Ignoring Context
A monolithic legacy system with 1000 services worth of dependencies will have different DORA metrics than a microservices architecture. That's okay. Compare yourself to your own baseline, not to Google.
Anti-Pattern 4: Missing Data
Measuring DORA metrics correctly requires solid tooling and data:
- Version control integrations
- CI/CD system logs
- Incident tracking
- Production monitoring
If you're estimating these metrics by hand, your data is wrong.
How to Improve DORA Metrics
To Improve Deployment Frequency
- Reduce batch size: Deploy smaller changes more frequently
- Reduce lead time: Fix bottlenecks in your deployment process
- Increase automation: Remove manual approval steps
- Use feature flags: Deploy incomplete features hidden behind flags
To Improve Lead Time
- Make tests fast: Slow tests create bottlenecks
- Parallelize testing: Run unit, integration, and E2E tests in parallel
- Reduce review time: Don't wait weeks for code review
- Automate gates: Let CI/CD check for quality, not humans
To Improve MTTR
- Improve monitoring: Detect issues faster
- Invest in incident response: Document runbooks, practice incident response
- Make rollback easy: Zero-downtime deployments, reversible migrations
- Reduce complexity: Simpler systems are easier to debug
To Improve Change Failure Rate
- Improve testing: Better test coverage, faster feedback
- Use canary deployments: Detect issues before they hit all users
- Automate code quality checks: Lint, type checks, security scans
- Reduce blast radius: Deploy to small subset first
The Counter-Intuitive Truth About DORA Metrics
Elite teams deploy more frequently AND have lower change failure rates. They're not trading off quality for speed.
Why?
- Smaller deployments are safer: A 10-line commit is less risky than a 1000-line commit
- Better infrastructure: Elite teams invest in testing, monitoring, and CI/CD
- Better practices: Code review, feature flags, canary deployments
- Better culture: Blameless incident response, continuous learning
If you're trading off quality for frequency, you're doing it wrong.
Measuring DORA Metrics in Practice
Setup
- Version control: Ensure all deployments are tracked in your VCS
- CI/CD logging: Capture when builds happen and deployments deploy
- Incident tracking: Log incidents with timestamps
- Monitoring: Track when issues are detected
Calculation
- Deployment Frequency: Count production deployments per week/month
- Lead Time: Measure time from commit to deployment for a random sample of deployments
- MTTR: Measure time from incident detection to resolution for a sample of incidents
- Change Failure Rate: Count failed deployments / total deployments
Tools
- GitLab CI: Built-in DORA metrics dashboards
- GitHub + custom integration: Use GitHub API + monitoring data
- Jira + custom integration: Track deployments and incidents
- Datadog, New Relic: Some monitoring platforms calculate DORA metrics
The Bigger Picture
DORA metrics are one lens on team performance. They measure engineering velocity and reliability.
They don't measure:
- Product decisions (are you building the right thing?)
- Code quality (is the code maintainable?)
- Technical debt (are you accumulating debt?)
- Developer satisfaction (are people happy?)
Use DORA metrics alongside other metrics:
- Code quality metrics: Test coverage, code review comments, refactoring rate
- Product metrics: Customer satisfaction, feature adoption, time-to-value
- Team metrics: Developer satisfaction, onboarding time, retention
Other DORA metrics give you how fast. You need other metrics to know how well and toward what.
Getting Started
- Establish baselines: Measure where you are now
- Pick the biggest bottleneck: Which metric is worst?
- Identify the root cause: Why is it bad?
- Make one improvement: Fix one thing at a time
- Remeasure: Did it improve?
- Repeat: Continuous improvement
DORA metrics aren't a destination. They're a framework for continuous improvement toward engineering excellence.
Frequently Asked Questions
Q: Should we compare our metrics to other companies? A: Only for context. Your 2 deployments per day might be elite for your domain and low for another. Compare yourself to your own baseline and your previous quarter.
Q: Our change failure rate is high. Does that mean we should deploy less? A: No. Deploying less won't fix quality. Fix the root cause: unreliable tests, missing monitoring, or lack of testing discipline. Then deploy more frequently with confidence.
Q: What if we can't measure some metrics accurately? A: Start with what you can measure. Even rough measurements are better than none. Improve measurement over time as you invest in tooling.