What is MTTD?
Mean time to detect (MTTD) measures how long it takes to discover a problem after it starts. For testers, it's the gap between when something breaks and when your monitoring tools or users report it.
Do you have any examples of MTTD?
Let's look at a week of production incidents:
- Memory leak: 5 minutes to detect
- API timeout: 12 minutes to detect
- Database slowdown: 8 minutes to detect
The formula for MTTD is:
- MTTD = Total detection time ÷ Number of incidents
So:
- MTTD = 25 minutes ÷ 3 = 8.3 minutes
This means your monitoring catches problems in about 8 minutes on average.
Why is MTTD important?
MTTD directly affects how much damage an issue can cause before you start fixing it. A high MTTD means problems lurk in your system longer, potentially affecting more users or data. It's especially critical for security issues, where every minute counts.
What are the challenges with MTTD?
Setting the right monitoring thresholds is tricky—too sensitive leads to alert fatigue, too loose means missing real issues. Some problems, like gradual performance degradation, are harder to detect than sudden failures. You also need monitoring coverage across your entire system, including third-party services and dependencies.