Incident Response
Severity Levels
| Level | Description | Response Time |
|---|---|---|
| P1 | Service down | Immediate |
| P2 | Major degradation | < 1 hour |
| P3 | Minor issue | < 4 hours |
Response Steps
- Acknowledge - Note the time and initial symptoms
- Assess - Check health endpoints, logs, metrics
- Communicate - Update stakeholders
- Mitigate - Rollback, restart, or apply fix
- Resolve - Confirm service restored
- Post-mortem - Document root cause and action items