Skip to main content

Incident Response

Severity Levels

LevelDescriptionResponse Time
P1Service downImmediate
P2Major degradation< 1 hour
P3Minor issue< 4 hours

Response Steps

  1. Acknowledge - Note the time and initial symptoms
  2. Assess - Check health endpoints, logs, metrics
  3. Communicate - Update stakeholders
  4. Mitigate - Rollback, restart, or apply fix
  5. Resolve - Confirm service restored
  6. Post-mortem - Document root cause and action items