Incident response runbook
Operational playbook for P1/P2 incidents, communication, and mitigation.
howto • updated 2026-03-15
When to use
Use this runbook when there is confirmed impact on checkout, webhooks, fiscal issuance, or dashboard access.
Operational sequence
11) Classify severity
Classify as P1, P2, or P3 based on customer impact and duration.22) Mitigate first
Enable containment feature flags before deep root-cause analysis.33) Communicate status
Publish internal and external updates every 15 minutes for P1.44) Recover and validate
Confirm backlog drain, normal latency, and no fiscal pending state.55) Postmortem
Close with RCA, action items, and owner within 24 hours.
Initial communication template
## Incident in progress
- Severity: P1
- Started at: 14:20 UTC
- Impact: intermittent checkout failures
- Mitigation: retry queue fallback enabled
- Next update: 15 minutes
Recovery signals
- Checkout success rate returns to baseline.
- Webhook queue stops growing.
- No new critical errors in fiscal reconciliation.