# Write Post-Incident Review (Postmortem)
Act as an Engineering Manager writing a post-incident review.
## Post-Incident Review Template
### Incident Summary
- **Incident ID**: [ID]
- **Date & Time**: [Start] - [End]
- **Duration**: [Duration]
- **Severity**: [SEV-1/2/3/4]
- **Status**: [Resolved/Mitigated]
### Executive Summary
[2-3 sentence summary of what happened and impact]
### Timeline of Events
**Detection** ([Time]):
- [What happened]
- [How it was detected]
- [Who was notified]
**Investigation** ([Time]):
- [What was investigated]
- [Key findings]
- [Decisions made]
**Mitigation** ([Time]):
- [Actions taken]
- [Workarounds deployed]
- [Resources involved]
**Resolution** ([Time]):
- [Root cause identified]
- [Permanent fix deployed]
- [Verification completed]
---
### Impact Assessment
**User Impact**:
- [ ] Affected users: [Number/%]
- [ ] Affected regions: [List]
- [ ] Affected features: [List]
- [ ] User-visible errors: [Description]
**Business Impact**:
- [ ] Revenue impact: [$ or %]
- [ ] Customer churn: [Number]
- [ ] Support tickets: [Number]
- [ ] Brand reputation: [Impact]
**Technical Impact**:
- [ ] Services affected: [List]
- [ ] Error rate increase: [% or number]
- [ ] Performance degradation: [%]
- [ ] Data integrity: [Any issues]
---
### Root Cause Analysis
**Primary Root Cause**:
- [Detailed explanation]
- [Why it happened]
- [Contributing factors]
**Contributing Factors**:
- [ ] [Factor 1]: [Explanation]
- [ ] [Factor 2]: [Explanation]
- [ ] [Factor 3]: [Explanation]
**Five Whys Analysis**:
1. Why did [symptom] happen? โ [Answer]
2. Why did [answer 1] happen? โ [Answer]
3. Why did [answer 2] happen? โ [Answer]
4. Why did [answer 3] happen? โ [Answer]
5. Why did [answer 4] happen? โ [Root cause]
---
### What Went Well
- [ ] [Positive aspect 1]
- [ ] [Positive aspect 2]
- [ ] [Positive aspect 3]
### What Went Wrong
- [ ] [Problem 1]
- [ ] [Problem 2]
- [ ] [Problem 3]
### What We Learned
- [ ] [Learning 1]
- [ ] [Learning 2]
- [ ] [Learning 3]
---
### Action Items
**Immediate Actions** (This Week):
- [ ] [Action] - Owner: [Name] - Due: [Date]
- [ ] [Action] - Owner: [Name] - Due: [Date]
**Short-term Actions** (This Month):
- [ ] [Action] - Owner: [Name] - Due: [Date]
- [ ] [Action] - Owner: [Name] - Due: [Date]
**Long-term Actions** (This Quarter):
- [ ] [Action] - Owner: [Name] - Due: [Date]
- [ ] [Action] - Owner: [Name] - Due: [Date]
**Action Item Categories**:
- Prevention: [Actions to prevent recurrence]
- Detection: [Actions to improve detection]
- Response: [Actions to improve response]
- Recovery: [Actions to improve recovery]
---
### Metrics & Statistics
**Incident Metrics**:
- Mean Time to Acknowledge (MTTA): [Time]
- Mean Time to Resolve (MTTR): [Time]
- Time to Detection: [Time]
- Time to Mitigation: [Time]
- Time to Resolution: [Time]
**Comparison to Goals**:
- [ ] MTTA within target? [Yes/No]
- [ ] MTTR within target? [Yes/No]
- [ ] Communication timeline met? [Yes/No]
---
### Follow-up Actions
**Review Schedule**:
- [ ] Action item review: [Date]
- [ ] Retrospective meeting: [Date]
- [ ] Document update: [Date]
**Stakeholder Communication**:
- [ ] Executive summary shared: [Date]
- [ ] Customer communication sent: [Date]
- [ ] Status page updated: [Date]
---
## Post-Incident Review Best Practices
**Blameless Culture**:
- Focus on systems and processes, not individuals
- Ask "What" and "How", not "Who"
- Encourage learning and improvement
**Thoroughness**:
- Include all relevant details
- Document lessons learned
- Track action items to completion
**Transparency**:
- Share learnings across teams
- Update documentation
- Communicate changes to stakeholders