Create Incident Recovery Plan
Develop detailed incident recovery procedures including rollback plans, failover procedures, data recovery steps, and service restoration workflows.
v3
Last updated: November 6, 2025
management
Engineering Manager
incident-recovery
disaster-recovery
devops
Prompt Template
Copy the prompt template below
# Create Incident Recovery Plan Act as an Engineering Manager creating an incident recovery plan. ## Recovery Plan Context - **System**: [System/service name] - **Criticality**: [Critical/High/Medium/Low] - **Recovery Time Objective (RTO)**: [Target recovery time] - **Recovery Point Objective (RPO)**: [Acceptable data loss window] ## Recovery Procedures ### 1. Pre-Incident Preparation **Backup Strategy**: - [ ] Database backups: [Frequency, retention] - [ ] Configuration backups: [Frequency, retention] - [ ] Code backups: [Version control, release tags] - [ ] Backup verification: [How to verify backups] **Disaster Recovery Infrastructure**: - [ ] Secondary region/environment: [Location] - [ ] Failover capabilities: [Active-passive/Active-active] - [ ] Standby resources: [Resources available] - [ ] DNS failover: [Configured/Load balancer] **Documentation**: - [ ] System architecture documented - [ ] Dependencies mapped - [ ] Recovery procedures documented - [ ] Contact lists maintained ### 2. Recovery Decision Tree **Scenario: Service Down** - [ ] Check health endpoints - [ ] Verify infrastructure status - [ ] Review recent deployments - [ ] Check logs for errors - [ ] Option A: Rollback deployment - [ ] Option B: Restart services - [ ] Option C: Scale up resources - [ ] Option D: Failover to secondary region **Scenario: Data Corruption** - [ ] Identify corrupted data - [ ] Assess scope of corruption - [ ] Option A: Restore from backup (if data loss acceptable) - [ ] Option B: Repair corrupted data (if possible) - [ ] Option C: Rebuild from source (if available) **Scenario: Security Breach** - [ ] Isolate affected systems - [ ] Preserve evidence - [ ] Notify security team - [ ] Assess data exposure - [ ] Restore from clean backup - [ ] Deploy security patches ### 3. Rollback Procedures **Deployment Rollback**: - [ ] Identify deployment version - [ ] Verify previous stable version - [ ] Execute rollback: [Step-by-step commands] - [ ] Verify rollback success - [ ] Monitor system health - [ ] Notify stakeholders **Database Rollback**: - [ ] Stop database writes - [ ] Identify restore point - [ ] Restore from backup: [Commands] - [ ] Verify data integrity - [ ] Resume database operations - [ ] Monitor for issues **Configuration Rollback**: - [ ] Identify configuration change - [ ] Restore previous configuration - [ ] Restart affected services - [ ] Verify configuration applied - [ ] Monitor system behavior ### 4. Failover Procedures **Active-Passive Failover**: - [ ] Verify secondary region health - [ ] Stop primary region traffic - [ ] Update DNS/load balancer - [ ] Verify failover success - [ ] Monitor secondary region - [ ] Document failover event **Active-Active Failover**: - [ ] Identify failing region - [ ] Drain traffic from failing region - [ ] Redirect traffic to healthy region - [ ] Monitor traffic distribution - [ ] Verify service continuity **Database Failover**: - [ ] Identify primary database failure - [ ] Promote secondary database - [ ] Update connection strings - [ ] Verify database connectivity - [ ] Monitor replication lag ### 5. Data Recovery Procedures **Database Recovery**: - [ ] Identify data loss/corruption - [ ] Stop data writes - [ ] Select restore point (within RPO) - [ ] Restore database: [Commands] - [ ] Verify data integrity - [ ] Resume operations **File System Recovery**: - [ ] Identify lost/corrupted files - [ ] Restore from backup: [Commands] - [ ] Verify file integrity - [ ] Restore permissions - [ ] Resume file operations **Application State Recovery**: - [ ] Identify lost state - [ ] Restore from checkpoint/backup - [ ] Replay logs if available - [ ] Verify state consistency - [ ] Resume application operations ### 6. Service Restoration Workflow **Step 1: Assess Impact** - [ ] What services are affected? - [ ] How many users impacted? - [ ] What functionality is broken? - [ ] What data is at risk? **Step 2: Choose Recovery Strategy** - [ ] Quick fix/workaround available? - [ ] Rollback possible? - [ ] Failover needed? - [ ] Full restoration required? **Step 3: Execute Recovery** - [ ] Execute recovery procedures - [ ] Monitor recovery progress - [ ] Verify recovery success - [ ] Document actions taken **Step 4: Validate Recovery** - [ ] Test critical functionality - [ ] Verify data integrity - [ ] Monitor error rates - [ ] Confirm user impact resolved **Step 5: Post-Recovery** - [ ] Update status page - [ ] Notify stakeholders - [ ] Schedule post-incident review - [ ] Update documentation ### 7. Recovery Testing **Test Scenarios**: - [ ] Database failure recovery - [ ] Deployment rollback - [ ] Region failover - [ ] Data corruption recovery - [ ] Security incident recovery **Testing Schedule**: - [ ] Quarterly: [Full disaster recovery test] - [ ] Monthly: [Component recovery tests] - [ ] Weekly: [Backup verification] **Test Documentation**: - [ ] Test results recorded - [ ] Recovery time measured - [ ] Issues documented - [ ] Improvements identified ### 8. Recovery Tools & Scripts **Automation Scripts**: - [ ] Rollback script: [Location] - [ ] Failover script: [Location] - [ ] Database restore script: [Location] - [ ] Health check script: [Location] **Monitoring Tools**: - [ ] [Tool] - [Purpose] - [ ] [Tool] - [Purpose] **Backup Tools**: - [ ] [Tool] - [Purpose] - [ ] [Tool] - [Purpose] ## Success Metrics **Recovery Metrics**: - Recovery Time Objective (RTO): [Target] - Recovery Point Objective (RPO): [Target] - Recovery Success Rate: [Target %] - Mean Time to Recovery (MTTR): [Target]
30 views
Updated 11/6/2025
Unlock Premium Features
Get access to enhanced versions, advanced examples, and premium support for this prompt.
Loading revision history...
30 views
0 favorites
0 shares
Related Prompts
No related prompts found.