Create Incident Recovery Plan

Develop detailed incident recovery procedures including rollback plans, failover procedures, data recovery steps, and service restoration workflows.

Last updated: November 6, 2025

management

Engineering Manager

incident-recovery

disaster-recovery

devops

Prompt Template

Copy the prompt template below

# Create Incident Recovery Plan

Act as an Engineering Manager creating an incident recovery plan.

## Recovery Plan Context
- **System**: [System/service name]
- **Criticality**: [Critical/High/Medium/Low]
- **Recovery Time Objective (RTO)**: [Target recovery time]
- **Recovery Point Objective (RPO)**: [Acceptable data loss window]

## Recovery Procedures

### 1. Pre-Incident Preparation

**Backup Strategy**:
- [ ] Database backups: [Frequency, retention]
- [ ] Configuration backups: [Frequency, retention]
- [ ] Code backups: [Version control, release tags]
- [ ] Backup verification: [How to verify backups]

**Disaster Recovery Infrastructure**:
- [ ] Secondary region/environment: [Location]
- [ ] Failover capabilities: [Active-passive/Active-active]
- [ ] Standby resources: [Resources available]
- [ ] DNS failover: [Configured/Load balancer]

**Documentation**:
- [ ] System architecture documented
- [ ] Dependencies mapped
- [ ] Recovery procedures documented
- [ ] Contact lists maintained

### 2. Recovery Decision Tree

**Scenario: Service Down**
- [ ] Check health endpoints
- [ ] Verify infrastructure status
- [ ] Review recent deployments
- [ ] Check logs for errors
- [ ] Option A: Rollback deployment
- [ ] Option B: Restart services
- [ ] Option C: Scale up resources
- [ ] Option D: Failover to secondary region

**Scenario: Data Corruption**
- [ ] Identify corrupted data
- [ ] Assess scope of corruption
- [ ] Option A: Restore from backup (if data loss acceptable)
- [ ] Option B: Repair corrupted data (if possible)
- [ ] Option C: Rebuild from source (if available)

**Scenario: Security Breach**
- [ ] Isolate affected systems
- [ ] Preserve evidence
- [ ] Notify security team
- [ ] Assess data exposure
- [ ] Restore from clean backup
- [ ] Deploy security patches

### 3. Rollback Procedures

**Deployment Rollback**:
- [ ] Identify deployment version
- [ ] Verify previous stable version
- [ ] Execute rollback: [Step-by-step commands]
- [ ] Verify rollback success
- [ ] Monitor system health
- [ ] Notify stakeholders

**Database Rollback**:
- [ ] Stop database writes
- [ ] Identify restore point
- [ ] Restore from backup: [Commands]
- [ ] Verify data integrity
- [ ] Resume database operations
- [ ] Monitor for issues

**Configuration Rollback**:
- [ ] Identify configuration change
- [ ] Restore previous configuration
- [ ] Restart affected services
- [ ] Verify configuration applied
- [ ] Monitor system behavior

### 4. Failover Procedures

**Active-Passive Failover**:
- [ ] Verify secondary region health
- [ ] Stop primary region traffic
- [ ] Update DNS/load balancer
- [ ] Verify failover success
- [ ] Monitor secondary region
- [ ] Document failover event

**Active-Active Failover**:
- [ ] Identify failing region
- [ ] Drain traffic from failing region
- [ ] Redirect traffic to healthy region
- [ ] Monitor traffic distribution
- [ ] Verify service continuity

**Database Failover**:
- [ ] Identify primary database failure
- [ ] Promote secondary database
- [ ] Update connection strings
- [ ] Verify database connectivity
- [ ] Monitor replication lag

### 5. Data Recovery Procedures

**Database Recovery**:
- [ ] Identify data loss/corruption
- [ ] Stop data writes
- [ ] Select restore point (within RPO)
- [ ] Restore database: [Commands]
- [ ] Verify data integrity
- [ ] Resume operations

**File System Recovery**:
- [ ] Identify lost/corrupted files
- [ ] Restore from backup: [Commands]
- [ ] Verify file integrity
- [ ] Restore permissions
- [ ] Resume file operations

**Application State Recovery**:
- [ ] Identify lost state
- [ ] Restore from checkpoint/backup
- [ ] Replay logs if available
- [ ] Verify state consistency
- [ ] Resume application operations

### 6. Service Restoration Workflow

**Step 1: Assess Impact**
- [ ] What services are affected?
- [ ] How many users impacted?
- [ ] What functionality is broken?
- [ ] What data is at risk?

**Step 2: Choose Recovery Strategy**
- [ ] Quick fix/workaround available?
- [ ] Rollback possible?
- [ ] Failover needed?
- [ ] Full restoration required?

**Step 3: Execute Recovery**
- [ ] Execute recovery procedures
- [ ] Monitor recovery progress
- [ ] Verify recovery success
- [ ] Document actions taken

**Step 4: Validate Recovery**
- [ ] Test critical functionality
- [ ] Verify data integrity
- [ ] Monitor error rates
- [ ] Confirm user impact resolved

**Step 5: Post-Recovery**
- [ ] Update status page
- [ ] Notify stakeholders
- [ ] Schedule post-incident review
- [ ] Update documentation

### 7. Recovery Testing

**Test Scenarios**:
- [ ] Database failure recovery
- [ ] Deployment rollback
- [ ] Region failover
- [ ] Data corruption recovery
- [ ] Security incident recovery

**Testing Schedule**:
- [ ] Quarterly: [Full disaster recovery test]
- [ ] Monthly: [Component recovery tests]
- [ ] Weekly: [Backup verification]

**Test Documentation**:
- [ ] Test results recorded
- [ ] Recovery time measured
- [ ] Issues documented
- [ ] Improvements identified

### 8. Recovery Tools & Scripts

**Automation Scripts**:
- [ ] Rollback script: [Location]
- [ ] Failover script: [Location]
- [ ] Database restore script: [Location]
- [ ] Health check script: [Location]

**Monitoring Tools**:
- [ ] [Tool] - [Purpose]
- [ ] [Tool] - [Purpose]

**Backup Tools**:
- [ ] [Tool] - [Purpose]
- [ ] [Tool] - [Purpose]

## Success Metrics

**Recovery Metrics**:
- Recovery Time Objective (RTO): [Target]
- Recovery Point Objective (RPO): [Target]
- Recovery Success Rate: [Target %]
- Mean Time to Recovery (MTTR): [Target]

30 views

Updated 11/6/2025

Back to Library

Unlock Premium Features

Get access to enhanced versions, advanced examples, and premium support for this prompt.

Loading revision history...

30 views

0 favorites

0 shares

Related Prompts

No related prompts found.

Create Incident Recovery Plan

Unlock Premium Features

Related Prompts

Social

Legal

Try These Resources

Related Prompts

Use infrastructure as code prompt

Use incident post-mortem facilitator prompt

Use monitoring and alerting prompt