Disaster Recovery Plan¶
Last Updated: YYYY-MM-DD Purpose: High-level recovery planning and scenario overview
Recovery Principles¶
- Stay calm - Panic leads to mistakes
- Assess first - Understand what's actually broken
- Check backups - Verify backups exist before proceeding
- Document everything - Take notes as you recover
- One thing at a time - Don't change multiple things simultaneously
Disaster Scenarios¶
| Scenario | Impact | Recovery Doc |
|---|---|---|
| Infrastructure server failure | All VMs down | proxmox-backup-restore.md |
| Storage server failure | All data unavailable | truenas-backup-restore.md |
| Both servers | Complete homelab loss | See below |
| Accidental VM deletion | Single service down | proxmox-backup-restore.md |
| Service misconfiguration | Single service broken | Restore from app backup |
Pre-Disaster Checklist¶
Do these NOW, before you need them:
Critical (Must Have)¶
- [ ] This documentation accessible from multiple locations
- [ ] Backup verification performed in last 30 days
- [ ] Recovery procedures tested at least once
- [ ] Emergency credentials stored securely (password manager)
- [ ] Restore scripts saved offline (USB, cloud storage)
Important (Should Have)¶
- [ ] VPN account recovery methods set up
- [ ] Cloud provider account recovery methods set up
- [ ] Offsite backup tested
- [ ] OS install media on bootable USB
Catastrophic Loss (Both Servers)¶
Scenario: Fire, flood, theft - both servers destroyed
This will take days. Accept it. Don't rush.
Phase 1: Get Infrastructure Running (Day 1)¶
- Obtain replacement hardware
- Install hypervisor -> proxmox-backup-restore.md
- Install storage OS -> truenas-backup-restore.md
- Basic network configuration
Phase 2: Restore Critical Services (Day 1-2)¶
Priority order: 1. Storage - Need this for everything 2. DNS - Network functionality 3. Remote access - Work capability 4. Monitoring - Visibility
Phase 3: Restore Data (Day 2+)¶
- Pull data from offsite backups (may take days for large datasets)
- Prioritize: Photos > Documents > Media
- Let it run in background
Phase 4: Applications (Day 3+)¶
- Reinstall apps once data is restored
- Restore configurations
- Test functionality
Recovery Service Priorities¶
When recovering multiple services, restore in this order:
- DNS - Network needs name resolution
- Home Automation - Safety and daily routines
- Monitoring - Need visibility into health
- Remote access - Work remotely
- Everything else - Docker hosts, media, etc.
Testing Schedule¶
Monthly¶
- [ ] Verify backups running
- [ ] Verify offsite sync completing
- [ ] Check storage space usage
Quarterly¶
- [ ] Test VM restore from backup
- [ ] Test file restore from offsite
- [ ] Verify documentation is current
Annually¶
- [ ] Full disaster recovery simulation
- [ ] Update all documentation
Recovery Log Template¶
Use this when performing actual recovery:
Date: ___________
Scenario: ___________
Cause: ___________
Timeline:
- Event discovered: ___________
- Recovery started: ___________
- Services restored: ___________
- Full recovery: ___________
What Worked:
-
What Didn't Work:
-
Lessons Learned:
-
Documentation Updates Needed:
-
Recovery Complete Checklist¶
Recovery is NOT done until: - [ ] All critical services operational - [ ] Data integrity verified - [ ] Backups resuming automatically - [ ] Monitoring operational - [ ] Documentation updated with lessons learned
Related Documentation¶
- proxmox-backup-restore.md - Hypervisor recovery
- truenas-backup-restore.md - Storage recovery
- backup-strategy.md - Overall backup philosophy
Emergency Resources¶
- Proxmox Community: https://forum.proxmox.com
- TrueNAS Community: https://forums.truenas.com
- r/homelab: https://reddit.com/r/homelab