AIOpenLibraryAIOpenLibrary
Back to Software Architecture

High Availability & Disaster Recovery Planner

Design high availability architectures with failover strategies, RTO/RPO calculations, and disaster recovery runbooks.

Updated Mar 11, 2026

ShareLinkedIn

Customize Your Prompt

0/7 filled

Prompt

You are a site reliability architect. Design a high availability and disaster recovery strategy for my system.

System: [SYSTEM_DESCRIPTION]
Current availability: [CURRENT_AVAILABILITY]
Target availability: [TARGET_SLA] (e.g., 99.9%, 99.99%)
Current infrastructure: [INFRASTRUCTURE]
Data criticality: [DATA_CRITICALITY]
Regulatory requirements: [COMPLIANCE]
Budget for HA/DR: [BUDGET]

Design the HA/DR strategy:

**1. Availability Math**
- Target SLA → allowed downtime per year/month/day
  - 99.9% = 8.76 hours/year
  - 99.95% = 4.38 hours/year
  - 99.99% = 52.6 minutes/year
- Current failure modes and their individual availability
- Composite availability calculation (serial vs. parallel components)

**2. RTO/RPO Definition**
- **RPO** (Recovery Point Objective): Maximum acceptable data loss
- **RTO** (Recovery Time Objective): Maximum acceptable downtime
- For each tier of services (not everything needs the same RPO/RTO)

**3. High Availability Design**
- Redundancy at every layer (load balancers, application, database, storage)
- Multi-AZ deployment strategy
- Database HA: Primary-replica, multi-master, or distributed
- Session management (stateless services, externalized state)
- Health checks and auto-healing
- Graceful degradation plan (what to shed under extreme load)

**4. Disaster Recovery Plan**
- DR strategy: Active-Active vs. Active-Passive vs. Pilot Light vs. Backup & Restore
- Multi-region architecture (if required)
- Data replication strategy (sync vs. async, RPO implications)
- Failover automation (DNS, load balancer, database)
- Failback procedure

**5. Failure Scenarios & Runbooks**
For each scenario (single instance, AZ failure, region failure, data corruption):
- Detection: How do we know it happened?
- Response: Step-by-step recovery procedure
- Communication: Who to notify and status page updates
- Estimated recovery time

**6. Chaos Engineering Plan**
- Game day exercises to practice failures
- Automated chaos testing (Chaos Monkey, Gremlin, Litmus)
- Failure injection starting points (start small)

**7. Monitoring & Alerting**
- SLI/SLO definitions for each critical path
- Error budget tracking
- Escalation procedures
- On-call rotation design

**8. Cost Analysis**
| HA Level | Architecture | Monthly Cost | Downtime/Year |

Recommend the right trade-off for your budget and requirements.

Powered by Hugging Face Inference API

Pro Tips

  • HA/DR planning is about math (availability calculations), architecture (redundancy), and operations (runbooks). This covers all three systematically.

References

Comments

Log in to leave a comment

More Software Architecture Prompts

🏗️Software ArchitectureNEW

Architecture Decision Record Writer

Write well-structured Architecture Decision Records (ADRs) that document the context, options considered, and rationale behind key technical decisions.

You are a principal software architect who believes that documented decisions ar...

Claude
IntermediateView prompt
🏗️Software ArchitectureNEW

System Design Document Generator

Generate comprehensive system design documents (RFCs/design docs) with component architecture, data flow, API contracts, and operational considerations.

You are a staff engineer writing a design document for a new system. Create a co...

Claude
AdvancedView prompt
🏗️Software ArchitectureNEW

Event-Driven Architecture Planner

Design event-driven systems with event sourcing, CQRS, message brokers, and eventual consistency patterns.

You are a distributed systems architect specializing in event-driven architectur...

Claude
AdvancedView prompt

You Might Also Like

✍️Writing & Content✦ Premium

Blog Post Architect

Create SEO-optimized, engaging blog posts with structured outlines, compelling hooks, and strategic keyword placement.

You are an expert content strategist and SEO specialist. Create a comprehensive ...

Claude Opus 4
IntermediateView prompt
📚Education✦ Premium

Socratic Method Tutor

Learn any concept through guided questioning that builds deep understanding instead of memorization.

You are a Socratic tutor. Your role is to help me deeply understand a concept th...

Claude Opus 4
BeginnerView prompt
📦Product Management✦ Premium

Product Requirements Document (PRD)

Generate comprehensive PRDs with user stories, acceptance criteria, technical requirements, and success metrics.

You are a senior product manager at a top tech company. Write a comprehensive PR...

Claude Opus 4
IntermediateView prompt