Disaster Recovery Plan

1. Introduction

This Disaster Recovery Plan (DRP) outlines Eververse's procedures for restoring critical systems and services in the event of a disaster. The plan aims to minimize downtime, restore essential operations, and protect data during significant disruptions caused by natural disasters, cyber-attacks, or hardware failures.

2. Objectives

Ensure that critical systems are restored as quickly as possible following a disaster.
Minimize the loss of data by maintaining regular backups and implementing recovery protocols.
Provide clear instructions for recovering infrastructure, databases, monitoring services and more.
Ensure communication is maintained with stakeholders during recovery efforts.

3. Scope

This DRP applies to all critical systems that support Eververse’s business operations, including but not limited to:

Hosting and application delivery
Databases and data storage
Logging, uptime management, and incident reporting

4. Disaster Definition

A disaster is defined as any event that results in prolonged disruption or unavailability of critical services. Examples include:

Natural disasters (earthquakes, floods, etc.)
Hardware or software failures
Cybersecurity attacks (DDoS, ransomware, etc.)
Extended loss of critical service providers

5. Recovery Priorities

In the event of a disaster, the following systems and services must be restored in the following priority order:

Eververse Platform: To ensure customers can access the platform.
Database: To restore access to all customer and system data.
Monitoring and Alerts: To provide visibility into ongoing recovery efforts and system health.

6. Recovery Procedures

6.1 Infrastructure Recovery

Failover to Backup Versions: Utilize our hosting provider's built-in failover capabilities to switch to a backup version of the platform, hosted in an alternate region if needed.
Redeploy the Application: If a significant portion of the infrastructure is compromised, redeploy the Eververse platform using our hosting provider's automated deployment process, leveraging Git repositories for source code and configurations.
Global DNS Failover: Ensure DNS failover is configured to redirect traffic to healthy instances in the event of a regional outage.
Collaborate with Hosting Provider: Contact our hosting provider's technical support if required for additional assistance in the restoration process.

6.2 Database Recovery

Restore Database from Backups: Our database provider offers automated, point-in-time backups of the database. Restore the most recent backup to recover critical data.
Switch to a Replica: If the primary database region is impacted, switch to a healthy database replica in an alternate region (our provider offers multi-region support).
Data Integrity Checks: After the database restoration, perform integrity checks to ensure data consistency and accuracy.
Verify Application Functionality: Once the database is restored, verify that all database connections within the Eververse platform are functioning correctly.

6.3 Monitoring Recovery

Restore Access to Logs: Ensure that our uptime monitoring system for logging and uptime monitoring are functional and receiving data. This will help monitor the recovery process and provide visibility into the state of the systems.
Incident Reporting: Use our uptime monitoring system to analyze and report on any ongoing incidents. Ensure that all incident reports are up-to-date for later analysis.
Set Up Alerts: Re-enable any disabled uptime monitoring system alerts to ensure real-time notifications during the recovery process.
Coordinate with uptime monitoring system provider: Coordinate with our uptime monitoring system provider to expedite the resolution if there are delays in restoring uptime monitoring system services.

7. Backup and Restoration

7.1 Data Backup

Frequency: Automated backups of databases are conducted daily, with the ability to restore from specific points in time.
Storage Location: Backups are securely stored within our provider's infrastructure, using multi-region redundancy to ensure availability.
Testing: Periodic recovery tests must be conducted to ensure that backups can be restored in case of a disaster.

7.2 Restoration Testing

Frequency: Disaster recovery tests must be conducted on a frequency determined by the IRT to ensure that all procedures are effective and that backups are functioning as expected.
Scope: Each test must cover the restoration of critical systems.

8. Communication Plan

8.1 Internal Communication

Crisis Management Team: The Incident Response Team (IRT) will immediately engage in internal communications via Slack to coordinate recovery efforts.
Daily Updates: The IRT will provide daily updates on the status of recovery efforts to senior management.

8.2 External Communication

Customer Notifications: Critical incidents affecting customer data or service availability will be communicated to customers via email and the Eververse status page, hosted on our uptime monitoring system.
Service Status Page: Regular updates will be posted to the uptime monitoring system service status page to keep customers informed of the recovery progress.

9. Post-Recovery Procedures

Post-Incident Review: After the successful recovery from a disaster, the Incident Response Team will conduct a thorough post-incident review to identify the root cause of the issue and areas for improvement.
Update DRP: Any lessons learned or improvements identified during the post-incident review must be incorporated into this Disaster Recovery Plan.
Audit and Report: Document the recovery process and report findings to senior management, ensuring that all critical systems were restored within acceptable recovery time objectives (RTO).

10. Roles and Responsibilities

These may be the same people, or different people, depending on the severity of the incident and capacity of the team.

Chief Information Security Officer (CISO): Oversees the execution of the DRP, coordinates all recovery efforts, and ensures alignment with business priorities.
Incident Response Team (IRT): Responsible for implementing the recovery procedures and restoring critical systems.
Engineering Team: Handles the technical aspects of recovery, including system redeployment, database restoration, and monitoring system status.
Customer Support Team: Communicates with customers, providing regular updates during the disaster recovery process.

11. Contact Information

For any questions or clarifications regarding this Disaster Recovery Plan, please contact us.