The past month has been a mind-blowing series of natural disasters. First, Hurricane Harvey completely flooded Houston, causing billions of dollars in damage and claiming more than 70 lives. Then, Mexico experienced a massive 8.2 magnitude earthquake to the southern state of Chiapas. Simultaneously, Hurricane Irma pummeled the Caribbean, Florida and the southeastern U.S., including Georgia—more than half of all Florida residents were left without power, approximately 16 million people.
Next, Hurricane Katia hit eastern Mexico. And then , Hurricane Jose was just off the East Coast, while Hurricane Maria has utterly destroyed the entire island of Puerto Rico as it moved on to its next victim. And if that weren’t enough, a second earthquake hit Mexico City killing at least 245 people so far. All of this within 40 days.
In addition to natural disasters, 2017 was a dangerous year for cybersecurity as well. Last April, the hacking group known as the “Shadow Brokers” released a trove of extremely powerful alleged NSA tools, including a Windows exploit known as EternalBlue, which hackers used to infect targets in two high-profile ransomware attacks:
- WannaCry, infecting hundreds of thousands of targets, including public utilities and large corporations- the largest ransomware attack on record so far with the majority of devices infected within a single day.
- Petya/NotPetya/Nyetya/Goldeneye, which infected networks in multiple countries, such as US pharmaceutical company Merck, Danish shipping company Maersk, and Russian oil giant Rosnoft.
Then, on March 7, WikiLeaks published a data trove containing 8,761 documents allegedly stolen from the CIA that contained extensive documentation of alleged spying operations and hacking tools. Revelations included iOS and Android vulnerabilities, bugs in Windows, and the ability to turn some smart TVs into listening devices.
Still not taking disaster recovery seriously? That would be a huge mistake. The time to act is NOW.
Having even a bare-bones disaster recovery plan is absolutely critical to restoring company operations. While reviewing the contents of an exhaustive DR plan is outside the scope of this post (contact us for help on this), you’ll want to ensure the following items are including in your disaster recovery plan (DRP), at a minimum.
In the case of a disaster, certain people have to be notified (e.g. executives, board members, stakeholders). An emergency contact list should include names, numbers and email addresses of everyone that needs to be contacted, including any external resources needed in the recovery process (e.g. data center/hosting contacts, Managed Service Provider, etc.).
DR Plan Scope
In order to define the scope of your DRP, you will first need to conduct a thorough business impact analysis (or get help in doing so). You need to know the acceptable length of time for which particular systems can be offline and during which, data will be unavailable to your applications. Since both of these things impact your bottom line, they should fit within the scope of your DRP.
Here are some common scope areas:
- Network infrastructure:
- Data storage
- End-user computers
- Software systems
- IT documentation
- Database systems
IT networks which are not mission-critical, such as labs and staging environments, should be defined as lower-priority within your DRP, but unless something is never coming back, it needs to be a part of your DRP.
Disaster Recovery Team
Define the roles and responsibilities of the various teams you’ll need in the recovery process. Members of each team should be motivated, well-informed and dedicated to handling any situation at a moment’s notice.
It’s better to distribute responsibilities between various teams within the organization who can work in unison if disaster strikes. The teams can be formed in the following order:
The people/team that are going to head the operation. This is the most important position in the network, because team lead will guide other teams to take action. People involved in the recovery process will report to the team lead.
This person/team should also work separately in order to take unbiased decisions during the recovery process. The disaster recovery lead will supervise the process. Here are the teams that will perform different roles during recovery:
This team will determine the damage done to the network and will be responsible for reestablishing network functionality. The exact function of each member of this team can be listed down according to the needs of your business.
For networks provided by a third-party, this team will coordinate with them to ensure quick recovery. In case of a disaster in the primary zone, the network team is also responsible for migration of the system to a secondary zone.
Once connectivity is restored to critical systems, this team will first provide connectivity to the DR team, executive staff, IT employees and then to the remaining employees. Once the company is back on track, this team will prepare a report that includes the cost of the restoration process and a summary of its role during the process and send it to the Team Lead.
IT needs to run its operations and applications during a disaster, so the Server Team will provide physical server infrastructure to the former. Their role is to ensure server functionality and provide assistance to IT and the Applications Team.
This team will figure out which servers are not working and what needs to be done to recover them with minimum business impact. If business is functioning from a secondary facility, the server team will install any tools or hardware that may be required. After the company is back on track, the team will prepare a report that contains a summary of their activities during disaster recovery and share it with the Team Lead.
Every company uses different applications for their day-to-day activities. When disaster strikes, the applications team will be responsible for ensuring that all applications are functioning seamlessly again.
If multiple applications have been impacted by a disaster, the team will assess risk levels and prioritize the recovery process accordingly. The applications team will need to update the secondary servers with the latest application versions, patches and data copies. Application settings may need to be changed to reflect the recovery environment. Custom applications may have to be refactored. Once the business is up and running, this team will prepare a report summarizing its role in disaster recovery, and share it with the Team Lead.
Data and Backups
Here's a fun fact, did you know that Pixar's Toy Story 2 would never have been released if it wasn’t for backups. The database containing master copies of characters, sets, animation, etc. was ‘accidentally’ deleted by an employee. Thanks to the backups of the Technical Director, the movie was saved.
While making a disaster recovery plan, you have to note down where your company’s critical data resides. In the event of a disaster, you will use this information to locate the data and speed up the recovery process. This can be done via a spreadsheet that lists each piece of data with its type, back-up frequency and backup location. Company leadership will decide which data should be prioritized.
Restoring IT Functionality
This section will be frequently referred to during the disaster, as it includes all the information needed to recover systems. Detailed diagrams of the system architecture and the manuals required to run various parts of the system will be a part of the plan.
Current IT System Architecture
Include a detailed diagram of the current system, its various components and system locations. IT systems should also be mentioned in order of priority.
Make a list of connectivity systems and their circuit types, bandwidth and onsite locations.
Make a list of all the network equipment in your organization and then prioritize each one according to usage. Include miscellaneous network equipment as well so that you don't leave anything out when recovering the system. Rank each component on the basis of its importance in running the business and the time it will take to bring it back to functionality. Make a list of all the servers along with their RAM, CPU, OS version and purpose. Follow this procedure for every system in your organization.
Plan Testing & Maintenance
Monitor your plan regularly and keep it updated. Back up data regularly. The most important thing to consider here is that your company’s disaster recovery needs will change over time. In order to keep up with those needs, your DRP must be updated as frequently as possible.
The maintenance of this plan is a crucial step in your disaster recovery process. Keep reviewing the plan to find any loopholes and take into account organizational shifts, mergers, changes and goals. It is also important to take into account any new legal requirements and update the plan accordingly. Any updates to the plan must also be recorded and the person making those changes must be accounted for them.
A good way to review the plan is by carrying a disaster recovery rehearsal where every team plays its role. Since this involves a large number of people, you can highlight any coordination problems during the rehearsal and revise the roles if required.
In this scenario, all the applications and servers are brought to an isolated environment to ensure that all systems and servers are working seamlessly. The administrators will check if all applications are performing as expected.
Live Fail-over Testing
In this scenario, you activate the entire DR plan live (scary, I know, but necessary). This will disrupt normal operation and system will go into disaster recovery mode. That’s why you must ensure that the steps mentioned above have been completed before this step. Because of all the disruption involved, this step must be proceeded with caution.
The Disaster recovery Team Lead will note any gaps that arise during this plan and make changes accordingly. Additional resources required during this exercise will also be added to the budget.
Once your disaster recovery plan has been implemented and completed, the team lead will also accept signed forms by all teams which clearly state that systems have been recovered to normal functionality.
A well-designed disaster recovery plan ensures that your business is functioning smoothly after catastrophe hits (and it will, someday!).
Formulating and orchastrating an exhaustive DR plan is not for the faint of heart. Many organizations simply don't have the resources or expertise. UTG can help with the design and testing of your DR plan and more. Contact us to setup an introductory call–we're here to help!