Man-made and natural disasters are constantly occurring around us. Some we can control quite easily while others literally shake the foundations of our earth. As economies around the world continually become more tightly knit together with supply chains being forged, these events challenge the stability of businesses and ultimately individual lives. Without a good disaster preparedness and recovery plan, critical businesses can become the weak link in the chain and have far reaching effects.
One of the departments I am responsible for at IQMS handles system administration support for customers using EnterpriseIQ. We receive a variety of queries and requests that range from how to install a specific component to "HELP!!! Our system is down." System down events, because of their rare occurrence, are difficult situations in themselves when trying to determine what might have gone wrong, why it happened and how to correct it. The situation doesn't get any easier when we ask about their backups and we hear the customer respond,
"Backup?? What backup? I'm not sure if we have a backup."
Fortunately for the majority of our customers, our team has been able to either recover the database or rely on the automated backup process IQMS puts in place when customers are installed.
Another situation we have encountered is the customer who confidently states that they have a backup, but when we go to access that backup, we find that it is either invalid because of file corruption or that it is years old. In one situation, a customer had backups being compressed and moved off server on a daily basis using a typical batch file process. Everything at the endpoint seemed fine with files being tagged and dated as expected. What they did not realize was that the actual export of the database was not happening. The backup that was being captured and moved was the same file for almost two years (one would think that the file size staying the same might have hinted that something was wrong). Fortunately in this case, the customer was not in a down situation but had needed the backup to do some troubleshooting.
These two situations should make it clear that without a disaster recovery policy in place, unplanned events can easily cripple a company relying on any computer system for accurate and timely information regarding their business processes. No matter the size of the company or quality of the IT department (if one even exists), there needs to be a point of responsibility to ensure continued operation within reasonable and defined constraints. At a minimum this would entail the following:
1. Identify an entity or entities who will be responsible for
carrying out a disaster recovery plan
2. Evaluate and classify critical components
3. Determine what exposure these components represent to the
company's daily operation if they fail
4. Implement processes to adequately protect the failure
points
5. Continually test the recovery process to identify gaps and
validate fulfillment of goals
These basic steps can go a long way to providing a foundation where none may currently be in place. There are many resources for more formalized processes. Some can be daunting in the scope of what they cover, but all have good information that can be used and shaped into something that can work for any business. A quick search on Google with the term "disaster recovery" turned up definitions, blogs, organizations, journals and government sites discussing processes and services for putting a plan in place.
If there is anything to take away from this blog post, I would hope that it would be: Do something. Companies need to take action regarding the health and well-being of the critical data and processes that are needed to run everyday business, no matter how large or small they are. The cost of not doing so could mean "GAME OVER."
I would love to hear from anyone who has had a system down emergency. What processes did you have in place to recover your critical data?