Disaster Recovery Plan and Testing for a 5,000 Site Global Retailer

A global retailer with over 5,000 locations worldwide, sought to improve disaster recovery (DR) for order processing. The company turned to Burwood Group to address technical challenges and tailor a solution within its current infrastructure. Upon completion, the company wanted to implement a major test exercise encompassing their mainframe, production data center, and a wide range of applications.

The Challenge: Slow Uptime in a Complicated Network Environment

The risks of downtime to mission-critical systems include the loss of data, productivity, revenue and the negative impact on the customer experience. When the company experienced a five-day mainframe outage, its board and IT leaders agreed it was time for an update to uphold the company’s commitment to customer satisfaction. Its existing infrastructure was too complex for an off-the-shelf solution, so Burwood sought to improve recovery time without a complete network overhaul.

The Solution: Create an Isolated Disaster Recovery Environment

Working as an extension of the retailer’s IT team, Burwood Group engaged stakeholders across the company to identify past order-processing DR strategies and requirements, and recommend the foundation for a new approach. The critical concern was that the company’s DR backup site is a remote mainframe, but the order-processing backup was not functioning properly. Burwood created an isolated order-processing network, collaborating with the company’s iT team to identify critical order-processing applications to be replicated. Through the isolated environment, Burwood was able to test the solution without disruption to the corporate network.

One challenge was that the company’s current infrastructure is based on legacy SNA network protocol that is not compatible with the TCP/IP protocol used in contemporary DR solutions. Burwood configured a virtual “tunnel” based on a Cisco switch, to integrate the order-processing system with the new DR solution.

Intensive Testing for Fail-safe Disaster Recovery

Upon implementing the solution, the company wanted to test a broad DR scenario far more extensive than any disaster that might occur, to encompass the entire ordering, payment processing, and delivery lifecycle.

One challenge involved the unknown number of dependencies between the Tier 1 and Tier 2 applications to be tested. In addition, the company was planning to eliminate its order processing mainframe, but had not yet implemented the new solution. Therefore, Burwood needed to design the testing exercise to accommodate the current and future state of the order processing workflow.

Working as an extension of the company’s IT team, Burwood Group interviewed roughly 70 stakeholders across the company to gather requirements and uncover hundreds of dependencies in dozens of applications. With a deep understanding of the entire ecosystem, the Burwood project team determined the optimal failover order for every application, and structured a comprehensive testing exercise that would ensure business continuity in the event of a major outage.

Given the scope of the DR test, Burwood recommended using an automated toolset to rapidly and seamlessly transition workflow from the primary systems to the DR servers. The Burwood team created the automation scripts and executed multiple iterations to resolve emerging technical issues.

Burwood Group Services:

  • Disaster Recovery

  • Data Optimization

  • Network Management

  • Program Management

  • Data Center Automation

  • Technology Strategy

The Outcome: Improved Business Continuity

Collaborating with the retailer, Burwood executed a successful DR test for all critical business processes. The four-day exercise encompassed four main sites—including two data centers—in the United States and India, and a total of 171 servers. In addition, the company gained a deeper and more detailed view of its IT environment and application dependencies. Using the playbooks and automated tools created by Burwood, the company can now respond to a disaster affecting any of the company’s core business systems.

Now that an isolated DR network is in place, the company can use it not only for testing DR, but also for testing application upgrades without risking the critical production environment. Based on the success of the order-processing DR solution and testing, Burwood and the retail company are now creating a DR and refresh strategy solution for all of the company’s applications and data. Most important, the exercise demonstrated that the business could withstand a major disruption without missing a beat. For senior management, that assurance means confidence and peace of mind.

I did not hear an ‘it can’t be done’ during this entire initiative. This was the largest and most comprehensive disaster recovery test ever performed and completed at our company. We were able to uncover and learn a tremendous amount about the complexities of our infrastructure and systems, and demonstrate what we could accomplish together. As part of the exercise, we were able to successfully complete all of our primary goals, secondary goals, and also add in additional scope to protect our mainframe from outages.
— Senior Director, Infrastructure & Security