Re: Disaster recovery, alternatives to CBU machine in alternate site

Timothy Sipples Thu, 21 Feb 2019 00:31:50 -0800

Thomas Ambrous wrote:
>Are there alternatives to either of these arrangements that allows a
customer
>to support production processing capacity in two sites without incurring
the
>extra cost of CMP, or even the CBU?

Yes, potentially, but backing up half a step, CBU (Capacity Backup) is a
terrific offer. With practically everything else in IT, infrastructure
costs double or more than double when even mediocre disaster recovery
capabilities are added to the mix. Not so with CBU mainframes. There are a
lot of people that don't understand these economics as well as they should.

One potential option is a CBU "sibling": Capacity for Planned Events (CPE).
CPE allows you to activate and to run capacity for certain defined periods
of time when you have, well, certain planned events such as relocations of
workloads during system migrations or data center moves -- not only for
disasters and disaster rehearsals. CBU and CPE can be combined and
frequently are.

Another rather common option is to share DR facilities and capacity with
one or more other like-minded installations, via a DR services firm or
through a private arrangement with one or more other companies. IBM
Business Continuity and Resiliency Services is a notable example. You might
think of this as a "DR cloud," and many companies and government agencies
have taken this approach for years or even decades. One important
prerequisite is that you're reasonable about keeping pace with server and
software technologies so that you stay coordinated with the shared DR
services.

There are a very few customers I know that have fairly demanding or very
demanding RPOs (Recovery Point Objectives) but not-so-demanding RTOs
(Recovery Time Objectives). Their business management claims they'll
tolerate several days of unplanned downtime if they were to lose their
primary site. In that unusual case your DR solution might be based on
server-less physically separate storage. That could be your own storage at
distance, a DR services firm's shared storage, a consortium's or
partnership's shared storage, and/or cloud object storage (via IBM Cloud
Tape Connector for z/OS). In a disaster the idea would be to go buy or rent
a machine, somewhere, quickly, then use the secondary storage instance to
recover. There's obviously some risk in not being able to rehearse
recovery, and it's a rare business (or government agency) that can
genuinely tolerate something like a one week outage, especially once
experienced. :-)

A variation on the above is a "nightlight DR service." That is, in a
disaster the services offered are severely limited. The storage is
replicated and backed up to achieve RPO, and there are some threadbare
servers providing emergency services only, as a little as a "We'll be back
soon!" message (a "nightlight"). Or the "nightlight" service is just enough
to reroute users. For example, if the primary site is providing a payment
service, the "DR nightlight" might be only enough to reroute payment
requests to an alternate payment provider. This scenario is also quite
rare, but it isn't unprecedented.

Some installations decide to use a just prior or even twice prior model
machine for DR. For example, they might upgrade their primary machine every
model cycle and their DR machine every second product cycle. That can be
viable, but that can also hold them back a bit on enjoying the new model's
benefits because they have to make sure that anything they need to recover
can still run on the older model. There's at least an opportunity cost in
that. Same with storage -- they sometimes don't match up the storage models
and configurations exactly the same across sites.

Some customers put a lot of development/test capacity on their secondary
machines using the newer and interesting IBM offers such as Container
Pricing and Solution Edition for Application Development. That often works
quite well in a variety of respects.

Some installations run a Parallel Sysplex on the primary site (on a single
machine or across two or more machines) and then, in a disaster, run a
single member Parallel Sysplex at the DR site. The DR machine is configured
with a bit less memory and maybe slightly reduced I/O, and it runs just
like the primary site but with one of the two z/OS LPARs either nominally
present or absent. Or they run a 2 machine (or multi-machine) Parallel
Sysplex on the primary site and a single machine Parallel Sysplex on the DR
site.

....OK, I'll stop there for now. There are lots of possible variations.

--------------------------------------------------------------------------------------------------------
Timothy Sipples
IT Architect Executive, Industry Solutions, IBM Z & LinuxONE
--------------------------------------------------------------------------------------------------------

E-Mail: sipp...@sg.ibm.com

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Disaster recovery, alternatives to CBU machine in alternate site

Reply via email to