Highly Available TSM

Johnson, Milton Fri, 14 May 2004 08:41:47 -0700

Management is desiring to implement a highly available TSM system with
the 
following requirements:


Campus consists of 3 buildings, presently with the lone TSM server in
Bldg. 3.
If Bldg. 3 goes down then all TSM activities are unavailable and
management 
wants to eliminate that single point of failure.  The goal is that the
loss of
a single TSM server or building would have zero impact on TSM
activities, i.e. 
back-ups, restores, producing off-site tapes, with no/minimal
intervention. 
They do not want to have to modify the clients to utilize an alternative
TSM server.  Our present server is running on AIX.

OK, so this sounds like running TSM on a HACMP cascading cluster could
be an
answer.  I can see the following shared/non-shared resources:
Physical Location
---------------------
Bldg 3     Bldg 1     Description
------------------------------------------------------------------------
--------
HDISK2     HDISK3     TSM DB Volumes mirrored via TSM - Separate VGs
HDISK4     HDISK5     TSM LOG Volumes mirrored via TSM - Separate VGs
HDISK6     HDISK7     Singled mirrored VG housing TSM Storage Pool used
for on-
.                     site "tape" pool,  Stgpool name=TAPEPOOL.  An
alternative
.                     could be a mirrored virtual tape library.
3494ATL               Used to create/read offsite tape volumes when Bldg
3 is
.                     the active node using 3494E drives. Stgpool
name=COPYPOOL
.                     This is a non-shared TSM resource.
.          3494ATL    Used to create/read offsite tape volumes when Bldg
1 is
.                     the active node using 3494E drives. Stgpool
name=COPYPOOL
.                     this is a non-shared TSM resource.

So when failover to Bldg 1 happens, HDISKs 3, 5 & 7 are used and that
takes care
of the DB, Logs & onsite tapes, and clients can continue to
backup/restore 
to/from TAPEPOOL.  The issues with the tape library for COPYPOOL present
me 
with some questions.  The issues I see include:
1) After failover TSM will have an incorrect view of what is in the
library.  I
.  assume that running an "Audit Library LibName Checklabel=barcode"
followed
.  a "LABEL LIBVolume LibName SEARCH=Yes CHECKIN=SCRatch DEVType=3590 
.  OVERWRITE=Yes" will give TSM an accurate view of the library contents
and a
.  supply of scratch tapes.

2) After failover, COPYPOOL volumes in the failed library may actually
be
.  physically lost/destroyed.  I believe that if I identify those
volumes and
.  update their access to destroyed and backup TAPEPOOL to COPYPOOL I
will have
.  solved that problem.

When Bldg 3 comes back on-line and HDISKs 2,4 & 6 are available I would:
1) Resync HDISK6 via AIX. (this could take a while)
2) varyon the VGs for HDISKs 2 & 4 and mount the file systems
3) resync the DB & LOG volumes on HDISKs 2 & 4 via TSM 

So my questions are:
What have I missed?  Anyone out there administering a HA-TSM system with
cluster nodes in different buildings? If so what is your architecture
like?
is their a way to achieve these goals without HACMP?  If so, how?

Thanks,
Milton Johnson

Highly Available TSM

Reply via email to