Management is desiring to implement a highly available TSM system with the following requirements:
Campus consists of 3 buildings, presently with the lone TSM server in Bldg. 3. If Bldg. 3 goes down then all TSM activities are unavailable and management wants to eliminate that single point of failure. The goal is that the loss of a single TSM server or building would have zero impact on TSM activities, i.e. back-ups, restores, producing off-site tapes, with no/minimal intervention. They do not want to have to modify the clients to utilize an alternative TSM server. Our present server is running on AIX. OK, so this sounds like running TSM on a HACMP cascading cluster could be an answer. I can see the following shared/non-shared resources: Physical Location --------------------- Bldg 3 Bldg 1 Description ------------------------------------------------------------------------ -------- HDISK2 HDISK3 TSM DB Volumes mirrored via TSM - Separate VGs HDISK4 HDISK5 TSM LOG Volumes mirrored via TSM - Separate VGs HDISK6 HDISK7 Singled mirrored VG housing TSM Storage Pool used for on- . site "tape" pool, Stgpool name=TAPEPOOL. An alternative . could be a mirrored virtual tape library. 3494ATL Used to create/read offsite tape volumes when Bldg 3 is . the active node using 3494E drives. Stgpool name=COPYPOOL . This is a non-shared TSM resource. . 3494ATL Used to create/read offsite tape volumes when Bldg 1 is . the active node using 3494E drives. Stgpool name=COPYPOOL . this is a non-shared TSM resource. So when failover to Bldg 1 happens, HDISKs 3, 5 & 7 are used and that takes care of the DB, Logs & onsite tapes, and clients can continue to backup/restore to/from TAPEPOOL. The issues with the tape library for COPYPOOL present me with some questions. The issues I see include: 1) After failover TSM will have an incorrect view of what is in the library. I . assume that running an "Audit Library LibName Checklabel=barcode" followed . a "LABEL LIBVolume LibName SEARCH=Yes CHECKIN=SCRatch DEVType=3590 . OVERWRITE=Yes" will give TSM an accurate view of the library contents and a . supply of scratch tapes. 2) After failover, COPYPOOL volumes in the failed library may actually be . physically lost/destroyed. I believe that if I identify those volumes and . update their access to destroyed and backup TAPEPOOL to COPYPOOL I will have . solved that problem. When Bldg 3 comes back on-line and HDISKs 2,4 & 6 are available I would: 1) Resync HDISK6 via AIX. (this could take a while) 2) varyon the VGs for HDISKs 2 & 4 and mount the file systems 3) resync the DB & LOG volumes on HDISKs 2 & 4 via TSM So my questions are: What have I missed? Anyone out there administering a HA-TSM system with cluster nodes in different buildings? If so what is your architecture like? is their a way to achieve these goals without HACMP? If so, how? Thanks, Milton Johnson