I would use MOVE NODEDATA commands to move the data for the effected nodes to a (?new?) collocated pool before they start trying to do their restores. That lets you get a lot of the tape mounts and so forth out of the way while the clients aren't ready yet to be restored. You can pace how much this is done and when it is done by how many MOVE NODEDATAs you have running and when you run them manually. It won't solve the fact that you're running at capacity, but it will let you minimize restore times for the victims of the fire.
Just a thought, Nick -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Roger Deschner Sent: Tuesday, January 22, 2008 2:40 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] DISASTER: How to do a LOT of restores? We like to talk about disaster preparedness, and one just happened here at UIC. On Saturday morning, a fire damaged portions of the UIC College of Pharmacy Building. It affected several laboratories and offices. The Chicago Fire Department, wearing hazmat moon suits due to the highly dangerous contents of the laboratories, put it out efficiently in about 15 minutes. The temperature was around 0F (-18C), which compounded the problems - anything that took on water became a block of ice. Fortunately nobody was hurt; only a few people were in the building on a Saturday morning, and they all got out safely. Now, both the good news and the bad news is that many of the damaged computers were backed up to our large TSM system. The good news is that their data can be restored. The bad news is that their data can be restored. And so now it must be. Our TSM system is currently an old-school tape-based setup from the ADSM days. (Upgrades involving a lot more disk coming real soon!) Most of the nodes affected are not collocated, so I have to plan to do a number of full restores of nodes whose data is scattered across numerous tape volumes each. There are only 8 tape drives, and they are kept busy since this system is in a heavily-loaded, about-to-be-upgraded state. (Timing couldn't be worse; Murphy's Law.) TSM was recently upgraded to version 5.5.0.0. It runs on AIX 5.3 with a SCSI library. Since it is a v5.5 server, there may be new facilities available that I'm not aware of yet. I have the luxury of a little bit of time in advance. The hazmat guys aren't letting anyone in to asess damage yet, so we don't know which client node computers are damaged or not. We should know in a day or two, so in the meantime I'm running as much reclamation as possible. Given that this is our situation, how can I best optimize these restores? I'm looking for ideas to get the most restoration done for this disaster, while still continuing normal client-backup, migration, expiration, reclamation cycles, because somebody else unrelated to this situation could also need to restore... Roger Deschner University of Illinois at Chicago [EMAIL PROTECTED]