I had considered that, but the problem is that when a mount is blocked by a request, the process is too, which could really slow things down if we can't get to it immediately.
I've ended up making judicious use of simultaneous copy, which seems to have helped this week. I've had to identify hosts that simply will never be able to stream to tape and point those to a separate FILE pool that has simultaneous copy disabled, but those are a fraction of our total backup load so storage pool backups finished by the time the check out occurs. I'm hopefuly that this will solve, or at least mitigate substantially, the problem for the long-term. Thanks all for the suggestions and discussion! On Mon, Jun 09, 2014 at 06:41:45PM -0700, Alex Paschal wrote: > Hi, Skylar. Have you tried setting your MOUNTWAIT to 0 or 1? It seems > to me that should allow the operator request to time out and your > processing to continue. > > > On 6/3/2014 2:12 PM, Skylar Thompson wrote: > > We've been suffering with the effects of this APAR for a while, which IBM > > fixed as a documentation errata rather than fixing TSM itself: > > > > http://www-01.ibm.com/support/docview.wss?uid=swg1IC87352 > > > > Basically the issue is that there is a race condition with running MOVE > > DRMEDIA on tape volumes while BACKUP STGPOOL is also running. BACKUP > > STGPOOL might choose a FILLING volume that MOVE DRMEDIA is also removing > > from the library, which causes an operator request to be raised. We must > > either check the volume back in, or cancel the request, allow TSM to mark > > the volume UNAVAILABLE and then update the volume to be OFFSITE. > > > > We have some challenges in our TSM environment: > > > > 1. The data ingest is highly bursty - some days we might have 100GB in > > backups, while others we might have 60TB. We average around 2TB/day in > > additions to primary storage. > > > > 2. We are not staffed 24x7, so we can't have operator requests going off > > outside business hours. > > > > 3. We have no dedicated staff managing our TSM/tape library environment, so > > we prefer not getting any operator requests since we might not be able to > > act on them immediately. > > > > 4. For budget and policy reasons, we have a weekly (not daily) shipment of > > tape to our offsite vault. > > > > I've rejiggered our client and admin schedules, and reclamation to try to > > avoid > > having writes into the copy pools happen while we do the checkout during > > business hours, but it's quite difficult to actually quiesce everything. > > > > It seems like we have these options: > > > > 1. Just live with it as it is. > > > > 2. Don't run BACKUP STGPOOL on the day that the checkout will happen. > > > > 3. Automate checking for writes into copy pools and cancel the > > session/process responsible for them. This might require restricting the > > number of mounts in our tape device classes, and also seems like it has the > > risk of being more disruptive than we really want. > > > > Have I missed anything? How are other people approaching this problem? > > > > Thanks, > > > > -- > > -- Skylar Thompson (skyl...@u.washington.edu) > > -- Genome Sciences Department, System Administrator > > -- Foege Building S046, (206)-685-7354 > > -- University of Washington School of Medicine > > -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine