Alan, That's the most coherent and complete discussion I've ever read about managing remote volumes. Thank you MUCH!!
Wanda On Mon, Nov 10, 2008 at 10:07 AM, Allen S. Rout <[EMAIL PROTECTED]> wrote: > >> On Sun, 9 Nov 2008 14:08:56 -0500, "Wanda Prather" <[EMAIL PROTECTED]> > said: > > > When you are doing reclaims of the virtual volumes, doesn't the data > > that is being reclaimed from the virtual tape have to travel back > > across the network to the original TSM server's buffer, then out > > across the network again to the new virtual volume? > > Short answer: "No". The design is for a new copy volume to be created > from primary volumes, and when all the data on a given to-be-reclaimed > copy volume is present on newly built copy volumes, the old one goes > pending. > > The answer gets longer, though. "sometimes", the reclaiming server > decides to read from a remote volume. The Good Reason for this is > when the primary volume is for some reason damaged or Unavailable. > There are other times when the reclaiming server just gets an impulse > and changes gears. I had a few PMRs about this, and the response was > somewhat opaque. The conclusion I drew was "Oh, we found a couple of > bits of bad logic, and we tuned it some". > > One interesting aspect of this is the changing locality of the offsite > data. When you make initial copies, your offiste data is grouped by > time-of-backup. When you reclaim that same data, the new offsite > volumes are built by mounting one primary volume at a time, so the > locality gradually comes to resemble that of the primary volumes. > Collocated, perhaps? > > It's a side effect, but a pleasant trend. I have often wished there > were a collocation setting "Do what you can, but don't have a fit > about it". > > > > What has been your experience of managing that? Do you just keep > > the virtual volumes really small compared to physical media? > > (assuming the target physical media is still something enormous like > > LTO3) Or do you just resolve to have a really low utilization on the > > target physical media? > > This is something I don't have a good theoretical answer for, yet. > And boy howdy, I've tried. Certainly, I waste more space in > reclaimable blocks, because there are two levels, at least, of > reclaiming going on: the physical remote volumes, and the virtual > volumes within them. > > Here is the answer I have used, with no particular opinion that it's > theoretically sound: > > + Most of my remote storage access is directly to the remote tapes. I > have a few clients who have tighter bottlenecks and send them to > disk, but 'direct to tape' is the rule. Note that this means when I > have >N streams trying to write, clients get in line and wait for a > drive, round-robin style. > > + I have some servers storing remote volumes of 50G MAXCAP, some of > 20. I haven't noted a big difference between them. Biggest > theoretical basis for choosing I can come up with is the speed of > round-robin on access to the remote tapes. > > + My biggest pain in the patoot so far comes from individual files > that are much bigger than the remote volume size. I hate re-sending > an initial chunk, then 4 intermediate volumes I know to be identical > to the remote volumes already present, and then re-sending the tail > chunk. > > + The other biggest pain in the patoot is that, while devices are > round-robining at the remote site, the source media is allocated at > the local site. This means that you can deadlock your way into a > mess of 'no tapes available' if you get congested. > > I find this to be a metastable situation: Things go very smoothly > until you hit some boundary condition, and then you have a > turbulence incident which takes intense, sustained effort to > resolve. > > > > > How do you know how big the "reallly big pipe" needs to be to take > > care of the reclaims? > > This I -do- have a theoretical answer for. See above when I talked > about round-robin on the remote tape drives? You want a pipe big > enough to stream all the remote drives. By implication, you can > stream the same count of local drives; this means that, while you may > have processes waiting in line for remote access, they won't be > waiting for a network-constrained bottleneck. > > Of course, that's easier said than done: 3592E05s are theoretically > capable of what, 200M/s? I mean, that's what the brag sheet > says... :) In realistic terms I get 60M sustained, 80-90M spikes. In > other realistic terms, you don't often have -everything- streaming at > once. So to calculate what your site would want, I suggest: > > + Get a Gb connection up. Run one stream. Optimize. Measure > sustained bandwidth. > > + Multiply sustained bandwidth * number of remote drives. Attempt to > get this size pipe. > > + Return to your cube, frustrated that they Just Don't Understand. Be > happy you've got a Gb. Work to fill it 24x7. > > > Actually, I'm lucky: I've got budget to go to about 2G this fiscal > year. Woot! > > > > - Allen S. Rout >