Howdy, all. I'm giving my offsites a little bit of a workout, and am trying to identify a bottleneck in the remote-volume access path. I'm hoping someone else has messed with this too.
My offsites live in a machine room 300-some miles from my main site. This led to a variety of TCP tuning experiments as I tried to get it right. After setting the TCP windows to ~2M, I get essentially local performance out of my tape drives (peaks at ~100MB/s, reading from Gainesville tape and writing to Atlanta tape). I also get good speed on the way back. But when I restore from one of those copystg volumes, my throughput is about 2.5MB/s, which is suspiciously close to the throughput I was getting before I tuned the TCP window. So I've been doing some experiments. I can get a client to backup and restore directly to Atlanta at 16-20MB/s, but if I insert the local TSM server I get no change, even though each individual leg goes MUCH faster by itself. I'm thinking I've got a TSM protocol-level analogue of the TCP-level window problem: I can only have so much data in-flight before someone wants an ACK, which limits the total throughput. But I think it's in the TSM-level command stream. I've dodged questions of file count: I would understand it if objects were moving faster than DB commits could happen, but my current test case is a single ~1G file. Now, TCPBUF at the server level would seem tempting as a knob, but that's too small (32K documented max) and specifically disavows relationship with TCPWindow. No other options look suggestive. TCPBuff at the client level doesn't document up to 2M, but when I moved it from default value to 512K, I saw zero difference in speed, so I don't think that's it. Ideally, I should be able to restore from the offsite datastore with only the interference of non-collocated, tiny volumes (as if that's not plenty). It'd be nice if at least the transfer speed were better. Any insight, experience, whatever? - Allen S. Rout
