Yesterday, as I was still waiting for a large rsync mirror to finish, I was thinking that it would be interesting if you could run multiple rsyncs and have them cooperate to mirror a repository from several different sources. I think a close approximation should be fairly easy to do, but I just won't have any time to do it.
My thought is that it could be implemented fairly inexpensively by mostly relying on the temporary files that are already written. If the temp files were given a common extension (even if it were just common among a concurrent set of rsyncs), the processes could use the temporary files to determine which daemon works on which file. In other words, get a truely random file name, open it, get it's inode number, and then rename it to the common temp file name. If your inode number no longer belongs to that file at any point during the transfer, skip on to the next file. Of course, this is only if the temp file doesn't already exist at the start of the attempt to rsync that file. I'm not really sure of how rsync does the transfers, which could lead to some stickiness. I know it builds up a list of changed files at the very beginning, which it then works on. It's not clear if it only exchanges "deltas" which are computed while the transfer is happening, or if that's all computed up front. If it's the former, it may be possible to just leave the transfer portion as it is, and have the different rsync daemons skip files that are already in transit. If it's all computed up front, I could imagine a mode where you store the inode number of the old file and skip the transfer if the inode has changed since the initial building of the file lists. This is pretty light-weight, and there are certainly instances where it could get things wrong. However, for my situation, it would get me close enough, and then I could run a final rsync to catch the stragglers. It seems like this would be much less work to implement than anything like using sockets/pipes/spread to communicate between the daemons, setting up a master daemon, etc... Just thought it might be worth sharing. My usage is that the push-primary Debian mirror I was syncing against went away a few weeks ago, and the new mirror I'm syncing against is only giving me 60KB/sec. I wouldn't mind getting more than that, particularly during low usage periods where I might be able to get much more than that from a fast site. Yeah, I know I could change the mirror I'm using, but there aren't many push-primaries. That's what gave me the idea. Sean -- I hear a cow jack-knifed on the Harley Memorial Bridge... There was milk everywhere. -- Stephanie, _Newhart_ Sean Reifschneider, Member of Technical Staff <[EMAIL PROTECTED]> tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
