Hi there, I'm using rsync with some large trees of files (on one disk, we have 30M files, for example, and a we might be copying say, 500k files in one tree. The file trees are reasonably balenced -- no single directory has thousands of files in it, for example. Our file system, at the moment, is ext3. We are very comfortable with it, and are hesitant to switch away from it, though JFS or Reiserfs could be persuasive if people's experience strongly suggests that they would help. My guess is that because the tree is reasonably balenced, changing filesystems isn't going to have a major effect on how big a bottleneck the filesystem may be.)
ANYWAY, the point is, as you've guessed, that I hate having to wait 20 or 30 minutes in order to have a transfer start (even when I'm copying to a location that doesn't even have anything there yet, thus, no possibilities of deltas to figure out). I've never really asked about this because my assumption has always been that it takes that long, becuase it simply takes that long to scan the disks, populate rsync's data structures, and get the show on the road, and that if I want it quicker, then I can darn well get faster disks, etc. (a) is that assumption correct? Or am I missing anything? (b) for those of you how understand rsync internals better than I (eg: anyone at all who's done anything with the code :P) Is there any possibility of rsync-in-daemon mode being able to leverage the File Alteration Monitor (FAM) efforts in order to cheaply maintain a more-or-less up to the moment map of the trees it is exporting? (I have reservations about this, because I seem to recall understanding that FAM was *not* designed to watch *vast* huge portions of huge filesystems -- more that it was designed for monitoring specific resources.) For that matter, is this not the sort of thing that ReiserFS, with it's evolution towards a pluggable architecture, might be perfect for? (c) I assume that it would be folly (eg: something that complicates the problem space substantially) to try and write something that simply started copying, and built the map as it went along, or in the background (though I could see this as being very interesting for situations were ones network was *much* slower than ones disks). One of the reasons I ask is that I've often come across rsync being used as a sort of lazy filesystem mirroring tool, the point being to make a sync with a remote filesystem every, say, 10 minutes. Which is fine, until the file tree grows to large to parse in 10 minutes, in which case you have to (a) reduce the transfer frequency, and (b) resign yourself to have your i/o subsystem running flat out *all the time*. Also, with the "monilithic" scan, the filesystem can easily change between the scan being done, and the actual directory/file in question being copied. Might it not be better all round to walk the tree progressively, making a sync plan for each "leaf node" of the tree as one reaches it? Anyway, I'd be interested what people think -- this is an awesome tool, and if there's any chance that addressing some of these things is technically possible, I'd like to know. (Never know, I might be able to help get the work done, or at least fund someone) All the best, -Cedric -- - | CCj/ClearLine - Unix/NT Administration and TCP/IP Network Services | 118 Louisa Street, Kitchener, Ontario, N2H 5M3, 519-741-2157 \____________________________________________________________________ Cedric Puddy, IS Director [EMAIL PROTECTED] PGP Key Available at: http://www.thinkers.org/cedric -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html