On Mon, Nov 07, 2005 at 10:59:21PM -0500, Chris Shoemaker wrote: > Ok, so the purpose of the directory list is to make sure all the local > directories are scanned for potential basis files, even directores not > mentioned in the transmited file-list, right? I didn't realize that > would require a table and delaying the scan of unknown directories > until *after* the file-list scan was done.
The idea behind that was to avoid delaying the reception of the file list, but it would also be possible to immediately scan the extra directories instead (but this is largely moot -- see below). > Are you saying only unchanged files are available as alternate basis > files? If we can, I think it's worth avoiding this restriction. If we were to use the files directly, then it would be complicated to try to order the updates to avoid changing a file before another file could use it as a basis file. However, I've come up with an algorithm I like better that avoids this restriction completely: Rsync already supports the idea of a "partial dir" that can be scanned for partially-transferred files and delayed updates. I'm thinking that hard-linking files into this directory makes this new feature much easier and more memory efficient (the dir is named ".~tmp~" by default, relative to the containing directory of the to-be-updated files). I also thought through where I'd like the rename scan to go. I finally decided that I liked the idea of piggy-backing the scan on the existing delete-before or delete-during scans that already occur, since this makes the logic much simpler (the code already exists to handle all the proper include/exclude logic, including local .cvsignore/.rsync-filter files) and it should also make the scan quick because it will take advantage of disk I/O that is either already occurring, or is at least in close proximity to identical stat() calls that the generator's update code is going to make. (If either --delete-after was selected or no deletions are occurring, rsync does the rename scan during the transfer using a non-deleting version of the delete-during code). The only potential problem with this scan position is that the receiving side may not have fully finished its scan when we encounter a missing file that doesn't have a size+mtime match yet, so I allow missing files to be delayed until the receiving-side scan is complete (at which point we check to see if a match has shown up yet or not). My code also attempts to match up files even when they're not missing. This works to the fullest extent when a delete-before scan is in effect, but it still handles the case of the rotating log files quite nicely (associating all the moved files together as you would expect). A patch for the CVS version is here: http://opencoder.net/detect-renames.diff The code is still a little ugly, but it does appear to work well in my limited testing. If I like the idea, I'll look into how to share the code for the delete scan in a way that is not as ugly as the current logic. > $ cp foo foo.orig; edit foo > > Not using the old foo as the basis for foo.orig just because foo > changed really hurts. If the user uses "cp -p foo foo.orig" we will find it. The patch could be extended to switch from size+mtime to use size+checksum, but I haven't done that yet (and checksumming is so slow that most folks tend to avoid it). ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html