On Mon, May 23, 2005 at 03:24:07PM +0200, Edwin Eefting wrote: > My idea is to create a patch for something like a --cache option that > will use a cached version of the filelist:
Something like that would be fairly easy to write, but only if there are no conflicts between the cache and the live disk. One would simply need an on-disk representation for the file-list's in-memory data structure, and a way to save/restore it. If you limited the code to a single source hierarchy, it might even be possible to use the current send & receive code for the file-list (with just a little touch-up of the dir.root value that is sender-side only, and thus not properly set by the receive code). Every time the server updates, it would want to use an atomic-update algorithm, like the one implemented in the atomic-rsync perl script in the "support" dir (which uses a parallel hierarchy and the --link-dest option to update all the files at the same time). An alternative to this --cache idea is to use the existing batch-file mechanism to provide a daily (or twice daily, etc.) update method for users. It would work like this: - A master cache server would maintain its files using a batch-writing rsync transfer updating it atomically (as mentioned above) so that (1) the batch-creating process can be restarted from scratch if the rsync run doesn't finish successfully, and so that (2) users have a source hierarchy that exactly matches the last batch-file's end-state. - The resulting batch file would be put into a place where it could be downloaded via some file-transfer protocol, such as on a webserver. - As long as the user didn't modify the portage hierarchy between batched runs, it would be possible to just apply each batched transfer, one after the other, to update to the latest hierarchy. If something goes wrong with the receive, it is safe to just run the batch-reading command again (since rsync skips the updates that were already applied; N.B. --partial must NOT be enabled.) A a fall-back, a normal rsync command to fetch files from the server would update any defects and get you back in sync with the batched updates. - I'd imagine using something like an HTTP-capable perl script to grab the data and output it on stdout -- this would let the batch be processed as it arrived instead of being written out to disk first. Such an update mechanism would work quite well for a consistent N batched updates per day (where N is not overly large). A set of source servers could even use this method to mirror the N-update hierarchy throughout the day. As long as the batch files are named uniquely, the end-user doesn't need to run the command on a regular schedule: the script could be smart enough to notice when the local portage hierarchy was last updated and choose to either perform one or more batch-reading runs, or to fall-back to doing a normal rsync update. ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html