So this is an interesting problem to think about. It seems to take this form:
- Data comes in fixed-size blocks - Blocks may be copied elsewhere - Blocks may be modified in place - Blocks may be copied with modification BUT - There are no insertions/deletions - The overall file does not change in size (or does so in terms of whole blocks) This suggests restarting the "match with preceding data in the same file" algorithm at each block boundary. Also, it could match either data at lower address or higher addresses, but assuming we progress from lower to higher, the "lower address" data obviously has to be from the new version of the file and "higher address" data from the old version. A simple form of the algorithm would attempt to produce, in order, a directive for each block that would (a) leave the block as is, (b) copy the block from another block on the destination, or (c) provide new data for the block. Case (c) could be further refined to analyze the block using the traditional rsync algorithm to reduce the number of bytes transferred. While rsync obviously works for arbitrary data, it is organized to detect similarities at any offset, and thus to handle insertion/deletion of bytes well. I further observe that *in place* update, if performed in a predetermined order, such as low to high addresses, deals less efficiently with some shufflings of data than with others. In particular, if an application copies from data from low addresses to high ones, and the algorithm also processes in that same order, then the copied data will have to be resynthesized/transferred, since the old copy will have been written over at the destination before we get to place for the new copy. It would be possible in principle to do in-place update *and* to use a scratch temp file at the destination to hold data that you were about to overwrite and will want later in the process. This avoid network bandwidth at the cost of I/O bandwidth / space at the destination. I think the maximum space required is half the size of the old version of the file at the destination, though the simplest use of the temp file might lead to an upper bound of the size of the target file (imagine a cyclic shuffling of blocks, where all but the last move up one "slot" and the last moves to the first slot). Maybe rsync already does this? I don't know enough about the internals of rsync to know what all this might suggest ... Regards -- Eliot Moss -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html