On Fri, May 31, 2002 at 05:25:15PM -0700, jw schultz wrote: > On Fri, May 31, 2002 at 11:45:43AM +1000, Donovan Baarda wrote: > > On Thu, May 30, 2002 at 03:35:05PM -0700, jw schultz wrote: [...] > > I would guess that the number of changes meeting this criteria would be > > almost non-existant. I suspect that the gzip-rsyncable patch does nearly > > nothing except produce worse compression. It _might_ slightly increase the > > rsyncability up to the point where the first change in the uncompresstext > > occurs, but the chance of it re-syncing after that point would be extremely > > low. > > Actually many file modifications do just fine. The key > being to recognize that any plaintext modification will > alter the compresstext from that point to the end. > Most content modifications alter the blocks nearest the end > of the file. Think about how you edit text and Word processor > documents.
So it is not possible for rsync to get any matches on a gzip-rsyncable compressed file after the first modification. Does the gzip-rsyncable patch actually improve the rsyncability of compressed files at all? AFAIKT, files compressed normaly should be pretty much rsyncable up to the same point. Reseting the compression every 4K probably does allow you to rsync closer up to that point, but only because the resets make the compression less efficient... ie any savings from matching closer to the modification are lost because of the overall larger file. [...] > This trend will also affect several other aspects of systems > and network administration. We are rapidly approaching a > day when most application files stored in home directories > and shared work areas will be compressed. This means that > that those areas will not benifit from network or filesystem > compression. And our so-called 200GB tape drives will > barely exceed 1:1 compression and only hold 100GB of these > types of files. I expect non-application files to remain > uncompressed for the forseeable future but we should > recognize that the character of the data stored is changing > in ways that disrupt the assumptions many of our tools are > built upon. I think that the increased use of compressed files is going to require that rsync-like tools become compression aware, and be smart enough to decompress/recompress files when syncing them. I see no way around it, other than throwing heaps of bandwidth at the problem :-). Needless to say this will make the load on servers even worse. However server side signature caching and client side delta calculation would probably end up making the load on servers even lower than it currently is. [...] > > I don't think it is possible to come up with a scheme where the reset > > windows could re-sync after a change and then stay sync'ed until the next > > change, unless you dynamiclly alter the compression at sync time... you may > > as well rsync the decompressed files. > > The only way to do it is to make a content-aware compressor > that compresses large chunks and then pads the compresstext > to an aligned offset. That would be too much waste to be a > good compression system. even this wouldn't do it... the large chunks would have to be split on identical boundaries over unchanged uncompressedtext in the basis and the target. The only way this could be achieved would be if the target was compressed using resets on boundarys determined by analysing the changes and boundaries used when the basis was compressed. If the end that has the target file has that degree of intimate knowege about the other end's basis file, then you can toss the whole rsync algorithm and revert to some sort of compressed xdelta. -- ---------------------------------------------------------------------- ABO: finger [EMAIL PROTECTED] for more info, including pgp key ---------------------------------------------------------------------- -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html