On Fri, May 31, 2002 at 11:45:43AM +1000, Donovan Baarda wrote: > On Thu, May 30, 2002 at 03:35:05PM -0700, jw schultz wrote: > [...] > > > There is a patch available to gzip to add an option --rsyncable that's > > > supposed to make it work better with rsync. It's been put into the > > > "patches" directory for the next release of rsync, or you can get it at > > > > > > http://rsync.samba.org/ftp/unpacked/rsync/patches/gzip-rsyncable.diff > > > > I took a quick look at this patch and i think it does what i expected. > > It resets the compression algorithm after each 4KB of > > compresstext. This means that if you change 1 byte early in > > the file it might or might not affect the blocks later on. > > The reason for the equivication is that if the change alters > > the compression ratio the savings are gone. > > If that is how it works, and I think you are right, then it would only work > for the smallest of cases, rendering the gzip-rsyncable patch worse than > useless for the vast majority of cases. > > Regular resets hurt the compression ratio. Resets must occur at the same > begin/end boundary points of an unchanged sequence of uncompresstext for the > resultant compresstext to be unchanged. The only changes that will result in > resets occuring at the same boundary points for any unchanged text following > the change _must_ result in compresstext that is an exact multiple of 4KB. > This means any insertion/deletion/replacement must not change the size of > the resulting compresstext unless it is by an exact multiple of 4KB. > > I would guess that the number of changes meeting this criteria would be > almost non-existant. I suspect that the gzip-rsyncable patch does nearly > nothing except produce worse compression. It _might_ slightly increase the > rsyncability up to the point where the first change in the uncompresstext > occurs, but the chance of it re-syncing after that point would be extremely > low.
Actually many file modifications do just fine. The key being to recognize that any plaintext modification will alter the compresstext from that point to the end. Most content modifications alter the blocks nearest the end of the file. Think about how you edit text and Word processor documents. What this does bring up in my mind is a trend i see in data formats. Specifically, the use of compressed XML. StarOffice/OpenOffice, KOffice and i think several others are going this route. Maintaining volatile meta-data at the beginning of their files will defeat rsync's rolling checksums. I'm not sure how but perhaps we could encourage the developers to somehow isolate the volatile meta-data at the end of the file or in a fixed size block at the beginning. Otherwise a user opening a file and changing the view-mode or fixing a single typo in the last paragraph would alter the entire binary file. This trend will also affect several other aspects of systems and network administration. We are rapidly approaching a day when most application files stored in home directories and shared work areas will be compressed. This means that that those areas will not benifit from network or filesystem compression. And our so-called 200GB tape drives will barely exceed 1:1 compression and only hold 100GB of these types of files. I expect non-application files to remain uncompressed for the forseeable future but we should recognize that the character of the data stored is changing in ways that disrupt the assumptions many of our tools are built upon. > I tried to think of a way of doing this so that it would eventualy re-sync, > with things like resets every <some-prime> bytes so that the reset window > moves, but the problem is the source and target reset windows must move > together for it to work, so any scheme that moves the reset window into sync > will also move the window _out_ of sync. > > I don't think it is possible to come up with a scheme where the reset > windows could re-sync after a change and then stay sync'ed until the next > change, unless you dynamiclly alter the compression at sync time... you may > as well rsync the decompressed files. The only way to do it is to make a content-aware compressor that compresses large chunks and then pads the compresstext to an aligned offset. That would be too much waste to be a good compression system. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html