On Sat, Jun 01, 2002 at 08:51:26PM +1000, Donovan Baarda wrote: > On Fri, May 31, 2002 at 05:25:15PM -0700, jw schultz wrote: > > On Fri, May 31, 2002 at 11:45:43AM +1000, Donovan Baarda wrote: > > > On Thu, May 30, 2002 at 03:35:05PM -0700, jw schultz wrote: [...] > > > I don't think it is possible to come up with a scheme where the reset > > > windows could re-sync after a change and then stay sync'ed until the next > > > change, unless you dynamiclly alter the compression at sync time... you may > > > as well rsync the decompressed files. > > > > The only way to do it is to make a content-aware compressor > > that compresses large chunks and then pads the compresstext > > to an aligned offset. That would be too much waste to be a > > good compression system. > > even this wouldn't do it... the large chunks would have to be split on > identical boundaries over unchanged uncompressedtext in the basis and the > target. The only way this could be achieved would be if the target was > compressed using resets on boundarys determined by analysing the changes and > boundaries used when the basis was compressed. If the end that has the > target file has that degree of intimate knowege about the other end's basis > file, then you can toss the whole rsync algorithm and revert to some sort of > compressed xdelta.
I guess i wasn't clear enough but that's OK because your response made be think a bit more on the subject so ignore my idea of padding the compresstext blocks. When i said "content-aware compressor" what i meant was that the compressor would actually analize the plaintext to find semantically identifiable blocks. For example, a large HOWTO could be broken up by the level-2 headings This would be largely (not always) consistant across plaintext changes without requiring any awareness of file history. Have rsync be compression aware. When rsync hits a gziped file it could treat that file as multiple streams in series where it would restart the checksumming each time the compression table is reset. I can't see this actually happening but it could work where the compression is done by the application that creates the file. If, and only if, that were to be done so that there were enough to be worthwhile then rsync could be made compression aware in this way but that would require a protocol change. Your idea of having rsync actually do the checksums on the plaintext of compressed files might have some merit in future. It would mean essentially that we would zcat the source twice and the destination would be ungzipped, merged and then regzipped. Gastly as far as CPU goes but would help save us network bandwidth which is growing at a lower rate. The questions are, what is the mean offset of first change as a proportion of file size and are enough files gzipped to merit the effort? -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html