>>>>> " " == Otto Wyss <[EMAIL PROTECTED]> writes:
> It's commonly agreed that compression does prevent rsync from > profit of older versions of packages when synchronizing Debian > mirrors. All the discussion about fixing rsync to solve this, > even trough a deb-plugin is IMHO not the right way. Rsync's > task is to synchronize files without knowing what's inside. > So why not solve the compression problem at the root? Why not > try to change the compression in a way so it does produce a > compressed result with the same (or similar) difference rate as > the source? > As my understanding of compression goes, all have a kind of > lookup table at the beginning where all compression codes where > declared. Each time this table is created new, each time > slightly different than the previous one depending on the Nope. Only a few compression programs use a table at the start of the file. Most build the table as they go along. Saves a lot of information not to copy the table. gzip (I hope I remeber that correctly) for example increases its table with every character it encodes, so when you compress a file that does only contain 0, the table will not contain any a's, so a can't even be encoded. bzip2 on the other hand resorts the input in some way to get better compression ratios. You can't resort the input in the same way with different data. The compression rate will dramatically drop otherwise. ppm, as a third example, builds a new table for every character thats transfered and encoded the probability range of the real character in one of the current contexts. And the contexts are based on all previous characters. The first character will be plain text and the rest of the file will (most likely) differ if that char changes. > source. So to get similar results when compressing means using > the same or at least an aquivalent lookup table. If it would > be possible to feed the lookup table of the previous compressed > file to the new compression process, an equal or at least > similar compression could be achieved. > Of course using allways the same lookup table means a deceasing > of the compression rate. If there is an algorithmus which > compares the old rate with an optimal rate, even this could be > solved. This means a completly different compression from time > to time. All depends how easy an aquivalent lookup table could > be created without loosing to much of the compression rate. Knowing the structure of the data can greatly increase the compression ratio. Also knowing the structure can greatly reduce the differences needed to sync two files. So why should rsync stay stupid? MfG Goswin