Goswin von Brederlow wrote: [snip] > >> With cumulative patches you run into the problem that you need a new > >> cummulative patch for every day that contains most of what the > >> previous one did. That realy quickly becomes a space issue. > > > > Errm, no, it doesn't need _one_ new cumulative patch. All the > > previously made cumulative diffs need to be updated. > > I was thinking of a > > -1day.diff > -2day.diff > -3day.diff > ... > > So every day a new file appears at the end and contains most of what > all the others already contain. > > Updating those cummulative diffs is also either inefficient (cat the > daily diffs together),
That wouldn't be a cumulative diff. > very hard (figure out how to make a minimal diff > from the daylies) or you need every days Packages file (apt-dupdate > does that). It is not "very hard" to re-diff a few files to incorporate the changes between old and new Packages file. > Having to store and diff every past days Packages file is a huge > resource drain and can't be done for more than a couple of days, maybe > up to 2 weeks. You don't need to store it. > Ask the apt-dupdate author for how long it takes every night and how > much disk space it uses. If that's true, then apt-dupdate is an example how to not do it. > > If we assume to hold 14 update cycles, have a cutoff if the size of > > the cumulative diff exceeds the size of the Packages file, and have > > linear growth of the diffs, then the additional space used is at most > > seven times the size of the Packages file. Normally it will be much > > less, because large archives don't thend to change that quickly. > > 14 update cycles is a limitation on the process and isn't needed with > sorted Packages files. It is not a hard limit, and to speedup exorbitant numbers of update cycles isn't needed except for pathologic cases. > Also how do you get 'seven times'? ... linear growth of the diffs ... > Say every day one package changes > bt on the last nearly ever package changes. That means all 14 > cummulative diffs will be the size of the Packages file (change as > many packages as possible but so that all stay below the cutoff). > > That would be nearly 14 times the space. Which is the (unlikely) worst case. > > [snip] > >> >> - extra space needed for the diff files > >> > > >> > Which is minimal in comparision to the archive size. > >> > >> Not for something like snapshots.debian.net. They do have a tad more > >> Packages files than debian has. And why waste even a byte if it is > >> absolutely not needed to achive the same? > > > > Again, snapshots shouldn't have any need for updating a snapshot. > > Yes they do. Every time a new version of a Package is released the > Packages file updates. And it never gets smaller. Those would be > perfect for date sorted. This makes no sense at all for a single package (which was at least the example you cited). [snip] > > No, they won't if cumulative diffs are used. > > Tell me how you plan to create the 30 cumulative diffs each > day. Storing the Packages files as plain text wastes too much > space. bunzip2ing them every night takes too long. Just diffing them > is also not that fast. Sorry, but if your mirror server is that slow, then you can't afford to do anything. > Or for 60 days, which would still be <50% the size. > > >> For stable and especially security the amount of change will be even > >> less and even more diff files would still be worth it. The size would > >> be smaller but the number of files higher. > > > > I can't follow you. stable would have three additional diffs by now. > > stable-proposed-updates > > What I mean is that each change is very small. So the diff files don't > grow much and a large amount of diffs is still below the size of the > Packages file. So you would have the normal Packages.gz and probably 30 small diffs. That's quite ok for tracking stable-proposed-updates. > It is not like sid where you have 100+ package changes every day. > > > For stable-security I assume it's either tracked closely or very > > infrequently. Providing a slightly faster update in the latter case > > doesn't seem to be worthwile. > > The date sorted method gets it for free. No, munging archive state in the Packages file isn't "for free". [snip] > >> Do you have any benefits for diffs apart from applying them is > >> simpler? > > > > They keep a backward compatible Packages file which is proven to > > work with old tools. Furthermore, updating on the server side > > can be done by a simple script which invokes diff a few times. > > > > The latter is especially interesting for partial mirror scripts > > which usually fail to implement a decent parser for Packages files. > > How would a diff be better for a mirror script that doesn't parse > Packages files? You still need a Packages file parser. You lost me > there. ... fail to implement a _decent_ parser ... Thiemo