On Fri, Nov 30, 2012 at 09:35:07AM -0800, Zac Medico wrote: > > However, I'm not aware of gnu tar's incremental archive. If it's much > > faster than the above, then it should probably replace > > emerge-delta-webrsync. > If it has benefits over the current diffball approach used by > emerge-delta-webrsync, then it seems like a good idea. It would be nice > to integrate it directly into emerge-webrsync, and eventually deprecate > emerge-delta-webrsync. I went and did a rough comparison of Tar incrementals vs the existing deltas.
TL;DR: ====== - Existing deltas are 8-9x better than other options. - We should consider retaining monthly snapshots, plus all the deltas. Results: ======== 1. Using bzip2 -9 compression: - Existing deltas are 9x smaller than tar-incremental. - Existing deltas are 8x smaller than rsync-batch. 2. If you just want to save bandwidth, the average full snapshot, compressed w/ BZIP2, is 55M. The average delta is 269k. 55M/269k = ~209. Ergo it is LESS bandwidth to download ~180 deltas and apply those than it is to download the full snapshot (assuming upstream side of the transaction accounts for ~30 snapshots worth of overhead). Notes: ====== 1. Extracting tar incrementals, you must be VERY careful to perform operations in the correct order, otherwise removed files will not actually be deleted. 2. When the Git repo goes live, we should tag at the point we take the daily snapshot, and use this to also consider git bundles. Numbers: ======== Baseline tarball: 57919736 portage-20121123.0.tar.bz2 Tar incrementals, daily: 2554334 portage-20121123-20121124.1.tar.bz2 2045216 portage-20121124-20121125.1.tar.bz2 1936313 portage-20121125-20121126.1.tar.bz2 2355342 portage-20121126-20121127.1.tar.bz2 2063612 portage-20121127-20121128.1.tar.bz2 2582600 portage-20121128-20121129.1.tar.bz2 2720135 portage-20121129-20121130.1.tar.bz2 Rsync incrementals, daily: 2224311 portage-20121123-20121124.rsync-batch.bz2 1869241 portage-20121124-20121125.rsync-batch.bz2 1802648 portage-20121125-20121126.rsync-batch.bz2 1936937 portage-20121126-20121127.rsync-batch.bz2 1868771 portage-20121127-20121128.rsync-batch.bz2 2240386 portage-20121128-20121129.rsync-batch.bz2 2028207 portage-20121129-20121130.rsync-batch.bz2 Existing deltas, daily: 252400 snapshot-20121123-20121124.patch.bz2 267094 snapshot-20121124-20121125.patch.bz2 161136 snapshot-20121125-20121126.patch.bz2 225349 snapshot-20121126-20121127.patch.bz2 245804 snapshot-20121127-20121128.patch.bz2 232549 snapshot-20121128-20121129.patch.bz2 332835 snapshot-20121129-20121130.patch.bz2 Rsync incrementals, from baseline: 2224311 portage-20121123-20121124.rsync-batch.bz2 2536620 portage-20121123-20121125.rsync-batch.bz2 2700715 portage-20121123-20121126.rsync-batch.bz2 2986403 portage-20121123-20121127.rsync-batch.bz2 3258723 portage-20121123-20121128.rsync-batch.bz2 3824015 portage-20121123-20121129.rsync-batch.bz2 4232674 portage-20121123-20121130.rsync-batch.bz2 -- Robin Hugh Johnson Gentoo Linux: Developer, Trustee & Infrastructure Lead E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85