On Fri, Nov 30, 2012 at 09:35:07AM -0800, Zac Medico wrote:
> > However, I'm not aware of gnu tar's incremental archive. If it's much
> > faster than the above, then it should probably replace
> > emerge-delta-webrsync.
> If it has benefits over the current diffball approach used by
> emerge-delta-webrsync, then it seems like a good idea. It would be nice
> to integrate it directly into emerge-webrsync, and eventually deprecate
> emerge-delta-webrsync.
I went and did a rough comparison of Tar incrementals vs the existing
deltas.

TL;DR:
======
- Existing deltas are 8-9x better than other options.
- We should consider retaining monthly snapshots, plus all the deltas.

Results:
========
1.
Using bzip2 -9 compression:
- Existing deltas are 9x smaller than tar-incremental.
- Existing deltas are 8x smaller than rsync-batch.

2.
If you just want to save bandwidth, the average full snapshot,
compressed w/ BZIP2, is 55M. The average delta is 269k.
55M/269k = ~209.
Ergo it is LESS bandwidth to download ~180 deltas and apply those than
it is to download the full snapshot (assuming upstream side of the
transaction accounts for ~30 snapshots worth of overhead).

Notes:
======
1.
Extracting tar incrementals, you must be VERY careful to perform
operations in the correct order, otherwise removed files will not
actually be deleted.

2.
When the Git repo goes live, we should tag at the point we take the
daily snapshot, and use this to also consider git bundles.

Numbers:
========

Baseline tarball:
57919736 portage-20121123.0.tar.bz2

Tar incrementals, daily:
 2554334 portage-20121123-20121124.1.tar.bz2
 2045216 portage-20121124-20121125.1.tar.bz2
 1936313 portage-20121125-20121126.1.tar.bz2
 2355342 portage-20121126-20121127.1.tar.bz2
 2063612 portage-20121127-20121128.1.tar.bz2
 2582600 portage-20121128-20121129.1.tar.bz2
 2720135 portage-20121129-20121130.1.tar.bz2

Rsync incrementals, daily:
 2224311 portage-20121123-20121124.rsync-batch.bz2
 1869241 portage-20121124-20121125.rsync-batch.bz2
 1802648 portage-20121125-20121126.rsync-batch.bz2
 1936937 portage-20121126-20121127.rsync-batch.bz2
 1868771 portage-20121127-20121128.rsync-batch.bz2
 2240386 portage-20121128-20121129.rsync-batch.bz2
 2028207 portage-20121129-20121130.rsync-batch.bz2

Existing deltas, daily:
 252400 snapshot-20121123-20121124.patch.bz2
 267094 snapshot-20121124-20121125.patch.bz2
 161136 snapshot-20121125-20121126.patch.bz2
 225349 snapshot-20121126-20121127.patch.bz2
 245804 snapshot-20121127-20121128.patch.bz2
 232549 snapshot-20121128-20121129.patch.bz2
 332835 snapshot-20121129-20121130.patch.bz2

Rsync incrementals, from baseline:
 2224311 portage-20121123-20121124.rsync-batch.bz2
 2536620 portage-20121123-20121125.rsync-batch.bz2
 2700715 portage-20121123-20121126.rsync-batch.bz2
 2986403 portage-20121123-20121127.rsync-batch.bz2
 3258723 portage-20121123-20121128.rsync-batch.bz2
 3824015 portage-20121123-20121129.rsync-batch.bz2
 4232674 portage-20121123-20121130.rsync-batch.bz2

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail     : robb...@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85

Reply via email to