On Sun, Sep 16, 2012 at 09:56:27AM +0200, Micha?? G??rny wrote:
> But consider that for example Zac & AxS (correct me if I recall it
> correctly) considered making changing the meaning of RDEPEND to install
> them before the build, thus effectively making 'build,run' useless.

I really am not trying to be a blatant dick to you, but this has 
/zero/ relevance.  RDEPEND means "required for runtime".  That ain't 
changing.  If they were discussing changing what RDEPEND meant, then 
they were high, period.

If zac/axs want to try and make the resolver install RDEPEND before 
DEPEND... well, they're free to.  That doesn't change the fact that 
the deps still must be specified correctly; in short, build,run is 
very much relevant.

What I suspect they were intending on doing is letting the resolver 
work on RDEPENDS of a pkg in parallel to that pkg being built; this is 
a parallelization scheduling optimization, still requires accurate 
deps.

I'm trying to be nice here, but you're very confused on this matter.


> > Total cache savings from doing this for a full tree conversion, for 
> > our existing md5-cache format is 2.73MB (90 byes per cache entry).  
> > Calculating the savings from the ebuild/eclass standpoint is
> > dependent on how the deps are built up, so I skipped that.
> 
> You're storing the cache in a tarball?

Going to assume you're not trolling, and instead use this as a 
way to point out that this actually *does* matter, although it's 
admittedly not obvious if you don't know much about the guts of 
package managers, or don't spend your saturday nights doing fun 
things like optimizing ebuild package manager performance.

First, the figure is 3.204MB if default context is used; ~9.5% of the 
content footprint for md5-cache specifically.

Little known fact; rsync transfers for gentoo are required to be 
--whole-file; meaning no intra-file delta compression, it transfers 
the whole file itself.  This is done to keep cpu load on rsync nodes 
low (else they'd be calculating minimally 97k md4's for every sync, 
not counting the rolling adl32 chksum for all content dependent on 
the window cut off threshold- sounds minor, but it's death by a 
thousand cuts).

For obvious reasons, the cache is the hottest part of the tree due to 
cascading updates due to eclass changes.  In other words, that ~9.5%
reduction targest the core data actually transferered in a sync.

In terms of the total tree footprint, it's a 1% reduction; mostly lost 
in blocksize overhead unless you're using squashfs (which a decent 
number of folks do for speed reasons), or use tail packing FS for the 
tree (again, more than you'd think- known primarily due to reiserfs 
corruption bugs causing some hell on PM caches).

There's also the fact doing this means best case, 2 less inodes per 
VDB entry (more once we start adding dependency types).  For my vdb, I 
have 15523 across 798 pkgs.  1331 of that is *DEPEND, converted to 
DEPENDENCIES the file count is 748.  Note that's preserving DEPEND, 
although it's worthless at this stage of the vdb.  So 5% reduction in 
files in there.  Whoopy-de-doo, right?

This one I can't test as well since the only rotational media I've got 
these days is a hardware raid w/ a beefy cache; the closest I can 
manage is local network nfs to an ssd FS, so it'll have to serve 
as a stand in for cold cache/hot cache, and for a demonstration of 
why having a backend that is a 101 small individual files is bad.

Best of 5 is displayed below:

Iterating over the vdb, and parsing and rendering all depends for our 
current layout, w/ the vdb stored on nfs:

cold cache:
real    0m30.405s
user    0m1.046s
sys     0m0.390s

hot cache:
real    0m16.483s
user    0m0.883s
sys     0m0.168s

non-optimized, hacked to work (known slower for parsing in comparison 
to the non quicky hack), iterating over the vdb, parsing all
depends and rendering said depends when it's stored as DEPENDENCIES; 
literally, rendering DEPEND from it, RDEPEND, PDEPEND.

cold cache:
real    0m18.329s
user    0m0.908s
sys     0m0.280s

hot cache
real    0m12.185s
user    0m0.860s
sys     0m0.128s


You get the idea.  See the various infamous cold cache/hot cache 
performance tests in doubt; I can tell you that a similar trick, done 
in '07, literally just skipping loading USE till it was needed for 
provides parsing was enough to bring a 5400RPM drive's run time 
down from 15s to 12s for cold cache- for parsing provides *alone*, 
nothing else.  Either way, do your own investigation, it's a
good education on performance.


Hopefully for the others listening, that last section was a random but 
useful tidbit of info; if not, pardon, just being through to make sure 
this point is not raised again.

~harring

Reply via email to