Hi Matt,

On Tue, Apr 5, 2022 at 10:38 PM Matt Turner <matts...@gentoo.org> wrote:
>
> On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld <zx...@gentoo.org> wrote:
> > By the way, we're not currently _checking_ two hash functions during
> > src_prepare(), are we?
>
> I don't know, but the hash-checking is definitely checked before 
> src_prepare().

Er, during the builtin fetch phase. Anyway, you know what I meant. :)

Anyway, looking at the portage source code, to answer my own question,
it looks like the file is actually being read twice and both hashes
computed. I would have at least expected an optimization like:

hash1_init(&hash1);
hash2_init(&hash2);
for chunks in file:
    hash1_update(&hash1, chunk);
    hash2_update(&hash2, chunk);
hash1_final(&hash1, out1);
hash2_final(&hash2, out2);

But actually what's happening is the even less efficient:

hash1_init(&hash1);
for chunks in file:
    hash1_update(&hash1, chunk);
hash1_final(&hash1, out1);
hash2_init(&hash2);
for chunks in file:
    hash2_update(&hash2, chunk);
hash1_final(&hash2, out2);

So the file winds up being open and read twice. For huge tarballs like
chromium or libreoffice...

But either way you do it - the missed optimization above or the
unoptimized reality below - there's still twice as much work being
done. This is all unless I've misread the source code, which is
possible, so if somebody knows this code well and I'm wrong here,
please do speak up.

Jason

Reply via email to