On Mon, Apr 11, 2022 at 7:14 PM Joshua Kinard <ku...@gentoo.org> wrote: > > On 4/5/2022 17:49, Jason A. Donenfeld wrote: > > Hi Matt, > > > > On Tue, Apr 5, 2022 at 10:38 PM Matt Turner <matts...@gentoo.org> wrote: > >> > >> On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld <zx...@gentoo.org> > >> wrote: > >>> By the way, we're not currently _checking_ two hash functions during > >>> src_prepare(), are we? > >> > >> I don't know, but the hash-checking is definitely checked before > >> src_prepare(). > > > > Er, during the builtin fetch phase. Anyway, you know what I meant. :) > > > > Anyway, looking at the portage source code, to answer my own question, > > it looks like the file is actually being read twice and both hashes > > computed. I would have at least expected an optimization like: > > > > hash1_init(&hash1); > > hash2_init(&hash2); > > for chunks in file: > > hash1_update(&hash1, chunk); > > hash2_update(&hash2, chunk); > > hash1_final(&hash1, out1); > > hash2_final(&hash2, out2); > > > > But actually what's happening is the even less efficient: > > > > hash1_init(&hash1); > > for chunks in file: > > hash1_update(&hash1, chunk); > > hash1_final(&hash1, out1); > > hash2_init(&hash2); > > for chunks in file: > > hash2_update(&hash2, chunk); > > hash1_final(&hash2, out2); > > > > So the file winds up being open and read twice. For huge tarballs like > > chromium or libreoffice... > > > > But either way you do it - the missed optimization above or the > > unoptimized reality below - there's still twice as much work being > > done. This is all unless I've misread the source code, which is > > possible, so if somebody knows this code well and I'm wrong here, > > please do speak up. > > Not to go off-topic, but where in Portage's source is this logic at? It > seems like an easy fix for a slightly more efficient Portage.
I believe it's the portage.checksum.verify_all() function. https://gitweb.gentoo.org/proj/portage.git/tree/lib/portage/checksum.py?h=portage-3.0.30#n471