Hi, Am Freitag, den 15.10.2021, 20:54 +0200 schrieb Ludovic Courtès: > Hello, > > Consider this file as if it were a patch you’re reviewing:
> (define-module (content-addressed)) > (use-modules (guix) > (guix build-system gnu) > (guix licenses) > (gnu packages perl)) > > (define-public sed > (package > (name "sed") > (version "4.8") > (source (origin > (method url-fetch) > (uri (string-append "mirror://gnu/zed/sed-" version > ".tar.gz")) To be fair, gnu/zed sounds wonky, but you could try inserting a version that does not exist (e.g. 1+ the current latest version) and as a committer thereby bypass review entirely. However, given that we trust committers in this aspect, I'd say they should be able to verify both URI and version field. This is trivially possible with most schemes safe for the mirror:// one. > (sha256 > (base32 > "1yy33kiwrxrwj2nxa4fg15bvmwyghqbs8qwkdvy5phm784f7brjq") > ))) > (build-system gnu-build-system) > (synopsis "Stream editor") > (native-inputs > `(("perl" ,perl))) ;for tests > (description > "Sed is a non-interactive, text stream editor. It receives a > text > input from a file or from standard input and it then applies a series > of text > editing commands to the stream and prints its output to standard > output. It > is often used for substituting text patterns in a stream. The GNU > implementation offers several extensions over the standard utility.") > (license gpl3+) > (home-page "https://www.gnu.org/software/sed/"))) > > sed > It builds just fine: > > --8<---------------cut here---------------start------------->8--- > $ guix build -f /tmp/content-addressed.scm > /gnu/store/lpais26sjwxcyl7y7jqns6f5qrbrnb34-sed-4.8 > $ guix build -f /tmp/content-addressed.scm -S --check -v0 > /gnu/store/mgais6lk92mm8n5kyx70knr11jbwgfhr-sed-4.8.tar.gz > --8<---------------cut here---------------end--------------->8--- > > Did you spot a problem? > > … > > > So, what did we just build? > > --8<---------------cut here---------------start------------->8--- > $ ls $(guix build -f /tmp/content-addressed.scm)/bin > egrep fgrep grep > --8<---------------cut here---------------end--------------->8--- > > Oh oh! This ‘sed’ package is giving us ‘grep’! How come? > > The trick is easy: we give a URL that’s actually 404, with the hash > of a file that can be found on Software Heritage (in this case, that > of ‘grep-3.4.tar.xz’). When downloading the source, the automatic > content-addressed fallback kicks in, and voilà: > > --8<---------------cut here---------------start------------->8--- > $ guix build -f /tmp/content-addressed.scm -S --check > La jena derivaĵo estos konstruata: > /gnu/store/nq2jdzbv3nh9b1mglan54dcpfz4l7bli-sed-4.8.tar.gz.drv > building /gnu/store/nq2jdzbv3nh9b1mglan54dcpfz4l7bli-sed- > 4.8.tar.gz.drv... > > Starting download of /gnu/store/1mlpazwwa2mi35v7jab5552lm3ssvn6r-sed- > 4.8.tar.gz > > From https://ftpmirror.gnu.org/gnu/zed/sed-4.8.tar.gz... > following redirection to ` > https://mirror.cyberbits.eu/gnu/zed/sed-4.8.tar.gz'... > download failed "https://mirror.cyberbits.eu/gnu/zed/sed-4.8.tar.gz" > 404 "Not Found" > > [...] > > Starting download of /gnu/store/1mlpazwwa2mi35v7jab5552lm3ssvn6r-sed- > 4.8.tar.gz > > From > > https://archive.softwareheritage.org/api/1/content/sha256:58e6751c41a7c25bfc6e9363a41786cff3ba5709cf11d5ad903cf7cce31cc3fb/raw/ > > ... > downloading from > https://archive.softwareheritage.org/api/1/content/sha256:58e6751c41a7c25bfc6e9363a41786cff3ba5709cf11d5ad903cf7cce31cc3fb/raw/ > ... > > warning: rewriting hashes in > `/gnu/store/mgais6lk92mm8n5kyx70knr11jbwgfhr-sed-4.8.tar.gz'; cross > fingers > successfully built /gnu/store/nq2jdzbv3nh9b1mglan54dcpfz4l7bli-sed- > 4.8.tar.gz.drv > --8<---------------cut here---------------end--------------->8--- > > It’s nothing new, it’s what I do when I want to test the download > fallbacks (see also ‘GUIX_DOWNLOAD_FALLBACK_TEST’ in commit > c4a7aa82e25503133a1bd33148d17968c899a5f5). Still, I wonder if it > could somehow be abused to have malicious packages pass review. I don't think this is much of a problem for packages where we have another source of truth (in this case mirrors/archives of sed), but it does point at a bigger problem when SWH is our only source of truth. I.e. when trying to conserve such software for the future, when other archives might fail and perhaps SHA256 itself might be broken, we can no longer be sure that the Guix time-machine indeed does what it promises. > Also, just because a URL looks nice and is reachable doesn’t mean the > source is trustworthy either. An attacker could submit a package for > an obscure piece of software that happens to be malware. The > difference here is that the trick above would allow targeting a high- > impact package. Again, less of an issue w.r.t. review because the reviewers can at review time check that the tarball matches their expectations. I personally find "I can't find this source anywhere but on SWH" to be a perfect reason to reject software in the main Guix channel, though perhaps that rule is a bit softer in Guix Past. > On the plus side, such an attack would be recorded forever in Git > history. On the minus side, time-machine makes said record a landmine to step into. > Also on the plus side, it turns out our origin URLs are currently > (unintentionally) limited to ASCII, so I couldn’t write “/ṡed” in the > URL. Couldn't one circumvent that with percent encoding and a nice enough file-name, however? > All in all, it’s probably not as worrisome as it first > seems. However, it’s worth keeping in mind when reviewing a package. > > Thoughts? I agree, that cross-checking “guix download” might be good praxis for review. Perhaps in light of this we should extend it to Git/SVN/other VCS? Regards, Liliana