Hello, Ludovic Courtès <ludovic.cour...@inria.fr> writes:
> As for past tarballs, #swh-devel comrades say we could send them a list > of URLs and they’d create “Save Code Now” requests on our behalf (we > cannot do it ourselves since the site doesn’t accept plain tarballs.) > > Any volunteer to write a script that’d generate a list of Bioconductor > content-addressed URLs (the bordeaux.guix.gnu.org/file ones) for say the > past couple of years? Sorry I’m a little late to this party, but I wrote a similar script a while ago. It creates a “sources.json” file of all the sources that the PoG database analyzed and found missing in SWH. It only covers what PoG monitors (which is *almost* everything, but not quite). $ git clone https://git.ngyro.com/preservation-of-guix $ cd preservation-of-guix $ wget https://ngyro.com/pog-reports/latest/pog.db [Wait a long time because my server is sloooow.] $ guile -L . etc/sources.scm pog.db > missing-sources.json With some modifications, I used it to generate the attached list of Bioconductor sources (based off of recent, unpublished PoG data). I’ve also attached the modifications in case anyone is curious or wants to make a similar list. I will publish the PoG database soon (today?), so maybe wait for that before generating any lists. -- Tim
bioconductor-sources.json.gz
Description: Binary data
diff --git a/etc/sources.scm b/etc/sources.scm index 71d157d..515cf00 100644 --- a/etc/sources.scm +++ b/etc/sources.scm @@ -1,5 +1,5 @@ ;;; Preservation of Guix -;;; Copyright © 2022 Timothy Sample <samp...@ngyro.com> +;;; Copyright © 2022, 2024 Timothy Sample <samp...@ngyro.com> ;;; ;;; This file is part of Preservation of Guix. ;;; @@ -61,6 +61,7 @@ FROM fods f WHERE f.algorithm = 'sha256' AND (fr.reference LIKE '\"%' OR fr.reference LIKE '(\"%') + AND fr.reference LIKE '%bioconductor.org%' AND NOT fr.is_error AND f.is_in_swh IS NOT NULL AND NOT f.is_in_swh") @@ -85,22 +86,25 @@ Subresource Integrity metadata value." (define b64 (base64-encode bv)) (string-append "sha256-" b64)) -(define (web-reference-urls reference) +(define (web-reference-filename reference) (define uris (match (call-with-input-string reference read) ((urls ...) (map string->uri urls)) (url (list (string->uri url))))) - (append-map (lambda (uri) - (map uri->string - (maybe-expand-mirrors uri %mirrors))) - uris)) + (or (any (lambda (uri) + (and (string-suffix? "bioconductor.org" (uri-host uri)) + (basename (uri-path uri)))) + uris) + (error "Not a 'bioconductor.org' refernce" reference))) (define (record->url-source rec) (match-let ((#(digest reference) rec)) - (let ((urls (web-reference-urls reference)) - (integrity (nix-base32-sha256->subresource-integrity digest))) + (let* ((filename (web-reference-filename reference)) + (url (string-append "https://bordeaux.guix.gnu.org/file/" + filename "/sha256/" digest)) + (integrity (nix-base32-sha256->subresource-integrity digest))) `(("type" . "url") - ("urls" . ,(list->vector urls)) + ("urls" . ,(vector url)) ("integrity" . ,integrity))))) (define (lookup-missing-sources db)