Hi Mark, Mark H Weaver <m...@netris.org> skribis:
> From 6eec36e66d20d82fe02c6de793422875477b90d8 Mon Sep 17 00:00:00 2001 > From: Mark H Weaver <m...@netris.org> > Date: Fri, 2 Apr 2021 18:36:23 -0400 > Subject: [PATCH] DRAFT: grafts: Support rewriting UTF-16 and UTF-32 store > references. > > * guix/build/graft.scm (replace-store-references): Add support for > finding and rewriting UTF-16 and UTF-32 store references. > * tests/grafts.scm: Add tests. Please add a “Fixes” line in the commit log. I’m not reviewing the code in depth and I trust your judgment. The risks of bugs I can think of are: missed ASCII references (a regression, whereby some ASCII references would not get rewritten), and unrelated UTF-{16,32}-base32-looking references getting rewritten. I guess the latter is very unlikely because only sequences found in the replacement table may be rewritten, right? The former should be caught by ‘tests/grafts.scm’ but we could also check the closure of real-world systems, for instance, to make sure it doesn’t refer to ungrafted things. Do you know how this affects performance? Some superficial comments: > +(define (possible-utf16-hash? buffer i w) > + (and (<= (* 2 hash-length) (- i w)) > + (let loop ((j (+ 1 (- i (* 2 hash-length))))) > + (or (>= j i) > + (and (zero? (bytevector-u8-ref buffer j)) > + (loop (+ j 2))))))) > + > +(define (possible-utf32-hash? buffer i w) > + (and (<= (* 4 hash-length) (- i w)) > + (let loop ((j (+ 1 (- i (* 4 hash-length))))) > + (or (>= j i) > + (and (zero? (bytevector-u8-ref buffer j)) > + (zero? (bytevector-u8-ref buffer (+ j 1))) > + (zero? (bytevector-u8-ref buffer (+ j 2))) > + (loop (+ j 4))))))) > + > +(define (insert-nuls char-size bv) Perhaps add short docstrings for clarity. > +(for-each > + (lambda (char-size1) > + (for-each > + (lambda (char-size2) > + (for-each > + (lambda (gap) > + (for-each > + (lambda (offset) > + (test-equal (format #f "replace-store-references, char-sizes ~a > ~a, gap ~s, offset ~a" > + char-size1 char-size2 gap offset) > + (string-append (make-string offset #\=) > + (nul-expand (string-append "/gnu/store/" > + (make-string 32 #\6) > + "-BlahBlaH") > + char-size1) > + gap > + (nul-expand (string-append "/gnu/store/" > + (make-string 32 #\8) > + "-SoMeTHiNG") > + char-size2) > + (list->string (map integer->char (iota 77 33)))) > + > + ;; Create input data where the right-hand-size of the dash > ("-something" > + ;; here) goes beyond the end of the internal buffer of > + ;; 'replace-store-references'. > + (let* ((content (string-append (make-string offset #\=) > + (nul-expand (string-append > "/gnu/store/" > + > (make-string 32 #\5) > + > "-blahblah") > + char-size1) > + gap > + (nul-expand (string-append > "/gnu/store/" > + > (make-string 32 #\7) > + > "-something") > + char-size2) > + (list->string > + (map integer->char (iota 77 > 33))))) > + (replacement (alist->vhash > + `((,(make-string 32 #\5) > + . ,(string->utf8 (string-append > + (make-string 32 #\6) > + "-BlahBlaH"))) > + (,(make-string 32 #\7) > + . ,(string->utf8 (string-append > + (make-string 32 #\8) > + "-SoMeTHiNG"))))))) > + (call-with-output-string > + (lambda (output) > + ((@@ (guix build graft) replace-store-references) > + (open-input-string content) output > + replacement > + "/gnu/store")))))) > + ;; offsets to test > + (map (lambda (i) (- buffer-size (* 40 char-size1) i)) > + (iota 30)))) > + ;; gaps > + '("" "-" " " "a"))) > + ;; char-size2 values to test > + '(1 2))) > + ;; char-size1 values to test > + '(1 2 4)) For clarity, perhaps you can define a top-level procedure for the test and call it from ‘for-each’. Modulo these very minor issues, it looks like it’s ready to go! Thank you, Ludo’.