‘regexp-exec’ sometimes gets match boundaries wrong when operating on a Unicode string but in a C locale (this is with af96820e072d18c49ac03e80c6f3466d568dc77d):
--8<---------------cut here---------------start------------->8--- scheme@(guile-user)> ,use(ice-9 regex) scheme@(guile-user)> (setlocale LC_ALL "C") $52 = "C" scheme@(guile-user)> (string-match "start (.*)" (string-append "start " (string (integer->char 1002)))) $53 = #("start \u03ea" (0 . 8) (6 . 8)) scheme@(guile-user)> (match:substring $53 1) ice-9/boot-9.scm:1683:22: In procedure raise-exception: Value out of range 6 to< 7: 8 Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue. --8<---------------cut here---------------end--------------->8--- The attached program produces more failures at random. (The example above works well under a UTF-8 locale.) So I believe ‘fixup_multibyte_match’ isn’t quite correct. Ludo’. PS: This originates in <https://issues.guix.gnu.org/77283>.
(use-modules (ice-9 regex)) (define rx (make-regexp "^start (.*)")) (setlocale LC_ALL "C") (let loop () (let* ((i (+ 256 (random (expt 2 10)))) (str (string-append "start " (string (integer->char i))))) (with-exception-handler (lambda (exc) (pk 'exc exc '<-- i) (display-backtrace (make-stack #t) (current-error-port)) (exit 1)) (lambda () (match:substring (regexp-exec rx str) 1))) (loop)))