Hi, As shown by https://gitlab.com/lilypond/lilypond/-/issues/6463, Guile regular expressions are a trap when it comes to Unicode. Under a non-Unicode locale, characters that can't be expressed in the locale encoding get converted to "?", both in the pattern and the search string, before invoking the underlying POSIX regex functions.
I would like feedback on this approach: https://gitlab.com/jeanas/lilypond/-/commits/regex-glib/ LilyPond requires GLib (for Pango), and GLib has a regex API wrapping that of PCRE, which is fully Unicode-aware. This branch wraps the GLib regex API into a Scheme API that LilyPond should then use. On the plus side, it allows not to worry about Unicode anymore, eliminating the nasty trap that bought us a critical regression. On the minus side, it is ~250 lines of code, and I don't immediately see regexes in the current code base that would be problematic with Unicode. Thoughts? Regards, Jean
OpenPGP_signature
Description: OpenPGP digital signature