Hi,

As shown by https://gitlab.com/lilypond/lilypond/-/issues/6463,
Guile regular expressions are a trap when it comes to Unicode.
Under a non-Unicode locale, characters that can't be expressed
in the locale encoding get converted to "?", both in the pattern
and the search string, before invoking the underlying POSIX regex
functions.

I would like feedback on this approach:

https://gitlab.com/jeanas/lilypond/-/commits/regex-glib/

LilyPond requires GLib (for Pango), and GLib has a regex
API wrapping that of PCRE, which is fully Unicode-aware.
This branch wraps the GLib regex API into a Scheme API that
LilyPond should then use.

On the plus side, it allows not to worry about Unicode anymore,
eliminating the nasty trap that bought us a critical regression.

On the minus side, it is ~250 lines of code, and I don't
immediately see regexes in the current code base that
would be problematic with Unicode.

Thoughts?

Regards,
Jean


Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to