bug#60708: pcre: improve support for linking with a library without unicode

Carlo Arenas Wed, 11 Jan 2023 20:05:24 -0800

On Wed, Jan 11, 2023 at 6:29 PM Paul Eggert <egg...@cs.ucla.edu> wrote:
>
> Oh, I think see your point, but doesn't this mean that even my code was
> too trusting? It should be something like this:
>
>    if (localeinfo.multibyte)
>      {
>        uint32_t unicode;
>        if (! (localeinfo.using_utf8
>               && 0 <= pcre2_config (PCRE2_CONFIG_UNICODE, &unicode)
>              && unicode))
>          die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8
> locales"));
>        ...
>
> That is, we're better off diagnosing the problem and not attempting to
> use pcre2 if the result will be wrong (or even result in undefined
> behavior). The problem is unlikely to occur so it's good to be
> conservative here.


Maybe we are not clear on what the "problem" is.

The issue the original code was trying to avoid was to set PCRE_UTF if
the library doesn't have Unicode support, as that would block grep
with a PCRE error (as shown in the commit message), and which also
disabled some tests as it couldn't be differentiated with a failure in
grep because -P wasn't supported.  Your suggested code doesn't address
that, it merely changes the error message with one that would be IMHO
even less clear and worsens the problem.

Using a non Unicode PCRE library is perfectly fine, and there is no
"undefined behavior" risk, and indeed `grep -P` without the UTF flag
is exactly what the alternate path uses and what is recommended for
speed, so?

Carlo

bug#60708: pcre: improve support for linking with a library without unicode

Reply via email to