On Wed, Jan 11, 2023 at 6:29 PM Paul Eggert <egg...@cs.ucla.edu> wrote: > > Oh, I think see your point, but doesn't this mean that even my code was > too trusting? It should be something like this: > > if (localeinfo.multibyte) > { > uint32_t unicode; > if (! (localeinfo.using_utf8 > && 0 <= pcre2_config (PCRE2_CONFIG_UNICODE, &unicode) > && unicode)) > die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 > locales")); > ... > > That is, we're better off diagnosing the problem and not attempting to > use pcre2 if the result will be wrong (or even result in undefined > behavior). The problem is unlikely to occur so it's good to be > conservative here.
Maybe we are not clear on what the "problem" is. The issue the original code was trying to avoid was to set PCRE_UTF if the library doesn't have Unicode support, as that would block grep with a PCRE error (as shown in the commit message), and which also disabled some tests as it couldn't be differentiated with a failure in grep because -P wasn't supported. Your suggested code doesn't address that, it merely changes the error message with one that would be IMHO even less clear and worsens the problem. Using a non Unicode PCRE library is perfectly fine, and there is no "undefined behavior" risk, and indeed `grep -P` without the UTF flag is exactly what the alternate path uses and what is recommended for speed, so? Carlo