On 1/11/23 20:03, Carlo Arenas wrote:
Your suggested code doesn't address
that, it merely changes the error message with one that would be IMHO
even less clear and worsens the problem.
In that case let's improve the error message wording; something like the
attached patch, say.
Using a non Unicode PCRE library is perfectly fine, and there is no
"undefined behavior" risk, and indeed `grep -P` without the UTF flag
is exactly what the alternate path uses and what is recommended for
speed, so?
It's not a question of undefined behavior. It's a question of whether
grep does what the user requested. Without the attached patch, in a
UTF-8 locale "grep -P '[[:alpha:]]'" won't report matching alphabetic
characters, if they're multibyte. Silent misbehavior is quite bad, and
it's better for grep to issue a diagnostic and exit than to silently do
the wrong thing.From ad986ac2a50f98a8731a141a2d55c49b613e48e5 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 12 Jan 2023 19:35:08 -0800
Subject: [PATCH] grep: diagnose no UTF-8 support (Bug#60708)
* src/pcresearch.c (Pcompile): Issue a diagnostic and exit instead
of misbehaving if libpcre2 does not support the requested locale.
---
src/pcresearch.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index a8034fb..5b111be 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -145,10 +145,12 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
= pcre2_general_context_create (private_malloc, private_free, NULL);
pcre2_compile_context *ccontext = pcre2_compile_context_create (gcontext);
- uint32_t unicode = 1;
- pcre2_config (PCRE2_CONFIG_UNICODE, &unicode);
- if (unicode && localeinfo.multibyte)
+ if (localeinfo.multibyte)
{
+ uint32_t unicode;
+ if (pcre2_config (PCRE2_CONFIG_UNICODE, &unicode) < 0 || !unicode)
+ die (EXIT_TROUBLE, 0,
+ _("-P supports only unibyte locales on this platform"));
if (! localeinfo.using_utf8)
die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 locales"));
flags |= (PCRE2_UTF | PCRE2_UCP);
--
2.39.0