On 5/22/22 15:22, Paul Eggert wrote:
We've already uncovered one area, where \a doesn't work as expected and where a warning diagnostic would be helpful.
I installed the attached patch to cause 'grep' to warn about these. Comments welcome. Most of the changes were in Gnulib's dfa module, which see.
Here's another one, where an oddly-placed '*' doesn't work as one would expect: $ printf '*\na\n*a\n' | grep '\(*\)' * *a $ printf '*\na\n*a\n' | grep -E '(*)' grep: Unmatched ( or \( $ printf '*\na\n*a\n' | grep '\(*a\)' *a $ printf '*\na\n*a\n' | grep -E '(*a)' a*a
I plan to look at this next. We shouldn't warn about BREs like '\(*\)' and '\(*a\)' as these conform to POSIX and work fine. But it makes sense to warn about EREs like '(*)', '(*a)', '(+)', '(+a)', '({1})', '({1}a)' as POSIX does not specify their behavior, their semantics are unpredictable with GNU grep, and it's plausible that people are making mistakes in this area.
From 42db5cc8f58620b4c9c58a91c7683279c50503f9 Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Mon, 23 May 2022 12:22:15 -0700 Subject: [PATCH 1/2] build: update gnulib submodule to latest --- gnulib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gnulib b/gnulib index b19a107..55d1a73 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit b19a10775e54f8ed17e3a8c08a72d261d8c26244 +Subproject commit 55d1a73c1a79e94c443f51798c4c76449a0c7d62 -- 2.34.1
From e7f8e8eb1fd41b308ee10741bbd8068acc1847c2 Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Mon, 23 May 2022 12:38:42 -0700 Subject: [PATCH 2/2] grep: warn about stray backslashes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This papers over a problem reported by Benno Schulenberg and Tomasz Dziendzielski <https://bugs.gnu.org/39678> involving regular expressions like \a that have unspecified behavior. * src/dfasearch.c (dfawarn): Just output a warning. Don’t exit, as DFA_CONFUSING_BRACKETS_ERROR now does that for us, and we need the ability to warn without exiting to diagnose \a etc. (GEAcompile): Use new dfa options DFA_CONFUSING_BRACKETS_ERROR and DFA_STRAY_BACKSLASH_WARN. --- NEWS | 6 ++++++ src/dfasearch.c | 7 ++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 38ac035..112d85b 100644 --- a/NEWS +++ b/NEWS @@ -11,6 +11,12 @@ GNU grep NEWS -*- outline -*- release 2.5.3 (2007), now warn that they are obsolescent and should be replaced by grep -E and grep -F. + Regular expressions with stray backslashes now cause warnings, as + their unspecified behavior can lead to unexpected results. + For example, '\a' and 'a' are not always equivalent + <https://bugs.gnu.org/39768>. The warnings are intended as a + transition aid; they are likely to be errors in future releases. + Regular expressions like [:space:] are now errors even if POSIXLY_CORRECT is set, since POSIX now allows the GNU behavior. diff --git a/src/dfasearch.c b/src/dfasearch.c index 8f48296..7547a8a 100644 --- a/src/dfasearch.c +++ b/src/dfasearch.c @@ -53,10 +53,10 @@ dfaerror (char const *mesg) die (EXIT_TROUBLE, 0, "%s", mesg); } -_Noreturn void +void dfawarn (char const *mesg) { - dfaerror (mesg); + error (0, 0, _("warning: %s"), mesg); } /* If the DFA turns out to have some set of fixed strings one of @@ -196,7 +196,8 @@ GEAcompile (char *pattern, idx_t size, reg_syntax_t syntax_bits, if (match_icase) syntax_bits |= RE_ICASE; - int dfaopts = eolbyte ? 0 : DFA_EOL_NUL; + int dfaopts = (DFA_CONFUSING_BRACKETS_ERROR | DFA_STRAY_BACKSLASH_WARN + | (eolbyte ? 0 : DFA_EOL_NUL)); dfasyntax (dc->dfa, &localeinfo, syntax_bits, dfaopts); bool bs_safe = !localeinfo.multibyte | localeinfo.using_utf8; -- 2.34.1