On 5/23/22 13:03, Paul Eggert wrote:
it makes sense to warn about EREs like '(*)', '(*a)', '(+)', '(+a)', '({1})', '({1}a)' as POSIX does not specify their behavior, their semantics are unpredictable with GNU grep, and it's plausible that people are making mistakes in this area.
I installed the attached to do that. As before, most of the changes were in Gnulib's dfa module.
With all these changes, we now see behavior like this: $ echo a | src/grep -oi '\a' src/grep: warning: stray \ before awhich is not ideal (so I'll leave the bug report open), but at least this gives the user a warning that the pattern is not on the up-and-up.
From 682c5138bb65e90fba695a6bcb17b33d4ce7ed7f Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Tue, 24 May 2022 16:05:18 -0700 Subject: [PATCH 1/2] build: update gnulib submodule to latest --- gnulib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gnulib b/gnulib index 55d1a73..88d3598 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit 55d1a73c1a79e94c443f51798c4c76449a0c7d62 +Subproject commit 88d3598a277061b855c778103c1f5a114ea0afd7 -- 2.36.1
From 74daef5986e3d52f0ec85dc1aa5411f26ea92bbb Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Tue, 24 May 2022 16:14:12 -0700 Subject: [PATCH 2/2] =?UTF-8?q?grep:=20warn=20about=20=E2=80=98(+x)?= =?UTF-8?q?=E2=80=99=20etc.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit These expressions are not portable and don’t always work as expected, so warn about them. For example, “grep -E '(+)'” doesn’t act like “grep '\(\+\)'”. * src/dfasearch.c (GEAcompile): Warn about a repetition op at the start of a regular expression or subexpression, except for ‘*’ in BREs which is portable. --- NEWS | 5 ++++- src/dfasearch.c | 2 ++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 112d85b..499eadf 100644 --- a/NEWS +++ b/NEWS @@ -14,7 +14,10 @@ GNU grep NEWS -*- outline -*- Regular expressions with stray backslashes now cause warnings, as their unspecified behavior can lead to unexpected results. For example, '\a' and 'a' are not always equivalent - <https://bugs.gnu.org/39768>. The warnings are intended as a + <https://bugs.gnu.org/39768>. Similarly, regular expressions or + subexpressions that start with a repetition operator now also cause + warnings due to their unspecified behavior; for example, *a(+b|{1}c) + now has three reasons to warn. The warnings are intended as a transition aid; they are likely to be errors in future releases. Regular expressions like [:space:] are now errors even if diff --git a/src/dfasearch.c b/src/dfasearch.c index 7547a8a..8d832f0 100644 --- a/src/dfasearch.c +++ b/src/dfasearch.c @@ -197,6 +197,8 @@ GEAcompile (char *pattern, idx_t size, reg_syntax_t syntax_bits, if (match_icase) syntax_bits |= RE_ICASE; int dfaopts = (DFA_CONFUSING_BRACKETS_ERROR | DFA_STRAY_BACKSLASH_WARN + | DFA_PLUS_WARN + | (syntax_bits & RE_CONTEXT_INDEP_OPS ? DFA_STAR_WARN : 0) | (eolbyte ? 0 : DFA_EOL_NUL)); dfasyntax (dc->dfa, &localeinfo, syntax_bits, dfaopts); bool bs_safe = !localeinfo.multibyte | localeinfo.using_utf8; -- 2.36.1