On 5/23/22 13:03, Paul Eggert wrote:
it makes sense to warn about EREs like '(*)', '(*a)', '(+)', '(+a)', '({1})', '({1}a)' as POSIX does not specify their behavior, their semantics are unpredictable with GNU grep, and it's plausible that people are making mistakes in this area.

I installed the attached to do that. As before, most of the changes were in Gnulib's dfa module.

With all these changes, we now see behavior like this:

$ echo a | src/grep -oi '\a'
src/grep: warning: stray \ before a

which is not ideal (so I'll leave the bug report open), but at least this gives the user a warning that the pattern is not on the up-and-up.
From 682c5138bb65e90fba695a6bcb17b33d4ce7ed7f Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Tue, 24 May 2022 16:05:18 -0700
Subject: [PATCH 1/2] build: update gnulib submodule to latest

---
 gnulib | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gnulib b/gnulib
index 55d1a73..88d3598 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit 55d1a73c1a79e94c443f51798c4c76449a0c7d62
+Subproject commit 88d3598a277061b855c778103c1f5a114ea0afd7
-- 
2.36.1

From 74daef5986e3d52f0ec85dc1aa5411f26ea92bbb Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Tue, 24 May 2022 16:14:12 -0700
Subject: [PATCH 2/2] =?UTF-8?q?grep:=20warn=20about=20=E2=80=98(+x)?=
 =?UTF-8?q?=E2=80=99=20etc.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

These expressions are not portable and don’t always work as
expected, so warn about them.  For example, “grep -E '(+)'”
doesn’t act like “grep '\(\+\)'”.
* src/dfasearch.c (GEAcompile): Warn about a repetition op at the
start of a regular expression or subexpression, except for ‘*’ in
BREs which is portable.
---
 NEWS            | 5 ++++-
 src/dfasearch.c | 2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 112d85b..499eadf 100644
--- a/NEWS
+++ b/NEWS
@@ -14,7 +14,10 @@ GNU grep NEWS                                    -*- outline -*-
   Regular expressions with stray backslashes now cause warnings, as
   their unspecified behavior can lead to unexpected results.
   For example, '\a' and 'a' are not always equivalent
-  <https://bugs.gnu.org/39768>.  The warnings are intended as a
+  <https://bugs.gnu.org/39768>.  Similarly, regular expressions or
+  subexpressions that start with a repetition operator now also cause
+  warnings due to their unspecified behavior; for example, *a(+b|{1}c)
+  now has three reasons to warn.  The warnings are intended as a
   transition aid; they are likely to be errors in future releases.
 
   Regular expressions like [:space:] are now errors even if
diff --git a/src/dfasearch.c b/src/dfasearch.c
index 7547a8a..8d832f0 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -197,6 +197,8 @@ GEAcompile (char *pattern, idx_t size, reg_syntax_t syntax_bits,
   if (match_icase)
     syntax_bits |= RE_ICASE;
   int dfaopts = (DFA_CONFUSING_BRACKETS_ERROR | DFA_STRAY_BACKSLASH_WARN
+                 | DFA_PLUS_WARN
+                 | (syntax_bits & RE_CONTEXT_INDEP_OPS ? DFA_STAR_WARN : 0)
                  | (eolbyte ? 0 : DFA_EOL_NUL));
   dfasyntax (dc->dfa, &localeinfo, syntax_bits, dfaopts);
   bool bs_safe = !localeinfo.multibyte | localeinfo.using_utf8;
-- 
2.36.1

Reply via email to