On 11/9/20 9:34 AM, JIang Yuancheng wrote:
grep -E “.*{10,}{10,}{10,}{10,}{10,}” can exhaust stack space then stack 
overflow comes out. (Tested on latest version 3.6)

This is a longstanding issue with the regex matcher. I installed the attached patch to document the issue better. Fortunately, the problem is mostly limited to contrived examples.
>From f0d97db2a2104c5fd558178713054f3f267623b2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 27 Aug 2021 18:20:58 -0700
Subject: [PATCH] doc: document interval expression limitations

* doc/grep.texi (Basic vs Extended, Performance):
Document limitations of interval expressions (Bug#44538).
---
 doc/grep.texi | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/doc/grep.texi b/doc/grep.texi
index b92ecb7..e5b9fd8 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1526,7 +1526,7 @@ before an interval expression's closing @samp{@}}, and an unmatched
 @code{\)} is invalid.
 
 Portable scripts should avoid the following constructs, as
-POSIX says they produce undefined results:
+POSIX says they produce unspecified results:
 
 @itemize @bullet
 @item
@@ -1541,6 +1541,8 @@ Empty alternatives (as in, e.g, @samp{a|}).
 Repetition operators that immediately follow empty expressions,
 unescaped @samp{$}, or other repetition operators.
 @item
+Interval expressions containing repetition counts greater than 255.
+@item
 A backslash escaping an ordinary character (e.g., @samp{\S}),
 unless it is a back-reference.
 @item
@@ -1965,6 +1967,17 @@ bracket expressions like @samp{[a-z]} and @samp{[[=a=]b]}, can be
 surprisingly inefficient due to difficulties in fast portable access to
 concepts like multi-character collating elements.
 
+@cindex interval expressions
+Interval expressions may be implemented internally via repetition.
+For example, @samp{^(a|bc)@{2,4@}$} might be implemented as
+@samp{^(a|bc)(a|bc)((a|bc)(a|bc)?)?$}.  A large repetition count may
+exhaust memory or greatly slow matching.  Even small counts can cause
+problems if cascaded; for example, @samp{grep -E
+".*@{10,@}@{10,@}@{10,@}@{10,@}@{10,@}"} is likely to overflow a
+stack.  Fortunately, regular expressions like these are typically
+artificial, and cascaded repetitions do not conform to POSIX so cannot
+be used in portable programs anyway.
+
 @cindex back-references
 A back-reference such as @samp{\1} can hurt performance significantly
 in some cases, since back-references cannot in general be implemented
-- 
2.30.2

Reply via email to