On 11/9/21 02:58, Carlo Marcelo Arenas Belón wrote:
Sadly, hadn't been able to generate a release,

Does this mean you're having trouble running 'make dist'? If so, what's the trouble?


it seems to be ready for some broader testing, specially if the
attached patch is applied on top of a 10.37 release (tested that way
in OpenBSD i386)

OK, thanks, I installed it into the Savannah master copy of GNU grep, except that I didn't rename m4/pcre.m4 to m4/pcre2.m4, or rename the macros to use PCRE2. This made the change easier to audit. Revised patch 0001 attached.

Also, I followed up with several related patches (also attached as 0002-0012). Please take a look at them and let us know of any problems. In the attached patch "grep: prefer signed integers" I followed the usual grep approach of preferring signed to unsigned integers (e.g., idx_t to size_t) when either will do; this lets us debug better with -fsanitize=undefined to catch integer overflow.

One issue I discovered: PCRE2_EXTRA_MATCH_WORD (which is used by pcre2grep -w) is incompatible with 'grep -w'. For example, 'echo a%%a | grep -Pw %%' outputs nothing, whereas 'echo a%%a | pcre2grep -w %%' outputs 'a%%a'. I think the GNU grep behavior (which is the same as with 'grep -w', either on Linux or OpenBSD) is more intuitive here: do you happen to know why PCRE behaves the way it does? Is that worth a PCRE2 bug report? Anyway, the attached patches avoid using PCRE2_EXTRA_MATCH_WORD for that reason.


* no more version restrictions (should work with >~10.20)

I tested with 10.00 and found one more glitch (it doesn't have PCRE2_SIZE_MAX), which is fixed by the attached patch "grep: port to PCRE2 10.20".


Pending:
* what to do with the current support of \C (enabled for now)

Let's open another bug report for that; I'm still a bit fuzzy about what the pros and cons are.


* merge of non critical bugfix in #51710[1]

I plan to follow up in that bug report.

Marking this bug as done. Thanks again for working on this.
From 3bf3f812183abadf2931e026063f7c20b2f4ce56 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <care...@gmail.com>
Date: Fri, 12 Nov 2021 16:45:04 -0800
Subject: [PATCH 01/12] grep: migrate to pcre2

Mostly a bug by bug translation of the original code to the PCRE2 API.
Code still could do with some optimizations but should be good as a
starting point.

The API changes the sign of some types and therefore some ugly casts
were needed, some of the changes are just to make sure all variables
fit into the newer types better.

Includes backward compatibility and could be made to build all the way
to 10.00, but assumes a recent enough version and has been tested with
10.23 (from CentOS 7, the oldest).

Performance seems equivalent, and it also seems functionally complete.

* m4/pcre.m4 (gl_FUNC_PCRE): Check for PCRE2, not the original PCRE.
* src/pcresearch.c (struct pcre_comp, jit_exec)
(Pcompile, Pexecute):
Use PCRE2, not the original PCRE.
* tests/filename-lineno.pl: Adjust to match PCRE2 diagnostics.
---
 doc/grep.in.1            |   8 +-
 doc/grep.texi            |   2 +-
 m4/pcre.m4               |  21 ++--
 src/pcresearch.c         | 249 +++++++++++++++++++--------------------
 tests/filename-lineno.pl |   4 +-
 5 files changed, 138 insertions(+), 146 deletions(-)

diff --git a/doc/grep.in.1 b/doc/grep.in.1
index b014f65..208cb76 100644
--- a/doc/grep.in.1
+++ b/doc/grep.in.1
@@ -756,7 +756,7 @@ In other implementations, basic regular expressions are less powerful.
 The following description applies to extended regular expressions;
 differences for basic regular expressions are summarized afterwards.
 Perl-compatible regular expressions give additional functionality, and are
-documented in B<pcresyntax>(3) and B<pcrepattern>(3), but work only if
+documented in B<pcre2syntax>(3) and B<pcre2pattern>(3), but work only if
 PCRE support is enabled.
 .PP
 The fundamental building blocks are the regular expressions
@@ -1360,9 +1360,9 @@ from the globbing syntax that the shell uses to match file names.
 .BR sort (1),
 .BR xargs (1),
 .BR read (2),
-.BR pcre (3),
-.BR pcresyntax (3),
-.BR pcrepattern (3),
+.BR pcre2 (3),
+.BR pcre2syntax (3),
+.BR pcre2pattern (3),
 .BR terminfo (5),
 .BR glob (7),
 .BR regex (7)
diff --git a/doc/grep.texi b/doc/grep.texi
index e5b9fd8..c3c4bbf 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1168,7 +1168,7 @@ In other implementations, basic regular expressions are less powerful.
 The following description applies to extended regular expressions;
 differences for basic regular expressions are summarized afterwards.
 Perl-compatible regular expressions give additional functionality, and
-are documented in the @i{pcresyntax}(3) and @i{pcrepattern}(3) manual
+are documented in the @i{pcre2syntax}(3) and @i{pcre2pattern}(3) manual
 pages, but work only if PCRE is available in the system.
 
 @menu
diff --git a/m4/pcre.m4 b/m4/pcre.m4
index 78b7fda..a1c6c82 100644
--- a/m4/pcre.m4
+++ b/m4/pcre.m4
@@ -1,4 +1,4 @@
-# pcre.m4 - check for libpcre support
+# pcre.m4 - check for PCRE library support
 
 # Copyright (C) 2010-2021 Free Software Foundation, Inc.
 # This file is free software; the Free Software Foundation
@@ -9,7 +9,7 @@ AC_DEFUN([gl_FUNC_PCRE],
 [
   AC_ARG_ENABLE([perl-regexp],
     AS_HELP_STRING([--disable-perl-regexp],
-                   [disable perl-regexp (pcre) support]),
+                   [disable perl-regexp (pcre2) support]),
     [case $enableval in
        yes|no) test_pcre=$enableval;;
        *) AC_MSG_ERROR([invalid value $enableval for --disable-perl-regexp]);;
@@ -21,24 +21,25 @@ AC_DEFUN([gl_FUNC_PCRE],
   use_pcre=no
 
   if test $test_pcre != no; then
-    PKG_CHECK_MODULES([PCRE], [libpcre], [], [: ${PCRE_LIBS=-lpcre}])
+    PKG_CHECK_MODULES([PCRE], [libpcre2-8], [], [: ${PCRE_LIBS=-lpcre2-8}])
 
-    AC_CACHE_CHECK([for pcre_compile], [pcre_cv_have_pcre_compile],
+    AC_CACHE_CHECK([for pcre2_compile], [pcre_cv_have_pcre2_compile],
       [pcre_saved_CFLAGS=$CFLAGS
        pcre_saved_LIBS=$LIBS
        CFLAGS="$CFLAGS $PCRE_CFLAGS"
        LIBS="$PCRE_LIBS $LIBS"
        AC_LINK_IFELSE(
-         [AC_LANG_PROGRAM([[#include <pcre.h>
+         [AC_LANG_PROGRAM([[#define PCRE2_CODE_UNIT_WIDTH 8
+                            #include <pcre2.h>
                           ]],
-            [[pcre *p = pcre_compile (0, 0, 0, 0, 0);
+            [[pcre2_code *p = pcre2_compile (0, 0, 0, 0, 0, 0);
               return !p;]])],
-         [pcre_cv_have_pcre_compile=yes],
-         [pcre_cv_have_pcre_compile=no])
+         [pcre_cv_have_pcre2_compile=yes],
+         [pcre_cv_have_pcre2_compile=no])
        CFLAGS=$pcre_saved_CFLAGS
        LIBS=$pcre_saved_LIBS])
 
-    if test "$pcre_cv_have_pcre_compile" = yes; then
+    if test "$pcre_cv_have_pcre2_compile" = yes; then
       use_pcre=yes
     elif test $test_pcre = maybe; then
       AC_MSG_WARN([AC_PACKAGE_NAME will be built without pcre support.])
@@ -50,7 +51,7 @@ AC_DEFUN([gl_FUNC_PCRE],
   if test $use_pcre = yes; then
     AC_DEFINE([HAVE_LIBPCRE], [1],
       [Define to 1 if you have the Perl Compatible Regular Expressions
-       library (-lpcre).])
+       library (-lpcre2).])
   else
     PCRE_CFLAGS=
     PCRE_LIBS=
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 09f92c8..630678b 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -17,41 +17,32 @@
    02110-1301, USA.  */
 
 /* Written August 1992 by Mike Haertel. */
+/* Updated for PCRE2 by Carlo Arenas. */
 
 #include <config.h>
 #include "search.h"
 #include "die.h"
 
-#include <pcre.h>
+#define PCRE2_CODE_UNIT_WIDTH 8
+#include <pcre2.h>
 
-/* This must be at least 2; everything after that is for performance
-   in pcre_exec.  */
-enum { NSUB = 300 };
-
-#ifndef PCRE_EXTRA_MATCH_LIMIT_RECURSION
-# define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0
-#endif
-#ifndef PCRE_STUDY_JIT_COMPILE
-# define PCRE_STUDY_JIT_COMPILE 0
-#endif
-#ifndef PCRE_STUDY_EXTRA_NEEDED
-# define PCRE_STUDY_EXTRA_NEEDED 0
+/* Needed for backward compatibility for PCRE2 < 10.30  */
+#ifndef PCRE2_CONFIG_DEPTHLIMIT
+#define PCRE2_CONFIG_DEPTHLIMIT PCRE2_CONFIG_RECURSIONLIMIT
+#define PCRE2_ERROR_DEPTHLIMIT  PCRE2_ERROR_RECURSIONLIMIT
+#define pcre2_set_depth_limit   pcre2_set_recursion_limit
 #endif
 
 struct pcre_comp
 {
-  /* Compiled internal form of a Perl regular expression.  */
-  pcre *cre;
-
-  /* Additional information about the pattern.  */
-  pcre_extra *extra;
-
-#if PCRE_STUDY_JIT_COMPILE
   /* The JIT stack and its maximum size.  */
-  pcre_jit_stack *jit_stack;
-  int jit_stack_size;
-#endif
+  pcre2_jit_stack *jit_stack;
+  PCRE2_SIZE jit_stack_size;
 
+  /* Compiled internal form of a Perl regular expression.  */
+  pcre2_code *cre;
+  pcre2_match_context *mcontext;
+  pcre2_match_data *data;
   /* Table, indexed by ! (flag & PCRE_NOTBOL), of whether the empty
      string matches when that flag is used.  */
   int empty_match[2];
@@ -60,54 +51,49 @@ struct pcre_comp
 
 /* Match the already-compiled PCRE pattern against the data in SUBJECT,
    of size SEARCH_BYTES and starting with offset SEARCH_OFFSET, with
-   options OPTIONS, and storing resulting matches into SUB.  Return
-   the (nonnegative) match location or a (negative) error number.  */
+   options OPTIONS.
+   Return the (nonnegative) match count or a (negative) error number.  */
 static int
-jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
-          int search_offset, int options, int *sub)
+jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
+          PCRE2_SIZE search_offset, int options)
 {
   while (true)
     {
-      int e = pcre_exec (pc->cre, pc->extra, subject, search_bytes,
-                         search_offset, options, sub, NSUB);
-
-#if PCRE_STUDY_JIT_COMPILE
-      /* Going over this would trigger an int overflow bug within PCRE.  */
-      int jitstack_max = INT_MAX - 8 * 1024;
-
-      if (e == PCRE_ERROR_JIT_STACKLIMIT
-          && 0 < pc->jit_stack_size && pc->jit_stack_size <= jitstack_max / 2)
+      int e = pcre2_match (pc->cre, (PCRE2_SPTR)subject, search_bytes,
+                           search_offset, options, pc->data, pc->mcontext);
+      if (e == PCRE2_ERROR_JIT_STACKLIMIT
+          && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
         {
-          int old_size = pc->jit_stack_size;
-          int new_size = pc->jit_stack_size = old_size * 2;
+          PCRE2_SIZE old_size = pc->jit_stack_size;
+          PCRE2_SIZE new_size = pc->jit_stack_size = old_size * 2;
+
           if (pc->jit_stack)
-            pcre_jit_stack_free (pc->jit_stack);
-          pc->jit_stack = pcre_jit_stack_alloc (old_size, new_size);
-          if (!pc->jit_stack)
+            pcre2_jit_stack_free (pc->jit_stack);
+          pc->jit_stack = pcre2_jit_stack_create (old_size, new_size, NULL);
+
+          if (!pc->mcontext)
+            pc->mcontext = pcre2_match_context_create (NULL);
+
+          if (!pc->jit_stack || !pc->mcontext)
             die (EXIT_TROUBLE, 0,
                  _("failed to allocate memory for the PCRE JIT stack"));
-          pcre_assign_jit_stack (pc->extra, NULL, pc->jit_stack);
+          pcre2_jit_stack_assign (pc->mcontext, NULL, pc->jit_stack);
           continue;
         }
-#endif
-
-#if PCRE_EXTRA_MATCH_LIMIT_RECURSION
-      if (e == PCRE_ERROR_RECURSIONLIMIT
-          && (PCRE_STUDY_EXTRA_NEEDED || pc->extra))
+      if (e == PCRE2_ERROR_DEPTHLIMIT)
         {
-          unsigned long lim
-            = (pc->extra->flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION
-               ? pc->extra->match_limit_recursion
-               : 0);
-          if (lim <= ULONG_MAX / 2)
-            {
-              pc->extra->match_limit_recursion = lim ? 2 * lim : (1 << 24) - 1;
-              pc->extra->flags |= PCRE_EXTRA_MATCH_LIMIT_RECURSION;
-              continue;
-            }
-        }
-#endif
+          uint32_t lim;
+          pcre2_config (PCRE2_CONFIG_DEPTHLIMIT, &lim);
+          if (lim >= UINT32_MAX / 2)
+            return e;
+
+          lim <<= 1;
+          if (!pc->mcontext)
+            pc->mcontext = pcre2_match_context_create (NULL);
 
+          pcre2_set_depth_limit (pc->mcontext, lim);
+          continue;
+        }
       return e;
     }
 }
@@ -118,27 +104,35 @@ jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
 void *
 Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
 {
-  int e;
-  char const *ep;
+  PCRE2_SIZE e;
+  int ec;
+  PCRE2_UCHAR8 ep[128]; /* 120 code units is suggested to avoid truncation  */
   static char const wprefix[] = "(?<!\\w)(?:";
   static char const wsuffix[] = ")(?!\\w)";
   static char const xprefix[] = "^(?:";
   static char const xsuffix[] = ")$";
   int fix_len_max = MAX (sizeof wprefix - 1 + sizeof wsuffix - 1,
                          sizeof xprefix - 1 + sizeof xsuffix - 1);
-  char *re = xnmalloc (4, size + (fix_len_max + 4 - 1) / 4);
-  int flags = PCRE_DOLLAR_ENDONLY | (match_icase ? PCRE_CASELESS : 0);
+  unsigned char *re = xmalloc (size + fix_len_max + 1);
+  int flags = PCRE2_DOLLAR_ENDONLY | (match_icase ? PCRE2_CASELESS : 0);
   char *patlim = pattern + size;
-  char *n = re;
-  char const *p;
-  char const *pnul;
+  char *n = (char *)re;
   struct pcre_comp *pc = xcalloc (1, sizeof (*pc));
+  pcre2_compile_context *ccontext = pcre2_compile_context_create(NULL);
 
   if (localeinfo.multibyte)
     {
       if (! localeinfo.using_utf8)
         die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 locales"));
-      flags |= PCRE_UTF8;
+      flags |= PCRE2_UTF;
+#if 0
+      /* do not match individual code units but only UTF-8  */
+      flags |= PCRE2_NEVER_BACKSLASH_C;
+#endif
+#ifdef PCRE2_MATCH_INVALID_UTF
+      /* consider invalid UTF-8 as a barrier, instead of error  */
+      flags |= PCRE2_MATCH_INVALID_UTF;
+#endif
     }
 
   /* FIXME: Remove this restriction.  */
@@ -151,56 +145,42 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
   if (match_lines)
     strcpy (n, xprefix);
   n += strlen (n);
-
-  /* The PCRE interface doesn't allow NUL bytes in the pattern, so
-     replace each NUL byte in the pattern with the four characters
-     "\000", removing a preceding backslash if there are an odd
-     number of backslashes before the NUL.  */
-  *patlim = '\0';
-  for (p = pattern; (pnul = p + strlen (p)) < patlim; p = pnul + 1)
+  memcpy (n, pattern, size);
+  n += size;
+  if (match_words && !match_lines)
     {
-      memcpy (n, p, pnul - p);
-      n += pnul - p;
-      for (p = pnul; pattern < p && p[-1] == '\\'; p--)
-        continue;
-      n -= (pnul - p) & 1;
-      strcpy (n, "\\000");
-      n += 4;
-    }
-  memcpy (n, p, patlim - p + 1);
-  n += patlim - p;
-  *patlim = '\n';
-
-  if (match_words)
     strcpy (n, wsuffix);
+    n += strlen(wsuffix);
+    }
   if (match_lines)
+    {
     strcpy (n, xsuffix);
+    n += strlen(xsuffix);
+    }
 
-  pc->cre = pcre_compile (re, flags, &ep, &e, pcre_maketables ());
+  pcre2_set_character_tables (ccontext, pcre2_maketables (NULL));
+  pc->cre = pcre2_compile (re, n - (char *)re, flags, &ec, &e, ccontext);
   if (!pc->cre)
-    die (EXIT_TROUBLE, 0, "%s", ep);
-
-  int pcre_study_flags = PCRE_STUDY_EXTRA_NEEDED | PCRE_STUDY_JIT_COMPILE;
-  pc->extra = pcre_study (pc->cre, pcre_study_flags, &ep);
-  if (ep)
-    die (EXIT_TROUBLE, 0, "%s", ep);
+    {
+      pcre2_get_error_message (ec, ep, sizeof (ep));
+      die (EXIT_TROUBLE, 0, "%s", ep);
+    }
 
-#if PCRE_STUDY_JIT_COMPILE
-  if (pcre_fullinfo (pc->cre, pc->extra, PCRE_INFO_JIT, &e))
-    die (EXIT_TROUBLE, 0, _("internal error (should never happen)"));
+  pc->data = pcre2_match_data_create_from_pattern (pc->cre, NULL);
 
-  /* The PCRE documentation says that a 32 KiB stack is the default.  */
-  if (e)
-    pc->jit_stack_size = 32 << 10;
-#endif
+  ec = pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
+  if (ec && ec != PCRE2_ERROR_JIT_BADOPTION && ec != PCRE2_ERROR_NOMEMORY)
+    die (EXIT_TROUBLE, 0, _("JIT internal error: %d"), ec);
+  else
+    {
+      /* The PCRE documentation says that a 32 KiB stack is the default.  */
+      pc->jit_stack_size = 32 << 10;
+    }
 
   free (re);
 
-  int sub[NSUB];
-  pc->empty_match[false] = pcre_exec (pc->cre, pc->extra, "", 0, 0,
-                                      PCRE_NOTBOL, sub, NSUB);
-  pc->empty_match[true] = pcre_exec (pc->cre, pc->extra, "", 0, 0, 0, sub,
-                                     NSUB);
+  pc->empty_match[false] = jit_exec (pc, "", 0, 0, PCRE2_NOTBOL);
+  pc->empty_match[true] = jit_exec (pc, "", 0, 0, 0);
 
   return pc;
 }
@@ -209,15 +189,15 @@ ptrdiff_t
 Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
           char const *start_ptr)
 {
-  int sub[NSUB];
   char const *p = start_ptr ? start_ptr : buf;
   bool bol = p[-1] == eolbyte;
   char const *line_start = buf;
-  int e = PCRE_ERROR_NOMATCH;
+  int e = PCRE2_ERROR_NOMATCH;
   char const *line_end;
   struct pcre_comp *pc = vcp;
+  PCRE2_SIZE *sub = pcre2_get_ovector_pointer (pc->data);
 
-  /* The search address to pass to pcre_exec.  This is the start of
+  /* The search address to pass to PCRE.  This is the start of
      the buffer, or just past the most-recently discovered encoding
      error or line end.  */
   char const *subject = buf;
@@ -229,14 +209,14 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
          better and the correctness issues were too puzzling.  See
          Bug#22655.  */
       line_end = rawmemchr (p, eolbyte);
-      if (INT_MAX < line_end - p)
+      if (PCRE2_SIZE_MAX < line_end - p)
         die (EXIT_TROUBLE, 0, _("exceeded PCRE's line length limit"));
 
       for (;;)
         {
           /* Skip past bytes that are easily determined to be encoding
              errors, treating them as data that cannot match.  This is
-             faster than having pcre_exec check them.  */
+             faster than having PCRE check them.  */
           while (localeinfo.sbclen[to_uchar (*p)] == -1)
             {
               p++;
@@ -244,10 +224,10 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
               bol = false;
             }
 
-          int search_offset = p - subject;
+          PCRE2_SIZE search_offset = p - subject;
 
           /* Check for an empty match; this is faster than letting
-             pcre_exec do it.  */
+             PCRE do it.  */
           if (p == line_end)
             {
               sub[0] = sub[1] = search_offset;
@@ -257,13 +237,14 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
 
           int options = 0;
           if (!bol)
-            options |= PCRE_NOTBOL;
+            options |= PCRE2_NOTBOL;
 
-          e = jit_exec (pc, subject, line_end - subject, search_offset,
-                        options, sub);
-          if (e != PCRE_ERROR_BADUTF8)
+          e = jit_exec (pc, subject, line_end - subject,
+                        search_offset, options);
+          /* PCRE2 provides 22 different error codes for bad UTF-8  */
+          if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e < PCRE2_ERROR_UTF8_ERR1))
             break;
-          int valid_bytes = sub[0];
+          PCRE2_SIZE valid_bytes = pcre2_get_startchar (pc->data);
 
           if (search_offset <= valid_bytes)
             {
@@ -273,14 +254,15 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
                   /* Handle the empty-match case specially, for speed.
                      This optimization is valid if VALID_BYTES is zero,
                      which means SEARCH_OFFSET is also zero.  */
+                  sub[0] = valid_bytes;
                   sub[1] = 0;
                   e = pc->empty_match[bol];
                 }
               else
                 e = jit_exec (pc, subject, valid_bytes, search_offset,
-                              options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL, sub);
+                              options | PCRE2_NO_UTF_CHECK | PCRE2_NOTEOL);
 
-              if (e != PCRE_ERROR_NOMATCH)
+              if (e != PCRE2_ERROR_NOMATCH)
                 break;
 
               /* Treat the encoding error as data that cannot match.  */
@@ -291,7 +273,7 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
           subject += valid_bytes + 1;
         }
 
-      if (e != PCRE_ERROR_NOMATCH)
+      if (e != PCRE2_ERROR_NOMATCH)
         break;
       bol = true;
       p = subject = line_start = line_end + 1;
@@ -302,26 +284,35 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
     {
       switch (e)
         {
-        case PCRE_ERROR_NOMATCH:
+        case PCRE2_ERROR_NOMATCH:
           break;
 
-        case PCRE_ERROR_NOMEMORY:
+        case PCRE2_ERROR_NOMEMORY:
           die (EXIT_TROUBLE, 0, _("%s: memory exhausted"), input_filename ());
 
-#if PCRE_STUDY_JIT_COMPILE
-        case PCRE_ERROR_JIT_STACKLIMIT:
+        case PCRE2_ERROR_JIT_STACKLIMIT:
           die (EXIT_TROUBLE, 0, _("%s: exhausted PCRE JIT stack"),
                input_filename ());
-#endif
 
-        case PCRE_ERROR_MATCHLIMIT:
+        case PCRE2_ERROR_MATCHLIMIT:
           die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's backtracking limit"),
                input_filename ());
 
-        case PCRE_ERROR_RECURSIONLIMIT:
-          die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's recursion limit"),
+        case PCRE2_ERROR_DEPTHLIMIT:
+          die (EXIT_TROUBLE, 0,
+               _("%s: exceeded PCRE's nested backtracking limit"),
                input_filename ());
 
+        case PCRE2_ERROR_RECURSELOOP:
+          die (EXIT_TROUBLE, 0, _("%s: PCRE detected recurse loop"),
+               input_filename ());
+
+#ifdef PCRE2_ERROR_HEAPLIMIT
+        case PCRE2_ERROR_HEAPLIMIT:
+          die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's heap limit"),
+               input_filename ());
+#endif
+
         default:
           /* For now, we lump all remaining PCRE failures into this basket.
              If anyone cares to provide sample grep usage that can trigger
diff --git a/tests/filename-lineno.pl b/tests/filename-lineno.pl
index 1e84b45..1ff3d6a 100755
--- a/tests/filename-lineno.pl
+++ b/tests/filename-lineno.pl
@@ -101,13 +101,13 @@ my @Tests =
    ],
    ['invalid-re-P-paren', '-P ")"', {EXIT=>2},
     {ERR => $ENV{PCRE_WORKS} == 1
-       ? "$prog: unmatched parentheses\n"
+       ? "$prog: unmatched closing parenthesis\n"
        : $no_pcre
     },
    ],
    ['invalid-re-P-star-paren', '-P "a.*)"', {EXIT=>2},
     {ERR => $ENV{PCRE_WORKS} == 1
-       ? "$prog: unmatched parentheses\n"
+       ? "$prog: unmatched closing parenthesis\n"
        : $no_pcre
     },
    ],
-- 
2.32.0

From b6b43bea32b5e059929fa58ed8ad182fe5fa1ecd Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 12 Nov 2021 16:56:53 -0800
Subject: [PATCH 02/12] maint: minor rewording and reindenting

---
 NEWS             |  4 ++++
 TODO             |  4 ++--
 m4/pcre.m4       |  8 ++++----
 src/pcresearch.c | 44 ++++++++++++++++++++++----------------------
 tests/pcre-abort |  2 +-
 5 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/NEWS b/NEWS
index 4a62fb7..2f63071 100644
--- a/NEWS
+++ b/NEWS
@@ -4,10 +4,14 @@ GNU grep NEWS                                    -*- outline -*-
 
 ** Changes in behavior
 
+  The -P option is now based on PCRE2 instead of the older PCRE,
+  thanks to code contributed by Carlo Arenas.
+
   The egrep and fgrep commands, which have been deprecated since
   release 2.5.3 (2007), now warn that they are obsolescent and should
   be replaced by grep -E and grep -F.
 
+
 * Noteworthy changes in release 3.7 (2021-08-14) [stable]
 
 ** Changes in behavior
diff --git a/TODO b/TODO
index 5211ac1..0b82eff 100644
--- a/TODO
+++ b/TODO
@@ -31,13 +31,13 @@ GNU grep originally did 32-bit arithmetic.  Although it has moved to
 64-bit on 64-bit platforms by using types like ptrdiff_t and size_t,
 this conversion has not been entirely systematic and should be checked.
 
-Lazy dynamic linking of libpcre.  See Debian’s 03-397262-dlopen-pcre.patch.
+Lazy dynamic linking of the PCRE library.
 
 Check FreeBSD’s integration of zgrep (-Z) and bzgrep (-J) in one
 binary.  Is there a possibility of doing even better by automatically
 checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F
 0x9D for compress, and 0x42 0x5A 0x68 for bzip2)?  Once what to do with
-libpcre is decided, do the same for libz and libbz2.
+the PCRE library is decided, do the same for libz and libbz2.
 
 
 ===================
diff --git a/m4/pcre.m4 b/m4/pcre.m4
index a1c6c82..970a229 100644
--- a/m4/pcre.m4
+++ b/m4/pcre.m4
@@ -9,7 +9,7 @@ AC_DEFUN([gl_FUNC_PCRE],
 [
   AC_ARG_ENABLE([perl-regexp],
     AS_HELP_STRING([--disable-perl-regexp],
-                   [disable perl-regexp (pcre2) support]),
+                   [disable perl-regexp (PCRE) support]),
     [case $enableval in
        yes|no) test_pcre=$enableval;;
        *) AC_MSG_ERROR([invalid value $enableval for --disable-perl-regexp]);;
@@ -42,16 +42,16 @@ AC_DEFUN([gl_FUNC_PCRE],
     if test "$pcre_cv_have_pcre2_compile" = yes; then
       use_pcre=yes
     elif test $test_pcre = maybe; then
-      AC_MSG_WARN([AC_PACKAGE_NAME will be built without pcre support.])
+      AC_MSG_WARN([AC_PACKAGE_NAME will be built without PCRE support.])
     else
-      AC_MSG_ERROR([pcre support not available])
+      AC_MSG_ERROR([PCRE support not available])
     fi
   fi
 
   if test $use_pcre = yes; then
     AC_DEFINE([HAVE_LIBPCRE], [1],
       [Define to 1 if you have the Perl Compatible Regular Expressions
-       library (-lpcre2).])
+       library.])
   else
     PCRE_CFLAGS=
     PCRE_LIBS=
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 630678b..daa0c42 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -16,9 +16,6 @@
    Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
    02110-1301, USA.  */
 
-/* Written August 1992 by Mike Haertel. */
-/* Updated for PCRE2 by Carlo Arenas. */
-
 #include <config.h>
 #include "search.h"
 #include "die.h"
@@ -26,24 +23,27 @@
 #define PCRE2_CODE_UNIT_WIDTH 8
 #include <pcre2.h>
 
-/* Needed for backward compatibility for PCRE2 < 10.30  */
+/* For PCRE2 < 10.30.  */
 #ifndef PCRE2_CONFIG_DEPTHLIMIT
-#define PCRE2_CONFIG_DEPTHLIMIT PCRE2_CONFIG_RECURSIONLIMIT
-#define PCRE2_ERROR_DEPTHLIMIT  PCRE2_ERROR_RECURSIONLIMIT
-#define pcre2_set_depth_limit   pcre2_set_recursion_limit
+# define PCRE2_CONFIG_DEPTHLIMIT PCRE2_CONFIG_RECURSIONLIMIT
+# define PCRE2_ERROR_DEPTHLIMIT PCRE2_ERROR_RECURSIONLIMIT
+# define pcre2_set_depth_limit pcre2_set_recursion_limit
 #endif
 
 struct pcre_comp
 {
-  /* The JIT stack and its maximum size.  */
-  pcre2_jit_stack *jit_stack;
-  PCRE2_SIZE jit_stack_size;
-
   /* Compiled internal form of a Perl regular expression.  */
   pcre2_code *cre;
+
+  /* Match context and data block.  */
   pcre2_match_context *mcontext;
   pcre2_match_data *data;
-  /* Table, indexed by ! (flag & PCRE_NOTBOL), of whether the empty
+
+  /* The JIT stack and its maximum size.  */
+  pcre2_jit_stack *jit_stack;
+  PCRE2_SIZE jit_stack_size;
+
+  /* Table, indexed by ! (flag & PCRE2_NOTBOL), of whether the empty
      string matches when that flag is used.  */
   int empty_match[2];
 };
@@ -59,7 +59,7 @@ jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
 {
   while (true)
     {
-      int e = pcre2_match (pc->cre, (PCRE2_SPTR)subject, search_bytes,
+      int e = pcre2_match (pc->cre, (PCRE2_SPTR) subject, search_bytes,
                            search_offset, options, pc->data, pc->mcontext);
       if (e == PCRE2_ERROR_JIT_STACKLIMIT
           && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
@@ -118,7 +118,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
   char *patlim = pattern + size;
   char *n = (char *)re;
   struct pcre_comp *pc = xcalloc (1, sizeof (*pc));
-  pcre2_compile_context *ccontext = pcre2_compile_context_create(NULL);
+  pcre2_compile_context *ccontext = pcre2_compile_context_create (NULL);
 
   if (localeinfo.multibyte)
     {
@@ -126,11 +126,11 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
         die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 locales"));
       flags |= PCRE2_UTF;
 #if 0
-      /* do not match individual code units but only UTF-8  */
+      /* Do not match individual code units but only UTF-8.  */
       flags |= PCRE2_NEVER_BACKSLASH_C;
 #endif
 #ifdef PCRE2_MATCH_INVALID_UTF
-      /* consider invalid UTF-8 as a barrier, instead of error  */
+      /* Consider invalid UTF-8 as a barrier, instead of error.  */
       flags |= PCRE2_MATCH_INVALID_UTF;
 #endif
     }
@@ -149,13 +149,13 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
   n += size;
   if (match_words && !match_lines)
     {
-    strcpy (n, wsuffix);
-    n += strlen(wsuffix);
+      strcpy (n, wsuffix);
+      n += strlen (wsuffix);
     }
   if (match_lines)
     {
-    strcpy (n, xsuffix);
-    n += strlen(xsuffix);
+      strcpy (n, xsuffix);
+      n += strlen (xsuffix);
     }
 
   pcre2_set_character_tables (ccontext, pcre2_maketables (NULL));
@@ -204,8 +204,8 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
 
   do
     {
-      /* Search line by line.  Although this code formerly used
-         PCRE_MULTILINE for performance, the performance wasn't always
+      /* Search line by line.  Although this formerly used something like
+         PCRE2_MULTILINE for performance, the performance wasn't always
          better and the correctness issues were too puzzling.  See
          Bug#22655.  */
       line_end = rawmemchr (p, eolbyte);
diff --git a/tests/pcre-abort b/tests/pcre-abort
index 51cee25..772a1d2 100755
--- a/tests/pcre-abort
+++ b/tests/pcre-abort
@@ -1,5 +1,5 @@
 #! /bin/sh
-# Show that grep handles PCRE's PCRE_ERROR_MATCHLIMIT.
+# Show that grep handles PCRE2_ERROR_MATCHLIMIT.
 # In grep-2.8, it would abort.
 #
 # Copyright (C) 2011-2021 Free Software Foundation, Inc.
-- 
2.32.0

From e896d8b0ecda036233dfa20ac0b17a6ac3d65431 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 12 Nov 2021 21:30:25 -0800
Subject: [PATCH 03/12] =?UTF-8?q?grep:=20Don=E2=80=99t=20limit=20jitstack?=
 =?UTF-8?q?=5Fmax=20to=20INT=5FMAX?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/pcresearch.c (jit_exec): Remove arbitrary INT_MAX limit on JIT
stack size.
---
 src/pcresearch.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index daa0c42..bf966f8 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -59,10 +59,16 @@ jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
 {
   while (true)
     {
+      /* STACK_GROWTH_RATE is taken from PCRE's src/pcre2_jit_compile.c.
+         Going over the jitstack_max limit could trigger an int
+         overflow bug within PCRE.  */
+      int STACK_GROWTH_RATE = 8192;
+      size_t jitstack_max = SIZE_MAX - (STACK_GROWTH_RATE - 1);
+
       int e = pcre2_match (pc->cre, (PCRE2_SPTR) subject, search_bytes,
                            search_offset, options, pc->data, pc->mcontext);
       if (e == PCRE2_ERROR_JIT_STACKLIMIT
-          && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
+          && 0 < pc->jit_stack_size && pc->jit_stack_size <= jitstack_max / 2)
         {
           PCRE2_SIZE old_size = pc->jit_stack_size;
           PCRE2_SIZE new_size = pc->jit_stack_size = old_size * 2;
-- 
2.32.0

From 8c9ceb884346ce25f153fd236c245b3713181f22 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 12 Nov 2021 21:34:12 -0800
Subject: [PATCH 04/12] grep: improve pcre2_get_error_message comments

* src/pcresearch.c (Pcompile): Improve comments re
pcre2_get_error_message buffer.
---
 src/pcresearch.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index bf966f8..286e1dc 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -112,7 +112,6 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
 {
   PCRE2_SIZE e;
   int ec;
-  PCRE2_UCHAR8 ep[128]; /* 120 code units is suggested to avoid truncation  */
   static char const wprefix[] = "(?<!\\w)(?:";
   static char const wsuffix[] = ")(?!\\w)";
   static char const xprefix[] = "^(?:";
@@ -168,7 +167,9 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
   pc->cre = pcre2_compile (re, n - (char *)re, flags, &ec, &e, ccontext);
   if (!pc->cre)
     {
-      pcre2_get_error_message (ec, ep, sizeof (ep));
+      enum { ERRBUFSIZ = 256 }; /* Taken from pcre2grep.c ERRBUFSIZ.  */
+      PCRE2_UCHAR8 ep[ERRBUFSIZ];
+      pcre2_get_error_message (ec, ep, sizeof ep);
       die (EXIT_TROUBLE, 0, "%s", ep);
     }
 
-- 
2.32.0

From a1b444027231caac247ab0fbd5be8bda8eb3d626 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Sat, 13 Nov 2021 13:52:23 -0800
Subject: [PATCH 05/12] grep: speed up, fix bad-UTF8 check with -P

* src/pcresearch.c (bad_utf8_from_pcre2): New function.  Fix bug
where PCRE2_ERROR_UTF8_ERR1 was not treated as an encoding error.
Improve performance when PCRE2_MATCH_INVALID_UTF is defined.
(Pexecute): Use it.
---
 src/pcresearch.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 286e1dc..953aca2 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -104,6 +104,18 @@ jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
     }
 }
 
+/* Return true if E is an error code for bad UTF-8, and if pcre2_match
+   could return E because PCRE lacks PCRE2_MATCH_INVALID_UTF.  */
+static bool
+bad_utf8_from_pcre2 (int e)
+{
+#ifdef PCRE2_MATCH_INVALID_UTF
+  return false;
+#else
+  return PCRE2_ERROR_UTF8_ERR21 <= e && e <= PCRE2_ERROR_UTF8_ERR1;
+#endif
+}
+
 /* Compile the -P style PATTERN, containing SIZE bytes that are
    followed by '\n'.  Return a description of the compiled pattern.  */
 
@@ -248,9 +260,9 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
 
           e = jit_exec (pc, subject, line_end - subject,
                         search_offset, options);
-          /* PCRE2 provides 22 different error codes for bad UTF-8  */
-          if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e < PCRE2_ERROR_UTF8_ERR1))
+          if (!bad_utf8_from_pcre2 (e))
             break;
+
           PCRE2_SIZE valid_bytes = pcre2_get_startchar (pc->data);
 
           if (search_offset <= valid_bytes)
-- 
2.32.0

From cb2725e28eed832c6d295cdb23e3c0b73002521a Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Sat, 13 Nov 2021 17:30:23 -0800
Subject: [PATCH 06/12] grep: prefer signed integers

* src/pcresearch.c (struct pcre_comp, jit_exec, Pexecute):
Prefer signed to unsigned types when either will do.
(jit_exec): Use INT_MULTIPLY_WRAPV instead of doing it by hand.
(Pexecute): Omit line length limit test that is no longer
needed with PCRE2.
---
 src/pcresearch.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 953aca2..fdecbe8 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -41,7 +41,7 @@ struct pcre_comp
 
   /* The JIT stack and its maximum size.  */
   pcre2_jit_stack *jit_stack;
-  PCRE2_SIZE jit_stack_size;
+  idx_t jit_stack_size;
 
   /* Table, indexed by ! (flag & PCRE2_NOTBOL), of whether the empty
      string matches when that flag is used.  */
@@ -54,24 +54,24 @@ struct pcre_comp
    options OPTIONS.
    Return the (nonnegative) match count or a (negative) error number.  */
 static int
-jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
-          PCRE2_SIZE search_offset, int options)
+jit_exec (struct pcre_comp *pc, char const *subject, idx_t search_bytes,
+          idx_t search_offset, int options)
 {
   while (true)
     {
       /* STACK_GROWTH_RATE is taken from PCRE's src/pcre2_jit_compile.c.
          Going over the jitstack_max limit could trigger an int
-         overflow bug within PCRE.  */
+         overflow bug.  */
       int STACK_GROWTH_RATE = 8192;
-      size_t jitstack_max = SIZE_MAX - (STACK_GROWTH_RATE - 1);
+      idx_t jitstack_max = MIN (IDX_MAX, SIZE_MAX - (STACK_GROWTH_RATE - 1));
 
       int e = pcre2_match (pc->cre, (PCRE2_SPTR) subject, search_bytes,
                            search_offset, options, pc->data, pc->mcontext);
       if (e == PCRE2_ERROR_JIT_STACKLIMIT
-          && 0 < pc->jit_stack_size && pc->jit_stack_size <= jitstack_max / 2)
+          && pc->jit_stack_size <= jitstack_max / 2)
         {
-          PCRE2_SIZE old_size = pc->jit_stack_size;
-          PCRE2_SIZE new_size = pc->jit_stack_size = old_size * 2;
+          idx_t old_size = pc->jit_stack_size;
+          idx_t new_size = pc->jit_stack_size = old_size * 2;
 
           if (pc->jit_stack)
             pcre2_jit_stack_free (pc->jit_stack);
@@ -90,10 +90,8 @@ jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
         {
           uint32_t lim;
           pcre2_config (PCRE2_CONFIG_DEPTHLIMIT, &lim);
-          if (lim >= UINT32_MAX / 2)
+          if (INT_MULTIPLY_WRAPV (lim, 2, &lim))
             return e;
-
-          lim <<= 1;
           if (!pc->mcontext)
             pc->mcontext = pcre2_match_context_create (NULL);
 
@@ -243,7 +241,7 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
               bol = false;
             }
 
-          PCRE2_SIZE search_offset = p - subject;
+          idx_t search_offset = p - subject;
 
           /* Check for an empty match; this is faster than letting
              PCRE do it.  */
@@ -263,7 +261,7 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
           if (!bad_utf8_from_pcre2 (e))
             break;
 
-          PCRE2_SIZE valid_bytes = pcre2_get_startchar (pc->data);
+          idx_t valid_bytes = pcre2_get_startchar (pc->data);
 
           if (search_offset <= valid_bytes)
             {
-- 
2.32.0

From b08fd96f28cc8b2c5b1afb2ddf0f26425f3779af Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Sun, 14 Nov 2021 08:12:59 -0800
Subject: [PATCH 07/12] grep: use PCRE2_EXTRA_MATCH_LINE

* src/pcresearch.c (Pcompile): If available, use
PCRE2_EXTRA_MATCH_LINE instead of doing it by hand.
Simplify construction of substitute regular expression.
---
 src/pcresearch.c | 54 +++++++++++++++++++++++++++---------------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index fdecbe8..6e1f217 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -122,16 +122,8 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
 {
   PCRE2_SIZE e;
   int ec;
-  static char const wprefix[] = "(?<!\\w)(?:";
-  static char const wsuffix[] = ")(?!\\w)";
-  static char const xprefix[] = "^(?:";
-  static char const xsuffix[] = ")$";
-  int fix_len_max = MAX (sizeof wprefix - 1 + sizeof wsuffix - 1,
-                         sizeof xprefix - 1 + sizeof xsuffix - 1);
-  unsigned char *re = xmalloc (size + fix_len_max + 1);
   int flags = PCRE2_DOLLAR_ENDONLY | (match_icase ? PCRE2_CASELESS : 0);
   char *patlim = pattern + size;
-  char *n = (char *)re;
   struct pcre_comp *pc = xcalloc (1, sizeof (*pc));
   pcre2_compile_context *ccontext = pcre2_compile_context_create (NULL);
 
@@ -154,27 +146,41 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
   if (rawmemchr (pattern, '\n') != patlim)
     die (EXIT_TROUBLE, 0, _("the -P option only supports a single pattern"));
 
-  *n = '\0';
-  if (match_words)
-    strcpy (n, wprefix);
+  void *re_storage = NULL;
   if (match_lines)
-    strcpy (n, xprefix);
-  n += strlen (n);
-  memcpy (n, pattern, size);
-  n += size;
-  if (match_words && !match_lines)
     {
-      strcpy (n, wsuffix);
-      n += strlen (wsuffix);
+#ifdef PCRE2_EXTRA_MATCH_LINE
+      pcre2_set_compile_extra_options (ccontext, PCRE2_EXTRA_MATCH_LINE);
+#else
+      static char const /* These sizes omit trailing NUL.  */
+        xprefix[4] = "^(?:", xsuffix[2] = ")$";
+      idx_t re_size = size + sizeof xprefix + sizeof xsuffix;
+      char *re = re_storage = ximalloc (re_size);
+      char *rez = mempcpy (re, xprefix, sizeof xprefix);
+      rez = mempcpy (rez, pattern, size);
+      memcpy (rez, xsuffix, sizeof xsuffix);
+      pattern = re;
+      size = re_size;
+#endif
     }
-  if (match_lines)
+  else if (match_words)
     {
-      strcpy (n, xsuffix);
-      n += strlen (xsuffix);
+      /* PCRE2_EXTRA_MATCH_WORD is incompatible with grep -w;
+         do things the grep way.  */
+      static char const /* These sizes omit trailing NUL.  */
+        wprefix[10] = "(?<!\\w)(?:", wsuffix[7] = ")(?!\\w)";
+      idx_t re_size = size + sizeof wprefix + sizeof wsuffix;
+      char *re = re_storage = ximalloc (re_size);
+      char *rez = mempcpy (re, wprefix, sizeof wprefix);
+      rez = mempcpy (rez, pattern, size);
+      memcpy (rez, wsuffix, sizeof wsuffix);
+      pattern = re;
+      size = re_size;
     }
 
   pcre2_set_character_tables (ccontext, pcre2_maketables (NULL));
-  pc->cre = pcre2_compile (re, n - (char *)re, flags, &ec, &e, ccontext);
+  pc->cre = pcre2_compile ((PCRE2_SPTR) pattern, size, flags,
+                           &ec, &e, ccontext);
   if (!pc->cre)
     {
       enum { ERRBUFSIZ = 256 }; /* Taken from pcre2grep.c ERRBUFSIZ.  */
@@ -183,6 +189,8 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
       die (EXIT_TROUBLE, 0, "%s", ep);
     }
 
+  free (re_storage);
+
   pc->data = pcre2_match_data_create_from_pattern (pc->cre, NULL);
 
   ec = pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
@@ -194,8 +202,6 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
       pc->jit_stack_size = 32 << 10;
     }
 
-  free (re);
-
   pc->empty_match[false] = jit_exec (pc, "", 0, 0, PCRE2_NOTBOL);
   pc->empty_match[true] = jit_exec (pc, "", 0, 0, 0);
 
-- 
2.32.0

From c11ea452b0ba48a47926d844e1ee7d06a64e2354 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Sun, 14 Nov 2021 08:18:42 -0800
Subject: [PATCH 08/12] grep: simplify JIT setup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/pcresearch.c (Pcompile): Simplify since ‘die’ cannot return.
---
 src/pcresearch.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 6e1f217..9898e04 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -196,11 +196,9 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
   ec = pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
   if (ec && ec != PCRE2_ERROR_JIT_BADOPTION && ec != PCRE2_ERROR_NOMEMORY)
     die (EXIT_TROUBLE, 0, _("JIT internal error: %d"), ec);
-  else
-    {
-      /* The PCRE documentation says that a 32 KiB stack is the default.  */
-      pc->jit_stack_size = 32 << 10;
-    }
+
+  /* The PCRE documentation says that a 32 KiB stack is the default.  */
+  pc->jit_stack_size = 32 << 10;
 
   pc->empty_match[false] = jit_exec (pc, "", 0, 0, PCRE2_NOTBOL);
   pc->empty_match[true] = jit_exec (pc, "", 0, 0, 0);
-- 
2.32.0

From 3e7de1b45c7e6dfa2d923142bc40ee1fba589a25 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Sun, 14 Nov 2021 09:34:15 -0800
Subject: [PATCH 09/12] grep: improve memory exhaustion checking with -P
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/pcresearch.c (struct pcre_comp): New member gcontext.
(private_malloc, private_free): New functions.
(jit_exec): It is OK to call pcre2_jit_stack_free (NULL), so simplify.
Use gcontext for allocation.  Check for pcre2_jit_stack_create
failure, since sljit bypasses private_malloc.  Redo to avoid two
‘continue’s.
(Pcompile): Create and use gcontext.
---
 src/pcresearch.c | 50 ++++++++++++++++++++++++++++++------------------
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 9898e04..a99835e 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -32,6 +32,9 @@
 
 struct pcre_comp
 {
+  /* General context for PCRE operations.  */
+  pcre2_general_context *gcontext;
+
   /* Compiled internal form of a Perl regular expression.  */
   pcre2_code *cre;
 
@@ -48,6 +51,19 @@ struct pcre_comp
   int empty_match[2];
 };
 
+/* Memory allocation functions for PCRE.  */
+static void *
+private_malloc (PCRE2_SIZE size, _GL_UNUSED void *unused)
+{
+  if (IDX_MAX < size)
+    xalloc_die ();
+  return ximalloc (size);
+}
+static void
+private_free (void *ptr, _GL_UNUSED void *unused)
+{
+  free (ptr);
+}
 
 /* Match the already-compiled PCRE pattern against the data in SUBJECT,
    of size SEARCH_BYTES and starting with offset SEARCH_OFFSET, with
@@ -72,33 +88,27 @@ jit_exec (struct pcre_comp *pc, char const *subject, idx_t search_bytes,
         {
           idx_t old_size = pc->jit_stack_size;
           idx_t new_size = pc->jit_stack_size = old_size * 2;
-
-          if (pc->jit_stack)
-            pcre2_jit_stack_free (pc->jit_stack);
-          pc->jit_stack = pcre2_jit_stack_create (old_size, new_size, NULL);
-
+          pcre2_jit_stack_free (pc->jit_stack);
+          pc->jit_stack = pcre2_jit_stack_create (old_size, new_size,
+                                                  pc->gcontext);
+          if (!pc->jit_stack)
+            xalloc_die ();
           if (!pc->mcontext)
-            pc->mcontext = pcre2_match_context_create (NULL);
-
-          if (!pc->jit_stack || !pc->mcontext)
-            die (EXIT_TROUBLE, 0,
-                 _("failed to allocate memory for the PCRE JIT stack"));
+            pc->mcontext = pcre2_match_context_create (pc->gcontext);
           pcre2_jit_stack_assign (pc->mcontext, NULL, pc->jit_stack);
-          continue;
         }
-      if (e == PCRE2_ERROR_DEPTHLIMIT)
+      else if (e == PCRE2_ERROR_DEPTHLIMIT)
         {
           uint32_t lim;
           pcre2_config (PCRE2_CONFIG_DEPTHLIMIT, &lim);
           if (INT_MULTIPLY_WRAPV (lim, 2, &lim))
             return e;
           if (!pc->mcontext)
-            pc->mcontext = pcre2_match_context_create (NULL);
-
+            pc->mcontext = pcre2_match_context_create (pc->gcontext);
           pcre2_set_depth_limit (pc->mcontext, lim);
-          continue;
         }
-      return e;
+      else
+        return e;
     }
 }
 
@@ -125,7 +135,9 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
   int flags = PCRE2_DOLLAR_ENDONLY | (match_icase ? PCRE2_CASELESS : 0);
   char *patlim = pattern + size;
   struct pcre_comp *pc = xcalloc (1, sizeof (*pc));
-  pcre2_compile_context *ccontext = pcre2_compile_context_create (NULL);
+  pcre2_general_context *gcontext = pc->gcontext
+    = pcre2_general_context_create (private_malloc, private_free, NULL);
+  pcre2_compile_context *ccontext = pcre2_compile_context_create (gcontext);
 
   if (localeinfo.multibyte)
     {
@@ -178,7 +190,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
       size = re_size;
     }
 
-  pcre2_set_character_tables (ccontext, pcre2_maketables (NULL));
+  pcre2_set_character_tables (ccontext, pcre2_maketables (gcontext));
   pc->cre = pcre2_compile ((PCRE2_SPTR) pattern, size, flags,
                            &ec, &e, ccontext);
   if (!pc->cre)
@@ -191,7 +203,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
 
   free (re_storage);
 
-  pc->data = pcre2_match_data_create_from_pattern (pc->cre, NULL);
+  pc->data = pcre2_match_data_create_from_pattern (pc->cre, gcontext);
 
   ec = pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
   if (ec && ec != PCRE2_ERROR_JIT_BADOPTION && ec != PCRE2_ERROR_NOMEMORY)
-- 
2.32.0

From 84a5d359815a6f0d208ec8ceba1ccf8104749637 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Sun, 14 Nov 2021 09:39:05 -0800
Subject: [PATCH 10/12] grep: use ximalloc, not xcalloc

* src/pcresearch.c (Pcompile): Use ximalloc, not xcalloc,
and explicitly initialize the two slots that should be null.
This is more likely to catch future errors if we use valgrind.
---
 src/pcresearch.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index a99835e..dea39f0 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -134,7 +134,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
   int ec;
   int flags = PCRE2_DOLLAR_ENDONLY | (match_icase ? PCRE2_CASELESS : 0);
   char *patlim = pattern + size;
-  struct pcre_comp *pc = xcalloc (1, sizeof (*pc));
+  struct pcre_comp *pc = ximalloc (sizeof *pc);
   pcre2_general_context *gcontext = pc->gcontext
     = pcre2_general_context_create (private_malloc, private_free, NULL);
   pcre2_compile_context *ccontext = pcre2_compile_context_create (gcontext);
@@ -203,6 +203,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
 
   free (re_storage);
 
+  pc->mcontext = NULL;
   pc->data = pcre2_match_data_create_from_pattern (pc->cre, gcontext);
 
   ec = pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
@@ -210,6 +211,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
     die (EXIT_TROUBLE, 0, _("JIT internal error: %d"), ec);
 
   /* The PCRE documentation says that a 32 KiB stack is the default.  */
+  pc->jit_stack = NULL;
   pc->jit_stack_size = 32 << 10;
 
   pc->empty_match[false] = jit_exec (pc, "", 0, 0, PCRE2_NOTBOL);
-- 
2.32.0

From 2ebdebfc843ea43f93ef1a729cd8ff7d79ae3305 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Sun, 14 Nov 2021 10:54:12 -0800
Subject: [PATCH 11/12] grep: fix minor -P memory leak

* src/pcresearch.c (Pcompile): Free ccontext when no longer needed.
---
 src/pcresearch.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index dea39f0..ef8215f 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -202,6 +202,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
     }
 
   free (re_storage);
+  pcre2_compile_context_free (ccontext);
 
   pc->mcontext = NULL;
   pc->data = pcre2_match_data_create_from_pattern (pc->cre, gcontext);
-- 
2.32.0

From aaafe3de9d0ec00f97a7db8eaf7ddfe7312e9f5f Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Sun, 14 Nov 2021 11:30:21 -0800
Subject: [PATCH 12/12] grep: port to PCRE2 10.20

* src/pcresearch.c (PCRE2_SIZE_MAX): Default to SIZE_MAX.
---
 src/pcresearch.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index ef8215f..c12c674 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -23,7 +23,10 @@
 #define PCRE2_CODE_UNIT_WIDTH 8
 #include <pcre2.h>
 
-/* For PCRE2 < 10.30.  */
+/* For older PCRE2.  */
+#ifndef PCRE2_SIZE_MAX
+# define PCRE2_SIZE_MAX SIZE_MAX
+#endif
 #ifndef PCRE2_CONFIG_DEPTHLIMIT
 # define PCRE2_CONFIG_DEPTHLIMIT PCRE2_CONFIG_RECURSIONLIMIT
 # define PCRE2_ERROR_DEPTHLIMIT PCRE2_ERROR_RECURSIONLIMIT
-- 
2.32.0

Reply via email to