[PATCH] regex: fix backreference matching

2021-06-16 Thread Egor Ignatov
This fixes a bug described in 70b673eb7.

* lib/regexec.c (set_regs): Revert pop condition changed in the
commit mentioned above.
(proceed_next_node): Always proceed on OP_BACK_REF to the
next node if naccepted is 0.
(update_regs): Fix optional sub expression boundaries matching.
* tests/test-regex.c: Fix tests.

Signed-off-by: Egor Ignatov 
---
 lib/regexec.c  | 12 ++--
 tests/test-regex.c |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/regexec.c b/lib/regexec.c
index 5d4113c9d..23b984a21 100644
--- a/lib/regexec.c
+++ b/lib/regexec.c
@@ -1292,9 +1292,9 @@ proceed_next_node (const re_match_context_t *mctx, Idx 
nregs, regmatch_t *regs,
  if (__glibc_unlikely (! ok))
return -2;
  dest_node = dfa->edests[node].elems[0];
- if (re_node_set_contains (&mctx->state_log[*pidx]->nodes,
-   dest_node))
-   return dest_node;
+ if(dfa->nodes[dest_node].type == END_OF_RE)
+   regs[0].rm_eo = *pidx;
+ return dest_node;
}
}
 
@@ -1413,8 +1413,7 @@ set_regs (const regex_t *preg, const re_match_context_t 
*mctx, size_t nmatch,
 {
   update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
 
-  if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
- || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
+  if (idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
{
  Idx reg_idx;
  cur_node = -1;
@@ -1514,7 +1513,8 @@ update_regs (const re_dfa_t *dfa, regmatch_t *pmatch,
  else
{
  if (dfa->nodes[cur_node].opt_subexp
- && prev_idx_match[reg_num].rm_so != -1)
+ && prev_idx_match[reg_num].rm_so != -1
+ && pmatch[reg_num].rm_eo != -1)
/* We transited through an empty match for an optional
   subexpression, like (a?)*, and this is not the subexp's
   first match.  Copy back the old content of the registers
diff --git a/tests/test-regex.c b/tests/test-regex.c
index 7ea73cfb6..f73909258 100644
--- a/tests/test-regex.c
+++ b/tests/test-regex.c
@@ -119,7 +119,7 @@ static struct
   /* Test for *+ match.  */
   { "^a*+(.)", "ab", REG_EXTENDED, 2, { { 0, 2 }, { 1, 2 } } },
   /* Test for ** match.  */
-  { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 0, 1 }, { 1, 2 } } },
+  { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 1, 1 }, { 1, 2 } } },
 };
 
 static void
@@ -431,7 +431,7 @@ main (void)
   else if (! (regs.start[0] == 0 && regs.end[0] == 1))
 report_error ("re_search '%s' on '%s' returned wrong match [%d,%d)",
   pat_sub2, data, (int) regs.start[0], (int) regs.end[0]);
-  else if (! (regs.start[1] == 0 && regs.end[1] == 0))
+  else if (! (regs.start[1] == 1 && regs.end[1] == 1))
 report_error ("re_search '%s' on '%s' returned wrong submatch [%d,%d)",
   pat_sub2, data, regs.start[1], regs.end[1]);
   regfree (®ex);
-- 
2.29.3




Re: [PATCH] regex: fix backreference matching

2021-06-16 Thread Dmitry V. Levin
On Wed, Jun 16, 2021 at 12:46:15PM +0300, Egor Ignatov wrote:
> This fixes a bug described in 70b673eb7.
[...]
> -  { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 0, 1 }, { 1, 2 } } },
> +  { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 1, 1 }, { 1, 2 } } },

Sorry, but how this could be correct?
Since the expression consists of two consequent parts, the whole match
should also consist of two consequent substring matches, shouldn't it?


-- 
ldv



Re: [PATCH] regex: fix match with possessive quantifier

2021-06-16 Thread Dmitry V. Levin
On Mon, Jun 07, 2021 at 04:10:27AM +0300, Dmitry V. Levin wrote:
> On Mon, Jun 07, 2021 at 12:45:02AM +0300, Dmitry V. Levin wrote:
> > On Wed, May 26, 2021 at 12:08:19PM +0300, Egor Ignatov wrote:
> > > Fix behaviour introduced in 70b673e, where regexps with
> > > possessive quantifier("*+") didn't match.
> > > * lib/regexec.c
> > > (set_regs): Pop if CUR_NODE has already been checked only when
> > > we have a fail stack.
> > > 
> > > Signed-off-by: Egor Ignatov 
> > > ---
> > > Hi Paul,
> > > 
> > > Do you have any test cases for bug 11053(glibc) for gnulib?
> > > This patch fixes the issue with "*+", but I'm not sure it
> > > doesn't break your fix for 11053.
> > 
> > Thanks, the fix looks plausible, it doesn't break any tests
> > (including those introduced along with commit 70b673eb7),
> 
> Apparently, there are more issues with commit 70b673eb7, for example:
> 
> $ echo ab | sed -E 's/^(a*)*(.)\1/\1/'
> Segmentation fault
> 
> $ echo ab | strace -enone -- sed --debug -E 's/^(a*)*(.)\1/\1/'
> SED PROGRAM:
>   s/^(a*)*(.)\\1/\1/
> INPUT:   'STDIN' line 1
> PATTERN: ab
> COMMAND: s/^(a*)*(.)\\1/\1/
> MATCHED REGEX REGISTERS
>   regex[0] = 0-2 'ab'
>   regex[1] = 0--1 'ab!!ab
> '
> --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x20} ---
> +++ killed by SIGSEGV +++
> Segmentation fault

And here is a tests/test-regex.c entry for this bug:

diff --git a/tests/test-regex.c b/tests/test-regex.c
index 7ea73cfb6..fdb1a1f1d 100644
--- a/tests/test-regex.c
+++ b/tests/test-regex.c
@@ -120,6 +120,8 @@ static struct
   { "^a*+(.)", "ab", REG_EXTENDED, 2, { { 0, 2 }, { 1, 2 } } },
   /* Test for ** match.  */
   { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 0, 1 }, { 1, 2 } } },
+  /* Test for ** match with backreferences.  */
+  { "^(a*)*\\1", "a", REG_EXTENDED, 2, { { 0, 0 }, { 0, 0 } } },
 };
 
 static void


-- 
ldv



Re: Seeking input from developers: glibc copyright assignment policy.

2021-06-16 Thread Eli Zaretskii
> From: Paul Smith 
> Date: Tue, 15 Jun 2021 12:32:35 -0400
> 
> On Tue, 2021-06-15 at 07:03 -0500, Eric Blake wrote:
> > I recall how long it took for me to get permission to sign assignment
> > papers from my previous employer, for work I was doing in my spare
> > time, and being able to use the DCO instead would have made my
> > efforts easier at that time.
> 
> This is what concerns me (not necessarily in Eric's case per se but in
> general).  I worry that people think that a DCO is a hassle-free
> replacement for an employer's copyright assignment.  Maybe, in some
> jurisdictions, it even can be.

In addition to what Paul Smith and Bruno Haible wrote, there is IMO
one other important aspect to be considered, which is specific to
Gnulib.  Unlike GCC, glibc, and many other projects, which are
basically separate, and therefore their decisions in this matter
affect only them and their users, Gnulib is different.  Gnulib is not
a separate project, it is in effect a collection of library functions
from which dozens of other GNU projects borrow code for their
distributions.  Thus, any changes in this matter that Gnulib
developers decide upon will affect all those "client" projects as
well.

For example, Emacs imports more than 200 source files from Gnulib, and
distributes them as part of its release tarballs and in its Git
repository.  If Gnulib folks decide that they can accept contributions
under DCO, does it mean Emacs will be unable to change its license to
GPL of version greater than 3?  Does it mean Emacs will bear part of
the risk of distributing sources whose DCO is invalid (for reasons
described by Paul Smith)?  (And it doesn't help that some/much of the
Gnulib code is taken from glibc, which will probably decide to follow
GCC's example.)

Given these aspects, I submit that Gnulib developers shouldn't make
these kinds of decisions without consulting with other GNU projects.



Re: Seeking input from developers: glibc copyright assignment policy.

2021-06-16 Thread Dmitry V. Levin
On Mon, Jun 14, 2021 at 01:39:26PM -0700, Paul Eggert wrote:
> A proposal to change the glibc copyright assignment policy is being 
> circulated on libc-alpha. The email thread starts at 
> , and 
> the text of the email seeking input is at the end of this message.
> 
> I'm sending this to bug-gnulib because we copy some files directly from 
> glibc and eventually I expect these files to be affected. The simplest 
> approach I see for Gnulib is to adopt glibc's policy, at least for files 
> or code copied from glibc.

Here is the list of affected gnulib modules:

$ git grep -A1 ^Maintainer: modules/ | sed -n '/glibc/ s/-[^-]*$//p'
modules/alphasort
modules/argp
modules/atoll
modules/crypto/md5
modules/crypto/md5-buffer
modules/dynarray
modules/eloop-threshold
modules/error
modules/euidaccess
modules/filename
modules/fnmatch
modules/fnmatch-h
modules/getcwd
modules/getopt-gnu
modules/getopt-posix
modules/getpass
modules/getpass-gnu
modules/getsubopt
modules/glob
modules/glob-h
modules/idx
modules/inet_ntop
modules/inet_pton
modules/memchr
modules/memcmp
modules/memrchr
modules/mktime
modules/nstrftime
modules/obstack
modules/posix_spawn
modules/posix_spawn-internal
modules/posix_spawn_file_actions_addclose
modules/posix_spawn_file_actions_adddup2
modules/posix_spawn_file_actions_addopen
modules/posix_spawn_file_actions_destroy
modules/posix_spawn_file_actions_init
modules/posix_spawnattr_destroy
modules/posix_spawnattr_getflags
modules/posix_spawnattr_getpgroup
modules/posix_spawnattr_getschedparam
modules/posix_spawnattr_getschedpolicy
modules/posix_spawnattr_getsigdefault
modules/posix_spawnattr_getsigmask
modules/posix_spawnattr_init
modules/posix_spawnattr_setflags
modules/posix_spawnattr_setpgroup
modules/posix_spawnattr_setschedparam
modules/posix_spawnattr_setschedpolicy
modules/posix_spawnattr_setsigdefault
modules/posix_spawnattr_setsigmask
modules/posix_spawnp
modules/pt_chown
modules/putenv
modules/random
modules/random_r
modules/rawmemchr
modules/scandir
modules/scratch_buffer
modules/spawn
modules/stpcpy
modules/stpncpy
modules/strchrnul
modules/strcspn
modules/strdup
modules/strdup-posix
modules/strndup
modules/strpbrk
modules/strptime
modules/strsignal
modules/strtok_r
modules/strtol
modules/strtoll
modules/strtoul
modules/strtoull
modules/strverscmp
modules/timegm
modules/tsearch


-- 
ldv



Re: Seeking input from developers: glibc copyright assignment policy.

2021-06-16 Thread Paul Smith
On Tue, 2021-06-15 at 22:08 +0100, Pádraig Brady wrote:
> Yes the fact that one needs to repeat this process as one changes
> employers is very awkward.

This is exactly what I was worried about with my previous message:
people saying "it's awkward to get my employer to assign copyright so
I'd rather use a DCO" when they are not legally allowed to do that.

A DCO is not a magic bullet.  If your employer has copyright to the
changes you made then the DCO is useless to you because the ownership
is not yours to certify; you'll STILL have to get your employer to
agree to it.  If you have the copyright yourself then you don't need
your employer to sign anything in the first place!

Allowing contributors to follow a simpler process doesn't make the
situation less complicated, it just makes it easier to ignore.  But
ignoring it doesn't make it go away.

The assignment is awkward, and it is extra effort, but that effort is
not useless or wasted.  IMO it's important for the project that people
pay attention to this and handle it BEFORE their code is accepted.




tsearch: Relicense under LGPLv2+

2021-06-16 Thread Bruno Haible
I need the module 'tsearch' under LGPLv2+, for use in GNU libintl.
Fortunately it is easy to do, because
  - It was under LGPLv2+ when I took the source code from glibc in 2006 (see
).
  - The only significant changes to lib/tsearch.c since then are from me.


2021-06-16  Bruno Haible  

tsearch: Relicense under LGPLv2+.
* modules/tsearch (License): Change to LGPLv2+.
* lib/tsearch.c: Update license notice.

diff --git a/lib/tsearch.c b/lib/tsearch.c
index 08f7061..0d4d838 100644
--- a/lib/tsearch.c
+++ b/lib/tsearch.c
@@ -7,7 +7,7 @@
 
This file is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 3 of the
+   published by the Free Software Foundation; either version 2.1 of the
License, or (at your option) any later version.
 
This file is distributed in the hope that it will be useful,
diff --git a/modules/tsearch b/modules/tsearch
index 4a2b5ed..6ffdf2f 100644
--- a/modules/tsearch
+++ b/modules/tsearch
@@ -22,7 +22,7 @@ Include:
 
 
 License:
-LGPL
+LGPLv2+
 
 Maintainer:
 all, glibc