Re: [PATCH 07/10] regex: fix longstanding backref match bug

2021-02-09 Thread Adhemerval Zanella
Hi Paul,

Trying to sync gnulib with glibc code, this patch trigger some regression
on glibc testcases:

FAIL: posix/tst-boost
FAIL: posix/tst-pcre
FAIL: posix/tst-rxspencer
FAIL: posix/tst-rxspencer-no-utf8

$ grep -n "FAIL rm" posix/tst-boost.out
445:FAIL rm[1] 3..-1 != expected 2..3
448:FAIL rm[1] 3..-1 != expected 2..3
451:FAIL rm[1] 3..-1 != expected 2..3
454:FAIL rm[1] 3..-1 != expected 2..3

$ cat posix/tst-pcre.out 
1168: /([a]*)*/ on a unexpectedly failed to match register 1 1..-1
1171: /([a]*)*/ on a  unexpectedly failed to match register 1 5..-1
1176: /([ab]*)*/ on a unexpectedly failed to match register 1 1..-1
1179: /([ab]*)*/ on b unexpectedly failed to match register 1 1..-1
1182: /([ab]*)*/ on ababab unexpectedly failed to match register 1 6..-1
1185: /([ab]*)*/ on bcde unexpectedly failed to match register 1 5..-1
1188: /([ab]*)*/ on  unexpectedly failed to match register 1 4..-1
1193: /([^a]*)*/ on b unexpectedly failed to match register 1 1..-1
1196: /([^a]*)*/ on  unexpectedly failed to match register 1 4..-1
1203: /([^ab]*)*/ on  unexpectedly failed to match register 1 4..-1

$ cat posix/tst-rxspencer.out  | grep FAIL
FAIL rm[1] unexpectedly did not match
FAIL rm[1] unexpectedly did not match

$ cat posix/tst-rxspencer-no-utf8.out | grep FAIL
FAIL rm[1] unexpectedly did not match
FAIL rm[1] unexpectedly did not match

On 05/02/2021 22:25, Paul Eggert wrote:
> This fixes a longstanding glibc bug concerning backreferences
>  (2009-12-04).
> * lib/regexec.c (proceed_next_node, push_fail_stack)
> (pop_fail_stack): Push and pop the previous registers
> as well as the current ones.  All callers changed.
> (set_regs): Also pop if CUR_NODE has already been checked,
> so that it does not get added as a duplicate set entry.
> (update_regs): Fix comment location.
> * tests/test-regex.c (tests): New constant.
> (bug_regex11): New test function.
> (main): Bump alarm value.  Call new test function.
> ---
>  ChangeLog  |  13 ++
>  lib/regexec.c  |  26 +++
>  tests/test-regex.c | 113 -
>  3 files changed, 141 insertions(+), 11 deletions(-)
> 
> diff --git a/ChangeLog b/ChangeLog
> index 74304474b..bd7d1fa16 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,5 +1,18 @@
>  2021-02-05  Paul Eggert  
>  
> + regex: fix longstanding backref match bug
> + This fixes a longstanding glibc bug concerning backreferences
> +  (2009-12-04).
> + * lib/regexec.c (proceed_next_node, push_fail_stack)
> + (pop_fail_stack): Push and pop the previous registers
> + as well as the current ones.  All callers changed.
> + (set_regs): Also pop if CUR_NODE has already been checked,
> + so that it does not get added as a duplicate set entry.
> + (update_regs): Fix comment location.
> + * tests/test-regex.c (tests): New constant.
> + (bug_regex11): New test function.
> + (main): Bump alarm value.  Call new test function.
> +
>   regex: avoid duplicate in espilon closure
>   * lib/regcomp.c (calc_eclosure_iter): Insert NODE into epsilon
>   closure first rather than last.  Otherwise, the epsilon closure
> diff --git a/lib/regexec.c b/lib/regexec.c
> index fdd2e373e..424bc8d15 100644
> --- a/lib/regexec.c
> +++ b/lib/regexec.c
> @@ -59,7 +59,7 @@ static void update_regs (const re_dfa_t *dfa, regmatch_t 
> *pmatch,
>Idx cur_idx, Idx nmatch);
>  static reg_errcode_t push_fail_stack (struct re_fail_stack_t *fs,
> Idx str_idx, Idx dest_node, Idx nregs,
> -   regmatch_t *regs,
> +   regmatch_t *regs, regmatch_t *prevregs,
> re_node_set *eps_via_nodes);
>  static reg_errcode_t set_regs (const regex_t *preg,
>  const re_match_context_t *mctx,
> @@ -1211,6 +1211,7 @@ check_halt_state_context (const re_match_context_t 
> *mctx,
>  
>  static Idx
>  proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t 
> *regs,
> +regmatch_t *prevregs,
>  Idx *pidx, Idx node, re_node_set *eps_via_nodes,
>  struct re_fail_stack_t *fs)
>  {
> @@ -1243,7 +1244,7 @@ proceed_next_node (const re_match_context_t *mctx, Idx 
> nregs, regmatch_t *regs,
> /* Otherwise, push the second epsilon-transition on the fail 
> stack.  */
> else if (fs != NULL
>  && push_fail_stack (fs, *pidx, candidate, nregs, regs,
> -eps_via_nodes))
> +prevregs, eps_via_nodes))
>   return -2;
>  
> /* We know we are going to exit.  */
> @@ -1316,7 +1317,8 @@ proceed_next_node (const re_match_context_t *mctx, Idx 
> nregs, regmatch_t *regs,
>  static reg_errcode_t
>  __attribute_warn_un

Re: [PATCH 07/10] regex: fix longstanding backref match bug

2021-02-09 Thread Paul Eggert

On 2/9/21 11:42 AM, Adhemerval Zanella wrote:

Trying to sync gnulib with glibc code, this patch trigger some regression
on glibc testcases:


Thanks for letting me know. That's odd, as I didn't find any regressions 
when I synced. I do know of remaining bugs and will add these to my list.




[PATCH] extern-inline: Pull __header_inline definition

2021-02-09 Thread Roman Bolshakov
* m4/extern-inline.m4 (gl_EXTERN_INLINE): __header_inline is defined in
   but the header is not included and as result
  _GL_EXTERN_INLINE doesn't work as intended on macOS.

Signed-off-by: Roman Bolshakov 
---
Hi,

The issue prevents -O1/-O0 compilation of libtasn1 on macOS:
https://gitlab.com/gnutls/libtasn1/-/issues/28#note_505357689

Thanks,
Roman

 m4/extern-inline.m4 | 8 
 1 file changed, 8 insertions(+)

diff --git a/m4/extern-inline.m4 b/m4/extern-inline.m4
index a2acf126c..b2ac5e0fe 100644
--- a/m4/extern-inline.m4
+++ b/m4/extern-inline.m4
@@ -7,6 +7,8 @@ dnl with or without modifications, as long as this notice is 
preserved.
 
 AC_DEFUN([gl_EXTERN_INLINE],
 [
+  AC_CHECK_HEADERS_ONCE([sys/cdefs.h])
+
   AH_VERBATIM([extern_inline],
 [/* Please see the Gnulib manual for how to use these macros.
 
@@ -51,6 +53,12 @@ AC_DEFUN([gl_EXTERN_INLINE],
  warning: to disable this warning use -fgnu89-inline or the gnu_inline 
function attribute
It defines a macro __GNUC_GNU_INLINE__ to indicate this situation.
  */
+
+/* Pull __header_inline definition */
+#ifdef HAVE_SYS_CDEFS_H
+# include 
+#endif
+
 #if (((defined __APPLE__ && defined __MACH__) \
   || defined __DragonFly__ || defined __FreeBSD__) \
  && (defined __header_inline \
-- 
2.30.0




Re: data structures for use in signal handlers

2021-02-09 Thread Bruno Haible
Ben Pfaff wrote:
> On Sun, Feb 7, 2021 at 2:57 PM Bruno Haible  wrote:
> > (2) Let the signal handler work only on immutable copies of the data
> > structure. Whenever the other code manipulates the data structure,
> > it creates an immutable copy of it, for the signal handler to use.
> > Use an 'asyncsafe-spin' lock through which the signal handler tells
> > the other threads to not free that immutable copy while it running.
> >
> > This is tricky; can it actually be made to work?
> >
> > Btw, there is an obvious requirement: the technique should use 
> > malloc/
> > free for memory management, and should not have memory leaks.
> > Algorithms that assume a garbage collected memory management are not
> > suitable here.
> 
> This makes me think of read-copy-update aka RCU:
> https://en.wikipedia.org/wiki/Read-copy-update
> https://lwn.net/Articles/262464/
> 
> In RCU, code that updates the data structure takes a lock, creates and
> modifies a copy, and then installs a new pointer to the data structure,
> which is otherwise immutable. Readers always access the data structure
> through a pointer. Whichever pointer they happen to see when they
> read the pointer remains available until they're done with it.
> 
> Using RCU is pretty straightforward once you've done a little of it, but
> it takes some reading to understand all of its concepts. It's probably best
> for me not to try to explain it all, because I'll surely get some parts of it
> wrong.
> 
> Building an RCU implementation isn't necessarily difficult (I have done it,
> but the implementation isn't suitable for gnulib).
> 
> There is a liburcu that is under LGPL v2.1: https://liburcu.org/

Thanks for the pointers, Ben. Yes, RCU is the technique I had in mind.
So: "can it actually be made to work?" -> Yes.

But the implementation of liburcu uses extra signals and a helper thread of
its own (or even a helper thread per CPU). 

I have to rephrase the question: can it be made to work in a straightforward
(not necessarily scalability-optimized) way?

Bruno




Re: data structures for use in signal handlers

2021-02-09 Thread Bruno Haible
Hi Eric,

Eric Wong wrote:
> I would've recommended just using a pipe, socket or eventfd;

How would setting up a pipe or eventfd help with a communication
problem between threads in the same process? I thought these were
for communication between processes.

Bruno




Re: data structures for use in signal handlers

2021-02-09 Thread Eric Wong
Bruno Haible  wrote:
> Hi Eric,
> 
> Eric Wong wrote:
> > I would've recommended just using a pipe, socket or eventfd;
> 
> How would setting up a pipe or eventfd help with a communication
> problem between threads in the same process? I thought these were
> for communication between processes.

The "self-pipe trick" is a common idiom in Unix for signal
handling; especially when pselect/ppoll aren't available.
I seem to recall learning of it reading something by djb.

It's perfectly valid to use pipes in single-threaded or
multi-threaded, or multi-process situations.  POSIX gives
well-defined semantics w.r.t. atomicity.

I know the (C)Ruby VM uses it internally (or eventfd on Linux);
I think CPython's GVL does, too.

My gnulib-using cmogstored uses a self-pipe when epoll_pwait
isn't available (Though it could probably use EVFILT_SIGNAL on
*BSD).



Re: data structures for use in signal handlers

2021-02-09 Thread Ben Pfaff
On Tue, Feb 9, 2021 at 5:01 PM Bruno Haible  wrote:
>
> Ben Pfaff wrote:
> > Building an RCU implementation isn't necessarily difficult (I have done it,
> > but the implementation isn't suitable for gnulib).
> >
> > There is a liburcu that is under LGPL v2.1: https://liburcu.org/
>
> Thanks for the pointers, Ben. Yes, RCU is the technique I had in mind.
> So: "can it actually be made to work?" -> Yes.
>
> But the implementation of liburcu uses extra signals and a helper thread of
> its own (or even a helper thread per CPU).
>
> I have to rephrase the question: can it be made to work in a straightforward
> (not necessarily scalability-optimized) way?

Signals are not needed. I haven't studied liburcu (it wasn't suitable for my
userspace use case because it only worked with GCC), so I am not sure
why it needs signals.

The userspace RCU implementation that I built in OVS (which is free but
Apache licensed) does use a thread, but not signals. The nice thing about
using a thread is that you always have something freeing memory that is
known to no longer be in use.

But I can imagine implementations that don't even use an extra thread.
For example, each thread could keep its own local list of callbacks to be
called after the next grace period. Then, whenever a thread quiesces, it
goes through its own list and executes all of the callbacks that now can
be. It could take a long time for some callbacks to be called, though,
especially if a thread goes to sleep and thus won't call any of its own
until it wakes up again.



Re: data structures for use in signal handlers

2021-02-09 Thread Bruno Haible
Eric Wong wrote:
> > How would setting up a pipe or eventfd help with a communication
> > problem between threads in the same process? I thought these were
> > for communication between processes.
> 
> The "self-pipe trick" is a common idiom in Unix for signal
> handling; especially when pselect/ppoll aren't available.
> I seem to recall learning of it reading something by djb.
> 
> It's perfectly valid to use pipes in single-threaded or
> multi-threaded, or multi-process situations.  POSIX gives
> well-defined semantics w.r.t. atomicity.

Atomicity alone is not an argument; the compiler and OS have
sufficient primitives (CAS instructions, barriers, spin-locks, locks)
to make things atomic between threads.

What I understand from [1] is that if a program uses signals AND
select()/poll(), then it is useful to map signals into events that
can be received by select()/poll(). The self-pipe trick is a way
to do that.

Maybe I didn't describe accurately the situation I am asking for.
I'm not attempting to transform a program to an event-based engine.
I'm looking to let signal handlers (SIGTERM, SIGSEGV, SIGWINCH, etc.)
read and traverse data structures that the main program has prepared
(and that contain information about files to delete, memory areas,
window lines to redraw etc.).

Bruno

[1] http://osiris.978.org/~alex/safesignalhandling.html




Re: data structures for use in signal handlers

2021-02-09 Thread Paul Eggert

On 2/9/21 5:21 PM, Eric Wong wrote:

I know the (C)Ruby VM uses it internally (or eventfd on Linux);
I think CPython's GVL does, too.


GNU Make does too. It's where I learned of the trick.



Re: data structures for use in signal handlers

2021-02-09 Thread Ben Pfaff
On Tue, Feb 9, 2021 at 6:07 PM Bruno Haible  wrote:
> Maybe I didn't describe accurately the situation I am asking for.
> I'm not attempting to transform a program to an event-based engine.
> I'm looking to let signal handlers (SIGTERM, SIGSEGV, SIGWINCH, etc.)
> read and traverse data structures that the main program has prepared
> (and that contain information about files to delete, memory areas,
> window lines to redraw etc.).

This is a situation that I've always worked hard to avoid. So little can
be safely done in a signal handler that it's worthwhile to try not to do
more than the minimum possible. Have you already concluded that
you must do significant work in a signal handler? I would not want to
try to, say, redraw lines in a window in a signal handler, because the
functions that I would need to call to do that would not be prepared
to run inside such a handler: I don't think ncurses or GTK+, for
example, would be happy at all about it.



Re: data structures for use in signal handlers

2021-02-09 Thread Eric Wong
Bruno Haible  wrote:
> Eric Wong wrote:
> > > How would setting up a pipe or eventfd help with a communication
> > > problem between threads in the same process? I thought these were
> > > for communication between processes.
> > 
> > The "self-pipe trick" is a common idiom in Unix for signal
> > handling; especially when pselect/ppoll aren't available.
> > I seem to recall learning of it reading something by djb.

> What I understand from [1] is that if a program uses signals AND
> select()/poll(), then it is useful to map signals into events that
> can be received by select()/poll(). The self-pipe trick is a way
> to do that.

Right.

> Maybe I didn't describe accurately the situation I am asking for.
> I'm not attempting to transform a program to an event-based engine.
> I'm looking to let signal handlers (SIGTERM, SIGSEGV, SIGWINCH, etc.)
> read and traverse data structures that the main program has prepared
> (and that contain information about files to delete, memory areas,
> window lines to redraw etc.).

No worries, I figured pipes/sockets weren't an options given
your mention of libsigsegv in the original post.

One possible solution is to spawn a dedicated thread which does
nothing but run an event loop to handle signals sequentially
without reentrancy.  Then, signal handlers are guaranteed to run
alone and existing thread-safe structures (e.g. rculfhash) and
locks would be able to work properly.

It all depends on whether you can spawn a dedicated thread for
that.  FWIW, liburcu uses background threads internally, too.


> [1] http://osiris.978.org/~alex/safesignalhandling.html