bug#59055: [PATCH] Fix possible deadlock.

2022-11-20 Thread Ludovic Courtès
Hi,

Olivier Dion  skribis:

> If we got interrupted while waiting on our condition variable, we unlock
> the kernel mutex momentarily while executing asynchronous operations
> before putting us back into the waiting queue.
>
> However, we have to retry acquiring the mutex before getting back into
> the queue, otherwise it's possible that we wait indefinitely since
> nobody could be the owner for a while.
>
> * libguile/threads.c (lock_mutex): Try acquring the mutex after signal
> interruption.

Looks reasonable to me; applied.

Did you try to come up with a reproducer?  That would be awesome but I
guess it’s hard because you need to trigger EINTR at the right point.

Thanks,
Ludo’.





bug#59321: ice-9's open-input-pipe is unexpectedly slow on some systems

2022-11-20 Thread Ludovic Courtès
Hi,

Andrew Whatson  skribis:

> Forcibly closing file descriptors like this shouldn't be necessary if
> the application has properly opened descriptors with the FD_CLOEXEC
> flag.  It would be good to get input from some more experienced Guile
> hackers on the potential consequences of this change.

Libguile opens all its own file descriptors at O_CLOEXEC (one omission
was recently fixed in 0aa1a9976fc3c6af4d1087e59d728cb8fe7d369a) so it
may be possible to remove that FD-closing loop.  There’s still the
possibility that application bug unwillingly leaks FDs, but we could
consider it’s none of our business.

Thoughts?

Similarly, with commit a356ceebee000efe91a2a16dbcaa64d6c6a3a922, it’s
possible to pass ‘open-file’ a flag that corresponds to O_CLOEXEC, which
wasn’t possible before.  I’ve also been thinking that files opened with
‘call-with-*’ should be O_CLOEXEC.  That’d be an incompatible change
though, so maybe not something for 3.0.x.

Ludo’.





bug#59021: Unbounded heap growth when combining dynamic states & delimited continuation

2022-11-20 Thread Ludovic Courtès
Hi,

Ludovic Courtès  skribis:

> Ludovic Courtès  skribis:
>
>> Consider this code:
>>
>> ;; https://issues.guix.gnu.org/58631
>> ;; https://github.com/wingo/fibers/issues/65
>>
>> (define loss
>>   (make-vector 100))
>>
>> (let ((tag (make-prompt-tag "my prompt")))
>>   (define handler
>> (lambda (k i)
>>   (when (zero? (modulo i 200))
>> (pk 'heap-size (assoc-ref (gc-stats) 'heap-size)))
>>
>>   (call-with-prompt tag
>> (lambda ()
>>   (k (modulo (+ 1 i) 1000)))
>> handler)))
>>
>>   (call-with-prompt tag
>> (let ((state (current-dynamic-state)))
>>   (lambda ()
>> ;; (define (with-dynamic-state state thunk)
>> ;;   (let ((previous #f))
>> ;; (dynamic-wind
>> ;;   (lambda () (set! previous (set-current-dynamic-state 
>> state)))
>> ;;   thunk
>> ;;   (lambda () (set-current-dynamic-state previous)
>> (with-dynamic-state state
>> (lambda ()
>>   (let loop ((i 0))
>> (loop (abort-to-prompt tag i)))
>> handler))
>>
>> On Guile 3.0.8, this program exhibits seemingly unbounded heap growth.
>
> This is fixed by the patch below (tested against the test case above and
> the Fibers and Shepherd test cases mentioned before):

Pushed as e47a153317c046ea5d335940412999e7dc604c33.

> Using a simple heap profiler (more on that later), I noticed that the
> stacks allocated at ‘p->stack_bottom’ would be partly retained,
> explaining the heap growth.
>
> I couldn’t pinpoint what exactly is keeping a pointer to the stack, but
> what I can tell is that the trick above makes that impossible (because
> we disable interior pointer tracing), hence the difference.
>
> Also, why changing the SCM_DYNSTACK_TYPE_DYNAMIC_STATE entry to an
> SCM_DYNSTACK_TYPE_UNWINDER entry would make a difference remains a
> mystery to me.
>
> I’m interested in theories that would explain all this in more detail!
> I’ll go ahead with the fix above if there are no objections.

I still am.  :-)

Ludo’.





bug#59055: [PATCH] Fix possible deadlock.

2022-11-20 Thread Bug reports for GUILE, GNU's Ubiquitous Extension Language
On Sun, 20 Nov 2022, Ludovic Courtès  wrote:

> Did you try to come up with a reproducer?  That would be awesome but I
> guess it’s hard because you need to trigger EINTR at the right point.

With a stress test in guile-parallel.  Very hard to reproduce indeed.
You can also reproduce it with `ice-9 futures` I think.

Here's the stress test that I've been using:
--8<---cut here---start->8---
(use-modules
  ((ice-9 futures) #:prefix ice-9:)
  (srfi srfi-1)
  (srfi srfi-26))

(define (run-stress-test N future touch)
  (for-each
   touch
   (unfold
(cut = N <>)
(lambda (_)
  (future
   (const #t)))
1+
0)))

(run-stress-test 1000 ice-9:make-future ice-9:touch)
--8<---cut here---end--->8---

-- 
Olivier Dion
oldiob.dev





bug#59321: ice-9's open-input-pipe is unexpectedly slow on some systems

2022-11-20 Thread tomas
On Sun, Nov 20, 2022 at 06:24:57PM +0100, Ludovic Courtès wrote:
> Hi,
> 
> Andrew Whatson  skribis:
> 
> > Forcibly closing file descriptors like this shouldn't be necessary if
> > the application has properly opened descriptors with the FD_CLOEXEC
> > flag.  It would be good to get input from some more experienced Guile
> > hackers on the potential consequences of this change.
> 
> Libguile opens all its own file descriptors at O_CLOEXEC (one omission
> was recently fixed in 0aa1a9976fc3c6af4d1087e59d728cb8fe7d369a) so it
> may be possible to remove that FD-closing loop.  There’s still the
> possibility that application bug unwillingly leaks FDs, but we could
> consider it’s none of our business.
> 
> Thoughts?

Hm. Socket FDs don't "have" O_CLOEXEC. Arguably, they are at least as
"interesting" as file FDs (meaning: source of obscure bugs).

Moreover, misbehaving C libraries can be an additional source of bugs
we have no control of.

The reference I posted upthread makes a compelling case for at least
needing an option for this (admittedly ugly) close orgy (perhaps with
some additional platform-dependent mitigations, but that's an ugliness
in its own, sigh).

Cheers
-- 
t


signature.asc
Description: PGP signature


bug#59321: ice-9's open-input-pipe is unexpectedly slow on some systems

2022-11-20 Thread Andrew Whatson
Ludovic Courtès  wrote:
>
> Andrew Whatson  skribis:
>
> > Forcibly closing file descriptors like this shouldn't be necessary if
> > the application has properly opened descriptors with the FD_CLOEXEC
> > flag.  It would be good to get input from some more experienced Guile
> > hackers on the potential consequences of this change.
>
> Libguile opens all its own file descriptors at O_CLOEXEC (one omission
> was recently fixed in 0aa1a9976fc3c6af4d1087e59d728cb8fe7d369a) so it
> may be possible to remove that FD-closing loop.  There’s still the
> possibility that application bug unwillingly leaks FDs, but we could
> consider it’s none of our business.
>
> Thoughts?

I agree with this approach in principle, but from what Tomas is saying
it seems like it's not currently possible for applications to do the
right thing in all cases.

> Similarly, with commit a356ceebee000efe91a2a16dbcaa64d6c6a3a922, it’s
> possible to pass ‘open-file’ a flag that corresponds to O_CLOEXEC,
> which wasn’t possible before.

Nice!

> I’ve also been thinking that files opened with ‘call-with-*’ should be
> O_CLOEXEC.  That’d be an incompatible change though, so maybe not
> something for 3.0.x.

This sounds reasonable to me.

We also need equivalent functionality around SOCK_CLOEXEC.  It seems
this is implemented for ‘accept’, but not ‘socket’ or ‘socketpair’.

Python's PEP 433 contains a good explanation of the issues which can
arise from leaked file descriptors:
https://peps.python.org/pep-0433/#inherited-file-descriptors-issues

Given the risks, I'm convinced that Guile's conservative approach is
actually quite sensible.  It seems like the best path forward would be
to implement platform-specific optimizations where possible.

I've attached a draft patch which implements a fast-path on systems
where "/proc/self/fd" is available.
commit 08943cae90545dddea44ca55eab68047e5ae2f9d
Author: Andrew Whatson 
Date:   Mon Nov 21 13:40:33 2022 +1000

Reduce redundant close() calls when forking on some systems.

Some systems provide "/proc/self/fd" which is a directory containing an
entry for each open file descriptor in the current process.  We use this
to limit the number of close() calls needed to ensure file descriptors
aren't leaked to the child process when forking.

* libguile/posix.c (close_inherited_fds_slow):
(close_inherited_fds): New static helper functions.
(start_child): Attempt to close inherited file descriptors efficiently
using 'close_inherited_fds', falling back to the brute-force approach in
'close_inherited_fds_slow'.

diff --git a/libguile/posix.c b/libguile/posix.c
index b5352c2c4..fc3512054 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -24,6 +24,7 @@
 #  include 
 #endif
 
+#include 
 #include 
 #include 
 #include 
@@ -1337,6 +1338,46 @@ renumber_file_descriptor (int fd, int err)
 }
 #endif /* HAVE_FORK */
 
+static void
+close_inherited_fds_slow(int max_fd, int in, int out, int err)
+{
+  while (max_fd--)
+if (max_fd != in && max_fd != out && max_fd != err)
+  close (max_fd);
+}
+
+static void
+close_inherited_fds(int max_fd, int in, int out, int err)
+{
+  DIR *dirp;
+  struct dirent *d;
+  int fd;
+
+  /* Try to use the platform-specific list of open file descriptors, so
+ we don't need to use the brute force approach. */
+  dirp = opendir ("/proc/self/fd");
+
+  if (dirp == NULL)
+return close_inherited_fds_slow (max_fd, in, out, err);
+
+  while ((d = readdir (dirp)) != NULL)
+{
+  fd = atoi (d->d_name);
+
+  /* Skip "." and "..", and any garbage entries. */
+  if (fd <= 0)
+continue;
+
+  /* Keep in/out/err open. */
+  if (fd == in || fd == out || fd == err)
+continue;
+
+  close (fd);
+}
+
+  closedir (dirp);
+}
+
 #ifdef HAVE_FORK
 #define HAVE_START_CHILD 1
 /* Since Guile uses threads, we have to be very careful to avoid calling
@@ -1373,9 +1414,7 @@ start_child (const char *exec_file, char **exec_argv,
 
   /* Close all file descriptors in ports inherited from the parent
  except for in, out, and err.  Heavy-handed, but robust.  */
-  while (max_fd--)
-if (max_fd != in && max_fd != out && max_fd != err)
-  close (max_fd);
+  close_inherited_fds (max_fd, in, out, err);
 
   /* Ignore errors on these open() calls.  */
   if (in == -1)