On 12/21/22 11:23, Martin Pieuchot wrote:
On 21/12/22(Wed) 09:20, David Hill wrote:


On 12/21/22 07:08, David Hill wrote:


On 12/21/22 05:33, Martin Pieuchot wrote:
On 18/12/22(Sun) 20:55, Martin Pieuchot wrote:
On 17/12/22(Sat) 14:15, David Hill wrote:


On 10/28/22 03:46, Renato Aguiar wrote:
Use of bbolt Go library causes 7.2 to freeze. I suspect
it is triggering some
sort of deadlock in mmap because threads get stuck at vmmaplk.

I managed to reproduce it consistently in a laptop with
4 cores (i5-1135G7)
using one unit test from bbolt:

     $ doas pkg_add git go
     $ git clone https://github.com/etcd-io/bbolt.git
     $ cd bbolt
     $ git checkout v1.3.6
     $ go test -v -run TestSimulate_10000op_10p

The test never ends and this is the 'top' report:

     PID      TID PRI NICE  SIZE   RES STATE
WAIT      TIME    CPU COMMAND
32181   438138 -18    0   57M   13M idle      uvn_fls
0:00  0.00% bbolt.test
32181   331169  10    0   57M   13M sleep/1   nanoslp
0:00  0.00% bbolt.test
32181   497390  10    0   57M   13M idle      vmmaplk
0:00  0.00% bbolt.test
32181   380477  14    0   57M   13M idle      vmmaplk
0:00  0.00% bbolt.test
32181   336950  14    0   57M   13M idle      vmmaplk
0:00  0.00% bbolt.test
32181   491043  14    0   57M   13M idle      vmmaplk
0:00  0.00% bbolt.test
32181   347071   2    0   57M   13M idle      kqread
0:00  0.00% bbolt.test

After this, most commands just hang. For example,
running a 'ps | grep foo' in
another shell would do it.


I can reproduce this on MP, but not SP.  Here is /trace from
ddb after using
the ddb.trigger sysctl.  Is there any other information I
could pull from
DDB that may help?

Thanks for the useful report David!

The issue seems to be a deadlock between the `vmmaplk' and a particular
`vmobjlock'.  uvm_map_clean() calls uvn_flush() which sleeps with the
`vmmaplk' held.

I'll think a bit about this and try to come up with a fix ASAP.

I'm missing a piece of information.  All the threads in your report seem
to want a read version of the `vmmaplk' so they should not block.  Could
you reproduce the hang with a WITNESS kernel and print 'show all locks'
in addition to all the informations you've reported?


Sure.  Its always the same; 2 processes (sysctl and bbolt.test) and 3
locks (sysctllk, kernel_lock, and vmmaplk) with bbolt.test always on the
uvn_flsh thread.


Process 98301 (sysctl) thread 0xfff......
exclusive rwlock sysctllk r = 0 (0xfffff...)
exclusive kernel_lock &kernel_lock r = 0 (0xffffff......)
Process 32181 (bbolt.test) thread (0xffffff...) (438138)
shared rwlock vmmaplk r = 0 (0xfffff......)

To reproduce, just do:
$ doas pkg_add git go
$ git clone https://github.com/etcd-io/bbolt.git
$ cd bbolt
$ git checkout v1.3.6
$ go test -v -run TestSimulate_10000op_10p

The test will hang happen almost instantly.


Not sure if this is a hint..

https://github.com/etcd-io/bbolt/blob/master/db.go#L27-L31

// IgnoreNoSync specifies whether the NoSync field of a DB is ignored when
// syncing changes to a file.  This is required as some operating systems,
// such as OpenBSD, do not have a unified buffer cache (UBC) and writes
// must be synchronized using the msync(2) syscall.
const IgnoreNoSync = runtime.GOOS == "openbsd"

Yes, the issue is related to sync(2).  Could you try the diff below, it
is not a fix, and tell me if you can produce the issue with it?  I can't.

Ran it 20 times and all completed and passed. I was also able to interrupt it as well. no issues.

Excellent!


Index: kern/kern_rwlock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
retrieving revision 1.48
diff -u -p -r1.48 kern_rwlock.c
--- kern/kern_rwlock.c  10 May 2022 16:56:16 -0000      1.48
+++ kern/kern_rwlock.c  21 Dec 2022 16:14:44 -0000
@@ -61,7 +61,7 @@ rw_cas(volatile unsigned long *p, unsign
   *
   * RW_WRITE   The lock must be completely empty. We increment it with
   *            RWLOCK_WRLOCK and the proc pointer of the holder.
- *             Sets RWLOCK_WAIT|RWLOCK_WRWANT while waiting.
+ *             Sets RWLOCK_WAIT while waiting.
   * RW_READ    RWLOCK_WRLOCK|RWLOCK_WRWANT may not be set. We increment
   *            with RWLOCK_READ_INCR. RWLOCK_WAIT while waiting.
   */
@@ -75,7 +75,7 @@ static const struct rwlock_op {
        {       /* RW_WRITE */
                RWLOCK_WRLOCK,
                ULONG_MAX,
-               RWLOCK_WAIT | RWLOCK_WRWANT,
+               RWLOCK_WAIT,
                1,
                PLOCK - 4
        },


Reply via email to