On 12/21/22 11:23, Martin Pieuchot wrote:
On 21/12/22(Wed) 09:20, David Hill wrote:
On 12/21/22 07:08, David Hill wrote:
On 12/21/22 05:33, Martin Pieuchot wrote:
On 18/12/22(Sun) 20:55, Martin Pieuchot wrote:
On 17/12/22(Sat) 14:15, David Hill wrote:
On 10/28/22 03:46, Renato Aguiar wrote:
Use of bbolt Go library causes 7.2 to freeze. I suspect
it is triggering some
sort of deadlock in mmap because threads get stuck at vmmaplk.
I managed to reproduce it consistently in a laptop with
4 cores (i5-1135G7)
using one unit test from bbolt:
$ doas pkg_add git go
$ git clone https://github.com/etcd-io/bbolt.git
$ cd bbolt
$ git checkout v1.3.6
$ go test -v -run TestSimulate_10000op_10p
The test never ends and this is the 'top' report:
PID TID PRI NICE SIZE RES STATE
WAIT TIME CPU COMMAND
32181 438138 -18 0 57M 13M idle uvn_fls
0:00 0.00% bbolt.test
32181 331169 10 0 57M 13M sleep/1 nanoslp
0:00 0.00% bbolt.test
32181 497390 10 0 57M 13M idle vmmaplk
0:00 0.00% bbolt.test
32181 380477 14 0 57M 13M idle vmmaplk
0:00 0.00% bbolt.test
32181 336950 14 0 57M 13M idle vmmaplk
0:00 0.00% bbolt.test
32181 491043 14 0 57M 13M idle vmmaplk
0:00 0.00% bbolt.test
32181 347071 2 0 57M 13M idle kqread
0:00 0.00% bbolt.test
After this, most commands just hang. For example,
running a 'ps | grep foo' in
another shell would do it.
I can reproduce this on MP, but not SP. Here is /trace from
ddb after using
the ddb.trigger sysctl. Is there any other information I
could pull from
DDB that may help?
Thanks for the useful report David!
The issue seems to be a deadlock between the `vmmaplk' and a particular
`vmobjlock'. uvm_map_clean() calls uvn_flush() which sleeps with the
`vmmaplk' held.
I'll think a bit about this and try to come up with a fix ASAP.
I'm missing a piece of information. All the threads in your report seem
to want a read version of the `vmmaplk' so they should not block. Could
you reproduce the hang with a WITNESS kernel and print 'show all locks'
in addition to all the informations you've reported?
Sure. Its always the same; 2 processes (sysctl and bbolt.test) and 3
locks (sysctllk, kernel_lock, and vmmaplk) with bbolt.test always on the
uvn_flsh thread.
Process 98301 (sysctl) thread 0xfff......
exclusive rwlock sysctllk r = 0 (0xfffff...)
exclusive kernel_lock &kernel_lock r = 0 (0xffffff......)
Process 32181 (bbolt.test) thread (0xffffff...) (438138)
shared rwlock vmmaplk r = 0 (0xfffff......)
To reproduce, just do:
$ doas pkg_add git go
$ git clone https://github.com/etcd-io/bbolt.git
$ cd bbolt
$ git checkout v1.3.6
$ go test -v -run TestSimulate_10000op_10p
The test will hang happen almost instantly.
Not sure if this is a hint..
https://github.com/etcd-io/bbolt/blob/master/db.go#L27-L31
// IgnoreNoSync specifies whether the NoSync field of a DB is ignored when
// syncing changes to a file. This is required as some operating systems,
// such as OpenBSD, do not have a unified buffer cache (UBC) and writes
// must be synchronized using the msync(2) syscall.
const IgnoreNoSync = runtime.GOOS == "openbsd"
Yes, the issue is related to sync(2). Could you try the diff below, it
is not a fix, and tell me if you can produce the issue with it? I can't.
Ran it 20 times and all completed and passed. I was also able to
interrupt it as well. no issues.
Excellent!
Index: kern/kern_rwlock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
retrieving revision 1.48
diff -u -p -r1.48 kern_rwlock.c
--- kern/kern_rwlock.c 10 May 2022 16:56:16 -0000 1.48
+++ kern/kern_rwlock.c 21 Dec 2022 16:14:44 -0000
@@ -61,7 +61,7 @@ rw_cas(volatile unsigned long *p, unsign
*
* RW_WRITE The lock must be completely empty. We increment it with
* RWLOCK_WRLOCK and the proc pointer of the holder.
- * Sets RWLOCK_WAIT|RWLOCK_WRWANT while waiting.
+ * Sets RWLOCK_WAIT while waiting.
* RW_READ RWLOCK_WRLOCK|RWLOCK_WRWANT may not be set. We increment
* with RWLOCK_READ_INCR. RWLOCK_WAIT while waiting.
*/
@@ -75,7 +75,7 @@ static const struct rwlock_op {
{ /* RW_WRITE */
RWLOCK_WRLOCK,
ULONG_MAX,
- RWLOCK_WAIT | RWLOCK_WRWANT,
+ RWLOCK_WAIT,
1,
PLOCK - 4
},