On May 22, 2023, at 2:27 AM, Claudio Jeker <clau...@openbsd.org> wrote: > I have seen these WITNESS warnings on other systems as well. I doubt this > is the problem. IIRC this warning is because sys_mount() is doing it wrong > but it is not really an issue since sys_mount is not called often.
Yup. I see that now that I have tested witness on several arches. They all show this lock order reversal right after booting the system. I guess this means what I am seeing isn’t something that witness detects. On -current with my T4-1, I can reliably reproduce the issues I am seeing. While the problem is intermittent I can’t get very far into the jdk build without tripping it. Instructions for reproducing the issue are: Add wxallowed to /usr/local/ and /usr/ports (or wherever WRKOBJDIR has been changed to) doas pkg_add jdk zip unzip cups-libs bash gmake libiconv giflib cd /usr/ports/devel/jdk/1.8 FLAVOR=native_bootstrap make There are two stages to the problem. A java command (or javac or javah) gets stuck making forward progress and nearly all of its cpu time is in sys time category. You can see this in top as 1500-3000% CPU time on the java process. ktrace of the process in this state shows endless sched_yield() calls. Debugging shows many threads in pthread_cond_wait(3). The condition vars are configured to use CLOCK_MONOTONIC. The second stage of the problem is when things lock up. While java is spinning in this sched_yield() state, if you display the process arguments in top (pressing the right arrow) you trip the lockups. top stops responding. getty will reprompt if enter is pressed, but locks up if a username is entered. Most processes lock up when doing anything after this point. ddb ps at this stage shows top waiting on vmmaplk and the rest of the stuck processes waiting on sysctllk (sshd, systat, login). I tried bisecting when this was introduced but as I go back in time with kernels it becomes more intermittent and I didn’t notice that so I would need to redo the bisecting. I can say I have seen the problem reproduce as far back as Feb 16th kernel. When I updated the jdk in late January I didn’t notice it but it could have been a lucky build as I tend to only do one native_bootstrap build of the jdk when updating as a way to test the resulting package. Here is some sample output of top, systat and ddb ps output on -current in my last reproduction of the problem. load averages: 15.27, 4.25, 1.68 oracle.intricatesoftware.com 16:13:10 64 processes: 62 idle, 2 on processor up 0 days 00:04:58 64 CPUs: 0.0% user, 1.8% nice, 55.5% sys, 0.4% spin, 0.1% intr, 42.2% idle Memory: Real: 143M/2676M act/tot Free: 13G Cache: 2334M Swap: 0K/16G PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND 38582 _pbuild 10 5 290M 64M onproc/19 fsleep 17:16 2892.14% javac 3 users Load 15.27 4.25 1.68 oracle.intricatesof 16:13:09 memory totals (in KB) PAGING SWAPPING Interrupts real virtual free in out in out 12777 total Active 146752 146752 13198032 ops vcons0 All 2740160 2740160 29713104 pages 5 vpci0:0 2 mpii0 Proc:r d s w Csw Trp Sys Int Sof Flt forks mpii1 18 148 78089 18666 8090 94097 18891 fkppw 2 em0 fksvm 31 ehci0 0.1%Int 0.3%Spn 55.1%Sys 1.8%Usr 42.7%Idle pwait 12737 clock | | | | | | | | | | | relck ============================> rlkok noram Namei Sys-cache Proc-cache No-cache ndcpy Calls hits % hits % miss % fltcp zfod cow Disks cd0 sd0 sd1 sd2 cd1 sd3 66409 fmin seeks 88545 ftarg xfers 3 itarg speed 18K 2 wired sec 0.0 pdfre pdscn pzidl 1 IPKTS 13 kmape 1 OPKTS ddb{0}> ps PID TID PPID UID S FLAGS WAIT COMMAND 94135 299634 9252 0 3 0x82 sysctllk sshd 38582 227559 44647 55 3 0x200083 fsleep javac 38582 430054 44647 55 3 0x4200083 fsleep javac 38582 454002 44647 55 7 0x4200003 javac 38582 92205 44647 55 3 0x4200083 fsleep javac 38582 391595 44647 55 7 0x4200003 javac 38582 306671 44647 55 7 0x4200003 javac 38582 444915 44647 55 7 0x4200003 javac 38582 68842 44647 55 7 0x4200003 javac 38582 210759 44647 55 7 0x4200003 javac 38582 295002 44647 55 3 0x4200003 uobjlk javac 38582 20089 44647 55 7 0x4200003 javac 38582 426972 44647 55 7 0x4200003 javac 38582 485651 44647 55 7 0x4200003 javac 38582 198526 44647 55 7 0x4200003 javac 38582 140392 44647 55 7 0x4200003 javac 38582 426536 44647 55 7 0x4200003 javac 38582 237845 44647 55 3 0x4200003 vmmaplk javac 38582 11519 44647 55 7 0x4200003 javac 38582 493801 44647 55 7 0x4200003 javac 38582 431112 44647 55 7 0x4200003 javac 38582 5954 44647 55 3 0x4200003 uobjlk javac 38582 334521 44647 55 7 0x4200003 javac 38582 63717 44647 55 7 0x4200003 javac 38582 278840 44647 55 3 0x4200003 uobjlk javac 38582 288942 44647 55 3 0x4200003 vmmaplk javac 38582 358539 44647 55 3 0x4200003 vmmaplk javac 38582 292341 44647 55 3 0x4200003 uobjlk javac 38582 71353 44647 55 7 0x4200003 javac 38582 173462 44647 55 7 0x4200003 javac 38582 369162 44647 55 7 0x4200003 javac 38582 103608 44647 55 3 0x4200003 uobjlk javac 38582 283692 44647 55 7 0x4200003 javac 38582 73296 44647 55 7 0x4200003 javac 38582 107123 44647 55 7 0x4200003 javac 38582 190345 44647 55 7 0x4200003 javac 38582 519032 44647 55 7 0x4200003 javac 38582 512468 44647 55 7 0x4200003 javac 38582 30652 44647 55 7 0x4200003 javac 38582 379236 44647 55 3 0x4200003 uobjlk javac 38582 269178 44647 55 7 0x4200003 javac 38582 268232 44647 55 3 0x4200003 uobjlk javac 38582 352770 44647 55 7 0x4200003 javac 38582 377632 44647 55 7 0x4200003 javac 38582 244819 44647 55 7 0x4200003 javac 38582 72610 44647 55 7 0x4200003 javac 38582 469953 44647 55 3 0x4200083 fsleep javac 38582 371383 44647 55 3 0x4200083 fsleep javac 38582 158857 44647 55 3 0x4200083 fsleep javac 38582 11735 44647 55 3 0x4200083 fsleep javac 38582 84102 44647 55 3 0x4200083 fsleep javac 38582 75248 44647 55 3 0x4200083 fsleep javac 38582 375420 44647 55 3 0x4200083 fsleep javac 38582 518610 44647 55 3 0x4200083 fsleep javac 38582 127230 44647 55 3 0x4200003 vmmaplk javac 38582 83423 44647 55 3 0x4200083 fsleep javac 38582 159603 44647 55 3 0x4200083 fsleep javac 38582 31091 44647 55 3 0x4200083 fsleep javac 38582 390888 44647 55 3 0x4200083 fsleep javac 38582 166886 44647 55 3 0x4200083 fsleep javac 38582 377933 44647 55 3 0x4200083 fsleep javac 38582 376867 44647 55 3 0x4200083 fsleep javac 38582 110280 44647 55 3 0x4200083 fsleep javac 38582 396501 44647 55 3 0x4200083 fsleep javac 38582 337157 44647 55 3 0x4200083 fsleep javac 38582 312539 44647 55 3 0x4200083 fsleep javac 38582 439725 44647 55 3 0x4200083 fsleep javac 38582 174728 44647 55 3 0x4200083 fsleep javac 38582 350117 44647 55 3 0x4200083 fsleep javac 38582 336583 44647 55 3 0x4200083 fsleep javac 44647 53285 6389 55 3 0x100089 sigsusp sh 6389 9627 3586 55 3 0x10008b sigsusp sh 3586 519556 31484 55 3 0x83 wait gmake 29401 282226 87417 55 3 0x100083 piperd tee 31484 384291 87417 55 3 0x81 wait bash 10686 220950 84161 55 3 0x100083 piperd tee 87417 52900 84161 55 3 0x81 wait bash 84161 354717 90894 55 3 0x83 wait bash 90894 299418 54844 55 3 0x100089 sigsusp sh 54844 38202 23322 55 3 0x10008b sigsusp sh 23322 499463 12410 55 3 0x83 wait gmake 2137 306483 5971 1000 3 0x83 sysctllk systat 96354 374280 1 1000 3 0x1000b0 kqread ssh-agent 5971 326285 53265 1000 3 0x10008b sigsusp ksh 53265 95696 81543 1000 3 0x98 kqread sshd 81543 262878 9252 0 3 0x82 kqread sshd 12410 25695 53684 1000 3 0x10008b sigsusp make 53684 389550 2559 1000 3 0x10008b sigsusp sh 91269 101622 48618 1000 3 0x100003 vmmaplk top 26730 50864 1 1000 3 0x1000b0 kqread ssh-agent 48618 194268 38192 1000 3 0x10008b sigsusp ksh 38192 94599 15809 1000 3 0x98 kqread sshd 15809 420302 9252 0 3 0x82 kqread sshd 2559 172546 11253 1000 3 0x10008b sigsusp make 48347 451749 1 1000 3 0x1000b0 kqread ssh-agent 11253 329919 75474 1000 3 0x10008b sigsusp ksh 75474 434695 24864 1000 3 0x98 kqread sshd 24864 460675 9252 0 3 0x82 kqread sshd 80339 457654 1 0 3 0x83 sysctllk login 2783 471804 1 0 3 0x100098 kqread cron 13986 329572 1 99 3 0x1100090 kqread sndiod 17161 216329 1 110 3 0x100090 kqread sndiod 7125 459497 81420 95 3 0x1100092 kqread smtpd 34588 306264 81420 103 3 0x1100092 kqread smtpd 35539 431962 81420 95 3 0x1100092 kqread smtpd 95046 110434 81420 95 3 0x100092 kqread smtpd 75832 395854 81420 95 3 0x1100092 kqread smtpd 90905 131444 81420 95 3 0x1100092 kqread smtpd 81420 40917 1 0 3 0x100080 kqread smtpd 9252 412804 1 0 3 0x88 kqread sshd 77215 300303 47896 0 3 0x80 nfsd nfsd 39163 328218 47896 0 3 0x80 nfsd nfsd 18940 88416 47896 0 3 0x80 nfsd nfsd 30770 448875 47896 0 3 0x80 nfsd nfsd 47896 35491 1 0 3 0x80 netcon nfsd 69834 381834 1172 0 3 0x80 kqread mountd 1172 8898 1 0 3 0x100080 kqread mountd 34849 159509 1 28 3 0x1100090 kqread portmap 99111 374653 1 0 3 0x100080 kqread ntpd 96514 463552 69056 83 3 0x100092 sysctllk ntpd 69056 185686 1 83 3 0x1100092 sysctllk ntpd 76215 96020 37258 74 3 0x1100092 bpf pflogd 37258 158509 1 0 3 0x80 netio pflogd 74134 93758 35337 73 3 0x1100090 kqread syslogd 35337 436947 1 0 3 0x100082 netio syslogd 49461 134918 1 0 3 0x100080 kqread resolvd 15802 138340 35598 77 3 0x100092 kqread dhcpleased 85374 145907 35598 77 3 0x100092 kqread dhcpleased 35598 69070 1 0 3 0x80 kqread dhcpleased 78965 485143 22355 115 3 0x100092 kqread slaacd 49372 473630 22355 115 3 0x100092 kqread slaacd 22355 252711 1 0 3 0x100080 kqread slaacd 77472 293410 0 0 3 0x14200 bored smr 96287 2177 0 0 3 0x14200 pgzero zerothread 22397 479660 0 0 3 0x14200 aiodoned aiodoned 40550 268537 0 0 3 0x14200 syncer update 30711 491254 0 0 3 0x14200 cleaner cleaner 19845 282487 0 0 3 0x14200 reaper reaper 85959 130144 0 0 3 0x14200 pgdaemon pagedaemon 70483 182566 0 0 3 0x14200 usbtsk usbtask 51296 152380 0 0 3 0x14200 usbatsk usbatsk 16787 444929 0 0 3 0x14200 bored sensors 63429 82409 0 0 7 0x40014200 idle63 92645 493403 0 0 7 0x40014200 idle62 20350 375563 0 0 7 0x40014200 idle61 35222 74521 0 0 7 0x40014200 idle60 33426 375581 0 0 7 0x40014200 idle59 22679 177213 0 0 7 0x40014200 idle58 14639 386227 0 0 7 0x40014200 idle57 39400 120262 0 0 7 0x40014200 idle56 12654 7800 0 0 7 0x40014200 idle55 48608 174405 0 0 3 0x40014200 idle54 94678 113021 0 0 7 0x40014200 idle53 6170 359112 0 0 3 0x40014200 idle52 92056 451703 0 0 7 0x40014200 idle51 94062 149790 0 0 3 0x40014200 idle50 7490 352645 0 0 7 0x40014200 idle49 58618 321162 0 0 3 0x40014200 idle48 34596 30496 0 0 3 0x40014200 idle47 42019 493114 0 0 7 0x40014200 idle46 19753 346089 0 0 3 0x40014200 idle45 35289 255668 0 0 7 0x40014200 idle44 71157 135665 0 0 7 0x40014200 idle43 2489 156561 0 0 7 0x40014200 idle42 69267 449921 0 0 3 0x40014200 idle41 17078 23764 0 0 3 0x40014200 idle40 18986 140559 0 0 7 0x40014200 idle39 31565 508859 0 0 3 0x40014200 idle38 83892 503397 0 0 3 0x40014200 idle37 3971 218332 0 0 3 0x40014200 idle36 56087 426177 0 0 3 0x40014200 idle35 8255 404604 0 0 3 0x40014200 idle34 88769 308362 0 0 7 0x40014200 idle33 17805 369954 0 0 7 0x40014200 idle32 44764 388232 0 0 7 0x40014200 idle31 6184 164272 0 0 7 0x40014200 idle30 57793 246870 0 0 3 0x40014200 idle29 94847 282657 0 0 3 0x40014200 idle28 20730 274685 0 0 7 0x40014200 idle27 49977 358582 0 0 3 0x40014200 idle26 76662 491808 0 0 3 0x40014200 idle25 339 381654 0 0 7 0x40014200 idle24 95798 435838 0 0 7 0x40014200 idle23 17729 220014 0 0 3 0x40014200 idle22 32933 457100 0 0 3 0x40014200 idle21 93991 300506 0 0 7 0x40014200 idle20 83270 35155 0 0 3 0x40014200 idle19 113 190949 0 0 7 0x40014200 idle18 47202 13764 0 0 3 0x40014200 idle17 26572 32690 0 0 7 0x40014200 idle16 79156 94305 0 0 3 0x40014200 idle15 91078 227738 0 0 3 0x40014200 idle14 92007 190955 0 0 3 0x40014200 idle13 30258 465800 0 0 7 0x40014200 idle12 21174 286794 0 0 7 0x40014200 idle11 38572 384863 0 0 3 0x40014200 idle10 39753 266722 0 0 7 0x40014200 idle9 2402 328673 0 0 3 0x40014200 idle8 7204 135132 0 0 3 0x40014200 idle7 50857 147720 0 0 3 0x40014200 idle6 98253 405788 0 0 3 0x40014200 idle5 45666 511079 0 0 3 0x40014200 idle4 31437 507896 0 0 3 0x40014200 idle3 22223 410682 0 0 7 0x40014200 idle2 66020 259824 0 0 3 0x40014200 idle1 1702 438104 0 0 3 0x14200 bored softnet3 79408 470282 0 0 3 0x14200 bored softnet2 65184 187165 0 0 3 0x14200 bored softnet1 19984 214261 0 0 3 0x14200 bored softnet0 96155 318872 0 0 3 0x14200 bored systqmp 56400 116597 0 0 3 0x14200 bored systq 90431 390796 0 0 3 0x40014200 bored softclock *64111 191058 0 0 7 0x40014200 idle0 18862 308472 0 0 3 0x14200 kmalloc kmthread 1 421752 0 0 3 0x82 wait init 0 0 -1 0 3 0x10200 scheduler swapper