On 26/06/22(Sun) 20:36, Caspar Schutijser wrote:
> A laptop of mine (dmesg below) frequently hangs. After some bisecting
> and extensive testing I think I found the commit that causes this:
> mpi@'s
> "Always acquire the `vmobjlock' before incrementing an object's reference."
> commit from 2022-04-28.
>
> My definition of "the system hangs":
> * Display is frozen
> * Switching to ttyC0 using Ctrl+Alt+F1 doesn't do anything
> * System does not respond to keyboard or mouse input
> * Pressing the power button for 1-2 seconds doesn't achieve anything
> (usually this initiates a system shutdown)
> * And also the fan starts spinning
>
> The system sometimes hangs very soon after booting the system, I've
> seen it happen once while I was typing my username in xenodm to log in.
> But sometimes it takes a couple of hours.
>
> For some reason I put
> "@reboot while sleep 1 ; do sync ; done"
> in my crontab and it *seems* (I'm not sure) that the hangs occur more
> frequently this way. Not sure if that is useful information.
>
> I don't see similar problems on my other machines.
>
> It looks like when the system hangs, it's stuck spinning in the new
> code that was added in that commit; to confirm that I added some code
> (see the diff below) to enter ddb if it's spinning there for 10 seconds
> (and then it indeed enters ddb). If my thinking and diff make sense
> I think that indeed confirms that is the problem.
>
> Any tips for debugging this?
I believe I introduced a deadlock. If you can reproduce it could you
get us the output of `ps' in ddb(4) and the trace of all the active
processes.
I guess one is waiting for the KERNEL_LOCK() while holding the uobj's
vmobjlock.
> Index: uvm/uvm_vnode.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v
> retrieving revision 1.124
> diff -u -p -r1.124 uvm_vnode.c
> --- uvm/uvm_vnode.c 3 May 2022 21:20:35 -0000 1.124
> +++ uvm/uvm_vnode.c 26 Jun 2022 18:21:19 -0000
> @@ -1455,6 +1455,10 @@ uvm_vnp_sync(struct mount *mp)
> {
> struct uvm_vnode *uvn;
> struct vnode *vp;
> + struct timespec start;
> + struct timespec limit = { 10, 0 };
> + struct timespec now, diff;
> + int i = 0;
>
> /*
> * step 1: ensure we are only ones using the uvn_sync_q by locking
> @@ -1472,9 +1476,18 @@ uvm_vnp_sync(struct mount *mp)
> if (mp && vp->v_mount != mp)
> continue;
>
> + nanotime(&start);
> /* Spin to ensure `uvn_wlist' isn't modified concurrently. */
> while (rw_enter(uvn->u_obj.vmobjlock, RW_WRITE|RW_NOSLEEP)) {
> CPU_BUSY_CYCLE();
> + if (++i % 4096 == 0) {
> + nanotime(&now);
> + timespecsub(&now, &start, &diff);
> + if (timespeccmp(&diff, &limit, >)) {
> + printf("i: %d\n", i);
> + db_enter();
> + }
> + }
> }
>
> /*
>
>
> ==== 2: ddb output
>
> i: 175194112
> Stopped at db_enter+0x10: popq %rbp
> ddb{1}> ddb{1}> db_enter() at db_enter+0x10
> uvm_vnp_sync(ffff80000087f400) at uvm_vnp_sync+0xc9
> sys_sync(ffff8000fffc0a90,ffff800033ba0b30,ffff800033ba0b80) at sys_sync+0x85
> syscall(ffff800033ba0bf0) at syscall+0x374
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7fffff78c0, count: -5
> ddb{1}>
>
>
> ==== 3: dmesg
>
> OpenBSD 7.1-current (GENERIC.MP) #1: Sun Jun 26 19:19:30 CEST 2022
> caspar@laptop:/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17032646656 (16243MB)
> avail mem = 16499073024 (15734MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xa9764000 (34 entries)
> bios0: vendor HP version "P96 Ver. 01.38" date 01/06/2021
> bios0: HP HP EliteBook 1040 G4
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT RTMA UEFI SSDT TPM2 SSDT MSDM SLIC WSMT HPET
> APIC MCFG SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT DBGP DBG2 DMAR NHLT
> SSDT ASF! FPDT BGRT SSDT
> acpi0: wakeup devices GLAN(S4) XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4)
> TXHC(S3) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S0)
> PXSX(S0) RP06(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 23999999 Hz
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 2593.96 MHz, 06-8e-09
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 2592.92 MHz, 06-8e-09
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 1 (application processor)
> cpu2: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 2593.96 MHz, 06-8e-09
> cpu2:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 1, core 0, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 2593.96 MHz, 06-8e-09
> cpu3:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu3: 256KB 64b/line 8-way L2 cache
> cpu3: smt 1, core 1, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 120 pins
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf0000000, bus 0-127
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 1 (RP01)
> acpiprt2 at acpi0: bus -1 (RP02)
> acpiprt3 at acpi0: bus -1 (RP03)
> acpiprt4 at acpi0: bus -1 (RP04)
> acpiprt5 at acpi0: bus 108 (RP05)
> acpiprt6 at acpi0: bus -1 (RP06)
> acpiprt7 at acpi0: bus -1 (RP07)
> acpiprt8 at acpi0: bus -1 (RP08)
> acpiprt9 at acpi0: bus 109 (RP09)
> acpiprt10 at acpi0: bus -1 (RP10)
> acpiprt11 at acpi0: bus -1 (RP11)
> acpiprt12 at acpi0: bus -1 (RP12)
> acpiprt13 at acpi0: bus -1 (RP13)
> acpiprt14 at acpi0: bus -1 (RP14)
> acpiprt15 at acpi0: bus -1 (RP15)
> acpiprt16 at acpi0: bus -1 (RP16)
> acpiprt17 at acpi0: bus -1 (RP17)
> acpiprt18 at acpi0: bus -1 (RP18)
> acpiprt19 at acpi0: bus -1 (RP19)
> acpiprt20 at acpi0: bus -1 (RP20)
> acpiprt21 at acpi0: bus -1 (RP21)
> acpiprt22 at acpi0: bus -1 (RP22)
> acpiprt23 at acpi0: bus -1 (RP23)
> acpiprt24 at acpi0: bus -1 (RP24)
> acpiec0 at acpi0
> "PNP0C14" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> acpipci0 at acpi0 PCI0: 0x00000010 0x00000011 0x00000000
> "INT3403" at acpi0 not configured
> "INT3403" at acpi0 not configured
> "ALP0113" at acpi0 not configured
> pchgpio0 at acpi0 GPI0 addr 0xfdaf0000/0x10000 0xfdae0000/0x10000
> 0xfdac0000/0x10000 irq 14, 176 pins
> "ALP0013" at acpi0 not configured
> "PNP0C51" at acpi0 not configured
> "PNP0C51" at acpi0 not configured
> acpibat0 at acpi0: BAT0 model "Primary" serial 01525 2018/04/20 type LIon oem
> "Hewlett-Packard"
> acpiac0 at acpi0: AC unit online
> acpibtn0 at acpi0: SLPB
> acpibtn1 at acpi0: LID_
> acpibtn2 at acpi0: PWRB
> "PNP0C14" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> "HPQ6001" at acpi0 not configured
> tpm0 at acpi0 TPM_ 2.0 (TIS) addr 0xfed40000/0x5000, device 0x001b15d1 rev
> 0x10
> "INT3400" at acpi0 not configured
> acpicpu0 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33),
> C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33),
> C1(1000@1 mwait.1), PSS
> acpicpu2 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33),
> C1(1000@1 mwait.1), PSS
> acpicpu3 at acpi0: C3(200@1034 mwait.1@0x60), C2(200@151 mwait.1@0x33),
> C1(1000@1 mwait.1), PSS
> acpipwrres0 at acpi0: WRST
> acpitz0 at acpi0: critical temperature is 128 degC
> acpitz1 at acpi0: critical temperature is 128 degC
> acpitz2 at acpi0: critical temperature is 128 degC
> acpitz3 at acpi0: critical temperature is 128 degC
> acpitz4 at acpi0: critical temperature is 128 degC
> acpitz5 at acpi0acpitz5: CHGZ: failed to read _CRT
> acpitz5: CHGZ: failed to read _CRT
> : no critical temperature defined
> acpitz6 at acpi0: critical temperature is 128 degC
> acpitz7 at acpi0: critical temperature is 128 degC
> acpitz8 at acpi0: critical temperature is 128 degC
> acpivideo0 at acpi0: GFX0
> acpivout0 at acpivideo0: DD1F
> acpivout1 at acpivideo0: DD2F
> cpu0: using VERW MDS workaround (except on vmm entry)
> cpu0: Enhanced SpeedStep 2593 MHz: speeds: 2701, 2700, 2600, 2500, 2400,
> 2200, 2000, 1800, 1600, 1500, 1300, 1100, 800, 700, 600, 400 MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel Core 7G Host" rev 0x02
> inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 620" rev 0x02
> drm0 at inteldrm0
> inteldrm0: msi, KABYLAKE, gen 9
> "Intel Core 6G Thermal" rev 0x02 at pci0 dev 4 function 0 not configured
> xhci0 at pci0 dev 20 function 0 "Intel 100 Series xHCI" rev 0x21: msi, xHCI
> 1.0
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00
> addr 1
> pchtemp0 at pci0 dev 20 function 2 "Intel 100 Series Thermal" rev 0x21
> dwiic0 at pci0 dev 21 function 0 "Intel 100 Series I2C" rev 0x21: apic 2 int
> 16
> iic0 at dwiic0
> dwiic1 at pci0 dev 21 function 1 "Intel 100 Series I2C" rev 0x21: apic 2 int
> 17
> iic1 at dwiic1
> ihidev0 at iic1 addr 0x2c gpio 82, vendor 0x44e product 0x120a, ALP0013
> ihidev0: 12 report ids
> imt0 at ihidev0: clickpad, 5 contacts
> wsmouse0 at imt0 mux 0
> ims0 at ihidev0 reportid 1: 2 buttons, Z and W dir
> wsmouse1 at ims0 mux 0
> hid at ihidev0 reportid 3 not configured
> hid at ihidev0 reportid 4 not configured
> hid at ihidev0 reportid 5 not configured
> hid at ihidev0 reportid 6 not configured
> hid at ihidev0 reportid 10 not configured
> hid at ihidev0 reportid 12 not configured
> "Intel 100 Series MEI" rev 0x21 at pci0 dev 22 function 0 not configured
> ppb0 at pci0 dev 28 function 0 "Intel 100 Series PCIE" rev 0xf1: msi
> pci1 at ppb0 bus 1
> ppb1 at pci0 dev 28 function 4 "Intel 100 Series PCIE" rev 0xf1: msi
> pci2 at ppb1 bus 108
> iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi
> ppb2 at pci0 dev 29 function 0 "Intel 100 Series PCIE" rev 0xf1: msi
> pci3 at ppb2 bus 109
> nvme0 at pci3 dev 0 function 0 "Samsung SM981/PM981 NVMe" rev 0x00: msix,
> NVMe 1.2
> nvme0: SAMSUNG MZVLB512HAJQ-000H1, firmware EXA72H1Q, serial S3WTNB0K561746
> scsibus1 at nvme0: 2 targets, initiator 0
> sd0 at scsibus1 targ 1 lun 0: <NVMe, SAMSUNG MZVLB512, EXA7>
> sd0: 488386MB, 512 bytes/sector, 1000215216 sectors
> pcib0 at pci0 dev 31 function 0 "Intel 200 Series LPC" rev 0x21
> "Intel 100 Series PMC" rev 0x21 at pci0 dev 31 function 2 not configured
> azalia0 at pci0 dev 31 function 3 "Intel 200 Series HD Audio" rev 0x21: msi
> azalia0: codecs: Conexant/0x1fd6, Intel/0x280b, using Conexant/0x1fd6
> audio0 at azalia0
> ichiic0 at pci0 dev 31 function 4 "Intel 100 Series SMBus" rev 0x21: apic 2
> int 16
> iic2 at ichiic0
> isa0 at pcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0: console keyboard
> pms0 at pckbc0 (aux slot)
> wsmouse2 at pms0 mux 0
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> vmm0 at mainbus0: VMX/EPT
> efifb at mainbus0 not configured
> cdce0 at uhub0 port 2 configuration 2 interface 0 "HP Inc. HP lt4132
> LTE/HSPA+ 4G Module" rev 2.00/1.02 addr 2
> cdce0: could not find data bulk in
> ugen0 at uhub0 port 2 configuration 2 "HP Inc. HP lt4132 LTE/HSPA+ 4G Module"
> rev 2.00/1.02 addr 2
> ugen1 at uhub0 port 3 "Intel Bluetooth" rev 2.00/0.10 addr 3
> uhidev0 at uhub0 port 4 configuration 1 interface 0 "Logitech USB Receiver"
> rev 2.00/24.07 addr 4
> uhidev0: iclass 3/1
> ukbd0 at uhidev0: 8 variable keys, 6 key codes
> wskbd1 at ukbd0 mux 1
> uhidev1 at uhub0 port 4 configuration 1 interface 1 "Logitech USB Receiver"
> rev 2.00/24.07 addr 4
> uhidev1: iclass 3/1, 8 report ids
> ums0 at uhidev1 reportid 2: 16 buttons, Z and W dir
> wsmouse3 at ums0 mux 0
> ucc0 at uhidev1 reportid 3: 652 usages, 18 keys, array
> wskbd2 at ucc0 mux 1
> uhid0 at uhidev1 reportid 4: input=1, output=0, feature=0
> uhid1 at uhidev1 reportid 8: input=1, output=0, feature=0
> uhidev2 at uhub0 port 4 configuration 1 interface 2 "Logitech USB Receiver"
> rev 2.00/24.07 addr 4
> uhidev2: iclass 3/0, 33 report ids
> uhidpp0 at uhidev2 device 1 mouse "MX Master 2S" serial ae-f6-7f-70
> uhid2 at uhidev2 reportid 32: input=14, output=14, feature=0
> uhid3 at uhidev2 reportid 33: input=31, output=31, feature=0
> uvideo0 at uhub0 port 7 configuration 1 interface 0 "Generic HP HD Camera"
> rev 2.00/0.16 addr 5
> video0 at uvideo0
> uvideo1 at uhub0 port 7 configuration 1 interface 2 "Generic HP HD Camera"
> rev 2.00/0.16 addr 5
> video1 at uvideo1
> ugen2 at uhub0 port 8 "Validity Sensors product 0x0092" rev 2.00/1.64 addr 6
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006>
> sd1: 409599MB, 512 bytes/sector, 838860272 sectors
> root on sd1a (06ac9358f268d3c0.a) swap on sd1b dump on sd1b
> drm:pid0:intel_ddi_sanitize_encoder_pll_mapping *NOTICE* [drm]
> [ENCODER:102:DDI \M-J/PHY h] is disabled/in DSI mode with an ungated DDI
> clock, gate it
> drm:pid0:intel_ddi_sanitize_encoder_pll_mapping *NOTICE* [drm]
> [ENCODER:116:DDI \M-J/PHY h] is disabled/in DSI mode with an ungated DDI
> clock, gate it
> inteldrm0: 1920x1080, 32bpp
> wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0
> wskbd1: connecting to wsdisplay0
> wskbd2: connecting to wsdisplay0
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> iwm0: hw rev 0x230, fw ver 36.ca7b901d.0, address 98:3b:8f:xx:xx:xx
>