On 2024/08/24 12:13, Martin Pieuchot wrote: > Hugh, > > If you can reproduce this easily, please send a new panic with the > outputs of: > - show uvm > - show bcstats > - And the traces of all running processes... In the two reports below we > only have the trace of pax(1) which is running on CPU2. > > The two panics are due to corruptions of two different global data > structures related to buffers: the tree of pages and the tree of buffers. > > In both cases it happens when the buffer cache reaches low DMA watermark > and tries to flip a buffer high. The fact that global data structures > are corrupted and the given buffer cannot be found tends to indicate > there is a race. And this is coherent with the use of pax | nc which > are currently running on two different CPUs. > > I fear there's a sleeping point somewhere, we could try converting the > splbio() to a mutex which should help. > > On 24/08/24(Sat) 01:11, Hugh Graham wrote: > > On Fri, Aug 23, 2024 at 01:52:52PM -0600, Bob Beck wrote: > > > My immediate suspicion would also fall there. Nothing in here has > > > recently changed. > > > > > > You should probably share this with a wider audience, like bugs@ or tech@ > > > instead of just > > > Mailing individuals. > > > > Apologies for the lack of process. I am only barely awake after a > > long slumber. > > > > It ran all day, but I did manage to reproduce the crash on 7.4, > > so that absolves a whole bunch of "recent" changes. > > > > Also, as yet, I have only the single machine for testing and > > can't exclude hardware. If anyone wants to make an independent > > confirmation, sending a ports tree with plenty of packages and > > distfiles might be a successful recipe. > > > > pax -w ports | network | pax -r > > > > Where the receiver's network media is forced to 10BaseT, or the > > sending machine is just that slow. My latest crash was near the > > 25GB mark, but this varies greatly and is usually sooner. I > > will confirm this recipe when I see my next crash. > > > > /Hugh > > > > >> OpenBSD/amd64 BOOTX64 3.65 > > boot> boot bsd.mp.74.dist -s > > booting hd0a:bsd.mp.74.dist: 17249612+4142096+368672+0+1241088 > > [1340407+128+1321080+1013316]=0x1973738 > > entry point at 0x1001000 > > [ using 3675960 bytes of bsd ELF symbol table ] > > Copyright (c) 1982, 1986, 1989, 1991, 1993 > > The Regents of the University of California. All rights reserved. > > Copyright (c) 1995-2023 OpenBSD. All rights reserved. > > https://www.OpenBSD.org > > > > OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37 MDT 2023 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > real mem = 33551818752 (31997MB) > > ... > > panic: kernel diagnostic assertion "tpg != NULL" failed: file > > "/usr/src/sys/uvm > > /uvm_page.c", line 855 > > Stopped at db_enter+0x14: popq %rbp > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > * 1154 46918 0 0x100003 0 2 pax > > 401298 12308 0 0x100003 0 3 nc > > 404052 38135 0 0x14000 0x200 1 softnet0 > > db_enter() at db_enter+0x14 > > panic(ffffffff820a9e1f) at panic+0xc3 > > __assert(ffffffff82122f8e,ffffffff8207775c,357,ffffffff8215caa2) at > > __assert+0x > > 29 > > uvm_pagerealloc_multi(fffffd8741fa1218,4000000000,4000,22,ffffffff8250d8d0) > > at u > > vm_pagerealloc_multi+0x2f8 > > buf_realloc_pages(fffffd8741fa1158,ffffffff8250d8d0,2) at > > buf_realloc_pages+0xb > > f > > buf_flip_high(fffffd8741fa1158) at buf_flip_high+0x7e > > bufcache_recover_dmapages(0,4) at bufcache_recover_dmapages+0x12b > > buf_get(fffffd873280ab58,3be4,4000) at buf_get+0xcb > > getblk(fffffd873280ab58,3be4,4000,0,ffffffffffffffff) at getblk+0x71 > > ffs2_balloc(fffffd872b47be18,ef90000,2d,fffffd880dad9ea0,1,ffff8000443a11a8) > > at > > ffs2_balloc+0xeef > > ffs_write(ffff8000443a1228) at ffs_write+0x229 > > VOP_WRITE(fffffd873280ab58,ffff8000443a1388,1,fffffd880dad9ea0) at > > VOP_WRITE+0x > > 45 > > vn_write(fffffd8718901708,ffff8000443a1388,0) at vn_write+0xcc > > dofilewritev(ffff80004436e2b0,6,ffff8000443a1388,0,ffff8000443a1460) at > > dofilew > > ritev+0x151 > > end trace frame: 0xffff8000443a13f0, count: 0 > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb{2}> > > > > > > > > Very likely related to come of the changes being made in uvm.
I've just been sent a photo from a crashed machine (not local to me - it's running 7.6-beta from Aug 19) with a trace which doesn't look entirely dissimilar to this first one from Hugh. It would have been idling at the time with X, mate, possibly chromium running but not actively used. Sadly I don't have any further information from DDB beyond what was on-screen, the machine was already rebooted so I can't get it now, so I'm afraid this is probably not all that a useful report.. Hand-retyped below (I do have messages from multiple boots in dmesg output, but nothing from DDB, so you'll have to live with any typos, photo available for rechecking if needed). Before this snapshot it was running 7.3. uvm_fault(0xfffffd83fdf596c8, 0x60, 0, 1) -> e kernel: page fault trap, code=0 Stopped at bread+0x33 testq $0x180,0x60(%rax) tid pid uid prflags pflags cpu command *310554 16143 35 0x18000012 0 3K Xorg 456271 71988 0 0x14000 0x200 1 i915_modeset 351969 42416 0 0x14000 0x200 2 i915-unordered 309135 79587 0 0x14000 0x200 0 drmubwq bread(fffffd84744efb2,ffffffffffff2ff4,4000,ffff800048363da58) at bread+0x33 ffs2_balloc(fffffd83fd837010,34275c23,e,fffffd83fc91cbd0,1,ffff80004863db18) at ffs2_balloc+0x6a3 ffs_write(ffff80004863db98) at ffs_write+0x20d VOP_WRITE(fffffd84744efb28,ffff80004863dcf8,1,fffffd83fc91cbd0) at VOP_WRITE+0x45 vn_write(fffffd83fd6b24c8,ffff80004863dcf8,0) at vn_write+oxd9 dofilewritev(fff800036b067b8,3,ffff80004863dcf8,0,ffff80004863dd90) at dofilewritev+0171 sys_write(ffff800036b067b8,ffff80004863de20,ffff80004863dd90) at sys_write+0x55 syscall(ffff80004863de20) at syscall+0x620 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x79e71da64660, count: 6 OpenBSD 7.6-beta (GENERIC.MP) #269: Mon Aug 19 19:01:12 MDT 2024 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 16927842304 (16143MB) avail mem = 16391426048 (15632MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.5 @ 0x75d9e000 (124 entries) bios0: vendor American Megatrends International, LLC. version "ADLNV105" date 12/12/2023 bios0: AZW MINI S efi0 at bios0: UEFI 2.8 efi0: American Megatrends rev 0x5001a acpi0 at bios0: ACPI 6.4Undefined scope: \\_SB_.PC00.TXHC.RHUB.SS01 Undefined scope: \\_SB_.PC00.TXHC.RHUB.SS02 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP FIDT SSDT SSDT SSDT SSDT HPET APIC MCFG SSDT UEFI RTCT PSDS NHLT LPIT SSDT SSDT DBGP DBG2 SSDT DMAR FPDT SSDT SSDT SSDT SSDT TPM2 PHAT WSMT acpi0: wakeup devices PEGP(S4) PEGP(S4) PEGP(S4) SIO1(S3) RP09(S4) PXSX(S4) RP10(S4) PXSX(S4) RP11(S4) PXSX(S4) RP12(S4) PXSX(S4) RP13(S4) PXSX(S4) RP14(S4) PXSX(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 19200000 Hz acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) N100, 3392.18 MHz, 06-be-00, patch 00000017 cpu0: cpuid 1 edx=bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> ecx=77fafbbf<SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND> cpu0: cpuid 6 eax=578ff7<SENSOR,ARAT> ecx=9<EFFFREQ> cpu0: cpuid 7.0 ebx=239ca7eb<FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA> ecx=98c007ac<UMIP,PKU,WAITPKG,PKS> edx=fc184410<MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD> cpu0: cpuid a vers=5, gp=6, gpwidth=48, ff=3, ffwidth=48 cpu0: cpuid d.1 eax=f<XSAVEOPT,XSAVEC,XGETBV1,XSAVES> cpu0: cpuid 80000001 edx=2c100800<NXE,PAGE1GB,RDTSCP,LONG> ecx=121<LAHF,ABM,3DNOWP> cpu0: cpuid 80000007 edx=100<ITSC> cpu0: msr 10a=1580fd6b<IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,TAA_NO,MISC_PKG_CT,ENERGY_FILT,DOITM,SBDR_SSDP_N,FBSDP_NO,PSDP_NO,OVERCLOCK,PBRSB_NO,GDS_NO,RFDS_CLEAR> cpu0: 32KB 64b/line 8-way D-cache, 64KB 64b/line 8-way I-cache, 2MB 64b/line 16-way L2 cache, 6MB 64b/line 12-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 38MHz cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) N100, 3392.18 MHz, 06-be-00, patch 00000017 cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) N100, 3092.87 MHz, 06-be-00, patch 00000017 cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) N100, 2893.33 MHz, 06-be-00, patch 00000017 cpu3: smt 0, core 3, package 0 ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 120 pins acpimcfg0 at acpi0 acpimcfg0: addr 0xc0000000, bus 0-255 acpiprt0 at acpi0: bus 0 (PC00) acpiprt1 at acpi0: bus -1 (RP09) acpiprt2 at acpi0: bus -1 (RP10) acpiprt3 at acpi0: bus 2 (RP11) acpiprt4 at acpi0: bus -1 (RP12) acpiprt5 at acpi0: bus -1 (RP13) acpiprt6 at acpi0: bus -1 (RP14) acpiprt7 at acpi0: bus -1 (RP15) acpiprt8 at acpi0: bus -1 (RP16) acpiprt9 at acpi0: bus -1 (RP01) acpiprt10 at acpi0: bus -1 (RP02) acpiprt11 at acpi0: bus -1 (RP03) acpiprt12 at acpi0: bus -1 (RP04) acpiprt13 at acpi0: bus -1 (RP05) acpiprt14 at acpi0: bus -1 (RP06) acpiprt15 at acpi0: bus 1 (RP07) acpiprt16 at acpi0: bus -1 (RP08) acpiprt17 at acpi0: bus -1 (RP17) acpiprt18 at acpi0: bus -1 (RP18) acpiprt19 at acpi0: bus -1 (RP19) acpiprt20 at acpi0: bus -1 (RP20) acpiprt21 at acpi0: bus -1 (RP21) acpiprt22 at acpi0: bus -1 (RP22) acpiprt23 at acpi0: bus -1 (RP23) acpiprt24 at acpi0: bus -1 (RP24) acpiprt25 at acpi0: bus -1 (RP25) acpiprt26 at acpi0: bus -1 (RP26) acpiprt27 at acpi0: bus -1 (RP27) acpiprt28 at acpi0: bus -1 (RP28) acpiec0 at acpi0: not present acpipci0 at acpi0 PC00: 0x00000010 0x00000011 0x00000000 com0 at acpi0 UAR1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo "ACPI000E" at acpi0 not configured pchgpio0 at acpi0 GPI0 addr 0xfd6e0000/0x10000 0xfd6d0000/0x10000 0xfd6a0000/0x10000 0xfd690000/0x10000 irq 14, 384 pins "INTC1023" at acpi0 not configured "INTC1024" at acpi0 not configured acpibtn0 at acpi0: SLPB acpicpu0 at acpi0: C3(200@1048 mwait.1@0x60), C2(350@127 mwait.1@0x21), C1(1000@1 mwait.1), PSS acpicpu1 at acpi0: C3(200@1048 mwait.1@0x60), C2(350@127 mwait.1@0x21), C1(1000@1 mwait.1), PSS acpicpu2 at acpi0: C3(200@1048 mwait.1@0x60), C2(350@127 mwait.1@0x21), C1(1000@1 mwait.1), PSS acpicpu3 at acpi0: C3(200@1048 mwait.1@0x60), C2(350@127 mwait.1@0x21), C1(1000@1 mwait.1), PSS "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured intelpmc0 at acpi0: PEPD state 0: 0x7f:1:2:0x00:0x0000000000000060 counter: 0x7f:64:0:0x00:0x0000000000000632 frequency: 0 state 1: 0x7f:1:2:0x00:0x0000000000000060 counter: 0x00:32:0:0x03:0x00000000fe00193c frequency: 8197 acpibtn1 at acpi0: PWRB tpm0 at acpi0 TPM_ 2.0 (CRB) addr 0xfed40000/0x5000, device 0x00000000 rev 0x0 "PNP0C0B" at acpi0 not configured "PNP0C0B" at acpi0 not configured "PNP0C0B" at acpi0 not configured "PNP0C0B" at acpi0 not configured "PNP0C0B" at acpi0 not configured acpipwrres0 at acpi0: BTRT acpipwrres1 at acpi0: WRST acpipwrres2 at acpi0: FN00, resource for FAN0 acpipwrres3 at acpi0: FN01, resource for FAN1 acpipwrres4 at acpi0: FN02, resource for FAN2 acpipwrres5 at acpi0: FN03, resource for FAN3 acpipwrres6 at acpi0: FN04, resource for FAN4 acpitz0 at acpi0: no critical temperature defined acpipwrres7 at acpi0: PIN_ acpivideo0 at acpi0: GFX0 acpivout0 at acpivideo0: DD1F acpivout1 at acpivideo0: DD2F cpu0: using VERW MDS workaround cpu0: Enhanced SpeedStep 3392 MHz: speeds: 801, 800, 700 MHz pci0 at mainbus0 bus 0 0:31:5: mem address conflict 0xfe010000/0x1000 pchb0 at pci0 dev 0 function 0 "Intel N100 Host" rev 0x00 inteldrm0 at pci0 dev 2 function 0 "Intel Graphics" rev 0x00 drm0 at inteldrm0 inteldrm0: msi, ALDERLAKE_P, gen 12 xhci0 at pci0 dev 20 function 0 "Intel ADL-N xHCI" rev 0x00: msi, xHCI 1.20 usb0 at xhci0: USB revision 3.0 uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 addr 1 "Intel ADL-N SRAM" rev 0x00 at pci0 dev 20 function 2 not configured iwx0 at pci0 dev 20 function 3 "Intel Wi-Fi 6 AX211" rev 0x00, msix dwiic0 at pci0 dev 21 function 0 "Intel ADL-N I2C" rev 0x00: apic 2 int 27 iic0 at dwiic0 dwiic1 at pci0 dev 21 function 1 "Intel ADL-N I2C" rev 0x00: apic 2 int 40 iic1 at dwiic1 "Intel ADL-N HECI" rev 0x00 at pci0 dev 22 function 0 not configured ahci0 at pci0 dev 23 function 0 "Intel ADL-N AHCI" rev 0x00: msi, AHCI 1.3.1 ahci0: PHY offline on port 1 scsibus1 at ahci0: 32 targets dwiic2 at pci0 dev 25 function 0 "Intel ADL-N I2C" rev 0x00: apic 2 int 31 iic2 at dwiic2 dwiic3 at pci0 dev 25 function 1 "Intel ADL-N I2C" rev 0x00: apic 2 int 32 iic3 at dwiic3 ppb0 at pci0 dev 28 function 0 "Intel ADL-N PCIE" rev 0x00: msi pci1 at ppb0 bus 1 re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x15: RTL8168H/8111H (0x5400), msi, address e8:ff:1e:d1:7c:46 rgephy0 at re0 phy 7: RTL8251, rev. 0 ppb1 at pci0 dev 29 function 0 "Intel ADL-N PCIE" rev 0x00: msi pci2 at ppb1 bus 2 nvme0 at pci2 dev 0 function 0 unknown vendor 0x1e4b product 0x1202 rev 0x01: msix, NVMe 1.4 nvme0: 512GB SSD, firmware SN12221, serial PAU138A020585 scsibus2 at nvme0: 2 targets, initiator 0 sd0 at scsibus2 targ 1 lun 0: <NVMe, 512GB SSD, SN12> sd0: 488386MB, 512 bytes/sector, 1000215216 sectors "Intel ADL-N UART" rev 0x00 at pci0 dev 30 function 0 not configured "Intel ADL-N GSPI" rev 0x00 at pci0 dev 30 function 3 not configured pcib0 at pci0 dev 31 function 0 "Intel ADL-N eSPI" rev 0x00 azalia0 at pci0 dev 31 function 3 "Intel ADL-N HD Audio" rev 0x00: msi azalia0: codecs: Realtek ALC897 audio0 at azalia0 ichiic0 at pci0 dev 31 function 4 "Intel ADL-N SMBus" rev 0x00: apic 2 int 16 iic4 at ichiic0 spdmem0 at iic4 addr 0x50: 16GB DDR4 SDRAM PC4-25600 SO-DIMM "Intel ADL-N SPI" rev 0x00 at pci0 dev 31 function 5 not configured isa0 at pcib0 isadma0 at isa0 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 vmm0 at mainbus0: VMX/EPT efifb at mainbus0 not configured uhidev0 at uhub0 port 3 configuration 1 interface 0 "PixArt Microsoft USB Optical Mouse" rev 1.10/1.00 addr 2 uhidev0: iclass 3/1 ums0 at uhidev0: 3 buttons, Z dir wsmouse0 at ums0 mux 0 uhidev1 at uhub0 port 4 configuration 1 interface 0 "Dell Dell Smart Card Reader Keyboard" rev 2.00/1.00 addr 3 uhidev1: iclass 3/1 ukbd0 at uhidev1: 8 variable keys, 6 key codes wskbd0 at ukbd0: console keyboard ugen0 at uhub0 port 4 configuration 1 "Dell Dell Smart Card Reader Keyboard" rev 2.00/1.00 addr 3 ugen1 at uhub0 port 10 "Intel Bluetooth" rev 2.01/0.02 addr 4 vscsi0 at root scsibus3 at vscsi0: 256 targets softraid0 at root scsibus4 at softraid0: 256 targets root on sd0a (90223823770b16de.a) swap on sd0b dump on sd0b drm:pid0:ct_send *ERROR* [drm] *ERROR* GT0: GUC: CT: No response for request 0x4000 (fence 1) drm:pid0:intel_guc_ct_send *ERROR* [drm] *ERROR* GT0: GUC: CT: Sending action 0x4000 failed (0xffffffffffffffc4e) status=0 drm:pid0:intel_huc_auth *ERROR* [drm] *ERROR* GT0: HuC: all workloads authentication failed 0xffffffffffffffc4e drm:pid87075:ct_handle_response *ERROR* [drm] *ERROR* GT0: GUC: CT: Unsolicited response message: len 1, data 0xf0000000 (fence 1, last 1) drm:pid87075:ct_handle_hxg *ERROR* [drm] *ERROR* GT0: GUC: CT: Failed to handle HXG message (0xfffffffffffffffee) 0xffff8000013076d8h drm:pid87075:ct_handle_msg *ERROR* [drm] *ERROR* GT0: GUC: CT: Failed to process CT message (0xfffffffffffffffee) 0xffff8000013076d4h inteldrm0: 1920x1080, 32bpp wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0 wsdisplay0: screen 1-5 added (std, vt100 emulation) iwx0: hw rev 0x370, fw 77.f92b5fed.0, address 74:3a:f4:b0:f3:ce