> Am 25.10.2023 um 14:32 schrieb Dave Voutila <d...@sisu.io>: > > > Mike Fischer <fischer+o...@lavielle.com> writes: > >> I have been observing occasional bouts of high load averages on >> several servers I administer and I am trying to find the cause. (I >> monitor these machines so that I can implement corrective measures in >> case of any malicious or abnormal activity. I think this is benign, >> but I’d still like to find the cause.) >> >> Once the high load average starts, only a reboot seems to (temporarily) >> return the values to their normal levels. >> >> The actual CPU usage (as measured by vmstat) stays low even if the load >> average is elevated. >> >> The servers are VMs running on a VMWare host (ESXi). This was seen with >> OpenBSD 7.3 and 7.4 amd64. >> >> I can not determine anything inside the VM that causes this. There >> seems to be no correlation to pfstat(8) graphs, log entries, known >> events, or anything else I can determine. restarting all of the rc.d >> services never made any difference. >> >> Could this be caused by something on the VMWare host machine? (The >> host seems to be operating at limit regarding RAM for example. But the >> VM is only using the normal percentage of its allocated RAM — way >> below 100% and very constant usage, no swap.) >> >> How can I further debug this, keeping in mind that these are production >> machines and experimentation is limited to benign things that don’t cause >> outages. >> > > Can you share a dmesg of one of the 7.4 vm? The output of `vmstat -iz` > might help narrow it down to a stuck interrupt. Also, try running > systat(1) and observe things as they happen.
dmesg follows. But the high load went away on the two affected machines. On one machine I did a reboot after installing the syspatches released today, on the other, which I left untouched on purpose, the load normalised by itself after almost a day. A third machine was not affected this time. So vmstat will probably not show anything interesting now: The rebooted machine: # vmstat -iz interrupt total rate irq96/acpi0 0 0 irq97/pciide0 123004 10 irq98/pciide0 0 0 irq114/em0 118842 9 irq99/ppb2 0 0 irq100/ppb3 0 0 irq101/ppb4 0 0 irq102/ppb5 0 0 irq103/ppb6 0 0 irq104/ppb7 0 0 irq105/ppb8 0 0 irq106/ppb9 0 0 irq107/ppb10 0 0 irq108/ppb11 0 0 irq109/ppb12 0 0 irq110/ppb13 0 0 irq111/ppb14 0 0 irq115/ppb15 0 0 irq116/ppb16 0 0 irq117/ppb17 0 0 irq118/ppb18 0 0 irq119/ppb19 0 0 irq120/ppb20 0 0 irq121/ppb21 0 0 irq122/ppb22 0 0 irq123/ppb23 0 0 irq124/ppb24 0 0 irq125/ppb25 0 0 irq126/ppb26 0 0 irq127/ppb27 0 0 irq128/ppb28 0 0 irq129/ppb29 0 0 irq130/ppb30 0 0 irq131/ppb31 0 0 irq132/ppb32 0 0 irq133/ppb33 0 0 irq144/pckbc0 0 0 irq145/pckbc0 0 0 irq0/clock 4894675 398 irq0/ipi 378105 30 Total 5514626 448 # The affected machine that I didn’t reboot: # vmstat -iz interrupt total rate irq96/acpi0 0 0 irq97/pciide0 2653816 21 irq98/pciide0 0 0 irq114/em0 2383849 19 irq99/ppb2 0 0 irq100/ppb3 0 0 irq101/ppb4 0 0 irq102/ppb5 0 0 irq103/ppb6 0 0 irq104/ppb7 0 0 irq105/ppb8 0 0 irq106/ppb9 0 0 irq107/ppb10 0 0 irq108/ppb11 0 0 irq109/ppb12 0 0 irq110/ppb13 0 0 irq111/ppb14 0 0 irq115/ppb15 0 0 irq116/ppb16 0 0 irq117/ppb17 0 0 irq118/ppb18 0 0 irq119/ppb19 0 0 irq120/ppb20 0 0 irq121/ppb21 0 0 irq122/ppb22 0 0 irq123/ppb23 0 0 irq124/ppb24 0 0 irq125/ppb25 0 0 irq126/ppb26 0 0 irq127/ppb27 0 0 irq128/ppb28 0 0 irq129/ppb29 0 0 irq130/ppb30 0 0 irq131/ppb31 0 0 irq132/ppb32 0 0 irq133/ppb33 0 0 irq144/pckbc0 14 0 irq145/pckbc0 0 0 irq0/clock 48652532 398 irq0/ipi 5468186 44 Total 59158397 484 # A machine that was not affected today: # vmstat -iz interrupt total rate irq96/acpi0 0 0 irq97/pciide0 378235 29 irq98/pciide0 0 0 irq114/em0 213348 16 irq99/ppb2 0 0 irq100/ppb3 0 0 irq101/ppb4 0 0 irq102/ppb5 0 0 irq103/ppb6 0 0 irq104/ppb7 0 0 irq105/ppb8 0 0 irq106/ppb9 0 0 irq107/ppb10 0 0 irq108/ppb11 0 0 irq109/ppb12 0 0 irq110/ppb13 0 0 irq111/ppb14 0 0 irq115/ppb15 0 0 irq116/ppb16 0 0 irq117/ppb17 0 0 irq118/ppb18 0 0 irq119/ppb19 0 0 irq120/ppb20 0 0 irq121/ppb21 0 0 irq122/ppb22 0 0 irq123/ppb23 0 0 irq124/ppb24 0 0 irq125/ppb25 0 0 irq126/ppb26 0 0 irq127/ppb27 0 0 irq128/ppb28 0 0 irq129/ppb29 0 0 irq130/ppb30 0 0 irq131/ppb31 0 0 irq132/ppb32 0 0 irq133/ppb33 0 0 irq144/pckbc0 0 0 irq145/pckbc0 0 0 irq0/clock 5134964 398 irq0/ipi 870473 67 Total 6597020 511 # systat is boring at the moment as well, but I’ll keep it mind for the next time this happens. The machines are very similar (RAM, CPU, storage, OS version), but here is the one I rebooted: OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42 MDT 2023 r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 4277010432 (4078MB) avail mem = 4127657984 (3936MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xe0010 (242 entries) bios0: vendor Phoenix Technologies LTD version "6.00" date 12/12/2018 bios0: VMware, Inc. VMware Virtual Platform acpi0 at bios0: ACPI 4.0 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP BOOT APIC MCFG SRAT HPET WAET acpi0: wakeup devices PCI0(S3) USB_(S1) P2P0(S3) S1F0(S3) S2F0(S3) S8F0(S3) S16F(S3) S18F(S3) S22F(S3) S23F(S3) S24F(S3) S25F(S3) PE40(S3) S1F0(S3) PE50(S3) S1F0(S3) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, 2100.20 MHz, 06-3a-00 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,ARAT,RSBA,SKIP_L1DFL,MELTDOWN cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 66MHz cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, 2100.32 MHz, 06-3a-00 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,ARAT,RSBA,SKIP_L1DFL,MELTDOWN cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache cpu1: smt 0, core 0, package 2 ioapic0 at mainbus0: apid 1 pa 0xfec00000, version 11, 24 pins acpimcfg0 at acpi0 acpimcfg0: addr 0xf0000000, bus 0-127 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001 acpicmos0 at acpi0 "PNP0A05" at acpi0 not configured acpiac0 at acpi0: AC unit online acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) cpu0: using VERW MDS workaround pvbus0 at mainbus0: VMware vmt0 at pvbus0 pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel 82443BX AGP" rev 0x01 ppb0 at pci0 dev 1 function 0 "Intel 82443BX AGP" rev 0x01 pci1 at ppb0 bus 1 pcib0 at pci0 dev 7 function 0 "Intel 82371AB PIIX4 ISA" rev 0x08 pciide0 at pci0 dev 7 function 1 "Intel 82371AB IDE" rev 0x01: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility wd0 at pciide0 channel 0 drive 0: <VMware Virtual IDE Hard Drive> wd0: 64-sector PIO, LBA48, 256000MB, 524288000 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 atapiscsi0 at pciide0 channel 1 drive 0 scsibus1 at atapiscsi0: 2 targets cd0 at scsibus1 targ 0 lun 0: <NECVMWar, VMware IDE CDR10, 1.00> removable cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 piixpm0 at pci0 dev 7 function 3 "Intel 82371AB Power" rev 0x08: SMBus disabled "VMware VMCI" rev 0x10 at pci0 dev 7 function 7 not configured vga1 at pci0 dev 15 function 0 "VMware SVGA II" rev 0x00 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) ppb1 at pci0 dev 17 function 0 "VMware PCI" rev 0x02 pci2 at ppb1 bus 2 em0 at pci2 dev 0 function 0 "Intel 82545EM" rev 0x01: apic 1 int 18, address 00:50:56:a5:4b:67 ppb2 at pci0 dev 21 function 0 "VMware PCIE" rev 0x01: msi pci3 at ppb2 bus 3 ppb3 at pci0 dev 21 function 1 "VMware PCIE" rev 0x01: msi pci4 at ppb3 bus 4 ppb4 at pci0 dev 21 function 2 "VMware PCIE" rev 0x01: msi pci5 at ppb4 bus 5 ppb5 at pci0 dev 21 function 3 "VMware PCIE" rev 0x01: msi pci6 at ppb5 bus 6 ppb6 at pci0 dev 21 function 4 "VMware PCIE" rev 0x01: msi pci7 at ppb6 bus 7 ppb7 at pci0 dev 21 function 5 "VMware PCIE" rev 0x01: msi pci8 at ppb7 bus 8 ppb8 at pci0 dev 21 function 6 "VMware PCIE" rev 0x01: msi pci9 at ppb8 bus 9 ppb9 at pci0 dev 21 function 7 "VMware PCIE" rev 0x01: msi pci10 at ppb9 bus 10 ppb10 at pci0 dev 22 function 0 "VMware PCIE" rev 0x01: msi pci11 at ppb10 bus 11 ppb11 at pci0 dev 22 function 1 "VMware PCIE" rev 0x01: msi pci12 at ppb11 bus 12 ppb12 at pci0 dev 22 function 2 "VMware PCIE" rev 0x01: msi pci13 at ppb12 bus 13 ppb13 at pci0 dev 22 function 3 "VMware PCIE" rev 0x01: msi pci14 at ppb13 bus 14 ppb14 at pci0 dev 22 function 4 "VMware PCIE" rev 0x01: msi pci15 at ppb14 bus 15 ppb15 at pci0 dev 22 function 5 "VMware PCIE" rev 0x01: msi pci16 at ppb15 bus 16 ppb16 at pci0 dev 22 function 6 "VMware PCIE" rev 0x01: msi pci17 at ppb16 bus 17 ppb17 at pci0 dev 22 function 7 "VMware PCIE" rev 0x01: msi pci18 at ppb17 bus 18 ppb18 at pci0 dev 23 function 0 "VMware PCIE" rev 0x01: msi pci19 at ppb18 bus 19 ppb19 at pci0 dev 23 function 1 "VMware PCIE" rev 0x01: msi pci20 at ppb19 bus 20 ppb20 at pci0 dev 23 function 2 "VMware PCIE" rev 0x01: msi pci21 at ppb20 bus 21 ppb21 at pci0 dev 23 function 3 "VMware PCIE" rev 0x01: msi pci22 at ppb21 bus 22 ppb22 at pci0 dev 23 function 4 "VMware PCIE" rev 0x01: msi pci23 at ppb22 bus 23 ppb23 at pci0 dev 23 function 5 "VMware PCIE" rev 0x01: msi pci24 at ppb23 bus 24 ppb24 at pci0 dev 23 function 6 "VMware PCIE" rev 0x01: msi pci25 at ppb24 bus 25 ppb25 at pci0 dev 23 function 7 "VMware PCIE" rev 0x01: msi pci26 at ppb25 bus 26 ppb26 at pci0 dev 24 function 0 "VMware PCIE" rev 0x01: msi pci27 at ppb26 bus 27 ppb27 at pci0 dev 24 function 1 "VMware PCIE" rev 0x01: msi pci28 at ppb27 bus 28 ppb28 at pci0 dev 24 function 2 "VMware PCIE" rev 0x01: msi pci29 at ppb28 bus 29 ppb29 at pci0 dev 24 function 3 "VMware PCIE" rev 0x01: msi pci30 at ppb29 bus 30 ppb30 at pci0 dev 24 function 4 "VMware PCIE" rev 0x01: msi pci31 at ppb30 bus 31 ppb31 at pci0 dev 24 function 5 "VMware PCIE" rev 0x01: msi pci32 at ppb31 bus 32 ppb32 at pci0 dev 24 function 6 "VMware PCIE" rev 0x01: msi pci33 at ppb32 bus 33 ppb33 at pci0 dev 24 function 7 "VMware PCIE" rev 0x01: msi pci34 at ppb33 bus 34 isa0 at pcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pckbd0 at pckbc0 (kbd slot) wskbd0 at pckbd0: console keyboard, using wsdisplay0 pms0 at pckbc0 (aux slot) wsmouse0 at pms0 mux 0 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 vscsi0 at root scsibus2 at vscsi0: 256 targets softraid0 at root scsibus3 at softraid0: 256 targets root on wd0a (8bb2ebc939040c08.a) swap on wd0b dump on wd0b Thanks! Mike