> Am 25.10.2023 um 14:32 schrieb Dave Voutila <d...@sisu.io>:
> 
> 
> Mike Fischer <fischer+o...@lavielle.com> writes:
> 
>> I have been observing occasional bouts of high load averages on
>> several servers I administer and I am trying to find the cause. (I
>> monitor these machines so that I can implement corrective measures in
>> case of any malicious or abnormal activity. I think this is benign,
>> but I’d still like to find the cause.)
>> 
>> Once the high load average starts, only a reboot seems to (temporarily) 
>> return the values to their normal levels.
>> 
>> The actual CPU usage (as measured by vmstat) stays low even if the load 
>> average is elevated.
>> 
>> The servers are VMs running on a VMWare host (ESXi). This was seen with 
>> OpenBSD 7.3 and 7.4 amd64.
>> 
>> I can not determine anything inside the VM that causes this. There
>> seems to be no correlation to pfstat(8) graphs, log entries, known
>> events, or anything else I can determine. restarting all of the rc.d
>> services never made any difference.
>> 
>> Could this be caused by something on the VMWare host machine? (The
>> host seems to be operating at limit regarding RAM for example. But the
>> VM is only using the normal percentage of its allocated RAM — way
>> below 100% and very constant usage, no swap.)
>> 
>> How can I further debug this, keeping in mind that these are production 
>> machines and experimentation is limited to benign things that don’t cause 
>> outages.
>> 
> 
> Can you share a dmesg of one of the 7.4 vm? The output of `vmstat -iz`
> might help narrow it down to a stuck interrupt. Also, try running
> systat(1) and observe things as they happen.

dmesg follows. But the high load went away on the two affected machines. On one 
machine I did a reboot after installing the syspatches released today, on the 
other, which I left untouched on purpose, the load normalised by itself after 
almost a day. A third machine was not affected this time. So vmstat will 
probably not show anything interesting now:

The rebooted machine:
# vmstat -iz
interrupt                       total     rate
irq96/acpi0                         0        0
irq97/pciide0                  123004       10
irq98/pciide0                       0        0
irq114/em0                     118842        9
irq99/ppb2                          0        0
irq100/ppb3                         0        0
irq101/ppb4                         0        0
irq102/ppb5                         0        0
irq103/ppb6                         0        0
irq104/ppb7                         0        0
irq105/ppb8                         0        0
irq106/ppb9                         0        0
irq107/ppb10                        0        0
irq108/ppb11                        0        0
irq109/ppb12                        0        0
irq110/ppb13                        0        0
irq111/ppb14                        0        0
irq115/ppb15                        0        0
irq116/ppb16                        0        0
irq117/ppb17                        0        0
irq118/ppb18                        0        0
irq119/ppb19                        0        0
irq120/ppb20                        0        0
irq121/ppb21                        0        0
irq122/ppb22                        0        0
irq123/ppb23                        0        0
irq124/ppb24                        0        0
irq125/ppb25                        0        0
irq126/ppb26                        0        0
irq127/ppb27                        0        0
irq128/ppb28                        0        0
irq129/ppb29                        0        0
irq130/ppb30                        0        0
irq131/ppb31                        0        0
irq132/ppb32                        0        0
irq133/ppb33                        0        0
irq144/pckbc0                       0        0
irq145/pckbc0                       0        0
irq0/clock                    4894675      398
irq0/ipi                       378105       30
Total                         5514626      448
# 

The affected machine that I didn’t reboot:
# vmstat -iz
interrupt                       total     rate
irq96/acpi0                         0        0
irq97/pciide0                 2653816       21
irq98/pciide0                       0        0
irq114/em0                    2383849       19
irq99/ppb2                          0        0
irq100/ppb3                         0        0
irq101/ppb4                         0        0
irq102/ppb5                         0        0
irq103/ppb6                         0        0
irq104/ppb7                         0        0
irq105/ppb8                         0        0
irq106/ppb9                         0        0
irq107/ppb10                        0        0
irq108/ppb11                        0        0
irq109/ppb12                        0        0
irq110/ppb13                        0        0
irq111/ppb14                        0        0
irq115/ppb15                        0        0
irq116/ppb16                        0        0
irq117/ppb17                        0        0
irq118/ppb18                        0        0
irq119/ppb19                        0        0
irq120/ppb20                        0        0
irq121/ppb21                        0        0
irq122/ppb22                        0        0
irq123/ppb23                        0        0
irq124/ppb24                        0        0
irq125/ppb25                        0        0
irq126/ppb26                        0        0
irq127/ppb27                        0        0
irq128/ppb28                        0        0
irq129/ppb29                        0        0
irq130/ppb30                        0        0
irq131/ppb31                        0        0
irq132/ppb32                        0        0
irq133/ppb33                        0        0
irq144/pckbc0                      14        0
irq145/pckbc0                       0        0
irq0/clock                   48652532      398
irq0/ipi                      5468186       44
Total                        59158397      484
# 

A machine that was not affected today:
# vmstat -iz
interrupt                       total     rate
irq96/acpi0                         0        0
irq97/pciide0                  378235       29
irq98/pciide0                       0        0
irq114/em0                     213348       16
irq99/ppb2                          0        0
irq100/ppb3                         0        0
irq101/ppb4                         0        0
irq102/ppb5                         0        0
irq103/ppb6                         0        0
irq104/ppb7                         0        0
irq105/ppb8                         0        0
irq106/ppb9                         0        0
irq107/ppb10                        0        0
irq108/ppb11                        0        0
irq109/ppb12                        0        0
irq110/ppb13                        0        0
irq111/ppb14                        0        0
irq115/ppb15                        0        0
irq116/ppb16                        0        0
irq117/ppb17                        0        0
irq118/ppb18                        0        0
irq119/ppb19                        0        0
irq120/ppb20                        0        0
irq121/ppb21                        0        0
irq122/ppb22                        0        0
irq123/ppb23                        0        0
irq124/ppb24                        0        0
irq125/ppb25                        0        0
irq126/ppb26                        0        0
irq127/ppb27                        0        0
irq128/ppb28                        0        0
irq129/ppb29                        0        0
irq130/ppb30                        0        0
irq131/ppb31                        0        0
irq132/ppb32                        0        0
irq133/ppb33                        0        0
irq144/pckbc0                       0        0
irq145/pckbc0                       0        0
irq0/clock                    5134964      398
irq0/ipi                       870473       67
Total                         6597020      511
# 

systat is boring at the moment as well, but I’ll keep it mind for the next time 
this happens.


The machines are very similar (RAM, CPU, storage, OS version), but here is the 
one I rebooted:

OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42 MDT 2023
    
r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4277010432 (4078MB)
avail mem = 4127657984 (3936MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xe0010 (242 entries)
bios0: vendor Phoenix Technologies LTD version "6.00" date 12/12/2018
bios0: VMware, Inc. VMware Virtual Platform
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP BOOT APIC MCFG SRAT HPET WAET
acpi0: wakeup devices PCI0(S3) USB_(S1) P2P0(S3) S1F0(S3) S2F0(S3) S8F0(S3) 
S16F(S3) S18F(S3) S22F(S3) S23F(S3) S24F(S3) S25F(S3) PE40(S3) S1F0(S3) 
PE50(S3) S1F0(S3) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, 2100.20 MHz, 06-3a-00
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,ARAT,RSBA,SKIP_L1DFL,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 20MB 64b/line 20-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 66MHz
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, 2100.32 MHz, 06-3a-00
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,ARAT,RSBA,SKIP_L1DFL,MELTDOWN
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 20MB 64b/line 20-way L3 cache
cpu1: smt 0, core 0, package 2
ioapic0 at mainbus0: apid 1 pa 0xfec00000, version 11, 24 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xf0000000, bus 0-127
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
acpicmos0 at acpi0
"PNP0A05" at acpi0 not configured
acpiac0 at acpi0: AC unit online
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
cpu0: using VERW MDS workaround
pvbus0 at mainbus0: VMware
vmt0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82443BX AGP" rev 0x01
ppb0 at pci0 dev 1 function 0 "Intel 82443BX AGP" rev 0x01
pci1 at ppb0 bus 1
pcib0 at pci0 dev 7 function 0 "Intel 82371AB PIIX4 ISA" rev 0x08
pciide0 at pci0 dev 7 function 1 "Intel 82371AB IDE" rev 0x01: DMA, channel 0 
configured to compatibility, channel 1 configured to compatibility
wd0 at pciide0 channel 0 drive 0: <VMware Virtual IDE Hard Drive>
wd0: 64-sector PIO, LBA48, 256000MB, 524288000 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <NECVMWar, VMware IDE CDR10, 1.00> removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
piixpm0 at pci0 dev 7 function 3 "Intel 82371AB Power" rev 0x08: SMBus disabled
"VMware VMCI" rev 0x10 at pci0 dev 7 function 7 not configured
vga1 at pci0 dev 15 function 0 "VMware SVGA II" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ppb1 at pci0 dev 17 function 0 "VMware PCI" rev 0x02
pci2 at ppb1 bus 2
em0 at pci2 dev 0 function 0 "Intel 82545EM" rev 0x01: apic 1 int 18, address 
00:50:56:a5:4b:67
ppb2 at pci0 dev 21 function 0 "VMware PCIE" rev 0x01: msi
pci3 at ppb2 bus 3
ppb3 at pci0 dev 21 function 1 "VMware PCIE" rev 0x01: msi
pci4 at ppb3 bus 4
ppb4 at pci0 dev 21 function 2 "VMware PCIE" rev 0x01: msi
pci5 at ppb4 bus 5
ppb5 at pci0 dev 21 function 3 "VMware PCIE" rev 0x01: msi
pci6 at ppb5 bus 6
ppb6 at pci0 dev 21 function 4 "VMware PCIE" rev 0x01: msi
pci7 at ppb6 bus 7
ppb7 at pci0 dev 21 function 5 "VMware PCIE" rev 0x01: msi
pci8 at ppb7 bus 8
ppb8 at pci0 dev 21 function 6 "VMware PCIE" rev 0x01: msi
pci9 at ppb8 bus 9
ppb9 at pci0 dev 21 function 7 "VMware PCIE" rev 0x01: msi
pci10 at ppb9 bus 10
ppb10 at pci0 dev 22 function 0 "VMware PCIE" rev 0x01: msi
pci11 at ppb10 bus 11
ppb11 at pci0 dev 22 function 1 "VMware PCIE" rev 0x01: msi
pci12 at ppb11 bus 12
ppb12 at pci0 dev 22 function 2 "VMware PCIE" rev 0x01: msi
pci13 at ppb12 bus 13
ppb13 at pci0 dev 22 function 3 "VMware PCIE" rev 0x01: msi
pci14 at ppb13 bus 14
ppb14 at pci0 dev 22 function 4 "VMware PCIE" rev 0x01: msi
pci15 at ppb14 bus 15
ppb15 at pci0 dev 22 function 5 "VMware PCIE" rev 0x01: msi
pci16 at ppb15 bus 16
ppb16 at pci0 dev 22 function 6 "VMware PCIE" rev 0x01: msi
pci17 at ppb16 bus 17
ppb17 at pci0 dev 22 function 7 "VMware PCIE" rev 0x01: msi
pci18 at ppb17 bus 18
ppb18 at pci0 dev 23 function 0 "VMware PCIE" rev 0x01: msi
pci19 at ppb18 bus 19
ppb19 at pci0 dev 23 function 1 "VMware PCIE" rev 0x01: msi
pci20 at ppb19 bus 20
ppb20 at pci0 dev 23 function 2 "VMware PCIE" rev 0x01: msi
pci21 at ppb20 bus 21
ppb21 at pci0 dev 23 function 3 "VMware PCIE" rev 0x01: msi
pci22 at ppb21 bus 22
ppb22 at pci0 dev 23 function 4 "VMware PCIE" rev 0x01: msi
pci23 at ppb22 bus 23
ppb23 at pci0 dev 23 function 5 "VMware PCIE" rev 0x01: msi
pci24 at ppb23 bus 24
ppb24 at pci0 dev 23 function 6 "VMware PCIE" rev 0x01: msi
pci25 at ppb24 bus 25
ppb25 at pci0 dev 23 function 7 "VMware PCIE" rev 0x01: msi
pci26 at ppb25 bus 26
ppb26 at pci0 dev 24 function 0 "VMware PCIE" rev 0x01: msi
pci27 at ppb26 bus 27
ppb27 at pci0 dev 24 function 1 "VMware PCIE" rev 0x01: msi
pci28 at ppb27 bus 28
ppb28 at pci0 dev 24 function 2 "VMware PCIE" rev 0x01: msi
pci29 at ppb28 bus 29
ppb29 at pci0 dev 24 function 3 "VMware PCIE" rev 0x01: msi
pci30 at ppb29 bus 30
ppb30 at pci0 dev 24 function 4 "VMware PCIE" rev 0x01: msi
pci31 at ppb30 bus 31
ppb31 at pci0 dev 24 function 5 "VMware PCIE" rev 0x01: msi
pci32 at ppb31 bus 32
ppb32 at pci0 dev 24 function 6 "VMware PCIE" rev 0x01: msi
pci33 at ppb32 bus 33
ppb33 at pci0 dev 24 function 7 "VMware PCIE" rev 0x01: msi
pci34 at ppb33 bus 34
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on wd0a (8bb2ebc939040c08.a) swap on wd0b dump on wd0b

Thanks!
Mike

Reply via email to