Answering myself: Yesterday we had an issue with an OpenBSD 7.5 stable X86 virtual server at Hetzner. After running syspatch and rebooting, the machine did not come back up! 2 other similar machines at Hetzner, albeit in a different datacenter, previously worked fine. (Of course the problem popped up with the most critical server, thanks Murphy ;-)
The symptoms: • With the Hetzner web console open I rebooted the server using `# reboot`. • The shutdown phase worked fine. • Almost immediately after starting the boot in the console the console disconnected. I’m not sure if I even saw the boot> prompt. Reconnecting to the console was not possible and the machine did not come back up. Repeated tries yielded the same result. After starting mitigation steps (created a snapshot, created a new server at a datacenter where another host worked fine and restoring the snapshot to that server, then dealing with the changed IPs, etc.) I opened a support ticket. Hetzner support answered that they were in the process of rolling out a patched version of QEMU which would fix the issue. They acknowledged that the problem was related to the issue Thomas Siegmund reported here. Indeed the server that had issues automatically came back up. And a manual reboot now also works fine. So it seems this issue is now fixed. Mike > Am 08.09.2024 um 19:36 schrieb Mike Fischer <fischer+o...@lavielle.com>: > > Is this still an issue? > > We run several OpenBSD 7.5 VMs at Hetzner and I am somewhat concerned about > the machines not coming back up after a reboot. > > So I created a new CX22 host, installed OpenBSD 7.5 and it seems to work fine. > > dmesg: > OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 4177379328 (3983MB) > avail mem = 4029792256 (3843MB) > random: good seed from bootblocks > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5210 (11 entries) > bios0: vendor Hetzner version "20171111" date 11/11/2017 > bios0: Hetzner vServer > acpi0 at bios0: ACPI 3.0 > acpi0: sleep states S5 > acpi0: tables DSDT FACP APIC HPET MCFG WAET > acpi0: wakeup devices > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel Xeon Processor (Skylake, IBRS, no TSX), 2100.74 MHz, 06-55-04 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLWB,AVX512CD,AVX512BW,AVX512VL,PKU,MD_CLEAR,IBRS,IBPB,SSBD,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 4MB 64b/line > 16-way L2 cache, 16MB 64b/line 16-way L3 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 1000MHz > cpu1 at mainbus0: apid 1 (application processor) > cpu1: Intel Xeon Processor (Skylake, IBRS, no TSX), 2133.33 MHz, 06-55-04 > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLWB,AVX512CD,AVX512BW,AVX512VL,PKU,MD_CLEAR,IBRS,IBPB,SSBD,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN > cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 4MB 64b/line > 16-way L2 cache, 16MB 64b/line 16-way L3 cache > cpu1: smt 0, core 1, package 0 > ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 11, 24 pins > acpihpet0 at acpi0: 100000000 Hz > acpimcfg0 at acpi0 > acpimcfg0: addr 0xb0000000, bus 0-255 > acpiprt0 at acpi0: bus 0 (PCI0) > "ACPI0006" at acpi0 not configured > acpipci0 at acpi0 PCI0: 0x00000010 0x00000011 0x00000000 > com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo > acpicmos0 at acpi0 > "PNP0A06" at acpi0 not configured > "PNP0A06" at acpi0 not configured > "PNP0A06" at acpi0 not configured > "QEMU0002" at acpi0 not configured > "ACPI0010" at acpi0 not configured > acpicpu0 at acpi0: C1(@1 halt!) > acpicpu1 at acpi0: C1(@1 halt!) > cpu0: using VERW MDS workaround > pvbus0 at mainbus0: KVM > pvclock0 at pvbus0 > pci0 at mainbus0 bus 0 > pchb0 at pci0 dev 0 function 0 "Intel 82G33 Host" rev 0x00 > vga1 at pci0 dev 1 function 0 "Qumranet Virtio 1.x GPU" rev 0x01 > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) > wsdisplay0: screen 1-5 added (80x25, vt100 emulation) > ppb0 at pci0 dev 2 function 0 "Red Hat PCIE" rev 0x00: apic 0 int 22 > pci1 at ppb0 bus 1 > virtio0 at pci1 dev 0 function 0 "Qumranet Virtio 1.x Network" rev 0x01 > vio0 at virtio0: address 96:00:03:b0:6b:55 > virtio0: msix per-VQ > ppb1 at pci0 dev 2 function 1 "Red Hat PCIE" rev 0x00: apic 0 int 22 > pci2 at ppb1 bus 2 > xhci0 at pci2 dev 0 function 0 "Red Hat xHCI" rev 0x01: msix, xHCI 0.0 > usb0 at xhci0: USB revision 3.0 > uhub0 at usb0 configuration 1 interface 0 "Red Hat xHCI root hub" rev > 3.00/1.00 addr 1 > ppb2 at pci0 dev 2 function 2 "Red Hat PCIE" rev 0x00: apic 0 int 22 > pci3 at ppb2 bus 3 > virtio1 at pci3 dev 0 function 0 "Qumranet Virtio 1.x Console" rev 0x01 > virtio1: no matching child driver; not configured > ppb3 at pci0 dev 2 function 3 "Red Hat PCIE" rev 0x00: apic 0 int 22 > pci4 at ppb3 bus 4 > virtio2 at pci4 dev 0 function 0 "Qumranet Virtio 1.x Memory Balloon" rev 0x01 > viomb0 at virtio2 > virtio2: apic 0 int 22 > ppb4 at pci0 dev 2 function 4 "Red Hat PCIE" rev 0x00: apic 0 int 22 > pci5 at ppb4 bus 5 > virtio3 at pci5 dev 0 function 0 "Qumranet Virtio 1.x RNG" rev 0x01 > viornd0 at virtio3 > virtio3: apic 0 int 22 > ppb5 at pci0 dev 2 function 5 "Red Hat PCIE" rev 0x00: apic 0 int 22 > pci6 at ppb5 bus 6 > virtio4 at pci6 dev 0 function 0 "Qumranet Virtio 1.x SCSI" rev 0x01 > vioscsi0 at virtio4: qsize 256 > scsibus1 at vioscsi0: 255 targets > sd0 at scsibus1 targ 0 lun 0: <QEMU, QEMU HARDDISK, 2.5+> > sd0: 39064MB, 512 bytes/sector, 80003072 sectors, thin > virtio4: msix per-VQ > ppb6 at pci0 dev 2 function 6 "Red Hat PCIE" rev 0x00: apic 0 int 22 > pci7 at ppb6 bus 7 > ppb7 at pci0 dev 2 function 7 "Red Hat PCIE" rev 0x00: apic 0 int 22 > pci8 at ppb7 bus 8 > ppb8 at pci0 dev 3 function 0 "Red Hat PCIE" rev 0x00: apic 0 int 23 > pci9 at ppb8 bus 9 > pcib0 at pci0 dev 31 function 0 "Intel 82801IB LPC" rev 0x02 > ahci0 at pci0 dev 31 function 2 "Intel 82801I AHCI" rev 0x02: msi, AHCI 1.0 > ahci0: port 0: 1.5Gb/s > scsibus2 at ahci0: 32 targets > cd0 at scsibus2 targ 0 lun 0: <QEMU, QEMU DVD-ROM, 2.5+> removable > ichiic0 at pci0 dev 31 function 3 "Intel 82801I SMBus" rev 0x02: apic 0 int 16 > iic0 at ichiic0 > isa0 at pcib0 > isadma0 at isa0 > pckbc0 at isa0 port 0x60/5 irq 1 irq 12 > pckbd0 at pckbc0 (kbd slot) > wskbd0 at pckbd0: console keyboard, using wsdisplay0 > pms0 at pckbc0 (aux slot) > wsmouse0 at pms0 mux 0 > pcppi0 at isa0 port 0x61 > spkr0 at pcppi0 > uhidev0 at uhub0 port 5 configuration 1 interface 0 "QEMU QEMU USB Tablet" > rev 2.00/0.00 addr 2 > uhidev0: iclass 3/0 > ums0 at uhidev0: 3 buttons, Z dir > wsmouse1 at ums0 mux 0 > vscsi0 at root > scsibus3 at vscsi0: 256 targets > softraid0 at root > scsibus4 at softraid0: 256 targets > root on sd0a (cb7ffb503da3a7aa.a) swap on sd0b dump on sd0b > > That doesn’t prove anything other than that the problem does not seem to be > universal for all VMs at Hetzner. > > > Thanks! > Mike > >> Am 06.09.2024 um 22:32 schrieb Stefan Fritsch <s...@openbsd.org>: >> >> Hi, >> >> I accidentally replied only to the sender. Here is the reply to the list. >> >> On Fri, 6 Sep 2024, Thomas Siegmund wrote: >>> We at SEPPmail have been using OpenBSD as the operating system for our >>> SEPPmail mail gateway since >>> the company was founded. >>> For two years we have been developing and operating a cloud service for >>> mail processing, >>> encryption/decryption, spam filtering and so on. >>> >>> Yesterday we had an incident at Hetzner, a German data center operator. >>> They upgraded their QEMU >>> version (we assume to 8.2.3, but they are not willing to tell us the exact >>> version) and this caused >>> all OpenBSD 7.5 appliances to fail to boot (short boot up to initialize the >>> Virtio network interface >>> or the Vioscsi driver and then the appliance just shuts down). >>> We found out that this was caused by the Virtio drivers if_vio and vioscsi. >>> By undoing one of your >>> commits >>> https://github.com/openbsd/src/commit/cdd248411fe303b936d5a056fde97097bd7015f0 >>> "virtio: Set >>> DRIVER_OK earlier” we were able to get our service up and running again. >>> Hetzner themselves found out that it could be a problem related to the QEMU >>> commit >>> https://gitlab.com/qemu-project/qemu/-/commit/fcbb086ae590e910614fe5b8bf76e264f71ef304. >>> >>> Since I am not familiar with the OpenBSD kernel and the virtio system, I am >>> not able to figure out >>> what is really causing the crash/shutdown. >>> Maybe you can figure it out and this information will be helpful. >> >> I have seen a qemu crash related to virtio with openbsd on some qemu 9.1 >> rc, I think rc2. This commit fixed it: >> https://gitlab.com/qemu-project/qemu/-/commit/a8e63ff289d137197ad7a701a587cc432872d798 >> >> I have also tried an older version and that had the same issue, I believe >> that was some 9.0.2 package from debian. The fix is in qemu 9.1 release. I >> think it likely that this is your issue. Maybe you can ask Hetzner if they >> can backport the fix to their build? >> >> I don't think there is a bug in OpenBSD in this. The OpenBSD commit you >> mentioned fixes various problems, both with qemu and with other >> hypervisors. We definitely don't want to revert that commit. >> >> Cheers, >> Stefan > > -- > Mike Fischer > fisc...@lavielle.com >