Dear Kevin, Sorry for delayed replying. This patch works for me well. Thanks a lot!
Recently, I found another odd thing. A qemu-kvm VM is stuck at the SeaBIOS after self-rebooting many times. Analyzing the SeaBIOS log attached below, I think there maybe someting wrong from this block of code: /src/fw/smp.c u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; while (cmos_smp_count != CountCPUs) asm volatile( // Release lock and allow other processors to use the stack. " movl %%esp, %1\n" " movl $0, %0\n" // Reacquire lock and take back ownership of stack. "1:rep ; nop\n" " lock btsl $0, %0\n" " jc 1b\n" : "+m" (SMPLock), "+m" (SMPStack) : : "cc", "memory"); yield(); It seems if SeaBIOS read an incorrect number sometimes from QEMU through cmos 0x5f,the SeaBIOS really may be stucked. So, i wonder what may cause this problem after a VM self-rebooting many times? ================bad SeaBIOS log=========== [2015-11-13 18:45:58] In resume (status=0) [2015-11-13 18:45:58] In 32bit resume [2015-11-13 18:45:58] Attempting a hard reboot [2015-11-13 18:46:00] SeaBIOS (version rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org) [2015-11-13 18:46:00] No Xen hypervisor found. [2015-11-13 18:46:00] Running on QEMU (i440fx) [2015-11-13 18:46:00] Running on KVM [2015-11-13 18:46:00] RamSize: 0xc0000000 [cmos] [2015-11-13 18:46:00] Relocating init from 0x000de8f0 to 0xbffaec00 (size 70464) [2015-11-13 18:46:00] Found QEMU fw_cfg [2015-11-13 18:46:00] RamBlock: addr 0x0000000000000000 len 0x00000000c0000000 [e820] [2015-11-13 18:46:00] RamBlock: addr 0x0000000100000000 len 0x0000000340000000 [e820] [2015-11-13 18:46:00] Moving pm_base to 0x600 [2015-11-13 18:46:00] boot order: [2015-11-13 18:46:00] 1: /pci@i0cf8/scsi@e/disk@0,0 [2015-11-13 18:46:00] 2: HALT [2015-11-13 18:46:00] CPU Mhz=2402 [2015-11-13 18:46:00] === PCI bus & bridge init === [2015-11-13 18:46:00] PCI: pci_bios_init_bus_rec bus = 0x0 [2015-11-13 18:46:00] === PCI device probing === [2015-11-13 18:46:00] Found 21 PCI devices (max PCI bus is 00) [2015-11-13 18:46:00] === PCI new allocation pass #1 === [2015-11-13 18:46:00] PCI: check devices [2015-11-13 18:46:00] === PCI new allocation pass #2 === [2015-11-13 18:46:00] PCI: IO: c000 - c1cf [2015-11-13 18:46:00] PCI: 32: 00000000c0000000 - 00000000fec00000 [2015-11-13 18:46:00] PCI: map device bdf=00:1f.0 bar 0, addr 0000c000, size 00000100 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:0e.0 bar 0, addr 0000c100, size 00000040 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:0f.0 bar 0, addr 0000c140, size 00000040 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:01.2 bar 4, addr 0000c180, size 00000020 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:0d.0 bar 0, addr 0000c1a0, size 00000020 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:01.1 bar 4, addr 0000c1c0, size 00000010 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 6, addr febe0000, size 00010000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 1, addr febf0000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:0d.0 bar 1, addr febf1000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:0e.0 bar 1, addr febf2000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:0f.0 bar 1, addr febf3000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:10.0 bar 0, addr febf4000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 0, addr f6000000, size 02000000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:10.0 bar 2, addr f8000000, size 01000000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:03.0 bar 2, addr f9000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:04.0 bar 2, addr f9800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:05.0 bar 2, addr fa000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:06.0 bar 2, addr fa800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:07.0 bar 2, addr fb000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:08.0 bar 2, addr fb800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:09.0 bar 2, addr fc000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:0a.0 bar 2, addr fc800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:0b.0 bar 2, addr fd000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:0c.0 bar 2, addr fd800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: init bdf=00:00.0 id=8086:1237 [2015-11-13 18:46:00] PCI: init bdf=00:01.0 id=8086:7000 [2015-11-13 18:46:00] PIIX3/PIIX4 init: elcr=00 0c [2015-11-13 18:46:00] PCI: init bdf=00:01.1 id=8086:7010 [2015-11-13 18:46:00] PCI: init bdf=00:01.2 id=8086:7020 [2015-11-13 18:46:00] PCI: init bdf=00:01.3 id=8086:7113 [2015-11-13 18:46:00] Using pmtimer, ioport 0x608 [2015-11-13 18:46:00] PCI: init bdf=00:02.0 id=1013:00b8 [2015-11-13 18:46:00] PCI: init bdf=00:03.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:04.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:05.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:06.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:07.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:08.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:09.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0a.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0b.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0c.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0d.0 id=1af4:1003 [2015-11-13 18:46:00] PCI: init bdf=00:0e.0 id=1af4:1001 [2015-11-13 18:46:00] PCI: init bdf=00:0f.0 id=1af4:1001 [2015-11-13 18:46:00] PCI: init bdf=00:10.0 id=1af4:1110 [2015-11-13 18:46:00] PCI: init bdf=00:1f.0 id=1af4:8888 [2015-11-13 18:46:00] PCI: Using 00:02.0 for primary VGA [2015-11-13 18:46:00] handle_smp: apic_id=1 [2015-11-13 18:46:00] handle_smp: apic_id=6 [2015-11-13 18:46:00] handle_smp: apic_id=7 [2015-11-13 18:46:00] handle_smp: apic_id=3 [2015-11-13 18:46:00] handle_smp: apic_id=2 [2015-11-13 18:46:00] handle_smp: apic_id=5 [2015-11-13 18:46:00] handle_smp: apic_id=4 ========The End, nothing more====== >On Mon, Nov 09, 2015 at 03:06:18PM -0500, Kevin O'Connor wrote: >> On Mon, Nov 09, 2015 at 08:32:53AM -0500, Kevin O'Connor wrote: >> > On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote: >> > > >On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote: >> > > >I'm surprised you would see the above on a recent qemu/kvm though - as >> > > >on a newer KVM I think the second reset would have to happen after >> > > >HaveAttemptedReboot is set and prior to the memcpy in >> > > >qemu_prep_reset() completing. Can you verify your KVM version? >> > > >> > > I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can >> > > see this problem. >> > > I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, >> > > let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, >> > > a self-defined timeout, HA mechnism will issue a internal reboot command >> > > to >> > > the VM and watchdog mechanism will issue a "virsh reset" from the host. >> > > Then, >> > > aforementioned problem will occurs in high probability. >> > >> > Ah, okay. I'm not sure what the best solution to this problem is. >> >> After thinking about this further, I think we can move the >> HaveAttemptedReboot assignment after the memcpy. > >The previous patch could cause corruption if the memcpy() failed. I >think the new SeaBIOS patch below should be okay though. > >-Kevin > > >commit 8a6e44ad5c953266d2339b3299f5fb4ff32c8cbb >Author: Kevin O'Connor <ke...@koconnor.net> >Date: Mon Nov 9 15:00:19 2015 -0500 > > resume: Make KVM soft reboot loop detection more flexible > > Move the check for soft reboot loops from resume.c to shadow.c and > directly check for the case where the memcpy fails. This prevents a > hang if an external reboot request occurs during the BIOS memcpy. > > Signed-off-by: Kevin O'Connor <ke...@koconnor.net> > >diff --git a/src/fw/shadow.c b/src/fw/shadow.c >index ee87d36..b2f2dd8 100644 >--- a/src/fw/shadow.c >+++ b/src/fw/shadow.c >@@ -156,6 +156,8 @@ make_bios_readonly(void) > make_bios_readonly_intel(ShadowBDF, Q35_HOST_BRIDGE_PAM0); > } > >+static u8 AttemptingReboot; >+ > void > qemu_prep_reset(void) > { >@@ -164,6 +166,19 @@ qemu_prep_reset(void) > // QEMU doesn't map 0xc0000-0xfffff back to the original rom on a > // reset, so do that manually before invoking a hard reset. > make_bios_writable(); >+ AttemptingReboot = 1; >+ barrier(); >+ if (!AttemptingReboot) >+ goto fail; >+ barrier(); > memcpy(VSYMBOL(code32flat_start), VSYMBOL(code32flat_start) + > BIOS_SRC_OFFSET > , SYMBOL(code32flat_end) - SYMBOL(code32flat_start)); >+ barrier(); >+ if (AttemptingReboot) >+ goto fail; >+ return; >+fail: >+ // Attempt to restore code has failed - try to shutdown machine. >+ dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n"); >+ apm_shutdown(); > } >diff --git a/src/resume.c b/src/resume.c >index a5465d8..afeadcf 100644 >--- a/src/resume.c >+++ b/src/resume.c >@@ -114,19 +114,10 @@ s3_resume(void) > farcall16big(&br); > } > >-u8 HaveAttemptedReboot VARLOW; >- > // Attempt to invoke a hard-reboot. > static void > tryReboot(void) > { >- if (HaveAttemptedReboot) { >- // Hard reboot has failed - try to shutdown machine. >- dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n"); >- apm_shutdown(); >- } >- HaveAttemptedReboot = 1; >- > dprintf(1, "Attempting a hard reboot\n"); > > // Setup for reset on qemu.