Hi Amos,

On 04/21/15 01:31, Amos Kong wrote:
> Hi Kevin,
> 
> When I use old seabios in some stable linux release, some bootable
> devices (2 ide disks) would be lost when I try to restart guest by
> Ctrl+Alt+Delete during boot stage.
> 
> Releated Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1129549
> 
> I found an upstream commit [1] fixed this bug, but when I backport
> this patch to old seabios, guest will shutdown when I try to restart
> by Ctrl+Alt+Delete during boot stage.
> 
> Kevin, can you help to describe that:
> "Unfortunately, kvm does not keep a  pristine copy of the BIOS at 0xffff0000"
> It's a kvm (userspace, QEMU) bug?
> 
> If it's a qemu-kvm bug, I should also fix this bz in old stable release.
> 
> 
> [1] ===========================================
> commit 244caf86f11f5f65d166d91704f64cb673167abc
> Author: Kevin O'Connor <ke...@koconnor.net>
> Date:   Wed Sep 15 21:48:16 2010 -0400
> 
>     Try to hard-reboot on rerun of post even on emulators.
>     
>     Extend the hard-reboot logic to qemu and kvm.  On qemu, a reboot will
>     not reset the memory settings for 0xc0000-0xfffff, so copy that memory
>     area manually before rebooting.  Unfortunately, kvm does not keep a
>     pristine copy of the BIOS at 0xffff0000, so detect that case and
>     shutdown the machine.
> 
> Two backport dependences:
>   [PATCH] Try to hard-reboot processor on rerun of post under coreboot.
>   [PATCH] Don't do shadow copying of optionroms when 
> CONFIG_OPTIONROMS_DEPLOYED.
> 

- Please open <https://bugzilla.redhat.com/show_bug.cgi?id=1027565>
  (it's a public bug)
- Please locate the "Unwrap comments" link, and click it
- Then go to comment #20 in the bug -- it's simplest to click this here:
  <https://bugzilla.redhat.com/show_bug.cgi?id=1027565#c20>

The diagrams in that comment explain the difference between the RHEL-6
and RHEL-7 memory maps that the corresponding QEMU versions provide.
Importantly, as Kevin explained too:

(a) in RHEL-6 qemu, the PAM registers that control *where* reads and
    writes to the region [0xc0000, 0xfffff] end up are not implemented
(b) in RHEL-6 qemu, the "pc.ram" RAMBlock that provides the guest's
    "main RAM" is *hidden* by the "pc.bios" RAMBlock in the
    [0xe0000, 0xfffff] region.
(c) in addition, the exact same "pc.bios" RAMBlock is visible at
    [0xfffe0000, 0xffffffff]

RHEL-7 is different. The PAM registers *are* implemented (well, mostly),
and whether [0xe0000, 0xfffff] shows a window into RAM or PCI address
space, that is controlled by some of these registers.

If the registers are set so that reads go to the PCI address space
instead of RAM (= the "pc.ram" RAMBlock) -- grep the source for
"pam-pci") -- then the [0xe0000, 0xfffff] range ends up showing the
"isa-bios" range. Use the "info mtree" command. And "isa-bios" is again
an alias, a window into the "pc.bios" RAMBlock, that is visible in PCI
address space at 0xfffe0000.

In summary,
- on RHEL-6 you have no working PAM registers, and [0xc0000, 0xfffff]
  and [0xfffe0000, 0xffffffff] always show the same.
- on RHEL-7, you have (mostly) working PAM registers, and those can
  change what [0xc0000, 0xfffff] shows. Dependent on the PAM settings,
  this range can be a window into RAM, or it can be a window into
  "pc.bios", which resides at 0xfffe0000.

Now, different versions of SeaBIOS handle these environments
differently. The topic of bug 1027565 was the following situation:

suppose that you boot a virtual machine on a RHEL-6 host (which implies
RHEL-6, ie. PAM-less qemu, *and* RHEL-6 SeaBIOS), then migrate it to
RHEL-7 (which implies RHEL-7, PAM-capable, qemu, *but* SeaBIOS stays the
same, as it comes with migration), and *then* you reboot the VM on the
target (RHEL-7) host -- this means that RHEL-6 SeaBIOS will reboot in a
RHEL-7 (PAM-capable) VM.

This was a problem because:
- when RHEL-6 SeaBIOS originally started on the RHEL-6 host, it modified
  the "pc.bios" RAMBlock, and that modification was visible in both
  [0xe0000, 0xfffff] and [0xfffe0000, 0xffffffff].

- When this guest was migrated to a RHEL-7 host, then rebooted, then
  (due to the RHEL-7-only PAM settings showing "pc.ram" at [0xe0000,
  0xfffff]) those original modifications were *only* visible at
  [0xfffe0000, 0xffffffff]. This tripped up RHEL-6 SeaBIOS at reboot,
  because it expected to see (on reboot) its original modifications at
  [0xe0000, 0xfffff]. (The PAM registers are not re-set on reset.)

The klude we implemented for this was to manually re-shadow the BIOS
from "pc.bios" to "pc.ram" on the RHEL-7 host, when the machine type
implied RHEL-6.

Okay, so how does this relate to your question? The SeaBIOS commit you
reference, 244caf86, makes several statements:

(1) "On qemu, a reboot will not reset the memory settings for
    0xc0000-0xfffff"
(2) "kvm does not keep a pristine copy of the BIOS at 0xffff0000"

Claim (1) remains true on RHEL-7 qemu as well.

Wrt. claim (2), you can see that this SeaBIOS commit dates back to the
"RHEL-6 era", because it is *no longer true* for RHEL-7 qemu. On RHEL-7
qemu, the PAM registers work (mostly), and the copy of the BIOS at
[0xfffe0000, 0xffffffff] *is* pristine. The function
old_pc_system_rom_init() in "hw/i386/pc_sysfw.c" makes it read-only
(both directly and also when seen via the isa-bios alias, dependent on
the PAM registers):

    if (!isapc_ram_fw) {
        memory_region_set_readonly(bios, true);
    }

...

    if (!isapc_ram_fw) {
        memory_region_set_readonly(isa_bios, true);
    }

This makes a verbatim backport of this upstream SeaBIOS commit
inappropriate for a RHEL-7 qemu (and that situation emerges when a VM is
migrated from a RHEL-6 to a RHEL-7 host).

As to how you can fix
<https://bugzilla.redhat.com/show_bug.cgi?id=1129549> (which is the
motivation for this entire discussion):

- You need to identify *what variable exactly* stores the list of
  bootable devices. Is that a SeaBIOS variable? Is it some field in the
  BDA (BIOS Data Area)? Something else? Because the bug is that this
  variable, wherever it lives, is not re-set on reboot.

  (Kevin, can you perhaps help with this question? Thank you.)

- Once you identified the variable or field, you should figure out its
  lifecycle -- how it is affected by all of the above.

- The fix you come up with may be a backport from upstream SeaBIOS, but
  it also might have to be downstream only (ie. divergent). In
  particular whatever fix you find for RHEL-6 SeaBIOS, it must also
  work when such a VM is migrated to RHEL-7, and rebooted there.

As far as I understand SeaBIOS commit 244caf86, it tries to implement
the following:
- it knows that qemu does not reset stuff "hard enough" on reset, so it
  tries to make up for it, manually
- it knows that "making up for it manually" is not possible on KVM at
  all, so it shuts down instead.

Implemented as:

the variable called HaveRunPost lives somewhere in the low [0xe0000,
0xfffff] range. When the guest is booted first, the initial value of
HaveRunPost is 0, so _start() does not invoke tryReboot(), and
HaveRunPost gets set o 1.

After a reboot, HaveRunPost is still 1, and this happens:

_start()
  tryReboot()
    qemu_prep_reset()
      make_bios_writable()
      copy from [0xfffe0000, 0xffffffff] to [0xe0000, 0xfffff]
    if (HaveRunPost) --> apm_shutdown()
    i8042_reboot()

The "copy" operation in the above seeks to *restore* (part of) the BIOS
image at [0xe0000, 0xfffff], including its variable HaveRunPost, from
the "pristine" copy at [0xfffe0000, 0xffffffff]. If that is successful,
then HaveRunPost gets implicitly cleared (due to the copy operation),
and then we proceed to a hard reboot. And this hard reboot is what
restores the boot device list, I guess.

If HaveRunPost does not go from 1 to 0 due to the copy operation, then
SeaBIOS has no way to restore itself to the "pristine" image -- there is
no pristine image. (And that's the case on RHEL-6, because when you
modify HaveRunPost in the [0xe0000, 0xfffff] range, that is immediately
reflected in [0xfffe0000, 0xffffffff] too, so the copy is actually a
no-op on RHEL-6. That's why you get the shutdown on RHEL-6.)

... I think that you cannot backport 244caf86 to RHEL-6 SeaBIOS. Namely,
RHEL-6 has exactly those circumstances where SeaBIOS has no chance at a
"real" hard reset; there is no "pristine" copy to restore stuff from.
The solution for that would be fixing RHEL-6 qemu (see your own
<https://bugzilla.redhat.com/show_bug.cgi?id=1129549#c15>), but fixing
the PAM registers and all the memory stuff in RHEL-6 is *completely* out
of scope.

So here's what you can do in RHEL-6 SeaBIOS (as I said above):
- identify the exact variables / BDA fields that control the boot device
list, and clear them manually on *each* SeaBIOS startup. That will cover
cold boots (when it will amount to a no-op) and on warm boots (when it
will fix your BZ.)

(Upstream 244caf86 would do that clearing with the copy operation, but
that copy operation will *never* work on RHEL-6.)

- Then, verify that this manual, downstream-only fix works also after
the VM is migrated to a RHEL-7 host.

... Apologies for the very long and messy email, but I hope it helps us
all understand the issue.

Thanks
Laszlo
>From 9f136b4ed4ecd83a2de6ffa52775286b106bb5f5 Mon Sep 17 00:00:00 2001
From: Laszlo Ersek <ler...@redhat.com>
Date: Thu, 17 Apr 2014 11:13:59 +0200
Subject: [RHEL-7.2 qemu-kvm PATCH] qemu_loadvm_state(): shadow SeaBIOS for VM
 incoming from RHEL-6 host

RH-Author: Laszlo Ersek <ler...@redhat.com>
Message-id: <1397733239-8835-1-git-send-email-ler...@redhat.com>
Patchwork-id: 58498
O-Subject: [RHEL-7.0 0day qemu-kvm PATCH v2] qemu_loadvm_state(): shadow SeaBIOS for VM incoming from RHEL-6 host
Bugzilla: 1027565
RH-Acked-by: Dr. David Alan Gilbert (git) <dgilb...@redhat.com>
RH-Acked-by: Marcel Apfelbaum <marce...@redhat.com>
RH-Acked-by: Eduardo Habkost <ehabk...@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1027565
Brew:     https://brewweb.devel.redhat.com/taskinfo?taskID=7352678

RHEL-only patch.

SeaBIOS's shadowing logic has no effect on "pc.ram" (only on "pc.bios" and
"pc.rom") when it runs on the RHEL-6 emulator. When such a guest is
migrated to the RHEL-7 emulator, where the PAM registers actually work,
these two UMBs under 1MB simply disappear from the guest's view, breaking
reboot and S3 resume.

Tested extensively by QE.

Signed-off-by: Laszlo Ersek <ler...@redhat.com>
---

Notes:
    Changes in v2:
    - print message to stderr when we shadow the UMBs manually [Dave]
    - also copy C and D segments from pc.rom to pc.ram
    - move to qemu_loadvm_state() from ram_load() -- we must shadow only
      when all RAMBlocks have been loaded [Dave]

 include/sysemu/sysemu.h |  1 +
 hw/i386/pc_piix.c       |  1 +
 savevm.c                | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

Signed-off-by: Miroslav Rezanina <mreza...@redhat.com>
Signed-off-by: Laszlo Ersek <ler...@redhat.com>
---
 include/sysemu/sysemu.h |  1 +
 hw/i386/pc_piix.c       |  1 +
 savevm.c                | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8dc0a4c..07181ac 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -84,6 +84,7 @@ void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 int qemu_loadvm_state(QEMUFile *f);
+extern bool shadow_bios_after_incoming;
 
 /* SLIRP */
 void do_info_slirp(Monitor *mon);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 940816f..fea98b6 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -954,6 +954,7 @@ static void pc_compat_rhel650(QEMUMachineInitArgs *args)
     rom_file_has_mr = false; 
     has_acpi_build = false;
     gigabyte_align = false;
+    shadow_bios_after_incoming = true;
 }
 
 static void pc_init_rhel650(QEMUMachineInitArgs *args)
diff --git a/savevm.c b/savevm.c
index 4d92a7b..6efbb75 100644
--- a/savevm.c
+++ b/savevm.c
@@ -52,6 +52,8 @@
 #define ARP_PTYPE_IP 0x0800
 #define ARP_OP_REQUEST_REV 0x3
 
+bool shadow_bios_after_incoming;
+
 static int announce_self_create(uint8_t *buf,
 				uint8_t *mac_addr)
 {
@@ -2195,6 +2197,63 @@ typedef struct LoadStateEntry {
     int version_id;
 } LoadStateEntry;
 
+static void shadow_bios(void)
+{
+    RAMBlock *block, *ram, *oprom, *bios;
+    size_t one_meg, oprom_size, bios_size;
+    uint8_t *cd_seg_host, *ef_seg_host;
+
+    ram = NULL;
+    oprom = NULL;
+    bios = NULL;
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (strcmp("pc.ram", block->idstr) == 0) {
+            assert(ram == NULL);
+            ram = block;
+        } else if (strcmp("pc.rom", block->idstr) == 0) {
+            assert(oprom == NULL);
+            oprom = block;
+        } else if (strcmp("pc.bios", block->idstr) == 0) {
+            assert(bios == NULL);
+            bios = block;
+        }
+    }
+    assert(ram != NULL);
+    assert(oprom != NULL);
+    assert(bios != NULL);
+    assert(memory_region_is_ram(ram->mr));
+    assert(memory_region_is_ram(oprom->mr));
+    assert(memory_region_is_ram(bios->mr));
+    assert(int128_eq(ram->mr->size, int128_make64(ram->length)));
+    assert(int128_eq(oprom->mr->size, int128_make64(oprom->length)));
+    assert(int128_eq(bios->mr->size, int128_make64(bios->length)));
+
+    one_meg = 1024 * 1024;
+    oprom_size = 128 * 1024;
+    bios_size = 128 * 1024;
+    assert(ram->length >= one_meg);
+    assert(oprom->length == oprom_size);
+    assert(bios->length == bios_size);
+
+    ef_seg_host = memory_region_get_ram_ptr(ram->mr) + (one_meg - bios_size);
+    cd_seg_host = ef_seg_host - oprom_size;
+
+    /* This is a crude hack, but we must distinguish a rhel6.x.0 machtype guest
+     * coming in from a RHEL-6 emulator (where shadowing has had no effect on
+     * "pc.ram") from a similar guest coming in from a RHEL-7 emulator (where
+     * shadowing has worked). In the latter case we must not trample the live
+     * SeaBIOS variables in "pc.ram".
+     */
+    if (buffer_is_zero(ef_seg_host, bios_size)) {
+        fprintf(stderr, "copying E and F segments from pc.bios to pc.ram\n");
+        memcpy(ef_seg_host, memory_region_get_ram_ptr(bios->mr), bios_size);
+    }
+    if (buffer_is_zero(cd_seg_host, oprom_size)) {
+        fprintf(stderr, "copying C and D segments from pc.rom to pc.ram\n");
+        memcpy(cd_seg_host, memory_region_get_ram_ptr(oprom->mr), oprom_size);
+    }
+}
+
 int qemu_loadvm_state(QEMUFile *f)
 {
     QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
@@ -2297,6 +2356,13 @@ int qemu_loadvm_state(QEMUFile *f)
         }
     }
 
+    /* Supplement SeaBIOS's shadowing now, because it was useless when the
+     * incoming VM started on the RHEL-6 emulator.
+     */
+    if (shadow_bios_after_incoming) {
+        shadow_bios();
+    }
+
     cpu_synchronize_all_post_init();
 
     ret = 0;
-- 
1.8.3.1

Reply via email to