On 2013-09-03 15:22, Zhanghaoyu (A) wrote: >>> Hi, all >>> >>> Segmentation fault happened when reboot VM after hot-unplug virtio NIC, >>> which can be reproduced 100%. >>> See similar bug report to >>> https://bugzilla.redhat.com/show_bug.cgi?id=988256 >>> >>> test environment: >>> host: SLES11SP2 (kenrel version: 3.0.58) >>> qemu: 1.5.1, upstream-qemu (commit >>> 545825d4cda03ea292b7788b3401b99860efe8bc) >>> libvirt: 1.1.0 >>> guest os: win2k8 R2 x64bit or sles11sp2 x64 or win2k3 32bit >>> >>> You can reproduce this problem by following steps: >>> 1. start a VM with virtio NIC(s) >>> 2. hot-unplug a virtio NIC from the VM 3. reboot the VM, then >>> segmentation fault happened during starting period >>> >>> the qemu backtrace shown as below: >>> #0 0x00007ff4be3288d0 in __memcmp_sse4_1 () from /lib64/libc.so.6 >>> #1 0x00007ff4c07f82c0 in patch_hypercalls (s=0x7ff4c15dd610) at >>> /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:549 >>> #2 0x00007ff4c07f84f0 in vapic_prepare (s=0x7ff4c15dd610) at >>> /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:614 >>> #3 0x00007ff4c07f85e7 in vapic_write (opaque=0x7ff4c15dd610, addr=0, >>> data=32, size=2) >>> at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:651 >>> #4 0x00007ff4c082a917 in memory_region_write_accessor >>> (opaque=0x7ff4c15df938, addr=0, value=0x7ff4bbfe3d00, size=2, >>> shift=0, mask=65535) at >>> /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:334 >>> #5 0x00007ff4c082a9ee in access_with_adjusted_size (addr=0, >>> value=0x7ff4bbfe3d00, size=2, access_size_min=1, >>> access_size_max=4, access=0x7ff4c082a89a >>> <memory_region_write_accessor>, opaque=0x7ff4c15df938) >>> at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:364 >>> #6 0x00007ff4c082ae49 in memory_region_iorange_write >>> (iorange=0x7ff4c15dfca0, offset=0, width=2, data=32) >>> at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:439 >>> #7 0x00007ff4c08236f7 in ioport_writew_thunk (opaque=0x7ff4c15dfca0, >>> addr=126, data=32) >>> at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:219 >>> #8 0x00007ff4c0823078 in ioport_write (index=1, address=126, data=32) >>> at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:83 >>> #9 0x00007ff4c0823ca9 in cpu_outw (addr=126, val=32) at >>> /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:296 >>> #10 0x00007ff4c0827485 in kvm_handle_io (port=126, data=0x7ff4c0510000, >>> direction=1, size=2, count=1) >>> at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1485 >>> #11 0x00007ff4c0827e14 in kvm_cpu_exec (env=0x7ff4c15bf270) at >>> /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1634 >>> #12 0x00007ff4c07b6f27 in qemu_kvm_cpu_thread_fn (arg=0x7ff4c15bf270) >>> at /mnt/zhanghaoyu/qemu/qemu-1.5.1/cpus.c:759 >>> #13 0x00007ff4be58af05 in start_thread () from /lib64/libpthread.so.0 >>> #14 0x00007ff4be2cd53d in clone () from /lib64/libc.so.6 >>> >>> If I apply below patch to the upstream qemu, this problem will >>> disappear, >>> --- >>> hw/i386/kvmvapic.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c index >>> 15beb80..6fff299 100644 >>> --- a/hw/i386/kvmvapic.c >>> +++ b/hw/i386/kvmvapic.c >>> @@ -652,11 +652,11 @@ static void vapic_write(void *opaque, hwaddr addr, >>> uint64_t data, >>> switch (size) { >>> case 2: >>> if (s->state == VAPIC_INACTIVE) { >>> - rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK; >>> - s->rom_state_paddr = rom_paddr + data; >>> - >>> s->state = VAPIC_STANDBY; >>> } >>> + rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK; >>> + s->rom_state_paddr = rom_paddr + data; >>> + >>> if (vapic_prepare(s) < 0) { >>> s->state = VAPIC_INACTIVE; >>> break; >> >> Yes, we need to update the ROM's physical address after the BIOS reshuffled >> the layout. >> >> But I'm not happy with simply updating the address unconditionally. We need >> to understand the crash first, then make QEMU robust against the guest not >> issuing this initial write after a ROM region layout change. >> And finally make it work properly in the normal case. >> > The direct cause of crash is trying to access invalid address, which is due > to not updating the rom's physical address. > In my opinion, since hot-plug/unplug involved in, we need to re-calculate > rom's physical address for all devices which have rom during starting period > when reboot/reset vm, > is it reasonable to set vapic's state to VAPIC_INACTIVE during vapic's reset?
I checked meanwhile, and it should be sufficient to make the VAPIC state working again under sane conditions. Still, this is not yet satisfying /wrt what *precisely* went wrong when the physical address became incorrect, and if a malicious guest could exploit this (e.g. by invoking the 16-bit write from an address outside the VAPIC ROM). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux