At 03/01/2011 03:13 PM, Isaku Yamahata Write: > On Tue, Mar 01, 2011 at 02:58:44PM +0800, Wen Congyang wrote: >> At 03/01/2011 12:11 PM, Isaku Yamahata Write: >>> Hi. >>> >>> I don't suppose that just introducing pending bits solve the issue. >>> Your test only use single hot plug/unplug. How about mixing of multiple >>> hot plug/unplug with different slots. >> >> The qemu uses the same thread to deal with monitor command and I/O request >> from guest OS. So I think mixing of multiple hot plug/unplug with different >> slots can also work fine. >> >>> Zeroing up/down on piix4_device_hotplug() is the culprit. >>> >>> State machine of (up, down) would be needed. >>> (up, down) of each slots should be changed following >>> something like >> >> Why we need this feature? We access s->pci0_status only in main thread and >> do not accesss it when we deal with signal, so there is no competition to >> access it. > > We are talking about losing interrupt and events, aren't we? > Not race caused by threads.
I am sorry, I do not understand what you are talk about. > The issue is, Qemu injected sci interrupt into guest, > but before the guest completes to handled it, > users/qemu can start to inject the next sci event triggered by hot > plug/unplug. > Thus qemu loses sci interrupt and up/down status. Yes, you are right. But I still do not understand why introducing pending bits can not solve the issue? Thanks Wen Congyang > > thanks >> >>> >>> - on power-on (0, 0) >>> - on hot plug (0, 0) -> (1, 0) >>> if other combination -> error >>> - on hot unpug (1, 0) or (0, 0) -> (0, 1) >>> (0, 0) is for cold plugged devices. >>> (1, 0) is for hot plugged devices. >>> if other combination -> error >>> - on pciej_write(write on PCI_EJ_BASE) >>> (0, 1) -> (0, 0) and qdev_free() >>> if other combination -> ignore >>> >>> When enabling sci, those bits are checked and raise sci if necessary. >>> >>> Any comments on this? >>> Anyway the current pci hotplug-related commands are somewhat broken, >>> and needs clean up with multifunction hotplug support which is >>> on my todo list. >>> >>> thanks >>> >>> On Mon, Feb 28, 2011 at 10:52:40AM +0800, Wen Congyang wrote: >>>> Hi Markus Armbruster >>>> >>>> At 02/23/2011 04:30 PM, Markus Armbruster Write: >>>>> Isaku Yamahata <yamah...@valinux.co.jp> writes: >>>>> >>>> >>>> <snip> >>>> >>>>> >>>>> I don't think this patch is correct. Let me explain. >>>>> >>>>> Device hot unplug is *not* guaranteed to succeed. >>>>> >>>>> For some buses, such as USB, it always succeeds immediately, i.e. when >>>>> the device_del monitor command finishes, the device is gone. Live is >>>>> good. >>>>> >>>>> But for PCI, device_del merely initiates the ACPI unplug rain dance. It >>>>> doesn't wait for the dance to complete. Why? The dance can take an >>>>> unpredictable amount of time, including forever. >>>>> >>>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI >>>>> slot, and the unplug has not yet completed (race condition), or it >>>>> failed. Yes, Virginia, PCI hotplug *can* fail. >>>>> >>>>> When unplug succeeds, the qdev is automatically destroyed. >>>>> pciej_write() does that for PIIX4. Looks like pcie_cap_slot_event() >>>>> does it for PCIE. >>>> >>>> I got a similar problem. When I unplug a pci device by hand, it works >>>> as expected, and I can hotplug it again. But when I use a srcipt to >>>> do the same thing, sometimes it failed. I think I may find another bug. >>>> >>>> Steps to reproduce this bug: >>>> 1. cat ./test-e1000.sh # RHEL6RC is domain name >>>> #! /bin/bash >>>> >>>> while true; do >>>> virsh attach-interface RHEL6RC network default --mac >>>> 52:54:00:1f:db:c7 --model e1000 >>>> if [[ $? -ne 0 ]]; then >>>> break >>>> fi >>>> virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7 >>>> if [[ $? -ne 0 ]]; then >>>> break >>>> fi >>>> sleep 5 >>>> done >>>> >>>> 2. ./test-e1000.sh >>>> Interface attached successfully >>>> >>>> Interface detached successfully >>>> >>>> Interface attached successfully >>>> >>>> Interface detached successfully >>>> >>>> error: Failed to attach interface >>>> error: operation failed: adding >>>> e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 >>>> device failed: Duplicate ID 'net1' for device >>>> >>>> >>>> I use ftrace to trace whether sci interrupt is passed to guest OS, here is >>>> log: >>>> # cat trace | grep 'irq=9' >>>> <idle>-0 [000] 118342.634772: irq_handler_entry: irq=9 >>>> name=acpi >>>> <idle>-0 [000] 118342.634831: irq_handler_exit: irq=9 >>>> ret=handled >>>> <idle>-0 [000] 118342.693216: irq_handler_entry: irq=9 >>>> name=acpi >>>> <idle>-0 [000] 118342.693254: irq_handler_exit: irq=9 >>>> ret=handled >>>> <idle>-0 [000] 118347.737689: irq_handler_entry: irq=9 >>>> name=acpi >>>> <idle>-0 [000] 118347.737738: irq_handler_exit: irq=9 >>>> ret=handled >>>> According to the log, we only receive 3 sci interrupt, and one is lost. >>>> >>>> I enable piix4's debug and 1 line printf into piix4_device_hotplug: >>>> printf("slot: %d, up: %d, down: %d, state: %d\n", slot, >>>> s->pci0_status.up, s->pci0_status.down, state); >>>> here is the log: >>>> ======== >>>> PM: mapping to 0xb000 >>>> PM readw port=0x0004 val=0x0000 >>>> ... >>>> gpe write afe2 <== 0 >>>> gpe write afe0 <== 255 >>>> gpe write afe3 <== 0 >>>> gpe write afe1 <== 255 >>>> PM readw port=0x0000 val=0x0001 >>>> PM readw port=0x0002 val=0x0000 >>>> gpe read afe0 == 0 >>>> gpe read afe2 == 0 >>>> gpe read afe1 == 0 >>>> gpe read afe3 == 0 >>>> PM writew port=0x0000 val=0x0020 >>>> PM readw port=0x0002 val=0x0000 >>>> PM writew port=0x0002 val=0x0020 >>>> PM readw port=0x0002 val=0x0020 >>>> gpe write afe2 <== 255 >>>> gpe write afe3 <== 255 >>>> ... >>>> slot: 6, up: 0, down: 0, state: 1 <==== we attach interface the first >>>> time >>>> PM readw port=0x0000 val=0x0001 >>>> PM readw port=0x0002 val=0x0120 >>>> gpe read afe0 == 2 >>>> gpe read afe2 == ff >>>> gpe read afe2 == ff >>>> gpe write afe2 <== 253 >>>> gpe read afe1 == 0 >>>> gpe read afe3 == ff >>>> pcihotplug read ae00 == 40 >>>> pcihotplug read ae04 == 0 >>>> ... >>>> gpe write afe0 <== 2 >>>> gpe write afe2 <== 255 >>>> slot: 6, up: 64, down: 0, state: 0 <===== we detach interface the first >>>> time >>>> PM readw port=0x0000 val=0x0001 >>>> PM readw port=0x0002 val=0x0120 >>>> gpe read afe0 == 2 >>>> gpe read afe2 == ff >>>> gpe read afe2 == ff >>>> gpe write afe2 <== 253 >>>> gpe read afe1 == 0 >>>> gpe read afe3 == ff >>>> pcihotplug read ae00 == 0 >>>> pcihotplug read ae04 == 40 >>>> ... >>>> pciej write ae08 <== 64 <=== we will call qdev_free() >>>> pciej write ae08 <== 64 >>>> gpe write afe0 <== 2 >>>> gpe write afe2 <== 255 >>>> slot: 7, up: 0, down: 64, state: 1 <=== we attach interface the second >>>> time. >>>> PM readw port=0x0000 val=0x0001 >>>> PM readw port=0x0002 val=0x0120 >>>> gpe read afe0 == 2 >>>> gpe read afe2 == ff >>>> gpe read afe2 == ff >>>> gpe write afe2 <== 253 >>>> gpe read afe1 == 0 >>>> gpe read afe3 == ff >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> pcihotplug read ae00 == 80 >>>> pcihotplug read ae04 == 0 >>>> slot: 7, up: 128, down: 0, state: 0 <=== we detach interface the second >>>> time >>>> gpe write afe0 <== 2 <=== write 2 to afe0 will cause sci >>>> interupt to be lost >>>> gpe write afe2 <== 255 >>>> =========== >>>> >>>> Here is short log, and show the different between the first time and >>>> second time: >>>> =========== >>>> gpe write afe0 <== 2 >>>> gpe write afe2 <== 255 >>>> slot: 6, up: 64, down: 0, state: 0 <===== we detach interface the first >>>> time >>>> .... >>>> slot: 7, up: 128, down: 0, state: 0 <=== we detach interface the second >>>> time >>>> gpe write afe0 <== 2 <=== write 2 to afe0 will cause sci >>>> interupt to be lost >>>> gpe write afe2 <== 255 >>>> >>>> =========== >>>> >>>> Does this behavior is a normal behavior? >>>> >>>> The following patch can fix this problem. >>>> >>>> From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001 >>>> From: Wen Congyang <we...@cn.fujitsu.com> >>>> Date: Mon, 28 Feb 2011 10:33:45 +0800 >>>> Subject: [PATCH] pend hotplug event until last event is handled by guest OS >>>> >>>> --- >>>> hw/acpi_piix4.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- >>>> 1 files changed, 44 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c >>>> index b5a2762..4ff3475 100644 >>>> --- a/hw/acpi_piix4.c >>>> +++ b/hw/acpi_piix4.c >>>> @@ -74,6 +74,7 @@ typedef struct PIIX4PMState { >>>> /* for pci hotplug */ >>>> struct gpe_regs gpe; >>>> struct pci_status pci0_status; >>>> + struct pci_status pci0_status_pending; >>>> uint32_t pci0_hotplug_enable; >>>> } PIIX4PMState; >>>> >>>> @@ -518,6 +519,19 @@ static void gpe_reset_val(uint16_t *cur, int addr, >>>> uint32_t val) >>>> *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift); >>>> } >>>> >>>> +static void raise_pending_hotplug(PIIX4PMState *s) >>>> +{ >>>> + if (s->pci0_status_pending.up || s->pci0_status_pending.down) { >>>> + s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS; >>>> + s->pci0_status.up = s->pci0_status_pending.up; >>>> + s->pci0_status.down = s->pci0_status_pending.down; >>>> + s->pci0_status_pending.up = 0; >>>> + s->pci0_status_pending.down = 0; >>>> + >>>> + pm_update_sci(s); >>>> + } >>>> +} >>>> + >>>> static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val) >>>> { >>>> PIIX4PMState *s = opaque; >>>> @@ -539,6 +553,11 @@ static void gpe_writeb(void *opaque, uint32_t addr, >>>> uint32_t val) >>>> pm_update_sci(s); >>>> >>>> PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val); >>>> + >>>> + if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & >>>> PIIX4_PCI_HOTPLUG_STATUS) { >>>> + /* check and reraise the pending hotplug event */ >>>> + raise_pending_hotplug(s); >>>> + } >>>> } >>>> >>>> static uint32_t pcihotplug_read(void *opaque, uint32_t addr) >>>> @@ -639,12 +658,22 @@ static void enable_device(PIIX4PMState *s, int slot) >>>> s->pci0_status.up |= (1 << slot); >>>> } >>>> >>>> +static void pend_enable_device(PIIX4PMState *s, int slot) >>>> +{ >>>> + s->pci0_status_pending.up |= (1 << slot); >>>> +} >>>> + >>>> static void disable_device(PIIX4PMState *s, int slot) >>>> { >>>> s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS; >>>> s->pci0_status.down |= (1 << slot); >>>> } >>>> >>>> +static void pend_disable_device(PIIX4PMState *s, int slot) >>>> +{ >>>> + s->pci0_status_pending.down |= (1 << slot); >>>> +} >>>> + >>>> static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, >>>> PCIHotplugState state) >>>> { >>>> @@ -659,15 +688,23 @@ static int piix4_device_hotplug(DeviceState *qdev, >>>> PCIDevice *dev, >>>> return 0; >>>> } >>>> >>>> - s->pci0_status.up = 0; >>>> - s->pci0_status.down = 0; >>>> - if (state == PCI_HOTPLUG_ENABLED) { >>>> - enable_device(s, slot); >>>> + if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) { >>>> + if (state == PCI_HOTPLUG_ENABLED) { >>>> + pend_enable_device(s, slot); >>>> + } else { >>>> + pend_disable_device(s, slot); >>>> + } >>>> } else { >>>> - disable_device(s, slot); >>>> - } >>>> + s->pci0_status.up = 0; >>>> + s->pci0_status.down = 0; >>>> + if (state == PCI_HOTPLUG_ENABLED) { >>>> + enable_device(s, slot); >>>> + } else { >>>> + disable_device(s, slot); >>>> + } >>>> >>>> - pm_update_sci(s); >>>> + pm_update_sci(s); >>>> + } >>>> >>>> return 0; >>>> } >>>> -- >>>> 1.7.1 >>>> >>>>> >>>>> Your patch adds a *second* qdev_free(). No good. >>>>> >>>>> >>>> >>>> >>> >> >