On Fri, 25 Mar 2016 09:38:09 +0800 Chen Fan <chen.fan.f...@cn.fujitsu.com> wrote:
> On 03/25/2016 06:54 AM, Alex Williamson wrote: > > On Wed, 23 Mar 2016 18:12:06 +0800 > > Cao jin <caoj.f...@cn.fujitsu.com> wrote: > > > >> From: Chen Fan <chen.fan.f...@cn.fujitsu.com> > >> > >> when a physical device aer occurred, the device state probably > >> is not in D0 in a short time, if we recover the device quickly. > >> we may stuck in D3 state when force to change device state to D0. > >> we may need to wait for a short time to inject the error to guest. > >> > >> Signed-off-by: Chen Fan <chen.fan.f...@cn.fujitsu.com> > >> --- > >> hw/vfio/pci.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > >> index 25fc095..5216e7f 100644 > >> --- a/hw/vfio/pci.c > >> +++ b/hw/vfio/pci.c > >> @@ -2658,6 +2658,9 @@ static void vfio_err_notifier_handler(void *opaque) > >> msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : > >> PCI_ERR_ROOT_CMD_NONFATAL_EN; > >> > >> + /* wait a bit to ensure aer device is ready */ > >> + usleep(2 * 1000); > > Where does this number come from? Why would the device be in D3? I > > don't understand this at all. > Hi Alex, > > when I tested the code in my environment, I found that when I used > the aer-inject module to inject a fake aer error to device on host, the qemu > would throw out the message "vfio: Unable to power on device, stuck in D3" > on and off. if I use "gdb" to debug the vfio_pci_pre_reset, the phenomenon > would not appearance, I just thought it should be some timing race issue, > so I use a sleep() to wait 2ms (double the reset time of 1ms) to ensure the > device state is ready. maybe the root reason still need to be > investigated deeply. Yes, it sounds like you need to investigate this further, the delay is arbitrary and perhaps suggests a race that needs to be fixed correctly. Thanks, Alex