On Wed, Mar 30, 2022 at 03:46:56PM -0400, Jason Andryuk wrote:
> PCI device assignment to an HVM with stubdom is potentially racy.  First
> the PCI device is assigned to the stubdom via the PV PCI protocol.  Then
> QEMU is sent a QMP command to attach the PCI device to QEMU running
> within the stubdom.  However, the sysfs entries within the stubdom may
> not have appeared by the time QEMU receives the device_add command
> resulting in errors like:
> 
> libxl_qmp.c:1838:qmp_ev_parse_error_messages:Domain 10:Could not open 
> '/sys/bus/pci/devices/0000:00:1f.3/config': No such file or directory
> 
> This patch retries the device assignment up to 10 times with a 1 second
> delay between.  That roughly matches the overall hotplug timeout.
> 
> The qmp_ev_parse_error_messages error is still printed since it happens
> at a lower level than the pci code controlling the retries.  With that,
> the "Retrying PCI add %d" message is also printed at ERROR level to
> clarify what is happening.
> 
> Signed-off-by: Jason Andryuk <jandr...@gmail.com>
> ---
> @@ -1252,10 +1258,22 @@ static void pci_add_qmp_device_add_cb(libxl__egc *egc,
>                                        const libxl__json_object *response,
>                                        int rc)
>  {
> -    EGC_GC;
>      pci_add_state *pas = CONTAINER_OF(qmp, *pas, qmp);
> +    STATE_AO_GC(pas->aodev->ao);

I think this changes are wrong, what is the reason to use STATE_AO_GC
instead of EGC_GC?
I think in this case, it is fine to keep using EGC_GC, as there doesn't
seems to be any allocation that needs to live after this function
returns. And you could just pass `pas->aodev->ao` to
libxl__ev_time_register_rel().

>  
> -    if (rc) goto out;
> +    if (rc) {
> +        if (pas->retries++ < 10) {
> +            LOGD(ERROR, qmp->domid, "Retrying PCI add %d", pas->retries);
> +            rc = libxl__ev_time_register_rel(ao, &pas->timeout_retries,
> +                                             pci_add_qmp_device_add_retry,
> +                                             1000);
> +            if (rc) goto out;
> +
> +            return; /* wait for the timeout to then retry */
> +        } else {
> +            goto out;
> +        }
> +    }

So this retry logic would be done regardless of whether stubdom is in
use or not. It's not going to be useful in the non-stubdom case, is it?

When adding a pci device to a domain that has its device model in a
stubdomain, there's a first call to do_pci_add() which works fine,
right? Then there's a callback to device_pci_add_stubdom_wait(), which
is supposed to wait for the stubdom to be ready, but the sysfs entry
might still be missing at that time, right? Then, there is a second call
to do_pci_add() for the guest, and it's the one that fail in your case,
right?

If my reasoning is correct, could we enable the retry logic only for the
second do_pci_add() call? So that guests without stubdom aren't impacted
as I don't think retrying in this case would be useful and would just
delay the error.

Cheers,

-- 
Anthony PERARD

Reply via email to