[Xen-devel] [PATCH v6 3/6] libxl: don't try to manipulate json config for stubdomain
Stubdomain do not have it's own config file - its configuration is derived from target domains. Do not try to manipulate it when attaching PCI device. This bug prevented starting HVM with stubdomain and PCI passthrough device attached. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Wei Liu --- Changes in v3: - skip libxl__dm_check_start too, as stubdomain is guaranteed to be running at this stage already - do not init d_config at all, as it is used only for json manipulation Changes in v4: - adjust comment style --- tools/libxl/libxl_pci.c | 50 ++ 1 file changed, 32 insertions(+), 18 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 578535f..d26fc9a 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -120,10 +120,14 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d libxl_domain_config d_config; libxl_device_pci pcidev_saved; libxl__domain_userdata_lock *lock = NULL; +bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL); -libxl_domain_config_init(&d_config); -libxl_device_pci_init(&pcidev_saved); -libxl_device_pci_copy(CTX, &pcidev_saved, pcidev); +/* Stubdomain doesn't have own config. */ +if (!is_stubdomain) { +libxl_domain_config_init(&d_config); +libxl_device_pci_init(&pcidev_saved); +libxl_device_pci_copy(CTX, &pcidev_saved, pcidev); +} be_path = libxl__domain_device_backend_path(gc, 0, domid, 0, LIBXL__DEVICE_KIND_PCI); @@ -152,27 +156,35 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d GCNEW(device); libxl__device_from_pcidev(gc, domid, pcidev, device); -lock = libxl__lock_domain_userdata(gc, domid); -if (!lock) { -rc = ERROR_LOCK_FAIL; -goto out; -} +/* + * Stubdomin config is derived from its target domain, it doesn't have + * its own file. + */ +if (!is_stubdomain) { +lock = libxl__lock_domain_userdata(gc, domid); +if (!lock) { +rc = ERROR_LOCK_FAIL; +goto out; +} -rc = libxl__get_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +rc = libxl__get_domain_configuration(gc, domid, &d_config); +if (rc) goto out; -device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, - &pcidev_saved); +device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, + &pcidev_saved); -rc = libxl__dm_check_start(gc, &d_config, domid); -if (rc) goto out; +rc = libxl__dm_check_start(gc, &d_config, domid); +if (rc) goto out; +} for (;;) { rc = libxl__xs_transaction_start(gc, &t); if (rc) goto out; -rc = libxl__set_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +if (lock) { +rc = libxl__set_domain_configuration(gc, domid, &d_config); +if (rc) goto out; +} libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back)); @@ -184,8 +196,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d out: libxl__xs_transaction_abort(gc, &t); if (lock) libxl__unlock_domain_userdata(lock); -libxl_device_pci_dispose(&pcidev_saved); -libxl_domain_config_dispose(&d_config); +if (!is_stubdomain) { +libxl_device_pci_dispose(&pcidev_saved); +libxl_domain_config_dispose(&d_config); +} return rc; } -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v6 0/6] Fix PCI passthrough for HVM with stubdomain
In this version, I add PHYSDEVOP_interrupt_control to allow stubdomain enabling MSI after mapping it, and also disabling INTx beforehand. Actual hypercall refuse to enable both of them. Related article: https://www.qubes-os.org/news/2017/10/18/msi-support/ Changes in v2: - new "xen/x86: Allow stubdom access to irq created for msi" patch - applied review comments from v1 Changes is v3: - apply suggestions by Roger - add PHYSDEVOP_msi_msix_set_enable Changes in v4: - implement suggestions by Wei, Roger, Jan - plug new physdevop into XSM Changes in v5: - rebase on master - rename to PHYSDEVOP_msi_control - move granting access to IRQ into create_irq Changes in v6: - simplify granting IRQ access, record dm domid for cleanup - rename to PHYSDEVOP_interrupt_control - include INTx control in the hypercall --- Cc: Ian Jackson Cc: Wei Liu Cc: Andrew Cooper Cc: George Dunlap Cc: Jan Beulich Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Tim Deegan Cc: Anthony PERARD Cc: "Roger Pau Monné" Cc: Suravee Suthikulpanit Cc: Brian Woods Cc: Kevin Tian Cc: Daniel De Graaf Marek Marczykowski-Górecki (6): libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use libxl: attach PCI device to qemu only after setting pciback/pcifront libxl: don't try to manipulate json config for stubdomain xen/x86: Allow stubdom access to irq created for msi. xen/x86: add PHYSDEVOP_interrupt_control tools/libxc: add wrapper for PHYSDEVOP_interrupt_control tools/libxc/include/xenctrl.h| 6 ++- tools/libxc/xc_physdev.c | 15 ++- tools/libxl/libxl_pci.c | 63 + xen/arch/x86/hpet.c | 3 +- xen/arch/x86/irq.c | 51 ++-- xen/arch/x86/msi.c | 45 ++- xen/arch/x86/physdev.c | 53 +- xen/arch/x86/x86_64/physdev.c| 4 ++- xen/drivers/char/ns16550.c | 2 +- xen/drivers/passthrough/amd/iommu_init.c | 2 +- xen/drivers/passthrough/vtd/iommu.c | 3 +- xen/include/asm-x86/irq.h| 7 ++- xen/include/asm-x86/msi.h| 2 +- xen/include/public/physdev.h | 23 +- xen/include/xlat.lst | 1 +- xen/include/xsm/dummy.h | 7 +++- xen/include/xsm/xsm.h| 6 ++- xen/xsm/dummy.c | 1 +- xen/xsm/flask/hooks.c| 24 ++- xen/xsm/flask/policy/access_vectors | 1 +- 20 files changed, 281 insertions(+), 38 deletions(-) base-commit: 6c9639a72f0ca3a9430ef75f375877182281fdef -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v6 6/6] tools/libxc: add wrapper for PHYSDEVOP_interrupt_control
Add libxc wrapper for PHYSDEVOP_interrupt_control introduced in previous commit. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - new patch Changes in v4: - adjust for updated previous patch Changes in v5: - rename to PHYSDEVOP_msi_control, adjust arguments Change in v6: - initialize struct physdev_interrupt_control inline, drop pointless rc variable - rename to PHYSDEVOP_interrupt_control --- tools/libxc/include/xenctrl.h | 6 ++ tools/libxc/xc_physdev.c | 15 +++ 2 files changed, 21 insertions(+) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 0ff6ed9..2adb114 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1639,6 +1639,12 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_interrupt_control(xc_interface *xch, + int seg, + int bus, + int devfn, + int flags); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libxc/xc_physdev.c b/tools/libxc/xc_physdev.c index 460a8e7..5af8296 100644 --- a/tools/libxc/xc_physdev.c +++ b/tools/libxc/xc_physdev.c @@ -111,3 +111,18 @@ int xc_physdev_unmap_pirq(xc_interface *xch, return rc; } +int xc_physdev_interrupt_control(xc_interface *xch, + int seg, + int bus, + int devfn, + int flags) +{ +struct physdev_interrupt_control op = { +.seg = seg, +.bus = bus, +.devfn = devfn, +.flags = flags, +}; + +return do_physdev_op(xch, PHYSDEVOP_interrupt_control, &op, sizeof(op)); +} -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control
Allow device model running in stubdomain to enable/disable INTx/MSI(-X), bypassing pciback. While pciback is still used to access config space from within stubdomain, it refuse to write to PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE in non-permissive mode. Which is the right thing to do for PV domain (the main use case for pciback), as PV domain should use XEN_PCI_OP_* commands for that. Unfortunately those commands are not good for stubdomain use, as they configure MSI in dom0's kernel too, which should not happen for HVM domain. This new physdevop is allowed only for stubdomain controlling the domain which own the device. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - new patch Changes in v4: - adjust code style - s/msi_msix/msi/ - add msi_set_enable XSM hook - flatten struct physdev_msi_set_enable - add to include/xlat.lst Changes in v5: - rename to PHYSDEVOP_msi_control - combine "mode" and "enable" into "flags" - refuse to enable both MSI and MSI-X, and also to enable MSI(-X) on incapable device - disable/enable INTx when enabling/disabling MSI (?) - refuse if !use_msi - adjust flask hook to make more sense (require "setup" access on device, not on domain) - rebase on master Changes in v6: - rename to PHYSDEVOP_interrupt_control - extend with INTx control - Ensure than MSI(-X) can't be enabled together with INTx and the other MSI(-X). - deduplicate code in msi_control - explicitly refuse to operate on hidden devices - expand flags to uint16_t to avoid implicit padding I'm not sure if XSM part is correct, compile-tested only, as I'm not sure how to set the policy. --- xen/arch/x86/msi.c | 45 +- xen/arch/x86/physdev.c | 53 ++- xen/arch/x86/x86_64/physdev.c | 4 ++- xen/include/asm-x86/msi.h | 2 +- xen/include/public/physdev.h| 23 +- xen/include/xlat.lst| 1 +- xen/include/xsm/dummy.h | 7 - xen/include/xsm/xsm.h | 6 +++- xen/xsm/dummy.c | 1 +- xen/xsm/flask/hooks.c | 24 ++- xen/xsm/flask/policy/access_vectors | 1 +- 11 files changed, 167 insertions(+) diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c index d630600..ecea91a 100644 --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1443,6 +1443,51 @@ int pci_restore_msi_state(struct pci_dev *pdev) return 0; } +int msi_control(struct pci_dev *pdev, bool msix, bool enable) +{ +unsigned int cap = msix ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI; +unsigned int other_cap = msix ? PCI_CAP_ID_MSI : PCI_CAP_ID_MSIX; +uint16_t cmd; + +if ( !use_msi ) +return -EOPNOTSUPP; + +if ( !pci_find_cap_offset(pdev->seg, + pdev->bus, + PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), + cap) ) +return -ENODEV; + +cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND); + +/* don't allow enabling MSI(-X) and INTx at the same time */ +if ( enable && ! (cmd & PCI_COMMAND_INTX_DISABLE) ) +return -EBUSY; + +/* don't allow enabling both MSI and MSI-X at the same time */ +if ( enable && find_msi_entry(pdev, -1, other_cap) ) +return -EBUSY; + +if ( msix ) +msix_set_enable(pdev, enable); +else +msi_set_enable(pdev, enable); + +return 0; +} + +int intx_control(struct pci_dev *pdev, bool enable) +{ +/* don't allow enabling INTx if MSI(-X) is already enabled */ +if ( enable && find_msi_entry(pdev, -1, PCI_CAP_ID_MSI) ) +return -EBUSY; +if ( enable && find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX) ) +return -EBUSY; +pci_intx(pdev, enable); +return 0; +} + void __init early_msi_init(void) { if ( use_msi < 0 ) diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 3a3c158..7b71039 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -662,6 +662,59 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_interrupt_control: { +struct physdev_interrupt_control op; +struct pci_dev *pdev; +int intr_type; +bool enable; + +ret = -EFAULT; +if ( copy_from_guest(&op, arg, 1) ) +break; + +ret = -EINVAL; +if ( op.flags & ~(PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK | + PHYSDEVOP_INTERRUPT_CONTROL_ENABLE) ) +break; + +intr_type = op.flags & PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK; +enable = op.flags & PHYSDEVOP_INTERRUPT_CONTROL_ENABLE; + +pcidevs_lock(); +pdev = pci_get_pdev(op.seg, op.bus, op.devfn); +
[Xen-devel] [PATCH v6 1/6] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
HVM domains use IOMMU and device model assistance for communicating with PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain. But pciback serve also second function - it reset the device when it is deassigned from the guest and for this reason pciback needs to be used with HVM domain too. When HVM domain has device model in stubdomain, attaching xen-pciback to the target domain itself may prevent attaching xen-pciback to the (PV) stubdomain, effectively breaking PCI passthrough. Fix this by attaching pciback only to one domain: if PV stubdomain is in use, let it be stubdomain (the commit prevents attaching device to target HVM in this case); otherwise, attach it to the target domain. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Wei Liu --- Changes in v2: - previously called "libxl: attach xen-pciback only to PV domains" - instead of excluding all HVMs, change the condition to what actually matters here - check if stubdomain is in use; this way xen-pciback is always in use (either for the target domain, or it's stubdomain), fixing PCI reset by xen-pciback concerns Changes in v3: - adjust commit message --- tools/libxl/libxl_pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 03beb86..2e06a45 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1106,7 +1106,7 @@ out: } } -if (!starting) +if (!starting && !libxl_get_stubdom_id(CTX, domid)) rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting); else rc = 0; @@ -1302,7 +1302,7 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid, } } -if (d_config->num_pcidevs > 0) { +if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) { rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs, d_config->num_pcidevs); if (rc < 0) { -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v6 4/6] xen/x86: Allow stubdom access to irq created for msi.
Stubdomains need to be given sufficient privilege over the guest which it provides emulation for in order for PCI passthrough to work correctly. When a HVM domain try to enable MSI, QEMU in stubdomain calls PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as part of xc_domain_update_msi_irq. Allow for that as part of PHYSDEVOP_map_pirq. This is not needed for PCI INTx, because IRQ in that case is known beforehand and the stubdomain is given permissions over this IRQ by libxl__device_pci_add (there's a do_pci_add against the stubdomain). create_irq() already grant IRQ access to hardware_domain, with assumption the device model (something managing this IRQ) lives there. Modify create_irq() to take additional parameter pointing at device model domain - which may be dom0 or stubdomain. Save ID of the domain given permission, to revoke it in destroy_irq() - easier and cleaner than replaying logic of create_irq() parameter. Use domid instead of actual reference to the domain, because it might get destroyed before destroying IRQ (stubdomain is destroyed before its target domain). And it is not an issue, because IRQ permissions live within domain structure, so destroying a domain also implicitly revoke the permission. Potential domid reuse is detected by by checking if that domain does have permission over the IRQ being destroyed. Then, adjust all callers to provide the parameter. In case of calls not related to stubdomain-initiated allocations, give it either hardware_domain (so the behavior is unchanged there), or NULL for interrupts used by Xen internally. Inspired by https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch by Eric Chanudet . Signed-off-by: Simon Gaiser Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - extend commit message Changes in v4: - add missing destroy_irq on error path Changes in v5: - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which basically make it a different patch - add get_dm_domain() helper - do not give hardware_domain permission over IRQs used in Xen internally - rename create_irq argument to just 'd', to avoid confusion when it's called by hardware domain - verify that device is de-assigned before pci_remove_device call - save ID of domain given permission in create_irq(), to revoke it in destroy_irq() - drop domain parameter from destroy_irq() and msi_free_irq() - do not give hardware domain permission over IRQ created in iommu_set_interrupt() Changes in v6: - do not give permission over hpet irq to hardware_domain - move creator_domid to arch_irq_desc - fix creator_domid initialization - always give current->domain permission instead of using get_dm_domain() helper. Analysis of all its use cases tells that it is the only value it returns. - drop unrelated change --- xen/arch/x86/hpet.c | 3 +- xen/arch/x86/irq.c | 51 ++--- xen/drivers/char/ns16550.c | 2 +- xen/drivers/passthrough/amd/iommu_init.c | 2 +- xen/drivers/passthrough/vtd/iommu.c | 3 +- xen/include/asm-x86/irq.h| 7 ++- 6 files changed, 50 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c index 4b08488..5ed4405 100644 --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel *ch) { int irq; -if ( (irq = create_irq(NUMA_NO_NODE)) < 0 ) +if ( (irq = create_irq(NUMA_NO_NODE, NULL)) < 0 ) return irq; ch->msi.irq = irq; diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 0ee3346..0b4c20a 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -254,7 +254,13 @@ void __init clear_irq_vector(int irq) /* * Dynamic irq allocate and deallocation for MSI */ -int create_irq(nodeid_t node) + +/* + * create_irq - allocate irq for MSI + * @d domain that will get permission over the allocated irq; this permission + * will automatically be revoked on destroy_irq + */ +int create_irq(nodeid_t node, struct domain *d) { int irq, ret; struct irq_desc *desc; @@ -282,23 +288,30 @@ int create_irq(nodeid_t node) } ret = assign_irq_vector(irq, mask); } +ASSERT(desc->arch.creator_domid == DOMID_INVALID); if (ret < 0) { desc->arch.used = IRQ_UNUSED; irq = ret; } -else if ( hardware_domain ) +else if ( d ) { -ret = irq_permit_access(hardware_domain, irq); +ASSERT(d == current->domain); +ret = irq_permit_access(d, irq); if ( ret ) printk(XENLOG_G_ERR - "Could not grant Dom0 access to IRQ%d (error %d)\n", -
[Xen-devel] [PATCH v6 2/6] libxl: attach PCI device to qemu only after setting pciback/pcifront
When qemu is running in stubdomain, handling "pci-ins" command will fail if pcifront is not initialized already. Fix this by sending such command only after confirming that pciback/front is running. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Wei Liu --- Changes in v2: - Fixed code style since previous version. --- tools/libxl/libxl_pci.c | 9 + 1 file changed, 9 insertions(+) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 2e06a45..578535f 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcide { libxl_ctx *ctx = libxl__gc_owner(gc); unsigned int orig_vdev, pfunc_mask; +char *be_path; libxl_device_pci *assigned; int num_assigned, i, rc; int stubdomid = 0; @@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcide rc = do_pci_add(gc, stubdomid, &pcidev_s, 0); if ( rc ) goto out; +/* Wait for the device actually being connected, otherwise device model + * running there will fail to find the device. */ +be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0", + libxl__xs_get_dompath(gc, 0), stubdomid); +rc = libxl__wait_for_backend(gc, be_path, + GCSPRINTF("%d", XenbusStateConnected)); +if (rc) +goto out; } orig_vdev = pcidev->vdevfn & ~7U; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control
On Fri, Sep 20, 2019 at 12:10:09PM +0200, Jan Beulich wrote: > On 14.09.2019 17:37, Marek Marczykowski-Górecki wrote: > > Allow device model running in stubdomain to enable/disable INTx/MSI(-X), > > bypassing pciback. While pciback is still used to access config space > > from within stubdomain, it refuse to write to > > PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE > > in non-permissive mode. Which is the right thing to do for PV domain > > (the main use case for pciback), as PV domain should use XEN_PCI_OP_* > > commands for that. Unfortunately those commands are not good for > > stubdomain use, as they configure MSI in dom0's kernel too, which should > > not happen for HVM domain. > > Why the "for HVM domain" here? I.e. why would this be correct for > a PV domain? Besides my dislike for such a bypass (imo all of the > handling should go through pciback, or none of it) I continue to > wonder whether the problem can't be addressed by a pciback change. > And even if not, I'd still wonder whether the request shouldn't go > through pciback, to retain proper layering. Ultimately it may be > better to have even the map/unmap go through pciback (it's at > least an apparent violation of the original physdev-op model that > these two are XSM_DM_PRIV). Technically it should be possible to move this part to pciback, and in fact this is what I've considered in the first version of this series. But Roger points out on each version[1] of this series that pciback is meant to serve *PV* domains, where a PCI passthrough is a completely different different beast. In fact, I even consider that using pcifront in a Linux stubdomain as a proxy for qemu there may be a bad idea (one needs to be careful to avoid stubdomain kernel fighting with qemu about device state). Roger, what is the state of Xen internal vPCI? If handling PCI passthrough in Xen (or maybe standalone emulator), without qemu help is going to happen sooner than later (I guess not 4.13, but maybe 4.14?), then maybe this whole patch doesn't make sense as a temporary measure? Anyway, if you all agree that pciback should be the way to go, I can go that route too. In practice, it would be a flag (set by the toolstack?) allowing writes to appropriate config space registers directly (with appropriate checks, as in this patch). [1] https://lists.xenproject.org/archives/html/xen-devel/2019-08/msg00486.html -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control
On Mon, Sep 23, 2019 at 09:58:27AM +0200, Jan Beulich wrote: > On 20.09.2019 18:02, Marek Marczykowski-Górecki wrote: > > On Fri, Sep 20, 2019 at 12:10:09PM +0200, Jan Beulich wrote: > >> On 14.09.2019 17:37, Marek Marczykowski-Górecki wrote: > >>> Allow device model running in stubdomain to enable/disable INTx/MSI(-X), > >>> bypassing pciback. While pciback is still used to access config space > >>> from within stubdomain, it refuse to write to > >>> PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE > >>> in non-permissive mode. Which is the right thing to do for PV domain > >>> (the main use case for pciback), as PV domain should use XEN_PCI_OP_* > >>> commands for that. Unfortunately those commands are not good for > >>> stubdomain use, as they configure MSI in dom0's kernel too, which should > >>> not happen for HVM domain. > >> > >> Why the "for HVM domain" here? I.e. why would this be correct for > >> a PV domain? Besides my dislike for such a bypass (imo all of the > >> handling should go through pciback, or none of it) I continue to > >> wonder whether the problem can't be addressed by a pciback change. > >> And even if not, I'd still wonder whether the request shouldn't go > >> through pciback, to retain proper layering. Ultimately it may be > >> better to have even the map/unmap go through pciback (it's at > >> least an apparent violation of the original physdev-op model that > >> these two are XSM_DM_PRIV). > > > > Technically it should be possible to move this part to pciback, and in > > fact this is what I've considered in the first version of this series. > > But Roger points out on each version[1] of this series that pciback is > > meant to serve *PV* domains, where a PCI passthrough is a completely > > different different beast. In fact, I even consider that using pcifront > > in a Linux stubdomain as a proxy for qemu there may be a bad idea (one > > needs to be careful to avoid stubdomain kernel fighting with qemu about > > device state). > > Well, not using pciback _at all_ in this case would be another option. > What I dislike is the furthering of hybrid-ness. Ah, I see. This may be a good idea, if this type of PCI passthrough is going to stay. If we're going away from qemu towards other options mentioned in previous email, I'd say such a rework is too much work for a time limited usefulness. > > > Anyway, if you all agree that pciback should be the way to go, I can go > > that route too. In practice, it would be a flag (set by the toolstack?) > > allowing writes to appropriate config space registers directly (with > > appropriate checks, as in this patch). > > I'm afraid I don't agree: How would allowing writes to more config space > registers by a stubdom be safe? Exactly the same as in this patch: pciback would perform the same validation (prohibit enabling MSI together with INTx etc). BTW what are the risks (besides DoS) of allowing full config space access, assuming VT-d with interrupt remapping present? This sounds similar to risks of malicious device connected to some domU, right? Can such device (or a domain controlling such device) break out to Xen or dom0? Can it steal data from other domains? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control
On Mon, Sep 23, 2019 at 02:05:58PM +0200, Jan Beulich wrote: > On 23.09.2019 12:47, Marek Marczykowski-Górecki wrote: > > On Mon, Sep 23, 2019 at 09:58:27AM +0200, Jan Beulich wrote: > >> On 20.09.2019 18:02, Marek Marczykowski-Górecki wrote: > >>> Anyway, if you all agree that pciback should be the way to go, I can go > >>> that route too. In practice, it would be a flag (set by the toolstack?) > >>> allowing writes to appropriate config space registers directly (with > >>> appropriate checks, as in this patch). > >> > >> I'm afraid I don't agree: How would allowing writes to more config space > >> registers by a stubdom be safe? > > > > Exactly the same as in this patch: pciback would perform the same > > validation (prohibit enabling MSI together with INTx etc). > > > > BTW what are the risks (besides DoS) of allowing full config space > > access, assuming VT-d with interrupt remapping present? This sounds > > similar to risks of malicious device connected to some domU, right? Can > > such device (or a domain controlling such device) break out to Xen or > > dom0? Can it steal data from other domains? > > There shouldn't be, but this would need proving. The direction of > proof then should be the other way around (and I realize it may be > [close to] impossible): Widening what guests (including stub > domains) are allowed to do should be proven to add no additional > risks. It shouldn't be (by example, as I imply from your question) > that an actual issue needs to be pointed out. What about this: HVM guest can already do all of this when qemu is running in dom0. So, allowing those actions when qemu is running in stubdomain should not introduce _additional_ risks. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control
On Mon, Sep 23, 2019 at 03:02:49PM +0200, Jan Beulich wrote: > On 23.09.2019 14:25, Marek Marczykowski-Górecki wrote: > > What about this: HVM guest can already do all of this when qemu is > > running in dom0. So, allowing those actions when qemu is running in > > stubdomain should not introduce _additional_ risks. > > Well, in a way - yes. But I don't think it's right to have qemu do > the direct writes it does (and I wouldn't be surprised if there > were still actual issues with this model). Hence it's not going to > be an improvement if this suspicious underlying design got > extended to other components. This sounds like any workflow involving qemu would be inferior. And I agree with that. But also I do need PCI passthrough working, so I need a solution until we have an alternative implementation. If that alternative is going to happen soon, I'm also fine with carrying patches in Qubes package (like I already do). This wouldn't be nice for the rest of the community (I believe many other Xen-based projects also carry similar patches already), but it looks like upstreaming this is taking way too much effort than it's worth for a temporary solution. So, in the next version of this series I'm going to drop this patch (and the next one). -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 0/4] Fix PCI passthrough for HVM with stubdomain
In this version, I drop PHYSDEVOP_interrupt_control patch, since Jan prefer not to mix pciif and hypercalls for serving device model. Enabling MSI by the stubdomain remains unsolved here, but other patches are improvement anyway. Changes in v2: - new "xen/x86: Allow stubdom access to irq created for msi" patch - applied review comments from v1 Changes is v3: - apply suggestions by Roger - add PHYSDEVOP_msi_msix_set_enable Changes in v4: - implement suggestions by Wei, Roger, Jan - plug new physdevop into XSM Changes in v5: - rebase on master - rename to PHYSDEVOP_msi_control - move granting access to IRQ into create_irq Changes in v6: - simplify granting IRQ access, record dm domid for cleanup - rename to PHYSDEVOP_interrupt_control - include INTx control in the hypercall Changes in v7: - update "xen/x86: Allow stubdom access to irq created for msi" - drop "xen/x86: add PHYSDEVOP_interrupt_control" - drop "tools/libxc: add wrapper for PHYSDEVOP_interrupt_control" --- Cc: Ian Jackson Cc: Wei Liu Cc: Andrew Cooper Cc: George Dunlap Cc: Jan Beulich Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Tim Deegan Cc: Anthony PERARD Cc: "Roger Pau Monné" Cc: Suravee Suthikulpanit Cc: Brian Woods Cc: Kevin Tian Cc: Daniel De Graaf Marek Marczykowski-Górecki (4): libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use libxl: attach PCI device to qemu only after setting pciback/pcifront libxl: don't try to manipulate json config for stubdomain xen/x86: Allow stubdom access to irq created for msi. tools/libxl/libxl_pci.c | 63 + xen/arch/x86/hpet.c | 3 +- xen/arch/x86/irq.c | 42 +++-- xen/drivers/char/ns16550.c | 2 +- xen/drivers/passthrough/amd/iommu_init.c | 2 +- xen/drivers/passthrough/vtd/iommu.c | 3 +- xen/include/asm-x86/irq.h| 7 ++- 7 files changed, 84 insertions(+), 38 deletions(-) base-commit: 6c9639a72f0ca3a9430ef75f375877182281fdef -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 3/4] libxl: don't try to manipulate json config for stubdomain
Stubdomain do not have it's own config file - its configuration is derived from target domains. Do not try to manipulate it when attaching PCI device. This bug prevented starting HVM with stubdomain and PCI passthrough device attached. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Wei Liu --- Changes in v3: - skip libxl__dm_check_start too, as stubdomain is guaranteed to be running at this stage already - do not init d_config at all, as it is used only for json manipulation Changes in v4: - adjust comment style --- tools/libxl/libxl_pci.c | 50 ++ 1 file changed, 32 insertions(+), 18 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 578535f..d26fc9a 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -120,10 +120,14 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d libxl_domain_config d_config; libxl_device_pci pcidev_saved; libxl__domain_userdata_lock *lock = NULL; +bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL); -libxl_domain_config_init(&d_config); -libxl_device_pci_init(&pcidev_saved); -libxl_device_pci_copy(CTX, &pcidev_saved, pcidev); +/* Stubdomain doesn't have own config. */ +if (!is_stubdomain) { +libxl_domain_config_init(&d_config); +libxl_device_pci_init(&pcidev_saved); +libxl_device_pci_copy(CTX, &pcidev_saved, pcidev); +} be_path = libxl__domain_device_backend_path(gc, 0, domid, 0, LIBXL__DEVICE_KIND_PCI); @@ -152,27 +156,35 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d GCNEW(device); libxl__device_from_pcidev(gc, domid, pcidev, device); -lock = libxl__lock_domain_userdata(gc, domid); -if (!lock) { -rc = ERROR_LOCK_FAIL; -goto out; -} +/* + * Stubdomin config is derived from its target domain, it doesn't have + * its own file. + */ +if (!is_stubdomain) { +lock = libxl__lock_domain_userdata(gc, domid); +if (!lock) { +rc = ERROR_LOCK_FAIL; +goto out; +} -rc = libxl__get_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +rc = libxl__get_domain_configuration(gc, domid, &d_config); +if (rc) goto out; -device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, - &pcidev_saved); +device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, + &pcidev_saved); -rc = libxl__dm_check_start(gc, &d_config, domid); -if (rc) goto out; +rc = libxl__dm_check_start(gc, &d_config, domid); +if (rc) goto out; +} for (;;) { rc = libxl__xs_transaction_start(gc, &t); if (rc) goto out; -rc = libxl__set_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +if (lock) { +rc = libxl__set_domain_configuration(gc, domid, &d_config); +if (rc) goto out; +} libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back)); @@ -184,8 +196,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d out: libxl__xs_transaction_abort(gc, &t); if (lock) libxl__unlock_domain_userdata(lock); -libxl_device_pci_dispose(&pcidev_saved); -libxl_domain_config_dispose(&d_config); +if (!is_stubdomain) { +libxl_device_pci_dispose(&pcidev_saved); +libxl_domain_config_dispose(&d_config); +} return rc; } -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.
Stubdomains need to be given sufficient privilege over the guest which it provides emulation for in order for PCI passthrough to work correctly. When a HVM domain try to enable MSI, QEMU in stubdomain calls PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as part of xc_domain_update_msi_irq. Allow for that as part of PHYSDEVOP_map_pirq. This is not needed for PCI INTx, because IRQ in that case is known beforehand and the stubdomain is given permissions over this IRQ by libxl__device_pci_add (there's a do_pci_add against the stubdomain). create_irq() already grant IRQ access to hardware_domain, with assumption the device model (something managing this IRQ) lives there. Modify create_irq() to take additional parameter pointing at device model domain - which may be dom0 or stubdomain. Save ID of the domain given permission, to revoke it in destroy_irq() - easier and cleaner than replaying logic of create_irq() parameter. Use domid instead of actual reference to the domain, because it might get destroyed before destroying IRQ (stubdomain is destroyed before its target domain). And it is not an issue, because IRQ permissions live within domain structure, so destroying a domain also implicitly revoke the permission. Potential domid reuse is detected by by checking if that domain does have permission over the IRQ being destroyed. Then, adjust all callers to provide the parameter. In case of calls not related to stubdomain-initiated allocations, give it either hardware_domain (so the behavior is unchanged there), or NULL for interrupts used by Xen internally. Inspired by https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch by Eric Chanudet . Signed-off-by: Simon Gaiser Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - extend commit message Changes in v4: - add missing destroy_irq on error path Changes in v5: - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which basically make it a different patch - add get_dm_domain() helper - do not give hardware_domain permission over IRQs used in Xen internally - rename create_irq argument to just 'd', to avoid confusion when it's called by hardware domain - verify that device is de-assigned before pci_remove_device call - save ID of domain given permission in create_irq(), to revoke it in destroy_irq() - drop domain parameter from destroy_irq() and msi_free_irq() - do not give hardware domain permission over IRQ created in iommu_set_interrupt() Changes in v6: - do not give permission over hpet irq to hardware_domain - move creator_domid to arch_irq_desc - fix creator_domid initialization - always give current->domain permission instead of using get_dm_domain() helper. Analysis of all its use cases tells that it is the only value it returns. - drop unrelated change Changes in v7: - Code style improvements (spaces, use %pd etc) - use bool parameter to create_irq, as it's only getting current->domain or NULL - remove redundant irq_access_permitted() --- xen/arch/x86/hpet.c | 3 +- xen/arch/x86/irq.c | 42 + xen/drivers/char/ns16550.c | 2 +- xen/drivers/passthrough/amd/iommu_init.c | 2 +- xen/drivers/passthrough/vtd/iommu.c | 3 +- xen/include/asm-x86/irq.h| 7 +++- 6 files changed, 41 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c index 4b08488..57f68fa 100644 --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel *ch) { int irq; -if ( (irq = create_irq(NUMA_NO_NODE)) < 0 ) +if ( (irq = create_irq(NUMA_NO_NODE, false)) < 0 ) return irq; ch->msi.irq = irq; diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 0ee3346..256dd02 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -254,7 +254,8 @@ void __init clear_irq_vector(int irq) /* * Dynamic irq allocate and deallocation for MSI */ -int create_irq(nodeid_t node) + +int create_irq(nodeid_t node, bool grant_access) { int irq, ret; struct irq_desc *desc; @@ -282,18 +283,23 @@ int create_irq(nodeid_t node) } ret = assign_irq_vector(irq, mask); } + +ASSERT(desc->arch.creator_domid == DOMID_INVALID); + if (ret < 0) { desc->arch.used = IRQ_UNUSED; irq = ret; } -else if ( hardware_domain ) +else if ( grant_access ) { -ret = irq_permit_access(hardware_domain, irq); +ret = irq_permit_access(current->domain, irq); if ( ret ) printk(XENLOG_G_ERR - "Could not grant Dom0 access to IRQ%d (error %d)\n&qu
[Xen-devel] [PATCH v7 1/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
HVM domains use IOMMU and device model assistance for communicating with PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain. But pciback serve also second function - it reset the device when it is deassigned from the guest and for this reason pciback needs to be used with HVM domain too. When HVM domain has device model in stubdomain, attaching xen-pciback to the target domain itself may prevent attaching xen-pciback to the (PV) stubdomain, effectively breaking PCI passthrough. Fix this by attaching pciback only to one domain: if PV stubdomain is in use, let it be stubdomain (the commit prevents attaching device to target HVM in this case); otherwise, attach it to the target domain. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Wei Liu --- Changes in v2: - previously called "libxl: attach xen-pciback only to PV domains" - instead of excluding all HVMs, change the condition to what actually matters here - check if stubdomain is in use; this way xen-pciback is always in use (either for the target domain, or it's stubdomain), fixing PCI reset by xen-pciback concerns Changes in v3: - adjust commit message --- tools/libxl/libxl_pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 03beb86..2e06a45 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1106,7 +1106,7 @@ out: } } -if (!starting) +if (!starting && !libxl_get_stubdom_id(CTX, domid)) rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting); else rc = 0; @@ -1302,7 +1302,7 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid, } } -if (d_config->num_pcidevs > 0) { +if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) { rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs, d_config->num_pcidevs); if (rc < 0) { -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 2/4] libxl: attach PCI device to qemu only after setting pciback/pcifront
When qemu is running in stubdomain, handling "pci-ins" command will fail if pcifront is not initialized already. Fix this by sending such command only after confirming that pciback/front is running. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Wei Liu --- Changes in v2: - Fixed code style since previous version. --- tools/libxl/libxl_pci.c | 9 + 1 file changed, 9 insertions(+) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 2e06a45..578535f 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcide { libxl_ctx *ctx = libxl__gc_owner(gc); unsigned int orig_vdev, pfunc_mask; +char *be_path; libxl_device_pci *assigned; int num_assigned, i, rc; int stubdomid = 0; @@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcide rc = do_pci_add(gc, stubdomid, &pcidev_s, 0); if ( rc ) goto out; +/* Wait for the device actually being connected, otherwise device model + * running there will fail to find the device. */ +be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0", + libxl__xs_get_dompath(gc, 0), stubdomid); +rc = libxl__wait_for_backend(gc, be_path, + GCSPRINTF("%d", XenbusStateConnected)); +if (rc) +goto out; } orig_vdev = pcidev->vdevfn & ~7U; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.
On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > It would still be nice to get the missing bits (interrupt enabling), > or else this patch is kind of pointless, since it still doesn't allow > stubdomains to work correctly with passed through devices. Well, that part, as discussed, doesn't need to be in Xen. For example the solution deployed in current Qubes stable version is based on pciback for this purpose. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7.1 4/4] xen/x86: Allow stubdom access to irq created for msi.
Stubdomains need to be given sufficient privilege over the guest which it provides emulation for in order for PCI passthrough to work correctly. When a HVM domain try to enable MSI, QEMU in stubdomain calls PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as part of xc_domain_update_msi_irq. Give the stubdomain enough permissions over the mapped interrupt in order to bind it successfully to it's target domain. This is not needed for PCI INTx, because IRQ in that case is known beforehand and the stubdomain is given permissions over this IRQ by libxl__device_pci_add (there's a do_pci_add against the stubdomain). create_irq() already grant IRQ access to hardware_domain, with assumption the device model lives there. Modify create_irq() to take additional parameter, whether to grant permissions to the domain creating the IRQ, which may be dom0 or a stubdomain. Do this instead of granting access always to hardware_domain. Save ID of the domain given permission, to revoke it in destroy_irq() - easier and cleaner than replaying logic of create_irq() parameter. Use domid instead of actual reference to the domain, because it might get destroyed before destroying IRQ (stubdomain is destroyed before its target domain). And it is not an issue, because IRQ permissions live within domain structure, so destroying a domain also implicitly revoke the permission. Potential domid reuse is detected by checking if that domain does have permission over the IRQ being destroyed. Then, adjust all callers to provide the parameter. In case of Xen internal allocations, set it to false, but for domain accessible interrupt set it to true. Inspired by https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch by Eric Chanudet . Signed-off-by: Simon Gaiser Signed-off-by: Marek Marczykowski-Górecki Reviewed-by: Roger Pau Monné --- Changes in v3: - extend commit message Changes in v4: - add missing destroy_irq on error path Changes in v5: - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which basically make it a different patch - add get_dm_domain() helper - do not give hardware_domain permission over IRQs used in Xen internally - rename create_irq argument to just 'd', to avoid confusion when it's called by hardware domain - verify that device is de-assigned before pci_remove_device call - save ID of domain given permission in create_irq(), to revoke it in destroy_irq() - drop domain parameter from destroy_irq() and msi_free_irq() - do not give hardware domain permission over IRQ created in iommu_set_interrupt() Changes in v6: - do not give permission over hpet irq to hardware_domain - move creator_domid to arch_irq_desc - fix creator_domid initialization - always give current->domain permission instead of using get_dm_domain() helper. Analysis of all its use cases tells that it is the only value it returns. - drop unrelated change Changes in v7: - Code style improvements (spaces, use %pd etc) - use bool parameter to create_irq, as it's only getting current->domain or NULL - remove redundant irq_access_permitted() Changes in v7.1: - adjust comments, merge if - update commit message --- xen/arch/x86/hpet.c | 3 +- xen/arch/x86/irq.c | 42 + xen/drivers/char/ns16550.c | 2 +- xen/drivers/passthrough/amd/iommu_init.c | 2 +- xen/drivers/passthrough/vtd/iommu.c | 3 +- xen/include/asm-x86/irq.h| 11 ++- 6 files changed, 45 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c index 4b08488..57f68fa 100644 --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel *ch) { int irq; -if ( (irq = create_irq(NUMA_NO_NODE)) < 0 ) +if ( (irq = create_irq(NUMA_NO_NODE, false)) < 0 ) return irq; ch->msi.irq = irq; diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 0ee3346..4304896 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -254,7 +254,8 @@ void __init clear_irq_vector(int irq) /* * Dynamic irq allocate and deallocation for MSI */ -int create_irq(nodeid_t node) + +int create_irq(nodeid_t node, bool grant_access) { int irq, ret; struct irq_desc *desc; @@ -282,18 +283,23 @@ int create_irq(nodeid_t node) } ret = assign_irq_vector(irq, mask); } + +ASSERT(desc->arch.creator_domid == DOMID_INVALID); + if (ret < 0) { desc->arch.used = IRQ_UNUSED; irq = ret; } -else if ( hardware_domain ) +else if ( grant_access ) { -ret = irq_permit_access(hardware_domain, irq); +ret = irq_
Re: [Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.
On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote: > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki wrote: > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > > > It would still be nice to get the missing bits (interrupt enabling), > > > or else this patch is kind of pointless, since it still doesn't allow > > > stubdomains to work correctly with passed through devices. > > > > Well, that part, as discussed, doesn't need to be in Xen. For example > > the solution deployed in current Qubes stable version is based on > > pciback for this purpose. > > Ack. Do you think it would make sense to submit that part to Linux > then? How would an interface with toolstack (when to allow enabling MSI) should look like? Right now I have it as extra attribute in sysfs of pciback and libxl writes to it. Or rather should it be in xenstore? Or maybe pciback should somehow detect itself if it's talking to stubdomain while the device is assigned to a HVM domain, or to a target PV domain itself? The actual patch is here: https://github.com/QubesOS/qubes-linux-kernel/blob/master/0014-xen-pciback-add-attribute-to-allow-MSI-enable-flag-w.patch and the toolstack part: https://github.com/QubesOS/qubes-vmm-xen/blob/xen-4.8/patch-stubdom-allow-msi-enable.patch -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] VM_BUG_ON_PAGE(!PageOffline(page), page) in alloc_xenballooned_pages
Hi, I've hit VM_BUG_ON_PAGE(!PageOffline(page), page) in alloc_xenballooned_pages, when trying to use gnttab from userspace application. It happens on Xen PV, but not on Xen PVH or HVM with the same kernel. This happens at least with 5.1.6, but also 5.2.15 (as seen below). Based on this, it looks related to 0266def91377 (xen/balloon: Fix mapping PG_offline pages to user space) and probably 77c4adf6a6df (xen/balloon: mark inflated pages PG_offline). Any idea? Below is full message. page:ea0003e7ffc0 refcount:1 mapcount:0 mapping: index:0x0 flags: 0xe1000(reserved) raw: 000e1000 dead0100 dead0200 raw: 0001 page dumped because: VM_BUG_ON_PAGE(!PageOffline(page)) [ cut here ] kernel BUG at include/linux/page-flags.h:744! invalid opcode: [#1] SMP NOPTI CPU: 0 PID: 551 Comm: qubesdb-daemon Tainted: GW 5.2.15-200.fc30.x86_64 #1 RIP: e030:alloc_xenballooned_pages+0xef/0x110 Code: c0 0c 10 00 e8 b2 fa ff ff 85 c0 0f 84 60 ff ff ff 41 89 dd b8 f4 ff ff ff eb b0 48 c7 c6 e8 af 14 82 48 89 c7 e8 31 32 ca ff <0f> 0b 48 c7 c7 40 f 0 4d 82 e8 13 85 3f 00 31 c0 48 83 c4 08 5b 5d RSP: e02b:c90001113d98 EFLAGS: 00010246 RAX: 0037 RBX: RCX: 0149 RDX: RSI: RDI: 82143bbc RBP: 0001 R08: 0181 R09: 0149 R10: 000a R11: c90001113c38 R12: 88800d670960 R13: 7fffdff236a0 R14: 7fffdff236a0 R15: 8880108bd000 FS: 7f30e205e7c0() GS:888013e0() knlGS: CS: e030 DS: ES: CR0: 80050033 CR2: 7f30e2082000 CR3: 0c92 CR4: 00040660 Call Trace: ? __kmalloc+0x16c/0x210 gnttab_alloc_pages+0x11/0x40 gntdev_alloc_map+0xe7/0x180 [xen_gntdev] gntdev_ioctl+0x203/0x530 [xen_gntdev] do_vfs_ioctl+0x405/0x660 ksys_ioctl+0x5e/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5f/0x1a0 ? page_fault+0x8/0x30 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f30e239b3bb Code: 0f 1e fa 48 8b 05 cd ca 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d ca 0c 00 f7 d8 64 89 01 48 RSP: 002b:7fffdff23698 EFLAGS: 0202 ORIG_RAX: 0010 RAX: ffda RBX: 0003 RCX: 7f30e239b3bb RDX: 7fffdff236a0 RSI: 00184700 RDI: 000b RBP: 7fffdff23730 R08: 7fffdff2375c R09: 7fffdff23758 R10: fcc9 R11: 0202 R12: 7fffdff236a0 R13: 1000 R14: 000b R15: 0001 Modules linked in: xenfs ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_blkfront xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] VM_BUG_ON_PAGE(!PageOffline(page), page) in alloc_xenballooned_pages
On Fri, Sep 27, 2019 at 09:44:35AM +0200, David Hildenbrand wrote: > On 26.09.19 23:34, Marek Marczykowski-Górecki wrote: > > Hi, > > > > I've hit VM_BUG_ON_PAGE(!PageOffline(page), page) in > > alloc_xenballooned_pages, when trying to use gnttab from userspace > > application. It happens on Xen PV, but not on Xen PVH or HVM with the > > same kernel. This happens at least with 5.1.6, but also 5.2.15 > > (as seen below). Based on this, it looks related to 0266def91377 > > (xen/balloon: Fix mapping PG_offline pages to user space) and probably > > 77c4adf6a6df (xen/balloon: mark inflated pages PG_offline). > > > > Any idea? Below is full message. > > Now that's weird. Weird because half a year passed since > 0266def91377 (xen/balloon: Fix mapping PG_offline pages to user space). Not sure about others, but in Qubes we use PVH/HVM VMs mostly. > > page:ea0003e7ffc0 refcount:1 mapcount:0 mapping: > > index:0x0 > > flags: 0xe1000(reserved) > > So we have a PageReserved page that is not PageOffline. I assume this > happens when we do a __ClearPageOffline() in alloc_xenballooned_pages(). > > That means, that we get such a page via balloon_retrieve(true). Which > means that we have such a page sitting in the ballooned_pages list, which > is weird. > > Pages enter ballooned_pages via __balloon_append() only. > > 1. Via xen_online_page(). We have a __SetPageOffline() right in front >of it. > 2. Via balloon_add_region(). I don't see a __SetPageOffline(). > 3. Via decrease_reservation(). We seem to do a __SetPageOffline() on all >pages in the previous loop. > 4. Via free_xenballooned_pages(). We have a __SetPageOffline() right >in front of it. > > > So this smells like #2 (and matches your PV only observation). Also, > it makes sense that the page is PageReserved that way. > > > Wonder if it is as easy as: Yes, besides missing semicolon it works. Thanks! > From 0955beef5aa11da4a8398472ce3106a92599cbe6 Mon Sep 17 00:00:00 2001 > From: David Hildenbrand > Date: Fri, 27 Sep 2019 09:39:31 +0200 > Subject: [PATCH v1] xen/balloon: Set pages PageOffline() in > balloon_add_region() > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > We are missing a __SetPageOffline(), which is why we can get > !PageOffline() pages onto the balloon list, where > alloc_xenballooned_pages() will complain: > > page:ea0003e7ffc0 refcount:1 mapcount:0 mapping: index:0x0 > flags: 0xe1000(reserved) > raw: 000e1000 dead0100 dead0200 > raw: 0001 00000000 > page dumped because: VM_BUG_ON_PAGE(!PageOffline(page)) > [ cut here ] > kernel BUG at include/linux/page-flags.h:744! > invalid opcode: 0000 [#1] SMP NOPTI > > Reported-by: Marek Marczykowski-Górecki > Fixes: 77c4adf6a6df ("xen/balloon: mark inflated pages PG_offline") > Cc: sta...@vger.kernel.org # v5.1+ > Signed-off-by: David Hildenbrand Tested-by: Marek Marczykowski-Górecki > --- > drivers/xen/balloon.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index 05b1f7e948ef..d31149068448 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -687,6 +687,7 @@ static void __init balloon_add_region(unsigned long > start_pfn, > /* totalram_pages and totalhigh_pages do not > include the boot-time balloon extension, so > don't subtract from it. */ > + __SetPageOffline(page) ^ ; > __balloon_append(page); > } > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 0/4] Fix PCI passthrough for HVM with stubdomain
On Fri, Sep 27, 2019 at 03:36:08PM +0100, Wei Liu wrote: > On Fri, Sep 27, 2019 at 04:21:55PM +0200, Jan Beulich wrote: > > > > > > Marek Marczykowski-Górecki (4): > > > libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use > > > libxl: attach PCI device to qemu only after setting pciback/pcifront > > > libxl: don't try to manipulate json config for stubdomain > > > xen/x86: Allow stubdom access to irq created for msi. > > > > I did commit the last one, but I'd prefer if one of you could take > > care of the three libxl ones. > > > > I tried to apply the first three. They don't apply cleanly. That's > perhaps due to Anthony's series which got committed recently. > > Marek, do you have a branch? This rebase turned out to be fairly complex, because of the whole pci attach got reworked for async api. So, I've done it, but dropped your ack on the patch needing rework for this reason. And also found one regression introduced by Anthony series. All in all - v8 on its way, still 4 patches (+1, -1). -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 4/4] libxl: don't try to manipulate json config for stubdomain
Stubdomain do not have it's own config file - its configuration is derived from target domains. Do not try to manipulate it when attaching PCI device. This bug prevented starting HVM with stubdomain and PCI passthrough device attached. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Wei Liu --- Changes in v3: - skip libxl__dm_check_start too, as stubdomain is guaranteed to be running at this stage already - do not init d_config at all, as it is used only for json manipulation Changes in v4: - adjust comment style Changes in v8: - rebase on staging --- tools/libxl/libxl_pci.c | 42 +++--- 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index ba0287d..dc96163 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -126,8 +126,11 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, int rc; libxl_domain_config d_config; libxl__domain_userdata_lock *lock = NULL; +bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL); -libxl_domain_config_init(&d_config); +/* Stubdomain doesn't have own config. */ +if (!is_stubdomain) +libxl_domain_config_init(&d_config); be_path = libxl__domain_device_backend_path(gc, 0, domid, 0, LIBXL__DEVICE_KIND_PCI); @@ -153,27 +156,35 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, if (!starting) flexarray_append_pair(back, "state", GCSPRINTF("%d", XenbusStateReconfiguring)); -lock = libxl__lock_domain_userdata(gc, domid); -if (!lock) { -rc = ERROR_LOCK_FAIL; -goto out; -} +/* + * Stubdomin config is derived from its target domain, it doesn't have + * its own file. + */ +if (!is_stubdomain) { +lock = libxl__lock_domain_userdata(gc, domid); +if (!lock) { +rc = ERROR_LOCK_FAIL; +goto out; +} -rc = libxl__get_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +rc = libxl__get_domain_configuration(gc, domid, &d_config); +if (rc) goto out; -device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, - pcidev); +device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, + pcidev); -rc = libxl__dm_check_start(gc, &d_config, domid); -if (rc) goto out; +rc = libxl__dm_check_start(gc, &d_config, domid); +if (rc) goto out; +} for (;;) { rc = libxl__xs_transaction_start(gc, &t); if (rc) goto out; -rc = libxl__set_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +if (lock) { +rc = libxl__set_domain_configuration(gc, domid, &d_config); +if (rc) goto out; +} libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back)); @@ -185,7 +196,8 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, out: libxl__xs_transaction_abort(gc, &t); if (lock) libxl__unlock_domain_userdata(lock); -libxl_domain_config_dispose(&d_config); +if (!is_stubdomain) +libxl_domain_config_dispose(&d_config); return rc; } -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 2/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
HVM domains use IOMMU and device model assistance for communicating with PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain. But pciback serve also second function - it reset the device when it is deassigned from the guest and for this reason pciback needs to be used with HVM domain too. When HVM domain has device model in stubdomain, attaching xen-pciback to the target domain itself may prevent attaching xen-pciback to the (PV) stubdomain, effectively breaking PCI passthrough. Fix this by attaching pciback only to one domain: if PV stubdomain is in use, let it be stubdomain (the commit prevents attaching device to target HVM in this case); otherwise, attach it to the target domain. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Wei Liu --- Changes in v2: - previously called "libxl: attach xen-pciback only to PV domains" - instead of excluding all HVMs, change the condition to what actually matters here - check if stubdomain is in use; this way xen-pciback is always in use (either for the target domain, or it's stubdomain), fixing PCI reset by xen-pciback concerns Changes in v3: - adjust commit message Changes in v8: - rebase on staging --- tools/libxl/libxl_pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 2edff64..ac597a5 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1399,7 +1399,7 @@ out_no_irq: } } -if (!starting) +if (!starting && !libxl_get_stubdom_id(CTX, domid)) rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting); else rc = 0; @@ -1697,7 +1697,7 @@ static void add_pcidevs_done(libxl__egc *egc, libxl__multidev *multidev, libxl_domid domid = apds->domid; libxl__ao_device *aodev = apds->outer_aodev; -if (d_config->num_pcidevs > 0) { +if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) { rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs, d_config->num_pcidevs); if (rc < 0) { -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 3/4] libxl: attach PCI device to qemu only after setting pciback/pcifront
When qemu is running in stubdomain, handling "pci-ins" command will fail if pcifront is not initialized already. Fix this by sending such command only after confirming that pciback/front is running. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - Fixed code style since previous version. Changes in v8: - rebase on staging - rework for async api --- tools/libxl/libxl_pci.c | 45 +- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index ac597a5..ba0287d 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1012,6 +1012,9 @@ typedef struct pci_add_state { bool starting; void (*callback)(libxl__egc *, struct pci_add_state *, int rc); +/* private to device_pci_add_stubdom_wait */ +libxl__ev_devstate pciback_ds; + /* private to do_pci_add */ libxl__xswait_state xswait; libxl__ev_qmp qmp; @@ -1487,6 +1490,10 @@ static int libxl_pcidev_assignable(libxl_ctx *ctx, libxl_device_pci *pcidev) return i != num; } +static void device_pci_add_stubdom_wait(libxl__egc *egc, +pci_add_state *pas, int rc); +static void device_pci_add_stubdom_ready(libxl__egc *egc, +libxl__ev_devstate* ds, int rc); static void device_pci_add_stubdom_done(libxl__egc *egc, pci_add_state *, int rc); static void device_pci_add_done(libxl__egc *egc, @@ -1563,7 +1570,8 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t domid, GCNEW(pcidev_s); libxl_device_pci_init(pcidev_s); libxl_device_pci_copy(CTX, pcidev_s, pcidev); -pas->callback = device_pci_add_stubdom_done; +pas->callback = device_pci_add_stubdom_wait; + do_pci_add(egc, stubdomid, pcidev_s, pas); /* must be last */ return; } @@ -1575,6 +1583,41 @@ out: device_pci_add_done(egc, pas, rc); /* must be last */ } +static void device_pci_add_stubdom_wait(libxl__egc *egc, +pci_add_state *pas, +int rc) +{ +libxl__ao_device *aodev = pas->aodev; +STATE_AO_GC(aodev->ao); +libxl_ctx *ctx = libxl__gc_owner(gc); +int stubdomid = libxl_get_stubdom_id(ctx, pas->domid); +char *state_path; + +if (rc) goto out; + +/* Wait for the device actually being connected, otherwise device model + * running there will fail to find the device. */ +state_path = libxl__sprintf(gc, "%s/backend/pci/%d/0/state", +libxl__xs_get_dompath(gc, 0), stubdomid); +rc = libxl__ev_devstate_wait(aodev->ao, &pas->pciback_ds, +device_pci_add_stubdom_ready, +state_path, XenbusStateConnected, +LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000); +if (rc) goto out; +return; +out: +device_pci_add_done(egc, pas, rc); /* must be last */ +} + +static void device_pci_add_stubdom_ready(libxl__egc *egc, + libxl__ev_devstate* ds, + int rc) +{ +pci_add_state *pas = CONTAINER_OF(ds, *pas, pciback_ds); + +device_pci_add_stubdom_done(egc, pas, rc); /* must be last */ +} + static void device_pci_add_stubdom_done(libxl__egc *egc, pci_add_state *pas, int rc) -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 0/4] Fix PCI passthrough for HVM with stubdomain
This version is rebase libxl patches on staging. Changes to "libxl: attach PCI device to qemu only after setting pciback/pcifront" are significant enough that I've dropped Wei's ack. Also, there is a new patch to fix regression introduced during async conversion. Changes in v2: - new "xen/x86: Allow stubdom access to irq created for msi" patch - applied review comments from v1 Changes is v3: - apply suggestions by Roger - add PHYSDEVOP_msi_msix_set_enable Changes in v4: - implement suggestions by Wei, Roger, Jan - plug new physdevop into XSM Changes in v5: - rebase on master - rename to PHYSDEVOP_msi_control - move granting access to IRQ into create_irq Changes in v6: - simplify granting IRQ access, record dm domid for cleanup - rename to PHYSDEVOP_interrupt_control - include INTx control in the hypercall Changes in v7: - update "xen/x86: Allow stubdom access to irq created for msi" - drop "xen/x86: add PHYSDEVOP_interrupt_control" - drop "tools/libxc: add wrapper for PHYSDEVOP_interrupt_control" Chages in v8: - drop applied "xen/x86: Allow stubdom access to irq created for msi" - new patch "libxl: fix cold plugged PCI device with stubdomain" - rebase libxl patches on staging --- Cc: Ian Jackson Cc: Wei Liu Cc: Andrew Cooper Cc: George Dunlap Cc: Jan Beulich Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Tim Deegan Cc: Anthony PERARD Cc: "Roger Pau Monné" Cc: Suravee Suthikulpanit Cc: Brian Woods Cc: Kevin Tian Cc: Daniel De Graaf Marek Marczykowski-Górecki (4): libxl: fix cold plugged PCI device with stubdomain libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use libxl: attach PCI device to qemu only after setting pciback/pcifront libxl: don't try to manipulate json config for stubdomain tools/libxl/libxl_pci.c | 96 +- 1 file changed, 77 insertions(+), 19 deletions(-) base-commit: 7a4e674905b3cbbe48e81c3222361a7f3579 -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 1/4] libxl: fix cold plugged PCI device with stubdomain
When libxl__device_pci_add() is called, stubdomain is already running, even when still constructing the target domain. Previously, do_pci_add() was called with 'starting' hardcoded to false, but now do_pci_add() shares 'starting' flag in pci_add_state for both target domain and stubdomain. Fix this by resetting (local) 'starting' to false in pci_add_dm_done() (previously part of do_pci_add()) when handling stubdomain, regardless of pas->starting value. Fixes: 11db56f9a6 (libxl_pci: Use libxl__ao_device with libxl__device_pci_add) Signed-off-by: Marek Marczykowski-Górecki --- tools/libxl/libxl_pci.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 4725817..2edff64 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1300,6 +1300,10 @@ static void pci_add_dm_done(libxl__egc *egc, if (rc) goto out; +/* stubdomain is always running by now, even at create time */ +if (isstubdom) +starting = false; + sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/resource", pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func); f = fopen(sysfs_path, "r"); @@ -1559,7 +1563,6 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t domid, GCNEW(pcidev_s); libxl_device_pci_init(pcidev_s); libxl_device_pci_copy(CTX, pcidev_s, pcidev); -/* stubdomain is always running by now, even at create time */ pas->callback = device_pci_add_stubdom_done; do_pci_add(egc, stubdomid, pcidev_s, pas); /* must be last */ return; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.
On Thu, Sep 26, 2019 at 09:10:17AM +0200, Roger Pau Monné wrote: > On Thu, Sep 26, 2019 at 06:16:06AM +0200, Marek Marczykowski-Górecki wrote: > > On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote: > > > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki > > > wrote: > > > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > > > > > It would still be nice to get the missing bits (interrupt enabling), > > > > > or else this patch is kind of pointless, since it still doesn't allow > > > > > stubdomains to work correctly with passed through devices. > > > > > > > > Well, that part, as discussed, doesn't need to be in Xen. For example > > > > the solution deployed in current Qubes stable version is based on > > > > pciback for this purpose. > > > > > > Ack. Do you think it would make sense to submit that part to Linux > > > then? > > > > How would an interface with toolstack (when to allow enabling MSI) > > should look like? Right now I have it as extra attribute in sysfs of > > pciback and libxl writes to it. Or rather should it be in xenstore? > > I think xenstore would be more suitable for this. There are already > some flags passed to pciback there: msitranslate, power_mgmt and > permissive (see libxl_pci.c:63). Hmm, I see permissive is also set in sysfs (libxl_pci.c:pci_add_dm_done). And I think that is used by pciback, based on inspection of its source code. In fact, I don't see anything in pciback parsing opts-%d xenstore entry at all. It looks like it's only used by the toolstack to reconstruct libxl_device_pci struct from xenstore. > > Or maybe pciback should somehow detect itself if it's talking to > > stubdomain while the device is assigned to a HVM domain, or to a target > > PV domain itself? > > I think doing it in the toolstack and just passing an option to > pciback is likely easier than adding more logic to pciback. Agree. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.
On Sun, Sep 29, 2019 at 03:35:57AM +0200, Marek Marczykowski-Górecki wrote: > On Thu, Sep 26, 2019 at 09:10:17AM +0200, Roger Pau Monné wrote: > > On Thu, Sep 26, 2019 at 06:16:06AM +0200, Marek Marczykowski-Górecki wrote: > > > On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote: > > > > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki > > > > wrote: > > > > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > > > > > > It would still be nice to get the missing bits (interrupt enabling), > > > > > > or else this patch is kind of pointless, since it still doesn't > > > > > > allow > > > > > > stubdomains to work correctly with passed through devices. BTW it is useful with permissive mode enabled. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] {xen, dom0}_vga_console_info.u.vesa_lfb.lfb_base field too small
Hi, I have a machine that allocate vesa LFB above 4GB, as reported by UEFI GOP. At 0x40 to be specific. vga_console_info.u.vesa_lfb.lfb_base is a 32bit field, so it gets truncated, leading to all kind of memory corruptions when something writes there. If that would be only about Xen, that wouldn't be that bad, but unfortunately exactly the same structure is used as an interface for dom0 start info (at least PV one). My only idea is to introduce yet another entry in *_vga_console_info.u union (efi_lfb64?) with a 64bit lfb_base field. And mark it in video_type (XEN_VGATYPE_EFI_LFB64?). But I'm not sure how non-patched Linux (or other supported OSes) would respond to this. xen_init_vga() in Linux doesn't seem to bail on unknown video_type, so it may be fragile. Any better ideas? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/5] drivers/video: drop unused limits
MAX_BPP, MAX_FONT_W, MAX_FONT_H are not used in the code at all. Signed-off-by: Marek Marczykowski-Górecki --- xen/drivers/video/lfb.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/xen/drivers/video/lfb.c b/xen/drivers/video/lfb.c index d0c8c49..0475a68 100644 --- a/xen/drivers/video/lfb.c +++ b/xen/drivers/video/lfb.c @@ -12,9 +12,6 @@ #define MAX_XRES 1900 #define MAX_YRES 1200 -#define MAX_BPP 4 -#define MAX_FONT_W 8 -#define MAX_FONT_H 16 struct lfb_status { struct lfb_prop lfbp; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 3/5] drivers/video: Drop framebuffer size constraints
The limit 1900x1200 do not match real world devices (1900 looks like a typo, should be 1920). But in practice the limits are arbitrary and do not serve any real purpose. As discussed in "Increase framebuffer size to todays standards" thread, drop them completely. This fixes graphic console on device with 3840x2160 native resolution. Signed-off-by: Marek Marczykowski-Górecki --- Cc: Olaf Hering --- xen/drivers/video/lfb.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/xen/drivers/video/lfb.c b/xen/drivers/video/lfb.c index 0475a68..5022195 100644 --- a/xen/drivers/video/lfb.c +++ b/xen/drivers/video/lfb.c @@ -10,9 +10,6 @@ #include "lfb.h" #include "font.h" -#define MAX_XRES 1900 -#define MAX_YRES 1200 - struct lfb_status { struct lfb_prop lfbp; @@ -146,13 +143,6 @@ void lfb_carriage_return(void) int __init lfb_init(struct lfb_prop *lfbp) { -if ( lfbp->width > MAX_XRES || lfbp->height > MAX_YRES ) -{ -printk(XENLOG_WARNING "Couldn't initialize a %ux%u framebuffer early.\n", - lfbp->width, lfbp->height); -return -EINVAL; -} - lfb.lfbp = *lfbp; lfb.lbuf = xmalloc_bytes(lfb.lfbp.bytes_per_line); -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/5] xen/bitmap: fix bitmap_fill with zero-sized bitmap
When bitmap_fill(..., 0) is called, do not try to write anything. Before this patch, it tried to write almost LONG_MAX, surely overwriting something. Signed-off-by: Marek Marczykowski-Górecki --- Found while debugging framebuffer located above 4GB. In that case 32bit variable for it overflows and framebuffer initialization zeroed unrelated memory. Specifically, it hit mbi->mods_count, so later on bitmap_fill(module_map, mbi->mods_count) in __start_xen() crashed. --- xen/include/xen/bitmap.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/xen/bitmap.h b/xen/include/xen/bitmap.h index fe3c720..0430c1c 100644 --- a/xen/include/xen/bitmap.h +++ b/xen/include/xen/bitmap.h @@ -126,6 +126,8 @@ static inline void bitmap_fill(unsigned long *dst, int nbits) size_t nlongs = BITS_TO_LONGS(nbits); switch (nlongs) { + case 0: + break; default: memset(dst, -1, (nlongs - 1) * sizeof(unsigned long)); /* fall through */ -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 4/5] xen: fix handling framebuffer located above 4GB
On some machines (for example Thinkpad P52), UEFI GOP reports framebuffer located above 4GB (0x40 on that machine). This address does not fit in {xen,dom0}_vga_console_info.u.vesa_lfb.lfb_base field, which is 32bit. The overflow here cause all kind of memory corruption when anything tries to write something on the screen, starting with zeroing the whole framebuffer in vesa_init(). Fix this similar to how it's done in Linux: add ext_lfb_base field at the end of the structure, to hold upper 32bits of the address. Since the field is added at the end of the structure, it will work with older Linux versions too (other than using possibly truncated address - no worse than without this change). Thanks to ABI containing size of the structure (start_info.console.dom0.info_size), Linux can detect when this field is present and use it appropriately then. Signed-off-by: Marek Marczykowski-Górecki --- xen/arch/x86/efi/efi-boot.h | 1 + xen/drivers/video/vesa.c| 15 +++ xen/include/public/xen.h| 2 ++ 3 files changed, 14 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index 5789d2c..7a13a30 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -550,6 +550,7 @@ static void __init efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop, vga_console_info.u.vesa_lfb.bytes_per_line = (mode_info->PixelsPerScanLine * bpp + 7) >> 3; vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase; +vga_console_info.u.vesa_lfb.ext_lfb_base = gop->Mode->FrameBufferBase >> 32; vga_console_info.u.vesa_lfb.lfb_size = (gop->Mode->FrameBufferSize + 0x) >> 16; } diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c index c92497e..f22cf7f 100644 --- a/xen/drivers/video/vesa.c +++ b/xen/drivers/video/vesa.c @@ -84,6 +84,7 @@ void __init vesa_early_init(void) void __init vesa_init(void) { struct lfb_prop lfbp; +unsigned long lfb_base; if ( !font ) return; @@ -97,15 +98,17 @@ void __init vesa_init(void) lfbp.text_columns = vlfb_info.width / font->width; lfbp.text_rows = vlfb_info.height / font->height; -lfbp.lfb = lfb = ioremap(vlfb_info.lfb_base, vram_remap); +lfb_base = vlfb_info.lfb_base; +lfb_base |= (unsigned long)vlfb_info.ext_lfb_base << 32; +lfbp.lfb = lfb = ioremap(lfb_base, vram_remap); if ( !lfb ) return; memset(lfb, 0, vram_remap); -printk(XENLOG_INFO "vesafb: framebuffer at %#x, mapped to 0x%p, " +printk(XENLOG_INFO "vesafb: framebuffer at %#lx, mapped to 0x%p, " "using %uk, total %uk\n", - vlfb_info.lfb_base, lfb, + lfb_base, lfb, vram_remap >> 10, vram_total >> 10); printk(XENLOG_INFO "vesafb: mode is %dx%dx%u, linelength=%d, font %ux%u\n", vlfb_info.width, vlfb_info.height, @@ -152,6 +155,10 @@ void __init vesa_mtrr_init(void) MTRR_TYPE_WRCOMB, MTRR_TYPE_WRTHROUGH }; unsigned int size_total; int rc, type; +unsigned long lfb_base; + +lfb_base = vlfb_info.lfb_base; +lfb_base |= (unsigned long)vlfb_info.ext_lfb_base << 32; if ( !lfb || (vesa_mtrr == 0) || (vesa_mtrr >= ARRAY_SIZE(mtrr_types)) ) return; @@ -167,7 +174,7 @@ void __init vesa_mtrr_init(void) /* Try and find a power of two to add */ do { -rc = mtrr_add(vlfb_info.lfb_base, size_total, type, 1); +rc = mtrr_add(lfb_base, size_total, type, 1); size_total >>= 1; } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) ); } diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index ccdffc0..b0f0f7e 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -923,6 +923,8 @@ typedef struct dom0_vga_console_info { /* Mode attributes (offset 0x0, VESA command 0x4f01). */ uint16_t mode_attrs; #endif +/* high 32 bits of lfb_base */ +uint32_t ext_lfb_base; } vesa_lfb; } u; } dom0_vga_console_info_t; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 0/5] Fixes for large framebuffer, placed above 4GB
A bunch of fixes for booting Xen on Thinkpad P52, through grub2-efi + multiboot2. Most of them can be applied independently. --- cc: Andrew Cooper cc: George Dunlap cc: Ian Jackson cc: Jan Beulich cc: Julien Grall cc: Konrad Rzeszutek Wilk cc: Stefano Stabellini cc: Tim Deegan cc: Wei Liu cc: Olaf Hering Marek Marczykowski-Górecki (5): xen/bitmap: fix bitmap_fill with zero-sized bitmap drivers/video: drop unused limits drivers/video: Drop framebuffer size constraints xen: fix handling framebuffer located above 4GB drivers/video: use vlfb_info consistently xen/arch/x86/efi/efi-boot.h | 1 + xen/drivers/video/lfb.c | 13 - xen/drivers/video/vesa.c| 17 - xen/include/public/xen.h| 2 ++ xen/include/xen/bitmap.h| 2 ++ 5 files changed, 17 insertions(+), 18 deletions(-) base-commit: dc497635d93f6672f82727ad97a55205177be2aa -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 5/5] drivers/video: use vlfb_info consistently
vlfb_info is an alias for vga_console_info.u.vesa_lfb, so this change is purely cosmetic. But using the same name helps reading the code. Signed-off-by: Marek Marczykowski-Górecki --- xen/drivers/video/vesa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c index f22cf7f..3767024 100644 --- a/xen/drivers/video/vesa.c +++ b/xen/drivers/video/vesa.c @@ -44,7 +44,7 @@ void __init vesa_early_init(void) { unsigned int vram_vmode; -vga_compat = !(vga_console_info.u.vesa_lfb.gbl_caps & 2); +vga_compat = !(vlfb_info.gbl_caps & 2); if ( (vlfb_info.bits_per_pixel < 8) || (vlfb_info.bits_per_pixel > 32) ) return; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] xen: fix handling framebuffer located above 4GB
On Mon, May 06, 2019 at 05:15:19PM +0200, Juergen Gross wrote: > On 06/05/2019 16:50, Marek Marczykowski-Górecki wrote: > > diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h > > index ccdffc0..b0f0f7e 100644 > > --- a/xen/include/public/xen.h > > +++ b/xen/include/public/xen.h > > @@ -923,6 +923,8 @@ typedef struct dom0_vga_console_info { > > /* Mode attributes (offset 0x0, VESA command 0x4f01). */ > > uint16_t mode_attrs; > > #endif > > +/* high 32 bits of lfb_base */ > > +uint32_t ext_lfb_base; > > You will need to put this addition into an: > > #if __XEN_INTERFACE_VERSION__ >= 0x00040d00 > ... > #endif > > section (only available from Xen 4.13 onwards). Why exactly? I don't see this structure used in any hypercall argument. The only externally accessible place is start_info structure, where it has explicit struct size already. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 1/5] xen/bitmap: fix bitmap_fill with zero-sized bitmap
When bitmap_fill(..., 0) is called, do not try to write anything. Before this patch, it tried to write almost LONG_MAX, surely overwriting something. Signed-off-by: Marek Marczykowski-Górecki Reviewed-by: Andrew Cooper Reviewed-by: Jan Beulich --- xen/include/xen/bitmap.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/xen/bitmap.h b/xen/include/xen/bitmap.h index fe3c720..0430c1c 100644 --- a/xen/include/xen/bitmap.h +++ b/xen/include/xen/bitmap.h @@ -126,6 +126,8 @@ static inline void bitmap_fill(unsigned long *dst, int nbits) size_t nlongs = BITS_TO_LONGS(nbits); switch (nlongs) { + case 0: + break; default: memset(dst, -1, (nlongs - 1) * sizeof(unsigned long)); /* fall through */ -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 2/5] drivers/video: drop unused limits
MAX_BPP, MAX_FONT_W, MAX_FONT_H are not used in the code at all. Signed-off-by: Marek Marczykowski-Górecki Suggested-by: Jan Beulich Acked-by: Andrew Cooper --- xen/drivers/video/lfb.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/xen/drivers/video/lfb.c b/xen/drivers/video/lfb.c index d0c8c49..0475a68 100644 --- a/xen/drivers/video/lfb.c +++ b/xen/drivers/video/lfb.c @@ -12,9 +12,6 @@ #define MAX_XRES 1900 #define MAX_YRES 1200 -#define MAX_BPP 4 -#define MAX_FONT_W 8 -#define MAX_FONT_H 16 struct lfb_status { struct lfb_prop lfbp; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 5/5] drivers/video: use vlfb_info consistently
vlfb_info is an alias for vga_console_info.u.vesa_lfb, so this change is purely cosmetic. But using the same name helps reading the code. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Andrew Cooper --- xen/drivers/video/vesa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c index f07d293..e9918a7 100644 --- a/xen/drivers/video/vesa.c +++ b/xen/drivers/video/vesa.c @@ -49,7 +49,7 @@ void __init vesa_early_init(void) { unsigned int vram_vmode; -vga_compat = !(vga_console_info.u.vesa_lfb.gbl_caps & 2); +vga_compat = !(vlfb_info.gbl_caps & 2); if ( (vlfb_info.bits_per_pixel < 8) || (vlfb_info.bits_per_pixel > 32) ) return; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 3/5] drivers/video: Drop framebuffer size constraints
The limit 1900x1200 do not match real world devices (1900 looks like a typo, should be 1920). But in practice the limits are arbitrary and do not serve any real purpose. As discussed in "Increase framebuffer size to todays standards" thread, drop them completely. This fixes graphic console on device with 3840x2160 native resolution. Signed-off-by: Marek Marczykowski-Górecki Suggested-by: Jan Beulich Acked-by: Andrew Cooper --- Cc: Olaf Hering --- xen/drivers/video/lfb.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/xen/drivers/video/lfb.c b/xen/drivers/video/lfb.c index 0475a68..5022195 100644 --- a/xen/drivers/video/lfb.c +++ b/xen/drivers/video/lfb.c @@ -10,9 +10,6 @@ #include "lfb.h" #include "font.h" -#define MAX_XRES 1900 -#define MAX_YRES 1200 - struct lfb_status { struct lfb_prop lfbp; @@ -146,13 +143,6 @@ void lfb_carriage_return(void) int __init lfb_init(struct lfb_prop *lfbp) { -if ( lfbp->width > MAX_XRES || lfbp->height > MAX_YRES ) -{ -printk(XENLOG_WARNING "Couldn't initialize a %ux%u framebuffer early.\n", - lfbp->width, lfbp->height); -return -EINVAL; -} - lfb.lfbp = *lfbp; lfb.lbuf = xmalloc_bytes(lfb.lfbp.bytes_per_line); -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 0/5] Fixes for large framebuffer, placed above 4GB
A bunch of fixes for booting Xen on Thinkpad P52, through grub2-efi + multiboot2. Most of them can be applied independently. --- cc: Andrew Cooper cc: George Dunlap cc: Ian Jackson cc: Jan Beulich cc: Julien Grall cc: Konrad Rzeszutek Wilk cc: Stefano Stabellini cc: Tim Deegan cc: Wei Liu cc: Olaf Hering Marek Marczykowski-Górecki (5): xen/bitmap: fix bitmap_fill with zero-sized bitmap drivers/video: drop unused limits drivers/video: Drop framebuffer size constraints xen: fix handling framebuffer located above 4GB drivers/video: use vlfb_info consistently xen/arch/x86/efi/efi-boot.h | 1 + xen/drivers/video/lfb.c | 13 - xen/drivers/video/vesa.c| 15 ++- xen/include/public/xen.h| 4 xen/include/xen/bitmap.h| 2 ++ 5 files changed, 17 insertions(+), 18 deletions(-) base-commit: dc497635d93f6672f82727ad97a55205177be2aa -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 4/5] xen: fix handling framebuffer located above 4GB
On some machines (for example Thinkpad P52), UEFI GOP reports framebuffer located above 4GB (0x40 on that machine). This address does not fit in {xen,dom0}_vga_console_info.u.vesa_lfb.lfb_base field, which is 32bit. The overflow here cause all kind of memory corruption when anything tries to write something on the screen, starting with zeroing the whole framebuffer in vesa_init(). Fix this similar to how it's done in Linux: add ext_lfb_base field at the end of the structure, to hold upper 32bits of the address. Since the field is added at the end of the structure, it will work with older Linux versions too (other than using possibly truncated address - no worse than without this change). Thanks to ABI containing size of the structure (start_info.console.dom0.info_size), Linux can detect when this field is present and use it appropriately then. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - guard ext_lfb_base with #if __XEN_INTERFACE_VERSION__, but always include whe building Xen itself - add a helper function for lfb_base --- xen/arch/x86/efi/efi-boot.h | 1 + xen/drivers/video/vesa.c| 13 + xen/include/public/xen.h| 4 3 files changed, 14 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index 5789d2c..7a13a30 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -550,6 +550,7 @@ static void __init efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop, vga_console_info.u.vesa_lfb.bytes_per_line = (mode_info->PixelsPerScanLine * bpp + 7) >> 3; vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase; +vga_console_info.u.vesa_lfb.ext_lfb_base = gop->Mode->FrameBufferBase >> 32; vga_console_info.u.vesa_lfb.lfb_size = (gop->Mode->FrameBufferSize + 0x) >> 16; } diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c index c92497e..f07d293 100644 --- a/xen/drivers/video/vesa.c +++ b/xen/drivers/video/vesa.c @@ -40,6 +40,11 @@ static int __init parse_font_height(const char *s) } custom_param("font", parse_font_height); +static inline paddr_t lfb_base(void) +{ +return (paddr_t)(vlfb_info.ext_lfb_base) << 32 | vlfb_info.lfb_base; +} + void __init vesa_early_init(void) { unsigned int vram_vmode; @@ -97,15 +102,15 @@ void __init vesa_init(void) lfbp.text_columns = vlfb_info.width / font->width; lfbp.text_rows = vlfb_info.height / font->height; -lfbp.lfb = lfb = ioremap(vlfb_info.lfb_base, vram_remap); +lfbp.lfb = lfb = ioremap(lfb_base(), vram_remap); if ( !lfb ) return; memset(lfb, 0, vram_remap); -printk(XENLOG_INFO "vesafb: framebuffer at %#x, mapped to 0x%p, " +printk(XENLOG_INFO "vesafb: framebuffer at %" PRIpaddr ", mapped to 0x%p, " "using %uk, total %uk\n", - vlfb_info.lfb_base, lfb, + lfb_base(), lfb, vram_remap >> 10, vram_total >> 10); printk(XENLOG_INFO "vesafb: mode is %dx%dx%u, linelength=%d, font %ux%u\n", vlfb_info.width, vlfb_info.height, @@ -167,7 +172,7 @@ void __init vesa_mtrr_init(void) /* Try and find a power of two to add */ do { -rc = mtrr_add(vlfb_info.lfb_base, size_total, type, 1); +rc = mtrr_add(lfb_base(), size_total, type, 1); size_total >>= 1; } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) ); } diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index ccdffc0..1752252 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -923,6 +923,10 @@ typedef struct dom0_vga_console_info { /* Mode attributes (offset 0x0, VESA command 0x4f01). */ uint16_t mode_attrs; #endif +#if __XEN_INTERFACE_VERSION__ >= 0x00040d00 || defined(__XEN__) +/* high 32 bits of lfb_base */ +uint32_t ext_lfb_base; +#endif } vesa_lfb; } u; } dom0_vga_console_info_t; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Anyone using blktap2?
On Mon, May 13, 2019 at 04:34:14PM +0100, Wei Liu wrote: > Hello > > Seeing that you were the last people who changed blktap2 in a meaningful > way: do you use it at all? > > I'm thinking about dropping it (again). Fine with me too. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 0/1] Fixes for large framebuffer, placed above 4GB
A bunch of fixes for booting Xen on Thinkpad P52, through grub2-efi + multiboot2. Most of them can be applied independently. Changes in v3: - updated "xen: fix handling framebuffer located above 4GB" - dropped already applied patches --- cc: Andrew Cooper cc: George Dunlap cc: Ian Jackson cc: Jan Beulich cc: Julien Grall cc: Konrad Rzeszutek Wilk cc: Stefano Stabellini cc: Tim Deegan cc: Wei Liu cc: Olaf Hering Marek Marczykowski-Górecki (1): xen: fix handling framebuffer located above 4GB xen/arch/x86/efi/efi-boot.h | 1 + xen/drivers/video/vesa.c| 14 +- xen/include/public/xen-compat.h | 2 +- xen/include/public/xen.h| 5 + 4 files changed, 16 insertions(+), 6 deletions(-) base-commit: 9dc8043ba18409e7a2057d8cccbf0ddaff71eb7e -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 1/1] xen: fix handling framebuffer located above 4GB
On some machines (for example Thinkpad P52), UEFI GOP reports framebuffer located above 4GB (0x40 on that machine). This address does not fit in {xen,dom0}_vga_console_info.u.vesa_lfb.lfb_base field, which is 32bit. The overflow here cause all kind of memory corruption when anything tries to write something on the screen, starting with zeroing the whole framebuffer in vesa_init(). Fix this similar to how it's done in Linux: add ext_lfb_base field at the end of the structure, to hold upper 32bits of the address. Since the field is added at the end of the structure, it will work with older Linux versions too (other than using possibly truncated address - no worse than without this change). Thanks to ABI containing size of the structure (start_info.console.dom0.info_size), Linux can detect when this field is present and use it appropriately then. Since this change public interface and use __XEN_INTERFACE_VERSION__, bump __XEN_LATEST_INTERFACE_VERSION__. Note: if/when backporting this change to Xen <= 4.12, #if in xen.h needs to be extended with " || defined(__XEN__)". Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - guard ext_lfb_base with #if __XEN_INTERFACE_VERSION__, but always include whe building Xen itself - add a helper function for lfb_base Changes in v3: - add padding field - add parentheses around << - unwrap format string per coding style, restore 0x - drop #if defined(__XEN__) needed only in backport - bump __XEN_LATEST_INTERFACE_VERSION__ --- xen/arch/x86/efi/efi-boot.h | 1 + xen/drivers/video/vesa.c| 14 +- xen/include/public/xen-compat.h | 2 +- xen/include/public/xen.h| 5 + 4 files changed, 16 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index 5789d2c..7a13a30 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -550,6 +550,7 @@ static void __init efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop, vga_console_info.u.vesa_lfb.bytes_per_line = (mode_info->PixelsPerScanLine * bpp + 7) >> 3; vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase; +vga_console_info.u.vesa_lfb.ext_lfb_base = gop->Mode->FrameBufferBase >> 32; vga_console_info.u.vesa_lfb.lfb_size = (gop->Mode->FrameBufferSize + 0x) >> 16; } diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c index 26d4962..9cf4bad 100644 --- a/xen/drivers/video/vesa.c +++ b/xen/drivers/video/vesa.c @@ -40,6 +40,11 @@ static int __init parse_font_height(const char *s) } custom_param("font", parse_font_height); +static inline paddr_t lfb_base(void) +{ +return (paddr_t)((vlfb_info.ext_lfb_base) << 32) | vlfb_info.lfb_base; +} + void __init vesa_early_init(void) { unsigned int vram_vmode; @@ -97,15 +102,14 @@ void __init vesa_init(void) lfbp.text_columns = vlfb_info.width / font->width; lfbp.text_rows = vlfb_info.height / font->height; -lfbp.lfb = lfb = ioremap(vlfb_info.lfb_base, vram_remap); +lfbp.lfb = lfb = ioremap(lfb_base(), vram_remap); if ( !lfb ) return; memset(lfb, 0, vram_remap); -printk(XENLOG_INFO "vesafb: framebuffer at %#x, mapped to 0x%p, " - "using %uk, total %uk\n", - vlfb_info.lfb_base, lfb, +printk(XENLOG_INFO "vesafb: framebuffer at 0x%" PRIpaddr ", mapped to 0x%p, using %uk, total %uk\n", + lfb_base(), lfb, vram_remap >> 10, vram_total >> 10); printk(XENLOG_INFO "vesafb: mode is %dx%dx%u, linelength=%d, font %ux%u\n", vlfb_info.width, vlfb_info.height, @@ -167,7 +171,7 @@ void __init vesa_mtrr_init(void) /* Try and find a power of two to add */ do { -rc = mtrr_add(vlfb_info.lfb_base, size_total, type, 1); +rc = mtrr_add(lfb_base(), size_total, type, 1); size_total >>= 1; } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) ); } diff --git a/xen/include/public/xen-compat.h b/xen/include/public/xen-compat.h index 6fabca1..6708132 100644 --- a/xen/include/public/xen-compat.h +++ b/xen/include/public/xen-compat.h @@ -27,7 +27,7 @@ #ifndef __XEN_PUBLIC_XEN_COMPAT_H__ #define __XEN_PUBLIC_XEN_COMPAT_H__ -#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040a00 +#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040d00 #if defined(__XEN__) || defined(__XEN_TOOLS__) /* Xen is built with matching headers and implements the latest interface. */ diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index ccdffc0..4676521 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -923,6 +923,11 @@ typedef struct dom0_vga_console_info { /* Mode attributes (offset 0x0, VESA command 0x4f01). */ uint16_t mode_at
[Xen-devel] [PATCH v4] xen: fix handling framebuffer located above 4GB
On some machines (for example Thinkpad P52), UEFI GOP reports framebuffer located above 4GB (0x40 on that machine). This address does not fit in {xen,dom0}_vga_console_info.u.vesa_lfb.lfb_base field, which is 32bit. The overflow here cause all kind of memory corruption when anything tries to write something on the screen, starting with zeroing the whole framebuffer in vesa_init(). Fix this similar to how it's done in Linux: add ext_lfb_base field at the end of the structure, to hold upper 32bits of the address. Since the field is added at the end of the structure, it will work with older Linux versions too (other than using possibly truncated address - no worse than without this change). Thanks to ABI containing size of the structure (start_info.console.dom0.info_size), Linux can detect when this field is present and use it appropriately then. Since this change public interface and use __XEN_INTERFACE_VERSION__, bump __XEN_LATEST_INTERFACE_VERSION__. Note: if/when backporting this change to Xen <= 4.12, #if in xen.h needs to be extended with " || defined(__XEN__)". Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - guard ext_lfb_base with #if __XEN_INTERFACE_VERSION__, but always include whe building Xen itself - add a helper function for lfb_base Changes in v3: - add padding field - add parentheses around << - unwrap format string per coding style, restore 0x - drop #if defined(__XEN__) needed only in backport - bump __XEN_LATEST_INTERFACE_VERSION__ Changes in v4: - fix parentheses - fix padding --- xen/arch/x86/efi/efi-boot.h | 1 + xen/drivers/video/vesa.c| 14 +- xen/include/public/xen-compat.h | 2 +- xen/include/public/xen.h| 5 + 4 files changed, 16 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index 5789d2cb70..7a13a30bc0 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -550,6 +550,7 @@ static void __init efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop, vga_console_info.u.vesa_lfb.bytes_per_line = (mode_info->PixelsPerScanLine * bpp + 7) >> 3; vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase; +vga_console_info.u.vesa_lfb.ext_lfb_base = gop->Mode->FrameBufferBase >> 32; vga_console_info.u.vesa_lfb.lfb_size = (gop->Mode->FrameBufferSize + 0x) >> 16; } diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c index 26d4962b0e..fd2cb1312d 100644 --- a/xen/drivers/video/vesa.c +++ b/xen/drivers/video/vesa.c @@ -40,6 +40,11 @@ static int __init parse_font_height(const char *s) } custom_param("font", parse_font_height); +static inline paddr_t lfb_base(void) +{ +return ((paddr_t)vlfb_info.ext_lfb_base << 32) | vlfb_info.lfb_base; +} + void __init vesa_early_init(void) { unsigned int vram_vmode; @@ -97,15 +102,14 @@ void __init vesa_init(void) lfbp.text_columns = vlfb_info.width / font->width; lfbp.text_rows = vlfb_info.height / font->height; -lfbp.lfb = lfb = ioremap(vlfb_info.lfb_base, vram_remap); +lfbp.lfb = lfb = ioremap(lfb_base(), vram_remap); if ( !lfb ) return; memset(lfb, 0, vram_remap); -printk(XENLOG_INFO "vesafb: framebuffer at %#x, mapped to 0x%p, " - "using %uk, total %uk\n", - vlfb_info.lfb_base, lfb, +printk(XENLOG_INFO "vesafb: framebuffer at 0x%" PRIpaddr ", mapped to 0x%p, using %uk, total %uk\n", + lfb_base(), lfb, vram_remap >> 10, vram_total >> 10); printk(XENLOG_INFO "vesafb: mode is %dx%dx%u, linelength=%d, font %ux%u\n", vlfb_info.width, vlfb_info.height, @@ -167,7 +171,7 @@ void __init vesa_mtrr_init(void) /* Try and find a power of two to add */ do { -rc = mtrr_add(vlfb_info.lfb_base, size_total, type, 1); +rc = mtrr_add(lfb_base(), size_total, type, 1); size_total >>= 1; } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) ); } diff --git a/xen/include/public/xen-compat.h b/xen/include/public/xen-compat.h index 6fabca1889..6708132394 100644 --- a/xen/include/public/xen-compat.h +++ b/xen/include/public/xen-compat.h @@ -27,7 +27,7 @@ #ifndef __XEN_PUBLIC_XEN_COMPAT_H__ #define __XEN_PUBLIC_XEN_COMPAT_H__ -#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040a00 +#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040d00 #if defined(__XEN__) || defined(__XEN_TOOLS__) /* Xen is built with matching headers and implements the latest interface. */ diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index ccdffc0ad1..cb2917e74b 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -922,6 +922,11 @@ typedef struct dom0_vga_console_info {
Re: [Xen-devel] [RFC PATCH v2 11/17] xenconsoled: add support for consoles using 'state' xenstore entry
On Thu, Nov 01, 2018 at 05:21:39PM +, Ian Jackson wrote: > Marek Marczykowski-Górecki writes ("[RFC PATCH v2 11/17] xenconsoled: add > support for consoles using 'state' xenstore entry"): > > Add support for standard xenbus initialization protocol using 'state' > > xenstore entry. It will be necessary for secondary consoles. > > For consoles supporting it, read 'state' entry on the frontend and > > proceed accordingly - either init console or close it. When closing, > > make sure all the in-transit data is flushed (both from shared ring and > > from local buffer), if possible. This is especially important for > > MiniOS-based qemu stubdomain, which closes console just after writing > > device model state to it. > > For consoles without 'state' field behavior is unchanged - on any watch > > event try to connect console, as long as domain is alive. > > I'm not opposed to the introduction of this state field. The code > looks plausible. > > But: > > Firstly, you have put the protocol description in your commit > message (and it seems rather informal). Can you please provide > a comprehensive protocol specification in-tree ? You need to patch >docs/misc/console.txt > I think. > > I say `comprehensive' because your text is not particularly clearly > about who is supposed to `flush' which data exactly when. Nor what > `flushing' means (does it ever mean discarding?) > > Secondly: what about backwards compatibility ? I think we need to at > least think about the possibility that there are some guest frontends > out there which may look for a `state' node and do something > undesirable with it. I think this possibility is remote but it should > be mentioned in the commit message. Note that this commit _does not_ invent any new protocol at all. It merely add support for the protocol that is used by additional consoles. The backend side was implemented by qemu only before, now I add support for it to xenconsoled. This is about (additional) consoles living in /local/domain/$DOMID/device/console/$DEVID, not the special-kind-of-thing living in /local/domain/$DOMID/console. Is there any protocol specification for it already anywhere? I can't see it xen/include/public/io/ as it's for other device types - console.h have only struct xencons_interface declaration without any comment. I can't find it in other places either. If there is one, it should be added to docs/misc/console.txt and/or xen/include/public/io/console.h. Otherwise I can add some basic spec to docs/misc/console.txt. > What about the possibility that one or the other end of the connection > may be replaced by a different implementation, so that the peer > appears to gain or lose support for `state' ? Actually, I'm doing similar thing here ;) previously xenconsoled didn't know anything about 'state' field and it was one of the reason it couldn't handle multiple consoles per domain. > I'll be able to review the code more effectively when there is a > formal protocol spec to compare it to... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.
From: Simon Gaiser Stubdomains need to be given sufficient privilege over the guest which it provides emulation for in order for PCI passthrough to work correctly. When a HVM domain try to enable MSI, QEMU in stubdomain calls PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as part of xc_domain_update_msi_irq. Allow for that as part of PHYSDEVOP_map_pirq. Based on https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch by Eric Chanudet . Signed-off-by: Simon Gaiser Signed-off-by: Marek Marczykowski-Górecki --- This is only one part of fixing MSI with QEMU in stubdomain. The other part is allowing stubdomain to actually enable MSI in PCI config space. QEMU does that through pcifront/back connected to the stubdomain (see hw/xen/xen_pt_msi.c:msi_msix_enable()), but pciback by default refuse write to that register. Easy, less safe solution: enable permissive mode for the device. Safer solution - enable access to this register for stubdomain only (pciback patch that add such flag + libxl patch to set it for relevant devices) The whole story: https://www.qubes-os.org/news/2017/10/18/msi-support/ Any other ideas? Which one is preferred upstream? --- xen/arch/x86/irq.c | 23 +++ xen/arch/x86/physdev.c | 9 + 2 files changed, 32 insertions(+) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 8b44d6c..123ca69 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p, { case MAP_PIRQ_TYPE_MULTI_MSI: irq = create_irq(NUMA_NO_NODE); +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) && + current->domain->target == d ) +{ +ret = irq_permit_access(current->domain, irq); +if ( ret ) { +dprintk(XENLOG_G_ERR, +"dom%d: can't grant it's stubdom (%d) access to " +"irq %d for msi: %d!\n", +d->domain_id, +current->domain->domain_id, +irq, +ret); +return -EINVAL; +} +} } if ( irq < nr_irqs_gsi || irq >= nr_irqs ) @@ -2717,7 +2732,15 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p, case MAP_PIRQ_TYPE_MSI: if ( index == -1 ) case MAP_PIRQ_TYPE_MULTI_MSI: +{ +if ( current->domain->target == d && + irq_deny_access(current->domain, irq) ) +dprintk(XENLOG_G_ERR, +"dom%d: can't revoke stubdom's access to irq %d!\n", +d->domain_id, +irq); destroy_irq(irq); +} break; } } diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 3a3c158..de59e39 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -164,6 +164,15 @@ int physdev_unmap_pirq(domid_t domid, int pirq) pcidevs_lock(); spin_lock(&d->event_lock); +if ( current->domain->target == d) +{ +int irq = domain_pirq_to_irq(d, pirq); +if ( irq <= 0 || irq_deny_access(current->domain, irq) ) +dprintk(XENLOG_G_ERR, +"dom%d: can't revoke stubdom's access to irq %d!\n", +d->domain_id, +irq); +} ret = unmap_domain_pirq(d, pirq); spin_unlock(&d->event_lock); pcidevs_unlock(); -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 3/4] libxl: don't try to manipulate json config for stubdomain
Stubdomain do not have it's own config file - its configuration is derived from target domains. Do not try to manipulate it when attaching PCI device. This bug prevented starting HVM with stubdomain and PCI passthrough device attached. Signed-off-by: Marek Marczykowski-Górecki --- tools/libxl/libxl_pci.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 1bde537..e974f55 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -152,14 +152,18 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d GCNEW(device); libxl__device_from_pcidev(gc, domid, pcidev, device); -lock = libxl__lock_domain_userdata(gc, domid); -if (!lock) { -rc = ERROR_LOCK_FAIL; -goto out; -} +/* Stubdomain config is derived from its target domain, it doesn't have + its own file */ +if (!libxl_is_stubdom(CTX, domid, NULL)) { +lock = libxl__lock_domain_userdata(gc, domid); +if (!lock) { +rc = ERROR_LOCK_FAIL; +goto out; +} -rc = libxl__get_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +rc = libxl__get_domain_configuration(gc, domid, &d_config); +if (rc) goto out; +} device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, &pcidev_saved); @@ -171,8 +175,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d rc = libxl__xs_transaction_start(gc, &t); if (rc) goto out; -rc = libxl__set_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +if (lock) { +rc = libxl__set_domain_configuration(gc, domid, &d_config); +if (rc) goto out; +} libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back)); -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 2/4] libxl: attach PCI device to qemu only after setting pciback/pcifront
When qemu is running in stubdomain, handling "pci-ins" command will fail if pcifront is not initialized already. Fix this by sending such command only after confirming that pciback/front is running. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - Fixed code style since previous version. --- tools/libxl/libxl_pci.c | 9 + 1 file changed, 9 insertions(+) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 3b6b23c..1bde537 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcide { libxl_ctx *ctx = libxl__gc_owner(gc); unsigned int orig_vdev, pfunc_mask; +char *be_path; libxl_device_pci *assigned; int num_assigned, i, rc; int stubdomid = 0; @@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcide rc = do_pci_add(gc, stubdomid, &pcidev_s, 0); if ( rc ) goto out; +/* Wait for the device actually being connected, otherwise device model + * running there will fail to find the device. */ +be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0", + libxl__xs_get_dompath(gc, 0), stubdomid); +rc = libxl__wait_for_backend(gc, be_path, + GCSPRINTF("%d", XenbusStateConnected)); +if (rc) +goto out; } orig_vdev = pcidev->vdevfn & ~7U; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 1/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
HVM domains use IOMMU and device model assistance for communicating with PCI devices, xen-pcifront/pciback is used only in PV domains. When HVM domain has device model in stubdomain, attaching xen-pciback to the target domain itself is not only useless, but also may prevent attaching xen-pciback to the stubdomain, effectively breaking PCI passthrough. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - previously called "libxl: attach xen-pciback only to PV domains" - instead of excluding all HVMs, change the condition to what actually matters here - check if stubdomain is in use; this way xen-pciback is always in use (either for the target domain, or it's stubdomain), fixing PCI reset by xen-pciback concerns --- tools/libxl/libxl_pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 87afa03..3b6b23c 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1106,7 +1106,7 @@ out: } } -if (!starting) +if (!starting && !libxl_get_stubdom_id(CTX, domid)) rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting); else rc = 0; @@ -1302,7 +1302,7 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid, } } -if (d_config->num_pcidevs > 0) { +if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) { rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs, d_config->num_pcidevs); if (rc < 0) { -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 0/4] Fix PCI passthrough for HVM with stubdomain
This is yet another attempt to upstream a set of PCI passthrough related fixed carried in Qubes OS packages for a long time already. Some of them were already discussed previously, see another thread with the same subject back in October 2016. As for "xen/x86: Allow stubdom access to irq created for msi" patch, this is only one part of fixing MSI with QEMU in stubdomain. The other part is allowing stubdomain to actually enable MSI in PCI config space. QEMU does that through pcifront/back connected to the stubdomain (see hw/xen/xen_pt_msi.c:msi_msix_enable()), but pciback by default refuse write to that register. Easy, less safe solution: enable permissive mode for the device. Safer solution - enable access to this register for stubdomain only: - pciback patch: https://github.com/QubesOS/qubes-linux-kernel/blob/master/0015-xen-pciback-add-attribute-to-allow-MSI-enable-flag-w.patch - libxl patch: https://github.com/QubesOS/qubes-vmm-xen/blob/xen-4.8/patch-stubdom-allow-msi-enable.patch The whole story: https://www.qubes-os.org/news/2017/10/18/msi-support/ Any other ideas? Which one is preferred upstream? Changes in v2: - new "xen/x86: Allow stubdom access to irq created for msi" patch - applied review comments from v1 Marek Marczykowski-Górecki (3): libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use libxl: attach PCI device to qemu only after setting pciback/pcifront libxl: don't try to manipulate json config for stubdomain Simon Gaiser (1): xen/x86: Allow stubdom access to irq created for msi. tools/libxl/libxl_pci.c | 37 ++--- xen/arch/x86/irq.c | 23 +++ xen/arch/x86/physdev.c | 9 + 3 files changed, 58 insertions(+), 11 deletions(-) base-commit: 93a62c544e20ba9e141e411bbaae3d65259d13a3 -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.
On Wed, Jan 16, 2019 at 10:21:29AM +0100, Roger Pau Monné wrote: > On Tue, Jan 15, 2019 at 04:36:31PM +0100, Marek Marczykowski-Górecki wrote: > > From: Simon Gaiser > > > > Stubdomains need to be given sufficient privilege over the guest which it > > provides emulation for in order for PCI passthrough to work correctly. > > When a HVM domain try to enable MSI, QEMU in stubdomain calls > > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as > > part of xc_domain_update_msi_irq. Allow for that as part of > > PHYSDEVOP_map_pirq. > > I see, that's not a problem AFAICT for PCI INTx because the IRQ in > that case is known beforehand, and the stubdomain is given permissions > over this IRQ by libxl__device_pci_add (there's a do_pci_add against > the stubdomain). Exactly. > > > > Based on > > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch > > by Eric Chanudet . > > > > Signed-off-by: Simon Gaiser > > Signed-off-by: Marek Marczykowski-Górecki > > --- > > This is only one part of fixing MSI with QEMU in stubdomain. The other > > part is allowing stubdomain to actually enable MSI in PCI config space. > > QEMU does that through pcifront/back connected to the stubdomain (see > > hw/xen/xen_pt_msi.c:msi_msix_enable()), but pciback by default refuse > > write to that register. > > Easy, less safe solution: enable permissive mode for the device. > > Safer solution - enable access to this register for stubdomain only > > (pciback patch that add such flag + libxl patch to set it for relevant > > devices) > > The whole story: > > https://www.qubes-os.org/news/2017/10/18/msi-support/ > > > > Any other ideas? Which one is preferred upstream? > > IMO, and please correct me if I'm wrong, QEMU in the stubdomain will > receive the PCI config space write to enable MSI, and since this > stub-QEMU runs in PV mode I think it should use the PV way to enable > MSI, ie: the same that Linux pcifront uses to enable MSI for > passed-through devices. > > Is this something that sounds sensible? We've considered this option too. Let me quote Simon on that (from the link above): The enable command that pcifront sends is intended for the normal PV use case where the device is passed to the VM itself (via pcifront) rather than to the stub domain target. While the command is called enable_msi, pciback does much more than simply setting the enable flag. It also configures IRQ handling in the dom0 kernel, adapts the MSI masking, and more. This makes sense in the PV case, but in the HVM case, the MSI configuration is done by QEMU, so this most likely won’t work correctly. > > --- > > xen/arch/x86/irq.c | 23 +++ > > xen/arch/x86/physdev.c | 9 + > > 2 files changed, 32 insertions(+) > > > > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c > > index 8b44d6c..123ca69 100644 > > --- a/xen/arch/x86/irq.c > > +++ b/xen/arch/x86/irq.c > > @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, int > > index, int *pirq_p, > > { > > case MAP_PIRQ_TYPE_MULTI_MSI: > > irq = create_irq(NUMA_NO_NODE); > > +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) && > > This check is already performed below, maybe you could re-arrange the > code as: > > case MAP_PIRQ_TYPE_MULTI_MSI: > irq = create_irq(NUMA_NO_NODE); > } > > if ( irq < nr_irqs_gsi || irq >= nr_irqs ) > { > dprintk(XENLOG_G_ERR, "dom%d: can't create irq for msi!\n", > d->domain_id); > return -EINVAL; > } > if ( current->domain->target == d ) > ... > > But I wonder whether it would be better to place the irq_permit_access > in map_domain_pirq, together with the existing irq_permit_access that > grant the target domain permissions over the irq. That may be a good idea. Let me try that in v3. But I'll wait for a feedback on libxl patches first. > > + current->domain->target == d ) > > +{ > > +ret = irq_permit_access(current->domain, irq); > > +if ( ret ) { > > +dprintk(XENLOG_G_ERR, > > +"dom%d: can't grant it's stubdom (%d) access > > to " > > +"irq %d for msi: %d!\n", > > +d->domain_id, > > +
Re: [Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.
On Wed, Jan 16, 2019 at 01:20:04PM +0100, Roger Pau Monné wrote: > On Wed, Jan 16, 2019 at 11:52:18AM +0100, Marek Marczykowski-Górecki wrote: > > On Wed, Jan 16, 2019 at 10:21:29AM +0100, Roger Pau Monné wrote: > > > On Tue, Jan 15, 2019 at 04:36:31PM +0100, Marek Marczykowski-Górecki > > > wrote: > > > > From: Simon Gaiser > > > > > > > > Stubdomains need to be given sufficient privilege over the guest which > > > > it > > > > provides emulation for in order for PCI passthrough to work correctly. > > > > When a HVM domain try to enable MSI, QEMU in stubdomain calls > > > > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as > > > > part of xc_domain_update_msi_irq. Allow for that as part of > > > > PHYSDEVOP_map_pirq. > > > > > > I see, that's not a problem AFAICT for PCI INTx because the IRQ in > > > that case is known beforehand, and the stubdomain is given permissions > > > over this IRQ by libxl__device_pci_add (there's a do_pci_add against > > > the stubdomain). > > > > Exactly. > > I would maybe consider adding something like this to the commit > message, so it's clear why PCI INTx works but not MSI interrupts. > > > > > > > > > > > Based on > > > > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch > > > > by Eric Chanudet . > > > > > > > > Signed-off-by: Simon Gaiser > > > > Signed-off-by: Marek Marczykowski-Górecki > > > > > > > > --- > > > > This is only one part of fixing MSI with QEMU in stubdomain. The other > > > > part is allowing stubdomain to actually enable MSI in PCI config space. > > > > QEMU does that through pcifront/back connected to the stubdomain (see > > > > hw/xen/xen_pt_msi.c:msi_msix_enable()), but pciback by default refuse > > > > write to that register. > > > > Easy, less safe solution: enable permissive mode for the device. > > > > Safer solution - enable access to this register for stubdomain only > > > > (pciback patch that add such flag + libxl patch to set it for relevant > > > > devices) > > > > The whole story: > > > > https://www.qubes-os.org/news/2017/10/18/msi-support/ > > > > > > > > Any other ideas? Which one is preferred upstream? > > > > > > IMO, and please correct me if I'm wrong, QEMU in the stubdomain will > > > receive the PCI config space write to enable MSI, and since this > > > stub-QEMU runs in PV mode I think it should use the PV way to enable > > > MSI, ie: the same that Linux pcifront uses to enable MSI for > > > passed-through devices. > > > > > > Is this something that sounds sensible? > > > > We've considered this option too. Let me quote Simon on that (from the > > link above): > > > > The enable command that pcifront sends is intended for the normal PV use > > case where the device is passed to the VM itself (via pcifront) rather > > than to the stub domain target. While the command is called enable_msi, > > pciback does much more than simply setting the enable flag. It also > > configures IRQ handling in the dom0 kernel, adapts the MSI masking, and > > more. This makes sense in the PV case, but in the HVM case, the MSI > > configuration is done by QEMU, so this most likely won’t work correctly. > > Oh great, that's unfortunate. Both pciback functions end up calling > into msi_capability_init in the Linux kernel, which does indeed more > than just toggling the PCI config space enable bit. > > OTOH adding a bypass to pciback so the stubdom is able to write to the > PCI register in order to toggle the enable bit seems quite clumsy. Not > to mention that you would be required to update Dom0 kernel in order to > fix the issue. > > Do you think it makes sense to add a domctl to enable/disable MSI(X)? > > This way the bug could be fixed by just updating Xen (and the > stubdomain). Indeed in case of stubdomain, that would make sens, as other PCI passthrough related operations already bypass pcifront/back anyway. And I agree with Jan, that physdevop makes more sense, if going this way. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
On Wed, Jan 16, 2019 at 05:47:19PM +0100, Roger Pau Monné wrote: > On Tue, Jan 15, 2019 at 04:36:28PM +0100, Marek Marczykowski-Górecki wrote: > > HVM domains use IOMMU and device model assistance for communicating with > > PCI devices, xen-pcifront/pciback is used only in PV domains. > > You still need pciback in order to reset the device when it's > deassigned from the guest, so it's functionality is not only used by > PV guests. Right, I'll update the commit message to match v2 code. > > When HVM domain has device model in stubdomain, attaching xen-pciback to > > the target domain itself is not only useless, but also may prevent > > attaching xen-pciback to the stubdomain, effectively breaking PCI > > passthrough. > > Right. When doing passthrough with a stubdomain you want the target > domain to have the memory and IO regions mapped, and the stubdomain to > handle the rest. > > > Signed-off-by: Marek Marczykowski-Górecki > > --- > > Changes in v2: > > - previously called "libxl: attach xen-pciback only to PV domains" > > - instead of excluding all HVMs, change the condition to what actually > >matters here - check if stubdomain is in use; this way xen-pciback is > >always in use (either for the target domain, or it's stubdomain), > >fixing PCI reset by xen-pciback concerns > > --- > > tools/libxl/libxl_pci.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c > > index 87afa03..3b6b23c 100644 > > --- a/tools/libxl/libxl_pci.c > > +++ b/tools/libxl/libxl_pci.c > > @@ -1106,7 +1106,7 @@ out: > > } > > } > > > > -if (!starting) > > +if (!starting && !libxl_get_stubdom_id(CTX, domid)) > > This change seems to assume that both libxl_domain_config for the > target and the stubdomain will have the assigned pci devices in the > pcidevs field. Not really. libxl__device_pci_add() calls do_pci_add() for both stubdomain (if applicable) and target domain. > Yet I cannot see where the stubdomain > libxl_domain_config will get the pci devices from the target domain > assigned, I've looked in libxl__spawn_stub_dm but there doesn't seem > to be any copy from the target to the stubdom of the list of pci > devices. I guess I'm missing something. > > Thanks, Roger. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
On Thu, Jan 17, 2019 at 10:21:34AM +0100, Roger Pau Monné wrote: > OK. From an architectural PoV I think it would make more sense to copy > the list of pci devices to the stubdom config in libxl__spawn_stub_dm, > but I'm not that familiar with pci handling in libxl, so there might > be a reason why things are done like this currently. libxl would refuse to attach the same device to two domains and also will perform some operations twice (as for example device reset). As you can see even right now some things needs to be disabled for stubdomain or target domain, because of this. If device would be included in the config for both target domain and its stubdomain, the code would need some more exceptions, which I think would be even worse. Other thing is the current code attach PCI devices when device-model is already running (regardless of its version or using stubdomain). I guess qemu doesn't setup those devices otherwise (there is nothing on qemu command line about it and none of libxl part generate such options). I'm not really sure if that couldn't be done differently, but I'm kind of afraid changing to much in PCI passthrough related code... > The change LGTM, albeit I found the pci handling code quite hard to > follow. I'm also not sure whether certain parts of the code are > correct, for example the PCI INTx seems to be mapped to both the > stubdomain and the target domain, when it's only the target domain the > one that actually uses it. That's true. Generally, I think stubdomain needs only config space access for its own, other things should be done for the target domain. This include mapping interrupts of any kind. Stubdomain is given permission to them only to be able to map them to the target domain. This also may be the reason why PCI devices are not included in stubdomain's config, but only do_pci_add() is called. It may be that in the past some more parts of the device setup was skipped this way. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.
On Wed, Jan 16, 2019 at 11:52:21AM +0100, Marek Marczykowski-Górecki wrote: > On Wed, Jan 16, 2019 at 10:21:29AM +0100, Roger Pau Monné wrote: > > On Tue, Jan 15, 2019 at 04:36:31PM +0100, Marek Marczykowski-Górecki wrote: > > > From: Simon Gaiser > > > > > > Stubdomains need to be given sufficient privilege over the guest which it > > > provides emulation for in order for PCI passthrough to work correctly. > > > When a HVM domain try to enable MSI, QEMU in stubdomain calls > > > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as > > > part of xc_domain_update_msi_irq. Allow for that as part of > > > PHYSDEVOP_map_pirq. > > > > I see, that's not a problem AFAICT for PCI INTx because the IRQ in > > that case is known beforehand, and the stubdomain is given permissions > > over this IRQ by libxl__device_pci_add (there's a do_pci_add against > > the stubdomain). > > Exactly. > > > > > > > Based on > > > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch > > > by Eric Chanudet . > > > > > > Signed-off-by: Simon Gaiser > > > Signed-off-by: Marek Marczykowski-Górecki > > > > > > --- (...) > > > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c > > > index 8b44d6c..123ca69 100644 > > > --- a/xen/arch/x86/irq.c > > > +++ b/xen/arch/x86/irq.c > > > @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, > > > int index, int *pirq_p, > > > { > > > case MAP_PIRQ_TYPE_MULTI_MSI: > > > irq = create_irq(NUMA_NO_NODE); > > > +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) && > > > > This check is already performed below, maybe you could re-arrange the > > code as: > > > > case MAP_PIRQ_TYPE_MULTI_MSI: > > irq = create_irq(NUMA_NO_NODE); > > } > > > > if ( irq < nr_irqs_gsi || irq >= nr_irqs ) > > { > > dprintk(XENLOG_G_ERR, "dom%d: can't create irq for msi!\n", > > d->domain_id); > > return -EINVAL; > > } > > if ( current->domain->target == d ) > > ... > > > > But I wonder whether it would be better to place the irq_permit_access > > in map_domain_pirq, together with the existing irq_permit_access that > > grant the target domain permissions over the irq. > > That may be a good idea. Let me try that in v3. But I'll wait for a > feedback on libxl patches first. That doesn't work, as map_domain_pirq() check irq access earlier. Which bring be to a question, what should be rules guarding stubdomain access to PHYSDEVOP_map_pirq? With this patch, stubdomain will be able to create and map multiple irq (DoS possibility?), as only target domain is validated in practice. Is that ok? If not, what additional limits could be applied here? In INTx case the problem doesn't apply, because toolstack grant access to particular IRQ and no allocation happen on stubdomain request. But in MSI case, it isn't that easy as IRQ number isn't known before (as explained in the commit message). -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.
On Fri, Jan 25, 2019 at 08:43:59PM +0100, Marek Marczykowski-Górecki wrote: > On Wed, Jan 16, 2019 at 11:52:21AM +0100, Marek Marczykowski-Górecki wrote: > > On Wed, Jan 16, 2019 at 10:21:29AM +0100, Roger Pau Monné wrote: > > > On Tue, Jan 15, 2019 at 04:36:31PM +0100, Marek Marczykowski-Górecki > > > wrote: > > > > From: Simon Gaiser > > > > > > > > Stubdomains need to be given sufficient privilege over the guest which > > > > it > > > > provides emulation for in order for PCI passthrough to work correctly. > > > > When a HVM domain try to enable MSI, QEMU in stubdomain calls > > > > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as > > > > part of xc_domain_update_msi_irq. Allow for that as part of > > > > PHYSDEVOP_map_pirq. > > > > > > I see, that's not a problem AFAICT for PCI INTx because the IRQ in > > > that case is known beforehand, and the stubdomain is given permissions > > > over this IRQ by libxl__device_pci_add (there's a do_pci_add against > > > the stubdomain). > > > > Exactly. > > > > > > > > > > Based on > > > > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch > > > > by Eric Chanudet . > > > > > > > > Signed-off-by: Simon Gaiser > > > > Signed-off-by: Marek Marczykowski-Górecki > > > > > > > > --- > (...) > > > > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c > > > > index 8b44d6c..123ca69 100644 > > > > --- a/xen/arch/x86/irq.c > > > > +++ b/xen/arch/x86/irq.c > > > > @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, > > > > int index, int *pirq_p, > > > > { > > > > case MAP_PIRQ_TYPE_MULTI_MSI: > > > > irq = create_irq(NUMA_NO_NODE); > > > > +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) && > > > > > > This check is already performed below, maybe you could re-arrange the > > > code as: > > > > > > case MAP_PIRQ_TYPE_MULTI_MSI: > > > irq = create_irq(NUMA_NO_NODE); > > > } > > > > > > if ( irq < nr_irqs_gsi || irq >= nr_irqs ) > > > { > > > dprintk(XENLOG_G_ERR, "dom%d: can't create irq for msi!\n", > > > d->domain_id); > > > return -EINVAL; > > > } > > > if ( current->domain->target == d ) > > > ... Oh, and this won't fly either, as irq_permit_access() should happen only when create_irq() was called, otherwise it will give access to arbitrary irq. > > > > > > But I wonder whether it would be better to place the irq_permit_access > > > in map_domain_pirq, together with the existing irq_permit_access that > > > grant the target domain permissions over the irq. > > > > That may be a good idea. Let me try that in v3. But I'll wait for a > > feedback on libxl patches first. > > That doesn't work, as map_domain_pirq() check irq access earlier. > Which bring be to a question, what should be rules guarding stubdomain > access to PHYSDEVOP_map_pirq? With this patch, stubdomain will be able > to create and map multiple irq (DoS possibility?), as only target domain > is validated in practice. Is that ok? If not, what additional limits > could be applied here? > In INTx case the problem doesn't apply, because toolstack grant access > to particular IRQ and no allocation happen on stubdomain request. But in > MSI case, it isn't that easy as IRQ number isn't known before (as > explained in the commit message). > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] libxl: attach PCI device to qemu only after setting pciback/pcifront
On Thu, Jan 17, 2019 at 11:29:59AM +0100, Roger Pau Monné wrote: > On Tue, Jan 15, 2019 at 04:36:29PM +0100, Marek Marczykowski-Górecki wrote: > > When qemu is running in stubdomain, handling "pci-ins" command will fail > > if pcifront is not initialized already. Fix this by sending such command > > only after confirming that pciback/front is running. > > > > Signed-off-by: Marek Marczykowski-Górecki > > --- > > Changes in v2: > > - Fixed code style since previous version. > > --- > > tools/libxl/libxl_pci.c | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c > > index 3b6b23c..1bde537 100644 > > --- a/tools/libxl/libxl_pci.c > > +++ b/tools/libxl/libxl_pci.c > > @@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t > > domid, libxl_device_pci *pcide > > { > > libxl_ctx *ctx = libxl__gc_owner(gc); > > unsigned int orig_vdev, pfunc_mask; > > +char *be_path; > > libxl_device_pci *assigned; > > int num_assigned, i, rc; > > int stubdomid = 0; > > @@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t > > domid, libxl_device_pci *pcide > > rc = do_pci_add(gc, stubdomid, &pcidev_s, 0); > > if ( rc ) > > goto out; > > +/* Wait for the device actually being connected, otherwise device > > model > > + * running there will fail to find the device. */ > > +be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0", > > + libxl__xs_get_dompath(gc, 0), stubdomid); > > +rc = libxl__wait_for_backend(gc, be_path, > > + GCSPRINTF("%d", > > XenbusStateConnected)); > > +if (rc) > > +goto out; > > I think it would be better to use the async libxl functionality here, > see libxl__xswait_start. I will leave for the toolstack maintainers to > decide. Apart from that the change seems correct. libxl__device_pci_add() is not async-aware right now and it looks like converting it is quite a bit of work. I'd leave it out of scope for this patch series... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 1/6] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
HVM domains use IOMMU and device model assistance for communicating with PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain. But pciback serve also second function - it reset the device when it is deassigned from the guest and for this reason pciback needs to be used with HVM domain too. When HVM domain has device model in stubdomain, attaching xen-pciback to the target domain itself may prevent attaching xen-pciback to the (PV) stubdomain, effectively breaking PCI passthrough. Fix this by attaching pciback only to one domain: if PV stubdomain is in use, let it be stubdomain (the commit prevents attaching device to target HVM in this case); otherwise, attach it to the target domain. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - previously called "libxl: attach xen-pciback only to PV domains" - instead of excluding all HVMs, change the condition to what actually matters here - check if stubdomain is in use; this way xen-pciback is always in use (either for the target domain, or it's stubdomain), fixing PCI reset by xen-pciback concerns Changes in v3: - adjust commit message --- tools/libxl/libxl_pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 87afa03..3b6b23c 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1106,7 +1106,7 @@ out: } } -if (!starting) +if (!starting && !libxl_get_stubdom_id(CTX, domid)) rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting); else rc = 0; @@ -1302,7 +1302,7 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid, } } -if (d_config->num_pcidevs > 0) { +if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) { rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs, d_config->num_pcidevs); if (rc < 0) { -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 0/6] Fix PCI passthrough for HVM with stubdomain
In this version, I add PHYSDEVOP_msi_msix_set_enable to allow stubdomain enabling MSI after mapping it. This patch is rather RFC and probably will need adjustments (see comments after commit message there), if physdevop will be the way to go. Related article: https://www.qubes-os.org/news/2017/10/18/msi-support/ Changes in v2: - new "xen/x86: Allow stubdom access to irq created for msi" patch - applied review comments from v1 Changes is v3: - apply suggestions by Roger - add PHYSDEVOP_msi_msix_set_enable Marek Marczykowski-Górecki (5): libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use libxl: attach PCI device to qemu only after setting pciback/pcifront libxl: don't try to manipulate json config for stubdomain xen/x86: add PHYSDEVOP_msi_msix_set_enable tools/libxc: add wrapper for PHYSDEVOP_msi_msix_set_enable Simon Gaiser (1): xen/x86: Allow stubdom access to irq created for msi. tools/libxc/include/xenctrl.h | 7 - tools/libxc/xc_physdev.c | 21 - tools/libxl/libxl_pci.c | 61 xen/arch/x86/irq.c| 23 ++- xen/arch/x86/msi.c| 16 +- xen/arch/x86/physdev.c| 33 +++- xen/include/asm-x86/msi.h | 1 +- xen/include/public/physdev.h | 13 - 8 files changed, 155 insertions(+), 20 deletions(-) base-commit: 93a62c544e20ba9e141e411bbaae3d65259d13a3 -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 4/6] xen/x86: Allow stubdom access to irq created for msi.
From: Simon Gaiser Stubdomains need to be given sufficient privilege over the guest which it provides emulation for in order for PCI passthrough to work correctly. When a HVM domain try to enable MSI, QEMU in stubdomain calls PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as part of xc_domain_update_msi_irq. Allow for that as part of PHYSDEVOP_map_pirq. This is not needed for PCI INTx, because IRQ in that case is known beforehand and the stubdomain is given permissions over this IRQ by libxl__device_pci_add (there's a do_pci_add against the stubdomain). Based on https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch by Eric Chanudet . Signed-off-by: Simon Gaiser Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - extend commit message With this patch, stubdomain will be able to create and map multiple irq (DoS possibility?), as only target domain is validated in practice. Is that ok? If not, what additional limits could be applied here? In INTx case the problem doesn't apply, because toolstack grant access to particular IRQ and no allocation happen on stubdomain request. But in MSI case, it isn't that easy as IRQ number isn't known before (as explained in the commit message). --- xen/arch/x86/irq.c | 23 +++ xen/arch/x86/physdev.c | 9 + 2 files changed, 32 insertions(+) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 8b44d6c..67c67d4 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p, { case MAP_PIRQ_TYPE_MULTI_MSI: irq = create_irq(NUMA_NO_NODE); +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) && +current->domain->target == d ) +{ +ret = irq_permit_access(current->domain, irq); +if ( ret ) { +dprintk(XENLOG_G_ERR, +"dom%d: can't grant it's stubdom (%d) access to " +"irq %d for msi: %d!\n", +d->domain_id, +current->domain->domain_id, +irq, +ret); +return ret; +} +} } if ( irq < nr_irqs_gsi || irq >= nr_irqs ) @@ -2717,7 +2732,15 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p, case MAP_PIRQ_TYPE_MSI: if ( index == -1 ) case MAP_PIRQ_TYPE_MULTI_MSI: +{ +if ( current->domain->target == d && +irq_deny_access(current->domain, irq) ) +dprintk(XENLOG_G_ERR, +"dom%d: can't revoke stubdom's access to irq %d!\n", +d->domain_id, +irq); destroy_irq(irq); +} break; } } diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 3a3c158..de59e39 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -164,6 +164,15 @@ int physdev_unmap_pirq(domid_t domid, int pirq) pcidevs_lock(); spin_lock(&d->event_lock); +if ( current->domain->target == d) +{ +int irq = domain_pirq_to_irq(d, pirq); +if ( irq <= 0 || irq_deny_access(current->domain, irq) ) +dprintk(XENLOG_G_ERR, +"dom%d: can't revoke stubdom's access to irq %d!\n", +d->domain_id, +irq); +} ret = unmap_domain_pirq(d, pirq); spin_unlock(&d->event_lock); pcidevs_unlock(); -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 5/6] xen/x86: add PHYSDEVOP_msi_msix_set_enable
Allow device model running in stubdomain to enable/disable MSI(-X), bypassing pciback. While pciback is still used to access config space from within stubdomain, it refuse to write to PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE in non-permissive mode. Which is the right thing to do for PV domain (the main use case for pciback), as PV domain should use XEN_PCI_OP_* commands for that. Unfortunately those commands are not good for stubdomain use, as they configure MSI in dom0's kernel too, which should not happen for HVM domain. This new physdevop is allowed only for stubdomain controlling the domain which own the device. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - new patch This is rather RFC. Any suggestions for shorter name? Also, I'm not sure if physdev_msi_msix_set_enable.flag is the best name/idea. Should it be plugged into XSM? Any suggestions how exactly? New function with XSM_DM_PRIV default action? Should it get target domain only, or also machine_bdf? --- xen/arch/x86/msi.c | 16 xen/arch/x86/physdev.c | 24 xen/include/asm-x86/msi.h| 1 + xen/include/public/physdev.h | 13 + 4 files changed, 54 insertions(+) diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c index babc414..9ba934c 100644 --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1474,6 +1474,22 @@ int pci_restore_msi_state(struct pci_dev *pdev) return 0; } +int msi_msix_set_enable(struct pci_dev *pdev, int flag, int enable) +{ +if ( !current->domain->target || pdev->domain != current->domain->target ) +return -EPERM; + +switch ( flag ) { +case PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSI: +msi_set_enable(pdev, enable); +break; +case PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSIX: +msix_set_enable(pdev, enable); +break; +} +return 0; +} + void __init early_msi_init(void) { if ( use_msi < 0 ) diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index de59e39..822846a 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -671,6 +671,30 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_msi_msix_set_enable: { +struct physdev_msi_msix_set_enable op; +struct pci_dev *pdev; + +ret = -EFAULT; +if ( copy_from_guest(&op, arg, 1) ) +break; + +ret = -EINVAL; +if ( op.flag != PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSI && +op.flag != PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSIX ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(op.pci.seg, op.pci.bus, op.pci.devfn); +if ( pdev ) +ret = msi_msix_set_enable(pdev, op.flag, !!op.enable); +else +ret = -ENODEV; +pcidevs_unlock(); +break; + +} + default: ret = -ENOSYS; break; diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h index 10387dc..080bf24 100644 --- a/xen/include/asm-x86/msi.h +++ b/xen/include/asm-x86/msi.h @@ -252,5 +252,6 @@ void guest_mask_msi_irq(struct irq_desc *, bool mask); void ack_nonmaskable_msi_irq(struct irq_desc *); void end_nonmaskable_msi_irq(struct irq_desc *, u8 vector); void set_msi_affinity(struct irq_desc *, const cpumask_t *); +int msi_msix_set_enable(struct pci_dev *pdev, int flag, int enable); #endif /* __ASM_MSI_H */ diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index b6faf83..fd797c6 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -344,6 +344,19 @@ struct physdev_dbgp_op { typedef struct physdev_dbgp_op physdev_dbgp_op_t; DEFINE_XEN_GUEST_HANDLE(physdev_dbgp_op_t); +#define PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSI 0 +#define PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSIX 1 + +#define PHYSDEVOP_msi_msix_set_enable 32 +struct physdev_msi_msix_set_enable { +/* IN */ +struct physdev_pci_device pci; +uint8_t flag; +uint8_t enable; +}; +typedef struct physdev_msi_msix_set_enable physdev_msi_msix_set_enable_t; +DEFINE_XEN_GUEST_HANDLE(physdev_msi_msix_set_enable_t); + /* * Notify that some PIRQ-bound event channels have been unmasked. * ** This command is obsolete since interface version 0x00030202 and is ** -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 3/6] libxl: don't try to manipulate json config for stubdomain
Stubdomain do not have it's own config file - its configuration is derived from target domains. Do not try to manipulate it when attaching PCI device. This bug prevented starting HVM with stubdomain and PCI passthrough device attached. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - skip libxl__dm_check_start too, as stubdomain is guaranteed to be running at this stage already - do not init d_config at all, as it is used only for json manipulation --- tools/libxl/libxl_pci.c | 48 ++ 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 1bde537..8d159cf 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -120,10 +120,14 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d libxl_domain_config d_config; libxl_device_pci pcidev_saved; libxl__domain_userdata_lock *lock = NULL; +bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL); -libxl_domain_config_init(&d_config); -libxl_device_pci_init(&pcidev_saved); -libxl_device_pci_copy(CTX, &pcidev_saved, pcidev); +/* Stubdomain doesn't have own config. */ +if (!is_stubdomain) { +libxl_domain_config_init(&d_config); +libxl_device_pci_init(&pcidev_saved); +libxl_device_pci_copy(CTX, &pcidev_saved, pcidev); +} be_path = libxl__domain_device_backend_path(gc, 0, domid, 0, LIBXL__DEVICE_KIND_PCI); @@ -152,27 +156,33 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d GCNEW(device); libxl__device_from_pcidev(gc, domid, pcidev, device); -lock = libxl__lock_domain_userdata(gc, domid); -if (!lock) { -rc = ERROR_LOCK_FAIL; -goto out; -} +/* Stubdomain config is derived from its target domain, it doesn't have + its own file */ +if (!is_stubdomain) { +lock = libxl__lock_domain_userdata(gc, domid); +if (!lock) { +rc = ERROR_LOCK_FAIL; +goto out; +} -rc = libxl__get_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +rc = libxl__get_domain_configuration(gc, domid, &d_config); +if (rc) goto out; -device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, - &pcidev_saved); +device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, + &pcidev_saved); -rc = libxl__dm_check_start(gc, &d_config, domid); -if (rc) goto out; +rc = libxl__dm_check_start(gc, &d_config, domid); +if (rc) goto out; +} for (;;) { rc = libxl__xs_transaction_start(gc, &t); if (rc) goto out; -rc = libxl__set_domain_configuration(gc, domid, &d_config); -if (rc) goto out; +if (lock) { +rc = libxl__set_domain_configuration(gc, domid, &d_config); +if (rc) goto out; +} libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back)); @@ -184,8 +194,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d out: libxl__xs_transaction_abort(gc, &t); if (lock) libxl__unlock_domain_userdata(lock); -libxl_device_pci_dispose(&pcidev_saved); -libxl_domain_config_dispose(&d_config); +if (!is_stubdomain) { +libxl_device_pci_dispose(&pcidev_saved); +libxl_domain_config_dispose(&d_config); +} return rc; } -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 2/6] libxl: attach PCI device to qemu only after setting pciback/pcifront
When qemu is running in stubdomain, handling "pci-ins" command will fail if pcifront is not initialized already. Fix this by sending such command only after confirming that pciback/front is running. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - Fixed code style since previous version. --- tools/libxl/libxl_pci.c | 9 + 1 file changed, 9 insertions(+) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 3b6b23c..1bde537 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcide { libxl_ctx *ctx = libxl__gc_owner(gc); unsigned int orig_vdev, pfunc_mask; +char *be_path; libxl_device_pci *assigned; int num_assigned, i, rc; int stubdomid = 0; @@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcide rc = do_pci_add(gc, stubdomid, &pcidev_s, 0); if ( rc ) goto out; +/* Wait for the device actually being connected, otherwise device model + * running there will fail to find the device. */ +be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0", + libxl__xs_get_dompath(gc, 0), stubdomid); +rc = libxl__wait_for_backend(gc, be_path, + GCSPRINTF("%d", XenbusStateConnected)); +if (rc) +goto out; } orig_vdev = pcidev->vdevfn & ~7U; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 6/6] tools/libxc: add wrapper for PHYSDEVOP_msi_msix_set_enable
Add libxc wrapper for PHYSDEVOP_msi_msix_set_enable introduced in previous commit. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - new patch --- tools/libxc/include/xenctrl.h | 7 +++ tools/libxc/xc_physdev.c | 21 + 2 files changed, 28 insertions(+) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 31cdda7..2b86f4c 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1639,6 +1639,13 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_msi_msix_set_enable(xc_interface *xch, + int seg, + int bus, + int devfn, + int flag, + int enable); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libxc/xc_physdev.c b/tools/libxc/xc_physdev.c index 460a8e7..3530cb8 100644 --- a/tools/libxc/xc_physdev.c +++ b/tools/libxc/xc_physdev.c @@ -111,3 +111,24 @@ int xc_physdev_unmap_pirq(xc_interface *xch, return rc; } +int xc_physdev_msi_msix_set_enable(xc_interface *xch, + int seg, + int bus, + int devfn, + int flag, + int enable) +{ +int rc; +struct physdev_msi_msix_set_enable op; + +memset(&op, 0, sizeof(struct physdev_msi_msix_set_enable)); +op.pci.seg = seg; +op.pci.bus = bus; +op.pci.devfn = devfn; +op.flag = flag; +op.enable = enable; + +rc = do_physdev_op(xch, PHYSDEVOP_msi_msix_set_enable, &op, sizeof(op)); + +return rc; +} -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 4/6] xen/x86: Allow stubdom access to irq created for msi.
On Mon, Jan 28, 2019 at 02:50:00PM +, Wei Liu wrote: > On Sat, Jan 26, 2019 at 03:31:15AM +0100, Marek Marczykowski-Górecki wrote: > > From: Simon Gaiser > > > > Stubdomains need to be given sufficient privilege over the guest which it > > provides emulation for in order for PCI passthrough to work correctly. > > When a HVM domain try to enable MSI, QEMU in stubdomain calls > > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as > > part of xc_domain_update_msi_irq. Allow for that as part of > > PHYSDEVOP_map_pirq. > > > > This is not needed for PCI INTx, because IRQ in that case is known > > beforehand and the stubdomain is given permissions over this IRQ by > > libxl__device_pci_add (there's a do_pci_add against the stubdomain). > > > > Based on > > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch > > by Eric Chanudet . > > > > Signed-off-by: Simon Gaiser > > Signed-off-by: Marek Marczykowski-Górecki > > --- > > Changes in v3: > > - extend commit message > > > > With this patch, stubdomain will be able to create and map multiple irq > > (DoS possibility?), as only target domain is validated in practice. Is > > that ok? If not, what additional limits could be applied here? > > In INTx case the problem doesn't apply, because toolstack grant access > > to particular IRQ and no allocation happen on stubdomain request. But in > > MSI case, it isn't that easy as IRQ number isn't known before (as > > explained in the commit message). > > --- > > xen/arch/x86/irq.c | 23 +++ > > xen/arch/x86/physdev.c | 9 + > > 2 files changed, 32 insertions(+) > > > > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c > > index 8b44d6c..67c67d4 100644 > > --- a/xen/arch/x86/irq.c > > +++ b/xen/arch/x86/irq.c > > @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, int > > index, int *pirq_p, > > { > > case MAP_PIRQ_TYPE_MULTI_MSI: > > irq = create_irq(NUMA_NO_NODE); > > +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) && > > +current->domain->target == d ) > > +{ > > +ret = irq_permit_access(current->domain, irq); > > +if ( ret ) { > > +dprintk(XENLOG_G_ERR, > > +"dom%d: can't grant it's stubdom (%d) access > > to " > > + "irq %d for msi: %d!\n", > > +d->domain_id, > > +current->domain->domain_id, > > +irq, > > +ret); > > +return ret; > > Don't you need to deallocate the irq before returning? Yes, indeed. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 3/6] libxl: don't try to manipulate json config for stubdomain
On Mon, Jan 28, 2019 at 02:41:15PM +, Wei Liu wrote: > On Sat, Jan 26, 2019 at 03:31:14AM +0100, Marek Marczykowski-Górecki wrote: > > Stubdomain do not have it's own config file - its configuration is > > derived from target domains. Do not try to manipulate it when attaching > > PCI device. > > > > So if we add the same configuration to stubdom as well, what would > happen? I guess libxl will try to attach the PCI devices to both the > stubdom and DomU? I'm not sure if I understand you here. You mean adding configuration file for stubdomain, the one managed by libxl__get_domain_configuration/libxl__set_domain_configuration? In theory it would work just fine, but in practice I fear all kind of desynchronization bugs, like adding device to target domain's config, but not stubdomain's one in some failure handling case. We'd have 4 things to care about: - attaching device to target domain - attaching device to stubdomain - saving device to target domain's config - saving device to stubdomain's config Handling all the failure cases properly will become quite complex, especially if we add async callbacks into the mix. Since stubdomain config is deterministically build based on target domian's config, I don't think adding such complexity is a good idea. > > This bug prevented starting HVM with stubdomain and PCI passthrough > > device attached. > > > > Signed-off-by: Marek Marczykowski-Górecki > > --- > > Changes in v3: > > - skip libxl__dm_check_start too, as stubdomain is guaranteed to be > >running at this stage already > > - do not init d_config at all, as it is used only for json manipulation > > --- > > tools/libxl/libxl_pci.c | 48 ++ > > 1 file changed, 30 insertions(+), 18 deletions(-) > > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c > > index 1bde537..8d159cf 100644 > > --- a/tools/libxl/libxl_pci.c > > +++ b/tools/libxl/libxl_pci.c > > @@ -120,10 +120,14 @@ static int libxl__device_pci_add_xenstore(libxl__gc > > *gc, uint32_t domid, libxl_d > > libxl_domain_config d_config; > > libxl_device_pci pcidev_saved; > > libxl__domain_userdata_lock *lock = NULL; > > +bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL); > > > > -libxl_domain_config_init(&d_config); > > -libxl_device_pci_init(&pcidev_saved); > > -libxl_device_pci_copy(CTX, &pcidev_saved, pcidev); > > +/* Stubdomain doesn't have own config. */ > > +if (!is_stubdomain) { > > +libxl_domain_config_init(&d_config); > > +libxl_device_pci_init(&pcidev_saved); > > +libxl_device_pci_copy(CTX, &pcidev_saved, pcidev); > > +} > > > > be_path = libxl__domain_device_backend_path(gc, 0, domid, 0, > > LIBXL__DEVICE_KIND_PCI); > > @@ -152,27 +156,33 @@ static int libxl__device_pci_add_xenstore(libxl__gc > > *gc, uint32_t domid, libxl_d > > GCNEW(device); > > libxl__device_from_pcidev(gc, domid, pcidev, device); > > > > -lock = libxl__lock_domain_userdata(gc, domid); > > -if (!lock) { > > -rc = ERROR_LOCK_FAIL; > > -goto out; > > -} > > +/* Stubdomain config is derived from its target domain, it doesn't have > > + its own file */ > > Although comment style isn't specified in CODING_STYLE, I would like to > fix this to > > /* > * Stubdom ... > * ... own file. > */ Ok. > > +if (!is_stubdomain) { > > +lock = libxl__lock_domain_userdata(gc, domid); > > +if (!lock) { > > +rc = ERROR_LOCK_FAIL; > > +goto out; > > +} > > > > -rc = libxl__get_domain_configuration(gc, domid, &d_config); > > -if (rc) goto out; > > +rc = libxl__get_domain_configuration(gc, domid, &d_config); > > +if (rc) goto out; > > > > -device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, > > - &pcidev_saved); > > +device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype, > > + &pcidev_saved); > > > > -rc = libxl__dm_check_start(gc, &d_config, domid); > > -if (rc) goto out; > > +rc = libxl__dm_check_start(gc, &d_config, domid); > > +if (rc) goto out; > > +} > > > > for (;;) { > >
[Xen-devel] [PATCH v3 13/17] libxl: use vchan for QMP access with Linux stubdomain, non-async version
Access to QMP of QEMU in Linux stubdomain is possible over vchan connection. Add appropriate handling to synchronous API. Since only one client can be connected to vchan server at the same time and it is not enforced by the libxenvchan itself, additional client-side locking is needed. Note that qemu supports only one simultaneous client on a control socket anyway (but in UNIX socket case, it enforce it server-side, so it doesn't add any extra limitation. Ideally, I woudn't need (or want) this part, and would communicate with stubdomain only with async API. But some libxl public functions that are not async-compatible do need to call qmp commands (for example libxl_domain_unpause). Alternative to this patch, would be return error, breaking all such functions, and incrementally convert them to async API. Signed-off-by: Marek Marczykowski-Górecki --- Two TODOs here: - handle locking, similarly to previous patch - select() qmp_next() can now be triggered by malicious stubdomain without actually sending any data, which would evade timeout enforcing (as select() each time is called with full timeout); the easiest thing to do, would be to re-use timeout value from previous select() call, but that works only on Linux; any better idea? Note that even without using vchan, malicious qemu could cause qmp_next() to wait almost indefinitely - by writing one byte at a time and never finishing the message Changes in v3: - new patch, in place of "libxl: access QMP socket via console for qemu-in-stubdomain" --- tools/libxl/libxl_qmp.c | 61 -- 1 file changed, 47 insertions(+), 14 deletions(-) diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index 45b9f74..19ce3ce 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -113,6 +113,7 @@ typedef struct callback_id_pair { struct libxl__qmp_handler { int qmp_fd; +struct libxenvchan *vchan; bool connected; time_t timeout; /* wait_for_id will be used by the synchronous send function */ @@ -496,7 +497,10 @@ static void qmp_close(libxl__qmp_handler *qmp) callback_id_pair *pp = NULL; callback_id_pair *tmp = NULL; -close(qmp->qmp_fd); +if (qmp->vchan) +libxenvchan_close(qmp->vchan); +else +close(qmp->qmp_fd); LIBXL_STAILQ_FOREACH(pp, &qmp->callback_list, next) { free(tmp); tmp = pp; @@ -516,7 +520,7 @@ static int qmp_next(libxl__gc *gc, libxl__qmp_handler *qmp) do { fd_set rfds; -int ret = 0; +int ret = 1; /* used when select() is skipped */ struct timeval timeout = { .tv_sec = qmp->timeout, .tv_usec = 0, @@ -525,7 +529,8 @@ static int qmp_next(libxl__gc *gc, libxl__qmp_handler *qmp) FD_ZERO(&rfds); FD_SET(qmp->qmp_fd, &rfds); -ret = select(qmp->qmp_fd + 1, &rfds, NULL, NULL, &timeout); +if (!(qmp->vchan && libxenvchan_data_ready(qmp->vchan))) +ret = select(qmp->qmp_fd + 1, &rfds, NULL, NULL, &timeout); if (ret == 0) { LOGD(ERROR, qmp->domid, "timeout"); return -1; @@ -536,7 +541,14 @@ static int qmp_next(libxl__gc *gc, libxl__qmp_handler *qmp) return -1; } -rd = read(qmp->qmp_fd, qmp->buffer, QMP_RECEIVE_BUFFER_SIZE); +if (qmp->vchan) { +libxenvchan_wait(qmp->vchan); +if (!libxenvchan_data_ready(qmp->vchan)) +continue; +rd = libxenvchan_read(qmp->vchan, qmp->buffer, QMP_RECEIVE_BUFFER_SIZE); +} else { +rd = read(qmp->qmp_fd, qmp->buffer, QMP_RECEIVE_BUFFER_SIZE); +} if (rd == 0) { LOGD(ERROR, qmp->domid, "Unexpected end of socket"); return -1; @@ -684,9 +696,15 @@ static int qmp_send(libxl__qmp_handler *qmp, goto out; } -if (libxl_write_exactly(qmp->ctx, qmp->qmp_fd, buf, strlen(buf), -"QMP command", "QMP socket")) -goto out; +if (qmp->vchan) { +/* vchan->blocking == 1, so no need to wrap it in a loop */ +if (libxenvchan_write(qmp->vchan, buf, strlen(buf)) == -1) +goto out; +} else { +if (libxl_write_exactly(qmp->ctx, qmp->qmp_fd, buf, strlen(buf), +"QMP command", "QMP socket")) +goto out; +} rc = qmp->last_id_used; out: @@ -798,19 +816,34 @@ libxl__qmp_handler *libxl__qmp_initialize(libxl__gc *gc, uint32_t domid) { int ret = 0; libxl__qmp_handler *qmp = NULL; -char *qmp_socket; +int dm_domid; +char *qmp_path; qmp = qmp_init_handler(gc, domid); if (!qmp) return NULL; -qmp_socket = GCSPRINTF(&
[Xen-devel] [PATCH v3 05/17] libxl: Handle Linux stubdomain specific QEMU options.
From: Eric Shelton This patch creates an appropriate command line for the QEMU instance running in a Linux-based stubdomain. NOTE: a number of items are not currently implemented for Linux-based stubdomains, such as: - save/restore - QMP socket - graphics output (e.g., VNC) Signed-off-by: Eric Shelton Simon: * fix disk path * fix cdrom path and "format" * pass downscript for network interfaces Signed-off-by: Simon Gaiser [drop Qubes-specific parts] Signed-off-by: Marek Marczykowski-Górecki --- Changes in v2: - fix serial specified with serial=[ ... ] syntax - error out on multiple consoles (incompatible with stubdom) - drop erroneous chunk about cdrom Changes in v3: - change to use libxl__stubdomain_is_linux instead of b_info->stubdomain_version - drop libxl__stubdomain_version_running, prefer libxl__stubdomain_is_linux_running introduced by previous patch - drop ifup/ifdown script - stubdomain will handle that with qemu events itself - slightly simplify -serial argument - add support for multiple serial consoles, do not ignore b_info.u.serial(_list) - add error checking for more than 26 emulated disks ("/dev/xvd%c" format string) --- tools/libxl/libxl_create.c | 40 +++- tools/libxl/libxl_dm.c | 176 +--- tools/libxl/libxl_internal.h | 1 +- tools/libxl/libxl_mem.c | 6 +- tools/libxl/libxl_types.idl | 3 +- 5 files changed, 171 insertions(+), 55 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index bb62542..13fc304 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -160,6 +160,31 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, } } +if (b_info->type == LIBXL_DOMAIN_TYPE_HVM && +libxl_defbool_val(b_info->device_model_stubdomain)) { +if (!b_info->stubdomain_kernel) { +switch (b_info->device_model_version) { +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: +b_info->stubdomain_kernel = +libxl__abs_path(NOGC, "ioemu-stubdom.gz", libxl__xenfirmwaredir_path()); +b_info->stubdomain_ramdisk = NULL; +break; +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: +b_info->stubdomain_kernel = +libxl__abs_path(NOGC, +"stubdom-linux-kernel", +libxl__xenfirmwaredir_path()); +b_info->stubdomain_ramdisk = +libxl__abs_path(NOGC, +"stubdom-linux-rootfs", +libxl__xenfirmwaredir_path()); +break; +default: +abort(); +} +} +} + if (!b_info->max_vcpus) b_info->max_vcpus = 1; if (!b_info->avail_vcpus.size) { @@ -195,6 +220,17 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, if (b_info->target_memkb == LIBXL_MEMKB_DEFAULT) b_info->target_memkb = b_info->max_memkb; +if (b_info->stubdomain_memkb == LIBXL_MEMKB_DEFAULT) { +if (libxl_defbool_val(b_info->device_model_stubdomain)) { +if (libxl__stubdomain_is_linux(b_info)) +b_info->stubdomain_memkb = LIBXL_LINUX_STUBDOM_MEM * 1024; +else +b_info->stubdomain_memkb = 28 * 1024; // MiniOS +} else { +b_info->stubdomain_memkb = 0; // no stubdomain +} +} + libxl_defbool_setdefault(&b_info->claim_mode, false); libxl_defbool_setdefault(&b_info->localtime, false); @@ -1538,7 +1574,9 @@ static void domcreate_devmodel_started(libxl__egc *egc, if (dcs->sdss.dm.guest_domid) { if (d_config->b_info.device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { -libxl__qmp_initializations(gc, domid, d_config); +if (!libxl_defbool_val(d_config->b_info.device_model_stubdomain)) { +libxl__qmp_initializations(gc, domid, d_config); +} } } diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 3601b14..1fa4fa8 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1169,6 +1169,7 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, int i, connection, devid; uint64_t ram_size; const char *path, *chardev; +bool is_stubdom = libxl_defbool_val(b_info->device_model_stubdomain); dm_args = flexarray_make(gc, 16, 1); dm_envs = flexarray_make(gc, 16, 1); @@ -1179,30 +1180,33 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, "-xen-domid", GCSPRINTF("%d", guest_domid), NULL);
[Xen-devel] [PATCH v3 09/17] tools/libvchan: notify server when client is connected
Let the server know when the client is connected. Otherwise server will notice only when client send some data. This change does not break existing clients, as libvchan user should handle spurious notifications anyway (for example acknowledge of remote side reading the data). Signed-off-by: Marek Marczykowski-Górecki --- I had this patch in Qubes for a long time and totally forgot it wasn't upstream thing... --- tools/libvchan/init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/libvchan/init.c b/tools/libvchan/init.c index 180833d..50a64c1 100644 --- a/tools/libvchan/init.c +++ b/tools/libvchan/init.c @@ -447,6 +447,9 @@ struct libxenvchan *libxenvchan_client_init(struct xentoollog_logger *logger, ctrl->ring->cli_live = 1; ctrl->ring->srv_notify = VCHAN_NOTIFY_WRITE; +/* wake up the server */ +xenevtchn_notify(ctrl->event, ctrl->event_port); + out: if (xs) xs_daemon_close(xs); -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 08/17] xl: add stubdomain related options to xl config parser
Signed-off-by: Marek Marczykowski-Górecki Reviewed-by: Jason Andryuk --- docs/man/xl.cfg.5.pod.in | 23 +++ tools/xl/xl_parse.c | 7 +++ 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index 3b92f39..f475196 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -2694,10 +2694,25 @@ model which they were installed with. =item B -Override the path to the binary to be used as the device-model. The -binary provided here MUST be consistent with the -B which you have specified. You should not -normally need to specify this option. +Override the path to the binary to be used as the device-model running in +toolstack domain. The binary provided here MUST be consistent with the +B which you have specified. You should not normally need +to specify this option. + +=item B + +Override the path to the kernel image used as device-model stubdomain. +The binary provided here MUST be consistent with the +B which you have specified. +In case of B it is expected to be MiniOS-based stubdomain +image, in case of B it is expected to be Linux-based stubdomain +kernel. + +=item B + +Override the path to the ramdisk image used as device-model stubdomain. +The binary provided here is to be used by a kernel pointed by B. +It is known to be used only by Linux-based stubdomain kernel. =item B diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index 352cd21..77e4cf0 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -2502,6 +2502,13 @@ skip_usbdev: xlu_cfg_replace_string(config, "device_model_user", &b_info->device_model_user, 0); +xlu_cfg_replace_string (config, "stubdomain_kernel", +&b_info->stubdomain_kernel, 0); +xlu_cfg_replace_string (config, "stubdomain_ramdisk", +&b_info->stubdomain_ramdisk, 0); +if (!xlu_cfg_get_long (config, "stubdomain_memory", &l, 0)) +b_info->stubdomain_memkb = l * 1024; + #define parse_extra_args(type)\ e = xlu_cfg_get_list_as_string_list(config, "device_model_args"#type, \ &b_info->extra##type, 0);\ -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 11/17] libxl: move xswait declaration up in libxl_internal.h
It will be needed for qmp_ev_* over vchan. No functional change. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - new patch --- tools/libxl/libxl_internal.h | 108 ++-- 1 file changed, 54 insertions(+), 54 deletions(-) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index a26d36e..0095835 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -358,6 +358,60 @@ struct libxl__ev_child { LIBXL_LIST_ENTRY(struct libxl__ev_child) entry; }; +/*- xswait: wait for a xenstore node to be suitable -*/ + +typedef struct libxl__xswait_state libxl__xswait_state; + +/* + * rc describes the circumstances of this callback: + * + * rc==0 + * + * The xenstore path (may have) changed. It has been read for + * you. The result is in data (allocated from the ao gc). + * data may be NULL, which means that the xenstore read gave + * ENOENT. + * + * If you are satisfied, you MUST call libxl__xswait_stop. + * Otherwise, xswait will continue waiting and watching and + * will call you back later. + * + * rc==ERROR_TIMEDOUT, rc==ERROR_ABORTED + * + * The specified timeout was reached. + * This has NOT been logged (except to the debug log). + * xswait will not continue (but calling libxl__xswait_stop is OK). + * + * rc!=0, !=ERROR_TIMEDOUT, !=ERROR_ABORTED + * + * Some other error occurred. + * This HAS been logged. + * xswait will not continue (but calling libxl__xswait_stop is OK). + * + * xswait.path may start with with '@', in which case no read is done + * and the callback will always get data==0. + */ +typedef void libxl__xswait_callback(libxl__egc *egc, + libxl__xswait_state *xswa, int rc, const char *data); + +struct libxl__xswait_state { +/* caller must fill these in, and they must all remain valid */ +libxl__ao *ao; +const char *what; /* for error msgs: noun phrase, what we're waiting for */ +const char *path; +int timeout_ms; /* as for poll(2) */ +libxl__xswait_callback *callback; +/* remaining fields are private to xswait */ +libxl__ev_time time_ev; +libxl__ev_xswatch watch_ev; +}; + +void libxl__xswait_init(libxl__xswait_state*); +void libxl__xswait_stop(libxl__gc*, libxl__xswait_state*); /*idempotent*/ +bool libxl__xswait_inuse(const libxl__xswait_state *ss); + +int libxl__xswait_start(libxl__gc*, libxl__xswait_state*); + /* * QMP asynchronous calls * @@ -1401,60 +1455,6 @@ _hidden const char *libxl__device_nic_devname(libxl__gc *gc, _hidden int libxl__get_domid(libxl__gc *gc, uint32_t *domid); -/*- xswait: wait for a xenstore node to be suitable -*/ - -typedef struct libxl__xswait_state libxl__xswait_state; - -/* - * rc describes the circumstances of this callback: - * - * rc==0 - * - * The xenstore path (may have) changed. It has been read for - * you. The result is in data (allocated from the ao gc). - * data may be NULL, which means that the xenstore read gave - * ENOENT. - * - * If you are satisfied, you MUST call libxl__xswait_stop. - * Otherwise, xswait will continue waiting and watching and - * will call you back later. - * - * rc==ERROR_TIMEDOUT, rc==ERROR_ABORTED - * - * The specified timeout was reached. - * This has NOT been logged (except to the debug log). - * xswait will not continue (but calling libxl__xswait_stop is OK). - * - * rc!=0, !=ERROR_TIMEDOUT, !=ERROR_ABORTED - * - * Some other error occurred. - * This HAS been logged. - * xswait will not continue (but calling libxl__xswait_stop is OK). - * - * xswait.path may start with with '@', in which case no read is done - * and the callback will always get data==0. - */ -typedef void libxl__xswait_callback(libxl__egc *egc, - libxl__xswait_state *xswa, int rc, const char *data); - -struct libxl__xswait_state { -/* caller must fill these in, and they must all remain valid */ -libxl__ao *ao; -const char *what; /* for error msgs: noun phrase, what we're waiting for */ -const char *path; -int timeout_ms; /* as for poll(2) */ -libxl__xswait_callback *callback; -/* remaining fields are private to xswait */ -libxl__ev_time time_ev; -libxl__ev_xswatch watch_ev; -}; - -void libxl__xswait_init(libxl__xswait_state*); -void libxl__xswait_stop(libxl__gc*, libxl__xswait_state*); /*idempotent*/ -bool libxl__xswait_inuse(const libxl__xswait_state *ss); - -int libxl__xswait_start(libxl__gc*, libxl__xswait_state*); - /* * libxl__ev_devstate - waits a given time for a device to * reach a given state. Follows the libxl_ev_* conventions. -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 15/17] tools: add missing libxenvchan cflags
libxenvchan.h include xenevtchn.h and xengnttab.h, so applications built with it needs applicable -I in CFLAGS too. Signed-off-by: Marek Marczykowski-Górecki --- tools/Rules.mk | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/Rules.mk b/tools/Rules.mk index 12b3129..a7c6c21 100644 --- a/tools/Rules.mk +++ b/tools/Rules.mk @@ -160,7 +160,7 @@ SHDEPS_libxenstat = $(SHLIB_libxenctrl) $(SHLIB_libxenstore) LDLIBS_libxenstat = $(SHDEPS_libxenstat) $(XEN_LIBXENSTAT)/libxenstat$(libextension) SHLIB_libxenstat = $(SHDEPS_libxenstat) -Wl,-rpath-link=$(XEN_LIBXENSTAT) -CFLAGS_libxenvchan = -I$(XEN_LIBVCHAN) +CFLAGS_libxenvchan = -I$(XEN_LIBVCHAN) $(CFLAGS_libxengnttab) $(CFLAGS_libxenevtchn) SHDEPS_libxenvchan = $(SHLIB_libxentoollog) $(SHLIB_libxenstore) $(SHLIB_libxenevtchn) $(SHLIB_libxengnttab) LDLIBS_libxenvchan = $(SHDEPS_libxenvchan) $(XEN_LIBVCHAN)/libxenvchan$(libextension) SHLIB_libxenvchan = $(SHDEPS_libxenvchan) -Wl,-rpath-link=$(XEN_LIBVCHAN) -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 14/17] libxl: add save/restore support for qemu-xen in stubdomain
Rely on a wrapper script in stubdomain to attach FD 3/4 of qemu to relevant consoles. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - adjust for qmp_ev* - assume specific fdset id in qemu set in stubdomain --- tools/libxl/libxl_dm.c | 23 +++ tools/libxl/libxl_qmp.c | 25 - 2 files changed, 35 insertions(+), 13 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index b9a53f3..ce65321 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1715,10 +1715,17 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, } if (state->saved_state) { -/* This file descriptor is meant to be used by QEMU */ -*dm_state_fd = open(state->saved_state, O_RDONLY); -flexarray_append(dm_args, "-incoming"); -flexarray_append(dm_args, GCSPRINTF("fd:%d",*dm_state_fd)); +if (is_stubdom) { +/* Linux stubdomain connects specific FD to STUBDOM_CONSOLE_RESTORE + */ +flexarray_append(dm_args, "-incoming"); +flexarray_append(dm_args, "fd:3"); +} else { +/* This file descriptor is meant to be used by QEMU */ +*dm_state_fd = open(state->saved_state, O_RDONLY); +flexarray_append(dm_args, "-incoming"); +flexarray_append(dm_args, GCSPRINTF("fd:%d",*dm_state_fd)); +} } for (i = 0; b_info->extra && b_info->extra[i] != NULL; i++) flexarray_append(dm_args, b_info->extra[i]); @@ -2192,14 +2199,6 @@ void libxl__spawn_stub_dm(libxl__egc *egc, libxl__stub_dm_spawn_state *sdss) assert(libxl_defbool_val(guest_config->b_info.device_model_stubdomain)); -if (libxl__stubdomain_is_linux(&guest_config->b_info)) { -if (d_state->saved_state) { -LOG(ERROR, "Save/Restore not supported yet with Linux Stubdom."); -ret = -1; -goto out; -} -} - sdss->pvqemu.guest_domid = 0; libxl_domain_create_info_init(&dm_config->c_info); diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index 19ce3ce..7643f15 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -1328,6 +1328,7 @@ static void dm_stopped(libxl__egc *egc, libxl__ev_qmp *ev, const libxl__json_object *response, int rc); static void dm_state_fd_ready(libxl__egc *egc, libxl__ev_qmp *ev, const libxl__json_object *response, int rc); +static void dm_state_save_to_fdset(libxl__egc *egc, libxl__ev_qmp *ev, int fdset); static void dm_state_saved(libxl__egc *egc, libxl__ev_qmp *ev, const libxl__json_object *response, int rc); @@ -1360,10 +1361,17 @@ static void dm_stopped(libxl__egc *egc, libxl__ev_qmp *ev, EGC_GC; libxl__domain_suspend_state *dsps = CONTAINER_OF(ev, *dsps, qmp); const char *const filename = dsps->dm_savefile; +uint32_t dm_domid = libxl_get_stubdom_id(CTX, dsps->domid); if (rc) goto error; +if (dm_domid) { +/* see Linux stubdom interface in docs/stubdom.txt */ +dm_state_save_to_fdset(egc, ev, 1); +return; +} + ev->payload_fd = open(filename, O_WRONLY | O_CREAT, 0600); if (ev->payload_fd < 0) { LOGED(ERROR, ev->domid, @@ -1394,7 +1402,6 @@ static void dm_state_fd_ready(libxl__egc *egc, libxl__ev_qmp *ev, EGC_GC; int fdset; const libxl__json_object *o; -libxl__json_object *args = NULL; libxl__domain_suspend_state *dsps = CONTAINER_OF(ev, *dsps, qmp); close(ev->payload_fd); @@ -1409,6 +1416,21 @@ static void dm_state_fd_ready(libxl__egc *egc, libxl__ev_qmp *ev, goto error; } fdset = libxl__json_object_get_integer(o); +dm_state_save_to_fdset(egc, ev, fdset); +return; + +error: +assert(rc); +libxl__remove_file(gc, dsps->dm_savefile); +dsps->callback_device_model_done(egc, dsps, rc); +} + +static void dm_state_save_to_fdset(libxl__egc *egc, libxl__ev_qmp *ev, int fdset) +{ +EGC_GC; +int rc; +libxl__json_object *args = NULL; +libxl__domain_suspend_state *dsps = CONTAINER_OF(ev, *dsps, qmp); ev->callback = dm_state_saved; @@ -1426,6 +1448,7 @@ static void dm_state_fd_ready(libxl__egc *egc, libxl__ev_qmp *ev, error: assert(rc); +/* TODO: only in non-stubdom case */ libxl__remove_file(gc, dsps->dm_savefile); dsps->callback_device_model_done(egc, dsps, rc); } -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 03/17] libxl: fix qemu-trad cmdline for no sdl/vnc case
When qemu is running in stubdomain, any attempt to initialize vnc/sdl there will crash it (on failed attempt to load a keymap from a file). If vfb is present, all those cases are skipped. But since b053f0c4c9e533f3d97837cf897eb920b8355ed3 "libxl: do not start dom0 qemu for stubdomain when not needed" it is possible to create a stubdomain without vfb and contrary to the comment -vnc none do trigger VNC initialization code (just skips exposing it externally). Change the implicit SDL avoiding method to -nographics option, used when none of SDL or VNC is enabled. Signed-off-by: Marek Marczykowski-Górecki Reviewed-by: Jason Andryuk Acked-by: Ian Jackson Acked-by: Wei Liu --- Changes in v2: - typo in qemu option Changes in v3: - add missing { } --- tools/libxl/libxl_dm.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index b245956..3601b14 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -715,14 +715,15 @@ static int libxl__build_device_model_args_old(libxl__gc *gc, if (libxl_defbool_val(vnc->findunused)) { flexarray_append(dm_args, "-vncunused"); } -} else +} else if (!sdl) { /* * VNC is not enabled by default by qemu-xen-traditional, - * however passing -vnc none causes SDL to not be - * (unexpectedly) enabled by default. This is overridden by - * explicitly passing -sdl below as required. + * however skipping -vnc causes SDL to be + * (unexpectedly) enabled by default. If undesired, disable graphics at + * all. */ -flexarray_append_pair(dm_args, "-vnc", "none"); +flexarray_append(dm_args, "-nographic"); +} if (sdl) { flexarray_append(dm_args, "-sdl"); -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 02/17] Document ioemu Linux stubdomain protocol
Add documentation for upcoming Linux stubdomain for qemu-upstream. Signed-off-by: Marek Marczykowski-Górecki --- docs/misc/stubdom.txt | 50 - 1 file changed, 50 insertions(+) diff --git a/docs/misc/stubdom.txt b/docs/misc/stubdom.txt index 4c524f2..9c94c6b 100644 --- a/docs/misc/stubdom.txt +++ b/docs/misc/stubdom.txt @@ -75,6 +75,56 @@ Defined commands: - "running" - success +Toolstack to Linux ioemu stubdomain protocol + + +This section describe communication protocol between toolstack and +qemu-upstream running in Linux stubdomain. The protocol include +expectations of both stubdomain, and qemu. + +Setup (done by toolstack, expected by stubdomain): + - Block devices for target domain are connected as PV disks to stubdomain, + according to configuration order, starting with xvda + - Network devices for target domain are connected as PV nics to stubdomain, + according to configuration order, starting with 0 + - [not implemented] if graphics output is expected, VFB and VKB devices are set for stubdomain + (its backend is responsible for exposing them using appropriate protocol + like VNC or Spice) + - other target domain's devices are not connected at this point to stubdomain + (may be hot-plugged later) + - QEMU command line is stored in + /vm//image/dmargs xenstore dir, each argument as separate key + in form /vm//image/dmargs/NNN, where NNN is 0-padded argument + number + - target domain id is stored in /local/domain//target xenstore path +?? - bios type is stored in /local/domain//hvmloader/bios + - stubdomain's console 0 is connected to qemu log file + - stubdomain's console 1 is connected to qemu save file (for saving state) + - stubdomain's console 2 is connected to qemu save file (for restoring state) + - next consoles are connected according to target guest's serial console configuration + +Environment exposed by stubdomain to qemu (needed to construct appropriate qemu command line and later interact with qmp): + - target domain's disks are available as /dev/xvd[a-z] + - console 2 (incoming domain state) is connected with FD 3 + - console 1 (saving domain state) is added over QMP to qemu as "fdset-id 1" (done by stubdomain, toolstack doesn't need to care about it) + - nics are connected to relevant stubdomain PV vifs when available (qemu -netdev should specify ifname= explicitly) + +Startup: +1. toolstack starts PV stubdomain with stubdom-linux-kernel kernel and stubdom-linux-initrd initrd +2. stubdomain initialize relevant devices +3. stubdomain starts qemu with requested command line, plus few stubdomain specific ones - including local qmp access options +4. stubdomain starts vchan server on /local/domain//device-model//qmp-vchan, exposing qmp socket to the toolstack +5. qemu signal readiness by writing "running" to /local/domain//device-model//state xenstore path +6. now device model is considered running + +QEMU can be controlled using QMP over vchan at /local/domain//device-model//qmp-vchan. Only one simultaneous connection is supported and toolstack needs to ensure that. + +Limitations: + - PCI passthrough require permissive mode + - only one nic is supported + - at most 26 emulated disks are supported (more are still available as PV disks) + - graphics output (VNC/SDL/Spice) not supported + PV-GRUB === -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain.
General idea is to allow freely set device_model_version and device_model_stubdomain_override and choose the right options based on this choice. Also, allow to specific path to stubdomain kernel/ramdisk, for greater flexibility. First two patches add documentation about expected toolstack-stubdomain-qemu interface, both for MiniOS stubdomain and Linux stubdomain. Initial version has no QMP support - in initial patches it is completely disabled, which means no suspend/restore and no PCI passthrough. Later patches add QMP over libvchan connection support. This means libxenlight will be linked with libxenvchan. Ideally talking to stubdomain would be added only to new libxl__qmp_ev* API, which is written from start with the assumption of untrusted qemu. But unfortunately some parts of libxl public API that calls into QMP, are not async-aware and can't use libxl__qmp_ev* API. Example of such API is libxl_domain_unpause(). Because of this, separate patch add support for QMP over vchan also to the old API. There is also a need for external locking access to vchan connection (one server can handle only one client and libvchan does not verify this). Since I haven't found any asynchronous locking primitives in libxl, for now I've added flock() on a lock file, but this also needs to be converted to async version. The actual stubdomain implementation is here: https://github.com/marmarek/qubes-vmm-xen-stubdom-linux (branch for-upstream, tag for-upstream-v3) See readme there for build instructions. Beware: building on Debian is dangerous, as it require installing "dracut", which will remove initramfs-tools. You may end up with broken initrd on your host. Few comments/questions about the stubdomain code: 1. There are extra patches for qemu that are necessary to run it in stubdomain. While it is desirable to upstream them, I think it can be done after merging libxl part. Stubdomain's qemu build will in most cases be separate anyway, to limit qemu's dependencies (so the stubdomain size). 2. By default Linux hvc-xen console frontend is unreliable for data transfer (qemu state save/restore) - it drops data sent faster than client is reading it. To fix it, console device needs to be switched into raw mode (`stty raw /dev/hvc1`). Especially for restoring qemu state it is tricky, as it would need to be done before opening the device, but stty (obviously) needs to open the device first. To solve this problem, for now the repository contains kernel patch which changes the default for all hvc consoles. Again, this isn't practical problem, as the kernel for stubdomain is built separately. But it would be nice to have something working with vanilla kernel. I see those options: - convert it to kernel cmdline parameter (hvc_console_raw=1 ?) - use channels instead of consoles (and on the kernel side change the default to "raw" only for channels); while in theory better design, libxl part will be more complex, as channels can be connected to sockets but not files, so libxl would need to read/write to it exactly when qemu write/read the data, not before/after as it is done now Remaining parts for eliminating dom0's instance of qemu: - do not force QDISK backend for CDROM - multiple consoles support in xenconsoled Changes in v2: - apply review comments by Jason Andryuk Changes in v3: - rework qemu arguments handling (separate xenstore keys, instead of \x1b separator) - add QMP over libvchan, instead of console - add protocol documentation - a lot of minor changes, see individual patches for full changes list - split xenconsoled patches into separate series Cc: Simon Gaiser Cc: Eric Shelton Cc: Ian Jackson Cc: Wei Liu Eric Shelton (1): libxl: Handle Linux stubdomain specific QEMU options. Marek Marczykowski-Górecki (16): Document ioemu MiniOS stubdomain protocol Document ioemu Linux stubdomain protocol libxl: fix qemu-trad cmdline for no sdl/vnc case libxl: Allow running qemu-xen in stubdomain libxl: write qemu arguments into separate xenstore keys libxl: create vkb device only for guests with graphics output xl: add stubdomain related options to xl config parser tools/libvchan: notify server when client is connected libxl: typo fix in comment libxl: move xswait declaration up in libxl_internal.h libxl: use vchan for QMP access with Linux stubdomain, libxl__ev_qmp_* version libxl: use vchan for QMP access with Linux stubdomain, non-async version libxl: add save/restore support for qemu-xen in stubdomain tools: add missing libxenvchan cflags libxl: add locking for libvchan QMP connection libxl: require qemu in dom0 even if stubdomain is in use docs/man/xl.cfg.5.pod.in | 23 +- docs/misc/stubdom.txt| 103 +- tools/Rules.mk | 6 +- tools/libvchan/init.c| 3 +- tools/libxl/Makefile | 2 +- tools/libxl/libxl_create.c | 51 +++- tools/l
[Xen-devel] [PATCH v3 07/17] libxl: create vkb device only for guests with graphics output
The forced vkb device is meant for better performance of qemu access (at least according to ebbd2561b4cefb299f0f68a88b2788504223de18 "libxl: Add a vkbd frontend/backend pair for HVM guests"), which isn't used if there is no configured channel to actually access that keyboard. One can still add vkb device manually if needed. This is continuation of b053f0c4c9e533f3d97837cf897eb920b8355ed3 "libxl: do not start dom0 qemu for stubdomain when not needed". Signed-off-by: Marek Marczykowski-Górecki Reviewed-by: Jason Andryuk --- tools/libxl/libxl_create.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 13fc304..736b93c 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1439,9 +1439,13 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev, libxl__device_console_add(gc, domid, &console, state, &device); libxl__device_console_dispose(&console); -libxl_device_vkb_init(&vkb); -libxl__device_add(gc, domid, &libxl__vkb_devtype, &vkb); -libxl_device_vkb_dispose(&vkb); +if (libxl_defbool_val(d_config->b_info.u.hvm.vnc.enable) +|| libxl_defbool_val(d_config->b_info.u.hvm.spice.enable) +|| libxl_defbool_val(d_config->b_info.u.hvm.sdl.enable)) { +libxl_device_vkb_init(&vkb); +libxl__device_add(gc, domid, &libxl__vkb_devtype, &vkb); +libxl_device_vkb_dispose(&vkb); +} dcs->sdss.dm.guest_domid = domid; if (libxl_defbool_val(d_config->b_info.device_model_stubdomain)) -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 10/17] libxl: typo fix in comment
Signed-off-by: Marek Marczykowski-Górecki --- tools/libxl/libxl_qmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index 42c8ab8..a235095 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -1452,7 +1452,7 @@ static void dm_state_saved(libxl__egc *egc, libxl__ev_qmp *ev, * * - Allowed internal state transition: * disconnected -> connecting - * connection -> capability_negotiation + * connecting -> capability_negotiation * capability_negotiation/connected -> waiting_reply * waiting_reply-> connected * any -> broken -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 01/17] Document ioemu MiniOS stubdomain protocol
Add documentation based on reverse-engineered toolstack-ioemu stubdomain protocol. Signed-off-by: Marek Marczykowski-Górecki --- docs/misc/stubdom.txt | 53 - 1 file changed, 53 insertions(+) diff --git a/docs/misc/stubdom.txt b/docs/misc/stubdom.txt index de7b6c7..4c524f2 100644 --- a/docs/misc/stubdom.txt +++ b/docs/misc/stubdom.txt @@ -23,6 +23,59 @@ and http://wiki.xen.org/wiki/Device_Model_Stub_Domains for more information on device model stub domains +Toolstack to MiniOS ioemu stubdomain protocol +- + +This section describe communication protocol between toolstack and +qemu-traditional running in MiniOS stubdomain. The protocol include +expectations of both qemu and stubdomain itself. + +Setup (done by toolstack, expected by stubdomain): + - Block devices for target domain are connected as PV disks to stubdomain, + according to configuration order, starting with xvda + - Network devices for target domain are connected as PV nics to stubdomain, + according to configuration order, starting with 0 + - if graphics output is expected, VFB and VKB devices are set for stubdomain + (its backend is responsible for exposing them using appropriate protocol + like VNC or Spice) + - other target domain's devices are not connected at this point to stubdomain + (may be hot-plugged later) + - QEMU command line (space separated arguments) is stored in + /vm//image/dmargs xenstore path + - target domain id is stored in /local/domain//target xenstore path +?? - bios type is stored in /local/domain//hvmloader/bios + - stubdomain's console 0 is connected to qemu log file + - stubdomain's console 1 is connected to qemu save file (for saving state) + - stubdomain's console 2 is connected to qemu save file (for restoring state) + - next consoles are connected according to target guest's serial console configuration + +Startup: +1. PV stubdomain is started with ioemu-stubdom.gz kernel and no initrd +2. stubdomain initialize relevant devices +2. stubdoma signal readiness by writing "running" to /local/domain//device-model//state xenstore path +3. now stubdomain is considered running + +Runtime control (hotplug etc): +Toolstack can issue command through xenstore. The sequence is (from toolstack POV): +1. Write parameter to /local/domain//device-model//parameter. +2. Write command to /local/domain//device-model//command. +3. Wait for command result in /local/domain//device-model//state (command specific value). +4. Write "running" back to /local/domain//device-model//state. + +Defined commands: + - "pci-ins" - PCI hot plug, results: + - "pci-inserted" - success + - "pci-insert-failed" - failure + - "pci-rem" - PCI hot remove, results: + - "pci-removed" - success + - ?? + - "save" - save domain state to console 1, results: + - "paused" - success + - "continue" - resume domain execution, after loading state from console 2 (require -loadvm command argument), results: + - "running" - success + + + PV-GRUB === -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 06/17] libxl: write qemu arguments into separate xenstore keys
This allows using arguments with spaces, like -append, without nominating any special "separator" character. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - previous version of this patch "libxl: use \x1b to separate qemu arguments for linux stubdomain" used specific non-printable separator, but it was rejected as xenstore doesn't cope well with non-printable chars --- tools/libxl/libxl_dm.c | 39 ++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 1fa4fa8..6cfc256 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -2047,6 +2047,40 @@ static int libxl__vfb_and_vkb_from_hvm_guest_config(libxl__gc *gc, return 0; } +static int libxl__write_stub_linux_dmargs(libxl__gc *gc, +int dm_domid, int guest_domid, +char **args) +{ +libxl_ctx *ctx = libxl__gc_owner(gc); +int i; +char *vm_path; +char *path; +struct xs_permissions roperm[2]; +xs_transaction_t t; + +roperm[0].id = 0; +roperm[0].perms = XS_PERM_NONE; +roperm[1].id = dm_domid; +roperm[1].perms = XS_PERM_READ; + +vm_path = libxl__xs_read(gc, XBT_NULL, GCSPRINTF("/local/domain/%d/vm", guest_domid)); +path = GCSPRINTF("%s/image/dmargs", vm_path); + +retry_transaction: +t = xs_transaction_start(ctx->xsh); +xs_write(ctx->xsh, t, path, "", 0); +xs_set_permissions(ctx->xsh, t, path, roperm, ARRAY_SIZE(roperm)); +i = 1; +for (i=1; args[i] != NULL; i++) +xs_write(ctx->xsh, t, GCSPRINTF("%s/%03d", path, i), args[i], strlen(args[i])); + +xs_set_permissions(ctx->xsh, t, GCSPRINTF("%s/rtc/timeoffset", vm_path), roperm, ARRAY_SIZE(roperm)); +if (!xs_transaction_end(ctx->xsh, t, 0)) +if (errno == EAGAIN) +goto retry_transaction; +return 0; +} + static int libxl__write_stub_dmargs(libxl__gc *gc, int dm_domid, int guest_domid, char **args) @@ -2249,7 +2283,10 @@ void libxl__spawn_stub_dm(libxl__egc *egc, libxl__stub_dm_spawn_state *sdss) libxl__store_libxl_entry(gc, guest_domid, "dm-version", libxl_device_model_version_to_string(dm_config->b_info.device_model_version)); -libxl__write_stub_dmargs(gc, dm_domid, guest_domid, args); +if (libxl__stubdomain_is_linux(&guest_config->b_info)) +libxl__write_stub_linux_dmargs(gc, dm_domid, guest_domid, args); +else +libxl__write_stub_dmargs(gc, dm_domid, guest_domid, args); libxl__xs_printf(gc, XBT_NULL, GCSPRINTF("%s/image/device-model-domid", libxl__xs_get_dompath(gc, guest_domid)), -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 12/17] libxl: use vchan for QMP access with Linux stubdomain, libxl__ev_qmp_* version
Access to QMP of QEMU in Linux stubdomain is possible over vchan connection. Add appropriate handling in libxl__ev_qmp_* API, keeping all the asynchronous properties. Since only one client can be connected to vchan server at the same time and it is not enforced by the libxenvchan itself, additional client-side locking is needed. Note that qemu supports only one simultaneous client on a control socket anyway (but in UNIX socket case, it enforce it server-side), so it doesn't add any extra limitation. Regarding pipe to self: Vchan issue notification about incoming data (or space for it) only once - even if there is more data to read, FD returned by libxenvchan_fd_for_select() will not be readable. Similar to buffer space - there is one notification if more data can be written, but FD isn't considered "writable" after libxenvchan_wait() call, even if in fact there is a buffer space. There are two situations where it is problematic: - some QMP message results in a user callback and processing further messages must stop (even if more data is already available in vchan buffer) - data is scheduled to write after a buffer space notification; this may result from either delayed libxl__ev_qmp_send() call, or internal state change To avoid the above problems, use pipe to self to schedule vchan data processing, even if vchan would not issue notification itself. Signed-off-by: Marek Marczykowski-Górecki --- TODO: - handle locking for vchan access - a lockfile + flock comes to mind, but there is no async provisioning for it in libxl Changes in v3: - new patch, in place of "libxl: access QMP socket via console for qemu-in-stubdomain" --- tools/Rules.mk | 4 +- tools/libxl/Makefile | 2 +- tools/libxl/libxl_dm.c | 2 +- tools/libxl/libxl_internal.h | 7 +- tools/libxl/libxl_qmp.c | 293 +++- 5 files changed, 301 insertions(+), 7 deletions(-) diff --git a/tools/Rules.mk b/tools/Rules.mk index 68f2ed7..12b3129 100644 --- a/tools/Rules.mk +++ b/tools/Rules.mk @@ -187,8 +187,8 @@ SHLIB_libblktapctl = PKG_CONFIG_REMOVE += xenblktapctl endif -CFLAGS_libxenlight = -I$(XEN_XENLIGHT) $(CFLAGS_libxenctrl) $(CFLAGS_xeninclude) -SHDEPS_libxenlight = $(SHLIB_libxenctrl) $(SHLIB_libxenstore) $(SHLIB_libblktapctl) +CFLAGS_libxenlight = -I$(XEN_XENLIGHT) $(CFLAGS_libxenctrl) $(CFLAGS_xeninclude) $(CFLAGS_libxenvchan) +SHDEPS_libxenlight = $(SHLIB_libxenctrl) $(SHLIB_libxenstore) $(SHLIB_libblktapctl) $(SHLIB_libxenvchan) LDLIBS_libxenlight = $(SHDEPS_libxenlight) $(XEN_XENLIGHT)/libxenlight$(libextension) SHLIB_libxenlight = $(SHDEPS_libxenlight) -Wl,-rpath-link=$(XEN_XENLIGHT) diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 6da342e..55b0b63 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -24,6 +24,7 @@ LIBXL_LIBS = $(LDLIBS_libxentoollog) $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) ifeq ($(CONFIG_LIBNL),y) LIBXL_LIBS += $(LIBNL3_LIBS) endif +LIBXL_LIBS += $(LDLIBS_libxenvchan) CFLAGS_LIBXL += $(CFLAGS_libxentoollog) CFLAGS_LIBXL += $(CFLAGS_libxentoolcore) @@ -35,6 +36,7 @@ CFLAGS_LIBXL += $(CFLAGS_libblktapctl) ifeq ($(CONFIG_LIBNL),y) CFLAGS_LIBXL += $(LIBNL3_CFLAGS) endif +CFLAGS_LIBXL += $(CFLAGS_libxenvchan) CFLAGS_LIBXL += -Wshadow LIBXL_LIBS-$(CONFIG_ARM) += -lfdt diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 6cfc256..b9a53f3 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1180,7 +1180,7 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, "-xen-domid", GCSPRINTF("%d", guest_domid), NULL); -/* There is currently no way to access the QMP socket in the stubdom */ +/* QMP access to qemu running in stubdomain is done over vchan, not local socket */ if (!is_stubdom) { flexarray_append(dm_args, "-chardev"); if (state->dm_monitor_fd >= 0) { diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 0095835..9a903a5 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -57,6 +57,7 @@ #include #include #include +#include #include @@ -509,6 +510,12 @@ struct libxl__ev_qmp { libxl__carefd *cfd; libxl__ev_fd efd; libxl__qmp_state state; +/* pipe to wake itself if there is more data in vchan */ +libxl__carefd *pipe_cfd_read; +libxl__carefd *pipe_cfd_write; +libxl__ev_fd pipe_efd; +struct libxenvchan *vchan; +libxl__xswait_state xswait; int id; int next_id;/* next id to use */ /* receive buffer */ diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index a235095..45b9f74 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -1466,6 +1466,14 @@ static void dm_state_saved(libxl__egc *egc, libxl__ev_qmp *ev, /* prototypes */ +static i
[Xen-devel] [PATCH v3 16/17] libxl: add locking for libvchan QMP connection
It is not safe for multiple clients to (even try to) connect to the same vchan server at the same time. Contrary to QMP over local socket, connection over vchan needs external locking. For now use flock() for this. This is not ideal for async QMP API, as flock() will block the whole thread while other thread/process talks to the same QEMU instance. This may be a problem especially in cause of malicious QEMU, that could stall the communication. But since libxl doesn't have asynchronous locking primitives, keep flock() until something better can be used instead. Note that qemu will handle only one client at a time anyway, so this does not introduce artificial limit here. Signed-off-by: Marek Marczykowski-Górecki --- tools/libxl/libxl_internal.h | 1 +- tools/libxl/libxl_qmp.c | 58 +++-- 2 files changed, 56 insertions(+), 3 deletions(-) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 9a903a5..36b38fd 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -515,6 +515,7 @@ struct libxl__ev_qmp { libxl__carefd *pipe_cfd_write; libxl__ev_fd pipe_efd; struct libxenvchan *vchan; +libxl__carefd *vchan_lock; libxl__xswait_state xswait; int id; int next_id;/* next id to use */ diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index 7643f15..260dc0a 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -114,6 +114,7 @@ typedef struct callback_id_pair { struct libxl__qmp_handler { int qmp_fd; struct libxenvchan *vchan; +libxl__carefd *vchan_lock; bool connected; time_t timeout; /* wait_for_id will be used by the synchronous send function */ @@ -497,9 +498,10 @@ static void qmp_close(libxl__qmp_handler *qmp) callback_id_pair *pp = NULL; callback_id_pair *tmp = NULL; -if (qmp->vchan) +if (qmp->vchan) { libxenvchan_close(qmp->vchan); -else +libxl__carefd_close(qmp->vchan_lock); +} else close(qmp->qmp_fd); LIBXL_STAILQ_FOREACH(pp, &qmp->callback_list, next) { free(tmp); @@ -805,6 +807,40 @@ static void qmp_parameters_add_integer(libxl__gc *gc, qmp_parameters_common_add(gc, param, name, obj); } +static libxl__carefd *qmp_vchan_lock(libxl__gc *gc, int domid) +{ +libxl__carefd *cfd; +char *lock_path; +int fd, r; + +lock_path = GCSPRINTF("%s/qmp-libxl-%d.lock", libxl__run_dir_path(), domid); +libxl__carefd_begin(); +fd = open(lock_path, O_RDWR | O_CREAT, 0644); +cfd = libxl__carefd_opened(CTX, fd); +if (!cfd) { +LOGED(ERROR, domid, "QMP lock file open error"); +goto error; +} + +/* Try to lock the file, retrying on EINTR */ +for (;;) { +r = flock(fd, LOCK_EX); +if (!r) +break; +if (errno != EINTR) { +/* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */ +LOGED(ERROR, domid, + "unexpected error while trying to lock %s, fd=%d, errno=%d", + lock_path, fd, errno); +goto error; +} +} +return cfd; +error: +libxl__carefd_close(cfd); +return NULL; +} + #define QMP_PARAMETERS_SPRINTF(args, name, format, ...) \ qmp_parameters_add_string(gc, args, name, GCSPRINTF(format, __VA_ARGS__)) @@ -825,10 +861,15 @@ libxl__qmp_handler *libxl__qmp_initialize(libxl__gc *gc, uint32_t domid) dm_domid = libxl_get_stubdom_id(CTX, domid); if (dm_domid) { qmp_path = DEVICE_MODEL_XS_PATH(gc, dm_domid, domid, "/qmp-vchan"); -/* TODO: add locking */ +qmp->vchan_lock = qmp_vchan_lock(gc, domid); +if (!qmp->vchan_lock) { +qmp_free_handler(qmp); +return NULL; +} qmp->vchan = libxenvchan_client_init(CTX->lg, dm_domid, qmp_path); if (!qmp->vchan) { LOGED(ERROR, domid, "QMP vchan connection failed: %s", strerror(errno)); +libxl__carefd_close(qmp->vchan_lock); qmp_free_handler(qmp); return NULL; } @@ -1741,6 +1782,12 @@ static void qmp_vchan_watch_callback(libxl__egc *egc, int pipe_fd[2]; if (!rc) { +/* FIXME: convert to async locking */ +ev->vchan_lock = qmp_vchan_lock(gc, ev->domid); +if (!ev->vchan_lock) { +rc= ERROR_FAIL; +goto error; +} ev->vchan = libxenvchan_client_init(CTX->lg, dm_domid, ev->xswait.path); if (ev->vchan) { /* ok */ @@ -1775,6 +1822,8 @@ static void qmp_vchan_watch_callback(libxl__egc *egc, if (rc) goto error; } else if (errno == ENOENT) { +libxl__carefd_close(ev->vchan_lock); +ev->vchan_lock = NULL; /* not ready ye
[Xen-devel] [PATCH v3 17/17] libxl: require qemu in dom0 even if stubdomain is in use
Until xenconsoled learns how to handle multiple consoles, this is needed for save/restore support (qemu state is transferred over secondary consoles). Additionally, Linux-based stubdomain waits for all the backends to initialize during boot. Lack of some console backends results in stubdomain startup timeout. This is a temporary patch until xenconsoled will be improved. Signed-off-by: Marek Marczykowski-Górecki --- tools/libxl/libxl_dm.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index ce65321..0fdc2f8 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -2438,7 +2438,11 @@ static void spawn_stub_launch_dm(libxl__egc *egc, } } -need_qemu = libxl__need_xenpv_qemu(gc, dm_config); +/* + * Until xenconsoled learns how to handle multiple consoles, require qemu + * in dom0, since the above code always adds at least 2 consoles. + */ +need_qemu = 1; for (i = 0; i < num_console; i++) { libxl__device device; -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 04/17] libxl: Allow running qemu-xen in stubdomain
Do not prohibit anymore using stubdomain with qemu-xen. To help distingushing MiniOS and Linux stubdomain, add helper inline functions libxl__stubdomain_is_linux() and libxl__stubdomain_is_linux_running(). Those should be used where really the difference is about MiniOS/Linux, not qemu-xen/qemu-xen-traditional. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v3: - new patch, instead of "libxl: Add "stubdomain_version" to domain_build_info" - helper functions as suggested by Ian Jackson --- tools/libxl/libxl_create.c | 9 - tools/libxl/libxl_internal.h | 17 + 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index a4e74a5..bb62542 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -160,15 +160,6 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, } } -if (b_info->type == LIBXL_DOMAIN_TYPE_HVM && -b_info->device_model_version != -LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL && -libxl_defbool_val(b_info->device_model_stubdomain)) { -LOG(ERROR, -"device model stubdomains require \"qemu-xen-traditional\""); -return ERROR_INVAL; -} - if (!b_info->max_vcpus) b_info->max_vcpus = 1; if (!b_info->avail_vcpus.size) { diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 459f9bf..b8c698a 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2195,6 +2195,23 @@ _hidden int libxl__device_model_version_running(libxl__gc *gc, uint32_t domid); /* Return the system-wide default device model */ _hidden libxl_device_model_version libxl__default_device_model(libxl__gc *gc); +static inline +bool libxl__stubdomain_is_linux_running(libxl__gc *gc, uint32_t domid) +{ +/* same logic as in libxl__stubdomain_is_linux */ +return libxl__device_model_version_running(gc, domid) +== LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN; +} + +static inline +bool libxl__stubdomain_is_linux(libxl_domain_build_info *b_info) +{ +/* right now qemu-tranditional implies MiniOS stubdomain and qemu-xen + * implies Linux stubdomain */ +return libxl_defbool_val(b_info->device_model_stubdomain) && +b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN; +} + #define DEVICE_MODEL_XS_PATH(gc, dm_domid, domid, fmt, _a...) \ libxl__sprintf(gc, "/local/domain/%u/device-model/%u" fmt, dm_domid, \ domid, ##_a) -- git-series 0.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 8/9] xen/gntdev.c: Convert to use vm_map_pages()
On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote: > Convert to use vm_map_pages() to map range of kernel > memory to user vma. > > map->count is passed to vm_map_pages() and internal API > verify map->count against count ( count = vma_pages(vma)) > for page array boundary overrun condition. This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages will: - use map->pages starting at vma->vm_pgoff instead of 0 - verify map->count against vma_pages()+vma->vm_pgoff instead of just vma_pages(). In practice, this breaks using a single gntdev FD for mapping multiple grants. It looks like vm_map_pages() is not a good fit for this code and IMO it should be reverted. > Signed-off-by: Souptick Joarder > Reviewed-by: Boris Ostrovsky > --- > drivers/xen/gntdev.c | 11 --- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index 5efc5ee..5d64262 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -1084,7 +1084,7 @@ static int gntdev_mmap(struct file *flip, struct > vm_area_struct *vma) > int index = vma->vm_pgoff; > int count = vma_pages(vma); > struct gntdev_grant_map *map; > - int i, err = -EINVAL; > + int err = -EINVAL; > > if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED)) > return -EINVAL; > @@ -1145,12 +1145,9 @@ static int gntdev_mmap(struct file *flip, struct > vm_area_struct *vma) > goto out_put_map; > > if (!use_ptemod) { > - for (i = 0; i < count; i++) { > - err = vm_insert_page(vma, vma->vm_start + i*PAGE_SIZE, > - map->pages[i]); > - if (err) > - goto out_put_map; > - } > + err = vm_map_pages(vma, map->pages, map->count); > + if (err) > + goto out_put_map; > } else { > #ifdef CONFIG_X86 > /* -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 8/9] xen/gntdev.c: Convert to use vm_map_pages()
On Mon, Jul 29, 2019 at 02:02:54PM +0530, Souptick Joarder wrote: > On Mon, Jul 29, 2019 at 1:35 PM Souptick Joarder wrote: > > > > On Sun, Jul 28, 2019 at 11:36 PM Marek Marczykowski-Górecki > > wrote: > > > > > > On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote: > > > > Convert to use vm_map_pages() to map range of kernel > > > > memory to user vma. > > > > > > > > map->count is passed to vm_map_pages() and internal API > > > > verify map->count against count ( count = vma_pages(vma)) > > > > for page array boundary overrun condition. > > > > > > This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages > > > will: > > > - use map->pages starting at vma->vm_pgoff instead of 0 > > > > The actual code ignores vma->vm_pgoff > 0 scenario and mapped > > the entire map->pages[i]. Why the entire map->pages[i] needs to be mapped > > if vma->vm_pgoff > 0 (in original code) ? vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's basically (ab)using this parameter for "which grant reference to map". > > are you referring to set vma->vm_pgoff = 0 irrespective of value passed > > from user space ? If yes, using vm_map_pages_zero() is an alternate > > option. Yes, that should work. > > > - verify map->count against vma_pages()+vma->vm_pgoff instead of just > > >vma_pages(). > > > > In original code -> > > > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > > index 559d4b7f807d..469dfbd6cf90 100644 > > --- a/drivers/xen/gntdev.c > > +++ b/drivers/xen/gntdev.c > > @@ -1084,7 +1084,7 @@ static int gntdev_mmap(struct file *flip, struct > > vm_area_struct *vma) > > int index = vma->vm_pgoff; > > int count = vma_pages(vma); > > > > Count is user passed value. > > > > struct gntdev_grant_map *map; > > - int i, err = -EINVAL; > > + int err = -EINVAL; > > if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED)) > > return -EINVAL; > > @@ -1145,12 +1145,9 @@ static int gntdev_mmap(struct file *flip, > > struct vm_area_struct *vma) > > goto out_put_map; > > if (!use_ptemod) { > > - for (i = 0; i < count; i++) { > > - err = vm_insert_page(vma, vma->vm_start + i*PAGE_SIZE, > > - map->pages[i]); > > > > and when count > i , we end up with trying to map memory outside > > boundary of map->pages[i], which was not correct. > > typo. > s/count > i / count > map->count gntdev_find_map_index verifies it. Specifically, it looks for a map matching both index and count. > > > > - if (err) > > - goto out_put_map; > > - } > > + err = vm_map_pages(vma, map->pages, map->count); > > + if (err) > > + goto out_put_map; > > > > With this commit, inside __vm_map_pages(), we have addressed this scenario. > > > > +static int __vm_map_pages(struct vm_area_struct *vma, struct page **pages, > > + unsigned long num, unsigned long offset) > > +{ > > + unsigned long count = vma_pages(vma); > > + unsigned long uaddr = vma->vm_start; > > + int ret, i; > > + > > + /* Fail if the user requested offset is beyond the end of the object */ > > + if (offset > num) > > + return -ENXIO; > > + > > + /* Fail if the user requested size exceeds available object size */ > > + if (count > num - offset) > > + return -ENXIO; > > > > By checking count > num -offset. (considering vma->vm_pgoff != 0 as well). > > So we will never cross the boundary of map->pages[i]. > > > > > > > > > > In practice, this breaks using a single gntdev FD for mapping multiple > > > grants. > > > > How ? gntdev uses vma->vm_pgoff to select which grant entry should be mapped. map struct returned by gntdev_find_map_index() describes just the pages to be mapped. Specifically map->pages[0] should be mapped at vma->vm_start, not vma->vm_start+vma->vm_pgoff*PAGE_SIZE. When trying to map grant with index (aka vma->vm_pgoff) > 1, __vm_map_pages() will refuse to map it because it will expect map->count to be at least vma_pages(vma)+vma->vm_pgoff, while it is exactly vma_pages(vma). > > > It looks like vm_map_pages() is not a good fit for this code and IMO it > > > should be reverted. > > > > Did you hit any issue around this code in real time ? Yes, relevant strace output: [pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0 [
Re: [Xen-devel] [PATCH] xen: Avoid calling device suspend/resume callbacks
On Tue, Jul 30, 2019 at 07:55:02AM +, Jan Beulich wrote: > On 29.07.2019 19:06, Andrew Cooper wrote: > > On 29/07/2019 16:57, Jan Beulich wrote: > >> On 29.07.2019 17:41, Ross Lagerwall wrote: > >>> When suspending/resuming or migrating under Xen, there isn't much need > >>> for suspending and resuming all the attached devices since the Xen/QEMU > >>> should correctly maintain the hardware state. Drop these calls and > >>> replace with more specific calls to ensure Xen frontend devices are > >>> properly reconnected. > >> Is this true for the general pass-through case as well? While migration > >> may not be (fully) compatible with pass-through, iirc save/restore is. > > > > What gives you this impression? > > > > Migration and save/restore are *literally* the same thing, except that > > in one case you're piping the data to/from disk, and in the other you're > > piping it to the destination and restoring it immediately. > > > > If you look at the toolstack code, it is all in terms of reading/writing > > an fd (including libxl's API) which is either a network socket or a > > file, as chosen by xl. > > Sure. The main difference is where the restore happens (by default): > For live migration I expect this to be on a different host, whereas > for a non-live restore I'd expect this to be the same host. And it > is only the "same host" case where one can assume the same physical > piece of hardware to be available again for passing through to this > guest. In the "different host" case restore _may_ be possible, using > identical hardware. (And yes, in the "same host" case restore may > also be impossible, if the hardware meanwhile has been assigned to > another guest. But as said, I'm talking about the default case here.) > > >> Would qemu restore state of physical PCI devices? > > > > What state would Qemu be in a position to know about, which isn't > > already present in Qemu's datablob? > > That's a valid (rhetorical) question, but not helping to answer mine. > > > What we do with graphics cards is to merge Xens logdirty bitmap, with a > > dirty list provided by the card itself. This needs a device-specific > > knowledge. In addition, there is an opaque blob of data produced by the > > source card, which is handed to the destination card. That also lives > > in the stream. > > > > Intel's Scalable IOV spec is attempting to rationalise this by having a > > standard ways of getting logdirty and "internal state" information out > > of a device, but for the moment, it requires custom device-driver > > specific code to do anything migration related with real hardware. > > Which isn't very nice, since it doesn't scale well as a model. > > > As for why its safe to do like this, the best argument is that this is > > how all other vendors do migration, including KVM. Xen is the > > odd-one-out using the full S3 path. > > So how do "all other vendors" deal with device specific state? So > far I was under the impression that to deal with this is precisely > why we use the S3 logic in the kernel. FWIW in Qubes we specifically S3 domUs with PCI devices assigned just before host S3 - to let the driver save the state and later restore it. AFAIR in same cases (but it might be PV, not HVM then) not doing so was breaking host S3 altogether. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 8/9] xen/gntdev.c: Convert to use vm_map_pages()
On Tue, Jul 30, 2019 at 10:05:42AM -0400, Boris Ostrovsky wrote: > On 7/30/19 2:03 AM, Souptick Joarder wrote: > > On Mon, Jul 29, 2019 at 7:06 PM Marek Marczykowski-Górecki > > wrote: > >> On Mon, Jul 29, 2019 at 02:02:54PM +0530, Souptick Joarder wrote: > >>> On Mon, Jul 29, 2019 at 1:35 PM Souptick Joarder > >>> wrote: > >>>> On Sun, Jul 28, 2019 at 11:36 PM Marek Marczykowski-Górecki > >>>> wrote: > >>>>> On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote: > >>>>>> Convert to use vm_map_pages() to map range of kernel > >>>>>> memory to user vma. > >>>>>> > >>>>>> map->count is passed to vm_map_pages() and internal API > >>>>>> verify map->count against count ( count = vma_pages(vma)) > >>>>>> for page array boundary overrun condition. > >>>>> This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages > >>>>> will: > >>>>> - use map->pages starting at vma->vm_pgoff instead of 0 > >>>> The actual code ignores vma->vm_pgoff > 0 scenario and mapped > >>>> the entire map->pages[i]. Why the entire map->pages[i] needs to be mapped > >>>> if vma->vm_pgoff > 0 (in original code) ? > >> vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's > >> basically (ab)using this parameter for "which grant reference to map". > >> > >>>> are you referring to set vma->vm_pgoff = 0 irrespective of value passed > >>>> from user space ? If yes, using vm_map_pages_zero() is an alternate > >>>> option. > >> Yes, that should work. > > I prefer to use vm_map_pages_zero() to resolve both the issues. > > Alternatively > > the patch can be reverted as you suggested. Let me know you opinion and wait > > for feedback from others. > > > > Boris, would you like to give any feedback ? > > vm_map_pages_zero() looks good to me. Marek, does it work for you? Yes, replacing vm_map_pages() with vm_map_pages_zero() fixes the problem for me. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 8/9] xen/gntdev.c: Convert to use vm_map_pages()
On Tue, Jul 30, 2019 at 08:22:02PM +0530, Souptick Joarder wrote: > On Tue, Jul 30, 2019 at 7:52 PM Marek Marczykowski-Górecki > wrote: > > > > On Tue, Jul 30, 2019 at 10:05:42AM -0400, Boris Ostrovsky wrote: > > > On 7/30/19 2:03 AM, Souptick Joarder wrote: > > > > On Mon, Jul 29, 2019 at 7:06 PM Marek Marczykowski-Górecki > > > > wrote: > > > >> On Mon, Jul 29, 2019 at 02:02:54PM +0530, Souptick Joarder wrote: > > > >>> On Mon, Jul 29, 2019 at 1:35 PM Souptick Joarder > > > >>> wrote: > > > >>>> On Sun, Jul 28, 2019 at 11:36 PM Marek Marczykowski-Górecki > > > >>>> wrote: > > > >>>>> On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote: > > > >>>>>> Convert to use vm_map_pages() to map range of kernel > > > >>>>>> memory to user vma. > > > >>>>>> > > > >>>>>> map->count is passed to vm_map_pages() and internal API > > > >>>>>> verify map->count against count ( count = vma_pages(vma)) > > > >>>>>> for page array boundary overrun condition. > > > >>>>> This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages > > > >>>>> will: > > > >>>>> - use map->pages starting at vma->vm_pgoff instead of 0 > > > >>>> The actual code ignores vma->vm_pgoff > 0 scenario and mapped > > > >>>> the entire map->pages[i]. Why the entire map->pages[i] needs to be > > > >>>> mapped > > > >>>> if vma->vm_pgoff > 0 (in original code) ? > > > >> vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's > > > >> basically (ab)using this parameter for "which grant reference to map". > > > >> > > > >>>> are you referring to set vma->vm_pgoff = 0 irrespective of value > > > >>>> passed > > > >>>> from user space ? If yes, using vm_map_pages_zero() is an alternate > > > >>>> option. > > > >> Yes, that should work. > > > > I prefer to use vm_map_pages_zero() to resolve both the issues. > > > > Alternatively > > > > the patch can be reverted as you suggested. Let me know you opinion and > > > > wait > > > > for feedback from others. > > > > > > > > Boris, would you like to give any feedback ? > > > > > > vm_map_pages_zero() looks good to me. Marek, does it work for you? > > > > Yes, replacing vm_map_pages() with vm_map_pages_zero() fixes the > > problem for me. > > Marek, I can send a patch for the same if you are ok. > We need to cc stable as this changes are available in 5.2.4. Sounds good, thanks! -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 5/6] xen/x86: add PHYSDEVOP_msi_control
On Fri, Jul 19, 2019 at 09:43:26AM +, Jan Beulich wrote: > On 19.07.2019 11:02, Roger Pau Monné wrote: > > On Fri, Jul 19, 2019 at 08:04:45AM +, Jan Beulich wrote: > >> On 18.07.2019 18:52, Roger Pau Monné wrote: > >>> On Thu, Jul 18, 2019 at 03:17:27PM +, Jan Beulich wrote: > >>>> On 18.07.2019 15:46, Roger Pau Monné wrote: > >>>>> In fact I don't think INTx should be enabled when MSI(-X) is disabled, > >>>>> QEMU already traps writes to the command register, and it will manage > >>>>> INTx enabling/disabling by itself. I think the only check required is > >>>>> that MSI(-X) cannot be enabled if INTx is also enabled. In the same > >>>>> way both MSI caspabilities cannot be enabled simultaneously. The > >>>>> function should not explicitly disable any of the other capabilities, > >>>>> and just return -EBUSY if the caller attempts for example to enable > >>>>> MSI while INTx or MSI-X is enabled. > >>>> > >>>> You do realize that pci_intx() only ever gets called for Xen > >>>> internally used interrupts, i.e. mainly the serial console one? > >>> > >>> You will have to bear with me because I'm not sure I understand why > >>> it does matter. Do you mean to point out that dom0 is the one in full > >>> control of INTx, and thus Xen shouldn't care of whether INTx and > >>> MSI(-X) are enabled at the same time? > >>> > >>> I still think that at least a warning should be printed if a caller > >>> tries to enable MSI(-X) while INTx is also enabled, but unless there's > >>> a reason to have both MSI(-X) and INTx enabled at the same time (maybe > >>> a quirk for some hardware issue?) it shouldn't be allowed on this new > >>> interface. > >> > >> I don't mind improvements to the current situation (i.e. such a > >> warning may indeed make sense); I merely stated how things currently > >> are. INTx treatment was completely left aside when MSI support was > >> introduced into Xen. > > > > In order to give Marek a more concise reply, would you agree to return > > -EBUSY (or some error code) and print a warning message if the caller > > attempts to enable MSI(-X) while INTx is also enabled? > > As to returning an error - I think so, yes. I'm less sure about logging > a message. I'm trying to get it working and it isn't clear to me what should I check for "INTx is also enabled". I assumed PCI_COMMAND_INTX_DISABLE bit, but it looks like guest has no control over this bit, even in permissive mode. This means enabling MSI(-X) always fails because guest has no way to set PCI_COMMAND_INTX_DISABLE bit before. Should I check something different? Or change back to disabling/enabling INTx as part of msi_control call? Or maybe allow guest/qemu to control PCI_COMMAND_INTX_DISABLE bit? In that case, I'd have very similar problem as with MSI - xen-pciback doesn't allow that, so I'd either need to patch pciback (and I could move this whole patch into Linux instead), or get around with a hypercall. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] preparations for 4.12.1
On Mon, Aug 05, 2019 at 04:19:03PM +0100, Ian Jackson wrote: > > >> "libxl: fix pci device re-assigning after domain reboot": > > >> commit c19434d9284e93e6f9aaec9a70f5f361adbfaba6 > > Thanks all. I pushed this to staging-4.12 today. > > There are a couple of other things that I say in git log that might > warrant a backport, and for which I'm hoping I get replies quickly. I'd like to propose "video: fix handling framebuffer located above 4GB" 9cf11fdcd91ff8e9cd038f8336cf21f0701e8b7b (see commit message about a little change needed during backport). -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] python: Adjust xc_physinfo wrapper for updated virt_caps bits
On Mon, Aug 05, 2019 at 03:56:30PM +0100, Ian Jackson wrote: > Ian Jackson writes ("Re: [PATCH v2] python: Adjust xc_physinfo wrapper for > updated virt_caps bits"): > > Marek Marczykowski-Górecki writes ("[PATCH v2] python: Adjust xc_physinfo > > wrapper for updated virt_caps bits"): > > > Commit f089fddd94 "xen: report PV capability in sysctl and use it in > > > toolstack" changed meaning of virt_caps bit 1 - previously it was > > > "directio", but was changed to "pv" and "directio" was moved to bit 2. > > > Adjust python wrapper to use #defines for the bits values, and add > > > reporting of both "pv_directio" and "hvm_directio". > > > > > > Signed-off-by: Marek Marczykowski-Górecki > > > > > > > Acked-by: Ian Jackson > > I think this is a backport candidate for 4.12, since f089fddd94 was in > 4.12. Am I right ? Yes, definitely. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 5/6] xen/x86: add PHYSDEVOP_msi_control
On Tue, Aug 06, 2019 at 07:56:39AM +, Jan Beulich wrote: > On 05.08.2019 15:44, Marek Marczykowski-Górecki wrote: > > On Fri, Jul 19, 2019 at 09:43:26AM +, Jan Beulich wrote: > >> On 19.07.2019 11:02, Roger Pau Monné wrote: > >>> On Fri, Jul 19, 2019 at 08:04:45AM +, Jan Beulich wrote: > >>>> On 18.07.2019 18:52, Roger Pau Monné wrote: > >>>>> On Thu, Jul 18, 2019 at 03:17:27PM +, Jan Beulich wrote: > >>>>>> On 18.07.2019 15:46, Roger Pau Monné wrote: > >>>>>>> In fact I don't think INTx should be enabled when MSI(-X) is disabled, > >>>>>>> QEMU already traps writes to the command register, and it will manage > >>>>>>> INTx enabling/disabling by itself. I think the only check required is > >>>>>>> that MSI(-X) cannot be enabled if INTx is also enabled. In the same > >>>>>>> way both MSI caspabilities cannot be enabled simultaneously. The > >>>>>>> function should not explicitly disable any of the other capabilities, > >>>>>>> and just return -EBUSY if the caller attempts for example to enable > >>>>>>> MSI while INTx or MSI-X is enabled. > >>>>>> > >>>>>> You do realize that pci_intx() only ever gets called for Xen > >>>>>> internally used interrupts, i.e. mainly the serial console one? > >>>>> > >>>>> You will have to bear with me because I'm not sure I understand why > >>>>> it does matter. Do you mean to point out that dom0 is the one in full > >>>>> control of INTx, and thus Xen shouldn't care of whether INTx and > >>>>> MSI(-X) are enabled at the same time? > >>>>> > >>>>> I still think that at least a warning should be printed if a caller > >>>>> tries to enable MSI(-X) while INTx is also enabled, but unless there's > >>>>> a reason to have both MSI(-X) and INTx enabled at the same time (maybe > >>>>> a quirk for some hardware issue?) it shouldn't be allowed on this new > >>>>> interface. > >>>> > >>>> I don't mind improvements to the current situation (i.e. such a > >>>> warning may indeed make sense); I merely stated how things currently > >>>> are. INTx treatment was completely left aside when MSI support was > >>>> introduced into Xen. > >>> > >>> In order to give Marek a more concise reply, would you agree to return > >>> -EBUSY (or some error code) and print a warning message if the caller > >>> attempts to enable MSI(-X) while INTx is also enabled? > >> > >> As to returning an error - I think so, yes. I'm less sure about logging > >> a message. > > > > I'm trying to get it working and it isn't clear to me what should I > > check for "INTx is also enabled". I assumed PCI_COMMAND_INTX_DISABLE > > bit, but it looks like guest has no control over this bit, even in > > permissive mode. This means enabling MSI(-X) always fails because guest > > has no way to set PCI_COMMAND_INTX_DISABLE bit before. > > Well, the guest has no control, but in order to enable MSI{,-X} I'd > have expected qemu or the Dom0 kernel to set this bit up front. qemu would do that, when running in dom0. But in PV stubdomain it talks to pciback, which filters it out. > If > that's not the case, then of course neither checking nor logging a > message is appropriate at this point in time. It may be worthwhile > calling out this anomaly then in the description. Ok, so I'll go back to setting PCI_COMMAND_INTX_DISABLE instead of just verification. Just to clarify: should I also clear PCI_COMMAND_INTX_DISABLE when disabling MSI? Now I think yes, because nothing else would do that otherwise, but I would like to double check. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 5/6] xen/x86: add PHYSDEVOP_msi_control
On Tue, Aug 06, 2019 at 12:33:39PM +0200, Jan Beulich wrote: > On 06.08.2019 11:46, Marek Marczykowski-Górecki wrote: > > On Tue, Aug 06, 2019 at 07:56:39AM +, Jan Beulich wrote: > > > On 05.08.2019 15:44, Marek Marczykowski-Górecki wrote: > > > > On Fri, Jul 19, 2019 at 09:43:26AM +, Jan Beulich wrote: > > > > > On 19.07.2019 11:02, Roger Pau Monné wrote: > > > > > > On Fri, Jul 19, 2019 at 08:04:45AM +, Jan Beulich wrote: > > > > > > > On 18.07.2019 18:52, Roger Pau Monné wrote: > > > > > > > > On Thu, Jul 18, 2019 at 03:17:27PM +, Jan Beulich wrote: > > > > > > > > > On 18.07.2019 15:46, Roger Pau Monné wrote: > > > > > > > > > > In fact I don't think INTx should be enabled when MSI(-X) > > > > > > > > > > is disabled, > > > > > > > > > > QEMU already traps writes to the command register, and it > > > > > > > > > > will manage > > > > > > > > > > INTx enabling/disabling by itself. I think the only check > > > > > > > > > > required is > > > > > > > > > > that MSI(-X) cannot be enabled if INTx is also enabled. In > > > > > > > > > > the same > > > > > > > > > > way both MSI caspabilities cannot be enabled > > > > > > > > > > simultaneously. The > > > > > > > > > > function should not explicitly disable any of the other > > > > > > > > > > capabilities, > > > > > > > > > > and just return -EBUSY if the caller attempts for example > > > > > > > > > > to enable > > > > > > > > > > MSI while INTx or MSI-X is enabled. > > > > > > > > > > > > > > > > > > You do realize that pci_intx() only ever gets called for Xen > > > > > > > > > internally used interrupts, i.e. mainly the serial console > > > > > > > > > one? > > > > > > > > > > > > > > > > You will have to bear with me because I'm not sure I understand > > > > > > > > why > > > > > > > > it does matter. Do you mean to point out that dom0 is the one > > > > > > > > in full > > > > > > > > control of INTx, and thus Xen shouldn't care of whether INTx and > > > > > > > > MSI(-X) are enabled at the same time? > > > > > > > > > > > > > > > > I still think that at least a warning should be printed if a > > > > > > > > caller > > > > > > > > tries to enable MSI(-X) while INTx is also enabled, but unless > > > > > > > > there's > > > > > > > > a reason to have both MSI(-X) and INTx enabled at the same time > > > > > > > > (maybe > > > > > > > > a quirk for some hardware issue?) it shouldn't be allowed on > > > > > > > > this new > > > > > > > > interface. > > > > > > > > > > > > > > I don't mind improvements to the current situation (i.e. such a > > > > > > > warning may indeed make sense); I merely stated how things > > > > > > > currently > > > > > > > are. INTx treatment was completely left aside when MSI support was > > > > > > > introduced into Xen. > > > > > > > > > > > > In order to give Marek a more concise reply, would you agree to > > > > > > return > > > > > > -EBUSY (or some error code) and print a warning message if the > > > > > > caller > > > > > > attempts to enable MSI(-X) while INTx is also enabled? > > > > > > > > > > As to returning an error - I think so, yes. I'm less sure about > > > > > logging > > > > > a message. > > > > > > > > I'm trying to get it working and it isn't clear to me what should I > > > > check for "INTx is also enabled". I assumed PCI_COMMAND_INTX_DISABLE > > > > bit, but it looks like guest has no control over this bit, even in > > > > permissive mode. This means enabling MSI(-X) always fails because guest > > > > has no way to set PCI_COMMAND_INTX_DISABLE bit before. > > > > > > Well, the guest has no control, but in order to enable MSI{,-X} I'd > > > have expected qemu or the Dom0 kernel to set this bit up front. > > > > qemu would do that, when running in dom0. But in PV stubdomain it talks > > to pciback, which filters it out. > > Filtering out the guest attempt is fine, but it should then set the > bit while preparing MSI/MSI-X for the guest. (I'm not sure it would > need to do so explicitly, or whether it couldn't rely on code > elsewhere in the kernel doing so.) ... > Well, I think I've made my position clear: So far we use pci_intx() > only on devices used by Xen itself. I think this should remain to be > that way. Devices in possession of domains (including Dom0) should > be under Dom0's control in this regard. The thing is, in case of using stubdomain, it's mostly stubdomain handling it. Especially all the config space interception and applying logic to it is done by qemu in stubdomain. Do you suggest duplicating / moving that part to dom0 instead? What would be the point for stubdomain then? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 5/6] xen/x86: add PHYSDEVOP_msi_control
On Tue, Aug 06, 2019 at 02:05:48PM +0200, Jan Beulich wrote: > On 06.08.2019 12:53, Marek Marczykowski-Górecki wrote: > > On Tue, Aug 06, 2019 at 12:33:39PM +0200, Jan Beulich wrote: > > > On 06.08.2019 11:46, Marek Marczykowski-Górecki wrote: > > > > On Tue, Aug 06, 2019 at 07:56:39AM +, Jan Beulich wrote: > > > > > On 05.08.2019 15:44, Marek Marczykowski-Górecki wrote: > > > > > > I'm trying to get it working and it isn't clear to me what should I > > > > > > check for "INTx is also enabled". I assumed PCI_COMMAND_INTX_DISABLE > > > > > > bit, but it looks like guest has no control over this bit, even in > > > > > > permissive mode. This means enabling MSI(-X) always fails because > > > > > > guest > > > > > > has no way to set PCI_COMMAND_INTX_DISABLE bit before. > > > > > > > > > > Well, the guest has no control, but in order to enable MSI{,-X} I'd > > > > > have expected qemu or the Dom0 kernel to set this bit up front. > > > > > > > > qemu would do that, when running in dom0. But in PV stubdomain it talks > > > > to pciback, which filters it out. > > > > > > Filtering out the guest attempt is fine, but it should then set the > > > bit while preparing MSI/MSI-X for the guest. (I'm not sure it would > > > need to do so explicitly, or whether it couldn't rely on code > > > elsewhere in the kernel doing so.) > > ... > > > Well, I think I've made my position clear: So far we use pci_intx() > > > only on devices used by Xen itself. I think this should remain to be > > > that way. Devices in possession of domains (including Dom0) should > > > be under Dom0's control in this regard. > > > > The thing is, in case of using stubdomain, it's mostly stubdomain > > handling it. Especially all the config space interception and applying > > logic to it is done by qemu in stubdomain. Do you suggest duplicating / > > moving that part to dom0 instead? What would be the point for stubdomain > > then? > > Nothing should be moved between components if possible (and if placed > sensibly). But isn't stubdom (being a PV DomU) relying on pciback in > Dom0 anyway? And hence doesn't the flow of events include > pci_enable_msi{,x}() invoked by pciback? I'd have expected that to > (also) take care of INTx. This was discussed in v2 of this series[1], where you also responded. Relevant part of Roger's email: Oh great, that's unfortunate. Both pciback functions end up calling into msi_capability_init in the Linux kernel, which does indeed more than just toggling the PCI config space enable bit. OTOH adding a bypass to pciback so the stubdom is able to write to the PCI register in order to toggle the enable bit seems quite clumsy. Not to mention that you would be required to update Dom0 kernel in order to fix the issue. Do you think it makes sense to add a domctl to enable/disable MSI(X)? This way the bug could be fixed by just updating Xen (and the stubdomain). [1] https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg01271.html -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen 4.12 panic on Thinkpad W540 with UEFI mutiboot2, efi=no-rs workarounds it
On Wed, Aug 07, 2019 at 05:33:05PM +0200, Jan Beulich wrote: > On 07.08.2019 17:17, Marek Marczykowski-Górecki wrote: > > On Wed, Aug 07, 2019 at 04:45:43PM +0200, Jan Beulich wrote: > > > On 07.08.2019 15:26, Marek Marczykowski-Górecki wrote: > > > > Hi, > > > > > > > > Xen 4.12 crashes when booting on UEFI (with multiboot2) unless I disable > > > > runtime services. The crash happens shortly after starting dom0 kernel. > > > > Unfortunately I don't have serial console there, so the only log I have > > > > is a photo of VGA console (attached). Below I retype part of the > > > > message: > > > > > > > > (XEN) [ Xen-4.12.0-3.fc29 x86_64 debug=n Not tainted ] > > > > (XEN) CPU:0 > > > > (XEN) RIP:e008:[<00f6>] 00f6 > > > > (XEN) RFLAGS: 00010287 CONTEXT: hypervisor (d0v0) > > > > ... > > > > (XEN) Xen call trace: > > > > (XEN)[<00f6>] 00f6 > > > > (XEN)[] flushtlb.c#pre_flush+0x3d/0x80 > > > > (XEN)[ ] efi_runtime_call+0x493/0xbd0 > > > > (XEN)[ ] efi_runtime_call+0x441/0xbd0 > > > > (XEN)[ ] vcpu_restore_fpu_nonlazy+0xe7/0x180 > > > > (XEN)[ ] do_platform_op+0/0x1880 > > > > (XEN)[ ] do_platform_op+0xb9c/0x1880 > > > > (XEN)[ ] do_platform_op+0xb9c/0x1880 > > > > (XEN)[ ] > > > > sched_credit2.c#csched2_schedule+0xcd0/0x13a0 > > > > (XEN)[ ] lstar_enter+0xae/0x120 > > > > (XEN)[ ] do_platform_op+0/0x1880 > > > > (XEN)[ ] pv_hypercall+0x152/0x220 > > > > (XEN)[ ] lstar_enter+0xae/0x120 > > > > (XEN)[ ] lstar_enter+0xa2/0x120 > > > > (XEN)[ ] lstar_enter+0xae/0x120 > > > > (XEN)[ ] lstar_enter+0xa2/0x120 > > > > (XEN)[ ] lstar_enter+0xae/0x120 > > > > (XEN)[ ] lstar_enter+0xa2/0x120 > > > > (XEN)[ ] lstar_enter+0xae/0x120 > > > > (XEN)[ ] lstar_enter+0xa2/0x120 > > > > (XEN)[ ] lstar_enter+0xae/0x120 > > > > (XEN)[ ] lstar_enter+0xa2/0x120 > > > > (XEN)[ ] lstar_enter+0xae/0x120 > > > > (XEN)[ ] lstar_enter+0x10c/0x120 > > > > (XEN) > > > > (XEN) > > > > (XEN) * > > > > (XEN) Panic on CPU 0: > > > > (XEN) FATAL TRAP: vector = 0 (divide error) > > > > (XEN) [error_code=] > > > > (XEN) * > > > > > > > > Any idea? Lack of serial console doesn't help with collecting logs... > > > > > > May I suggest that you repeat this with a debug hypervisor, such that > > > the call trace becomes more usable/sensible? > > > > Actually, I've already tried, but every other build I try, I get even > > less useful call trace, for example debug unstable build: > > > > Xen call trace: > > [<7b751c09>] 00007b751c09 > > [<8c2b0398edaa>] 8c2b0398edaa > > ... > > GENERAL PROTECTION FAULT > > [error_code=] > > But this makes reasonable sense: The faulting insn is "call *0x18(%rax)" > and %rax is 6954b2b0, which points into a block of type > EfiBootServicesData (and hence likely reused by the time of the crash > for other purposes). Hence the "/mapbs" option of xen.efi might help > here too, but iirc there's no equivalent for it in the MB2 case. Oh, yes, indeed in Qubes we have /mapbs enabled by default with xen.efi. I'd like to add it to MB2 case too. Is command line parsed at this point (efi_exit_boot()) already? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen 4.12 panic on Thinkpad W540 with UEFI mutiboot2, efi=no-rs workarounds it
On Wed, Aug 07, 2019 at 05:58:59PM +0200, Jan Beulich wrote: > On 07.08.2019 17:51, Marek Marczykowski-Górecki wrote: > > On Wed, Aug 07, 2019 at 05:33:05PM +0200, Jan Beulich wrote: > > > On 07.08.2019 17:17, Marek Marczykowski-Górecki wrote: > > > > Actually, I've already tried, but every other build I try, I get even > > > > less useful call trace, for example debug unstable build: > > > > > > > > Xen call trace: > > > > [<7b751c09>] 7b751c09 > > > > [<8c2b0398edaa>] 8c2b0398edaa > > > > ... > > > > GENERAL PROTECTION FAULT > > > > [error_code=] > > > > > > But this makes reasonable sense: The faulting insn is "call *0x18(%rax)" > > > and %rax is 6954b2b0, which points into a block of type > > > EfiBootServicesData (and hence likely reused by the time of the crash > > > for other purposes). Hence the "/mapbs" option of xen.efi might help > > > here too, but iirc there's no equivalent for it in the MB2 case. > > > > Oh, yes, indeed in Qubes we have /mapbs enabled by default with xen.efi. > > I'd like to add it to MB2 case too. Is command line parsed at this point > > (efi_exit_boot()) already? > > I'm struggling a little how to reply, considering that I've already > said "iirc there's no equivalent for it in the MB2 case". So I guess > I'm simply not understanding your question, or more specifically > where you want to get with it. /mapbs option is very specific to xen.efi. I'd like to add a means to set map_bs variable in MB2 case too. I'm asking whether I can use standard command line parser, or do I need something special extracting it from MB2 structures directly. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel