[Xen-devel] [PATCH v6 3/6] libxl: don't try to manipulate json config for stubdomain

2019-09-14 Thread Marek Marczykowski-Górecki
Stubdomain do not have it's own config file - its configuration is
derived from target domains. Do not try to manipulate it when attaching
PCI device.

This bug prevented starting HVM with stubdomain and PCI passthrough
device attached.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Wei Liu 
---
Changes in v3:
 - skip libxl__dm_check_start too, as stubdomain is guaranteed to be
   running at this stage already
 - do not init d_config at all, as it is used only for json manipulation
Changes in v4:
 - adjust comment style
---
 tools/libxl/libxl_pci.c | 50 ++
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 578535f..d26fc9a 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -120,10 +120,14 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 libxl_domain_config d_config;
 libxl_device_pci pcidev_saved;
 libxl__domain_userdata_lock *lock = NULL;
+bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL);
 
-libxl_domain_config_init(&d_config);
-libxl_device_pci_init(&pcidev_saved);
-libxl_device_pci_copy(CTX, &pcidev_saved, pcidev);
+/* Stubdomain doesn't have own config. */
+if (!is_stubdomain) {
+libxl_domain_config_init(&d_config);
+libxl_device_pci_init(&pcidev_saved);
+libxl_device_pci_copy(CTX, &pcidev_saved, pcidev);
+}
 
 be_path = libxl__domain_device_backend_path(gc, 0, domid, 0,
 LIBXL__DEVICE_KIND_PCI);
@@ -152,27 +156,35 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 GCNEW(device);
 libxl__device_from_pcidev(gc, domid, pcidev, device);
 
-lock = libxl__lock_domain_userdata(gc, domid);
-if (!lock) {
-rc = ERROR_LOCK_FAIL;
-goto out;
-}
+/* 
+ * Stubdomin config is derived from its target domain, it doesn't have
+ * its own file.
+ */
+if (!is_stubdomain) {
+lock = libxl__lock_domain_userdata(gc, domid);
+if (!lock) {
+rc = ERROR_LOCK_FAIL;
+goto out;
+}
 
-rc = libxl__get_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+rc = libxl__get_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
 
-device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
- &pcidev_saved);
+device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
+ &pcidev_saved);
 
-rc = libxl__dm_check_start(gc, &d_config, domid);
-if (rc) goto out;
+rc = libxl__dm_check_start(gc, &d_config, domid);
+if (rc) goto out;
+}
 
 for (;;) {
 rc = libxl__xs_transaction_start(gc, &t);
 if (rc) goto out;
 
-rc = libxl__set_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+if (lock) {
+rc = libxl__set_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
+}
 
 libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back));
 
@@ -184,8 +196,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 out:
 libxl__xs_transaction_abort(gc, &t);
 if (lock) libxl__unlock_domain_userdata(lock);
-libxl_device_pci_dispose(&pcidev_saved);
-libxl_domain_config_dispose(&d_config);
+if (!is_stubdomain) {
+libxl_device_pci_dispose(&pcidev_saved);
+libxl_domain_config_dispose(&d_config);
+}
 return rc;
 }
 
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 0/6] Fix PCI passthrough for HVM with stubdomain

2019-09-14 Thread Marek Marczykowski-Górecki
In this version, I add PHYSDEVOP_interrupt_control to allow stubdomain
enabling MSI after mapping it, and also disabling INTx beforehand. Actual
hypercall refuse to enable both of them.

Related article:
https://www.qubes-os.org/news/2017/10/18/msi-support/

Changes in v2:
 - new "xen/x86: Allow stubdom access to irq created for msi" patch
 - applied review comments from v1
Changes is v3:
 - apply suggestions by Roger
 - add PHYSDEVOP_msi_msix_set_enable
Changes in v4:
 - implement suggestions by Wei, Roger, Jan
 - plug new physdevop into XSM
Changes in v5:
 - rebase on master
 - rename to PHYSDEVOP_msi_control
 - move granting access to IRQ into create_irq
Changes in v6:
 - simplify granting IRQ access, record dm domid for cleanup
 - rename to PHYSDEVOP_interrupt_control
 - include INTx control in the hypercall

---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Anthony PERARD 
Cc: "Roger Pau Monné" 
Cc: Suravee Suthikulpanit 
Cc: Brian Woods 
Cc: Kevin Tian 
Cc: Daniel De Graaf 

Marek Marczykowski-Górecki (6):
  libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
  libxl: attach PCI device to qemu only after setting pciback/pcifront
  libxl: don't try to manipulate json config for stubdomain
  xen/x86: Allow stubdom access to irq created for msi.
  xen/x86: add PHYSDEVOP_interrupt_control
  tools/libxc: add wrapper for PHYSDEVOP_interrupt_control

 tools/libxc/include/xenctrl.h|  6 ++-
 tools/libxc/xc_physdev.c | 15 ++-
 tools/libxl/libxl_pci.c  | 63 +
 xen/arch/x86/hpet.c  |  3 +-
 xen/arch/x86/irq.c   | 51 ++--
 xen/arch/x86/msi.c   | 45 ++-
 xen/arch/x86/physdev.c   | 53 +-
 xen/arch/x86/x86_64/physdev.c|  4 ++-
 xen/drivers/char/ns16550.c   |  2 +-
 xen/drivers/passthrough/amd/iommu_init.c |  2 +-
 xen/drivers/passthrough/vtd/iommu.c  |  3 +-
 xen/include/asm-x86/irq.h|  7 ++-
 xen/include/asm-x86/msi.h|  2 +-
 xen/include/public/physdev.h | 23 +-
 xen/include/xlat.lst |  1 +-
 xen/include/xsm/dummy.h  |  7 +++-
 xen/include/xsm/xsm.h|  6 ++-
 xen/xsm/dummy.c  |  1 +-
 xen/xsm/flask/hooks.c| 24 ++-
 xen/xsm/flask/policy/access_vectors  |  1 +-
 20 files changed, 281 insertions(+), 38 deletions(-)

base-commit: 6c9639a72f0ca3a9430ef75f375877182281fdef
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 6/6] tools/libxc: add wrapper for PHYSDEVOP_interrupt_control

2019-09-14 Thread Marek Marczykowski-Górecki
Add libxc wrapper for PHYSDEVOP_interrupt_control introduced in previous
commit.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - new patch
Changes in v4:
 - adjust for updated previous patch
Changes in v5:
 - rename to PHYSDEVOP_msi_control, adjust arguments
Change in v6:
 - initialize struct physdev_interrupt_control inline, drop pointless rc
   variable
 - rename to PHYSDEVOP_interrupt_control
---
 tools/libxc/include/xenctrl.h |  6 ++
 tools/libxc/xc_physdev.c  | 15 +++
 2 files changed, 21 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0ff6ed9..2adb114 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1639,6 +1639,12 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_interrupt_control(xc_interface *xch,
+ int seg,
+ int bus,
+ int devfn,
+ int flags);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libxc/xc_physdev.c b/tools/libxc/xc_physdev.c
index 460a8e7..5af8296 100644
--- a/tools/libxc/xc_physdev.c
+++ b/tools/libxc/xc_physdev.c
@@ -111,3 +111,18 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
 return rc;
 }
 
+int xc_physdev_interrupt_control(xc_interface *xch,
+ int seg,
+ int bus,
+ int devfn,
+ int flags)
+{
+struct physdev_interrupt_control op = {
+.seg = seg,
+.bus = bus,
+.devfn = devfn,
+.flags = flags,
+};
+
+return do_physdev_op(xch, PHYSDEVOP_interrupt_control, &op, sizeof(op));
+}
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control

2019-09-14 Thread Marek Marczykowski-Górecki
Allow device model running in stubdomain to enable/disable INTx/MSI(-X),
bypassing pciback. While pciback is still used to access config space
from within stubdomain, it refuse to write to
PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE
in non-permissive mode. Which is the right thing to do for PV domain
(the main use case for pciback), as PV domain should use XEN_PCI_OP_*
commands for that. Unfortunately those commands are not good for
stubdomain use, as they configure MSI in dom0's kernel too, which should
not happen for HVM domain.

This new physdevop is allowed only for stubdomain controlling the domain
which own the device.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - new patch
Changes in v4:
 - adjust code style
 - s/msi_msix/msi/
 - add msi_set_enable XSM hook
 - flatten struct physdev_msi_set_enable
 - add to include/xlat.lst
Changes in v5:
 - rename to PHYSDEVOP_msi_control
 - combine "mode" and "enable" into "flags"
 - refuse to enable both MSI and MSI-X, and also to enable MSI(-X) on
   incapable device
 - disable/enable INTx when enabling/disabling MSI (?)
 - refuse if !use_msi
 - adjust flask hook to make more sense (require "setup" access on
   device, not on domain)
 - rebase on master
Changes in v6:
 - rename to PHYSDEVOP_interrupt_control
 - extend with INTx control
 - Ensure than MSI(-X) can't be enabled together with INTx and the other 
MSI(-X).
 - deduplicate code in msi_control
 - explicitly refuse to operate on hidden devices
 - expand flags to uint16_t to avoid implicit padding

I'm not sure if XSM part is correct, compile-tested only, as I'm not
sure how to set the policy.
---
 xen/arch/x86/msi.c  | 45 +-
 xen/arch/x86/physdev.c  | 53 ++-
 xen/arch/x86/x86_64/physdev.c   |  4 ++-
 xen/include/asm-x86/msi.h   |  2 +-
 xen/include/public/physdev.h| 23 +-
 xen/include/xlat.lst|  1 +-
 xen/include/xsm/dummy.h |  7 -
 xen/include/xsm/xsm.h   |  6 +++-
 xen/xsm/dummy.c |  1 +-
 xen/xsm/flask/hooks.c   | 24 ++-
 xen/xsm/flask/policy/access_vectors |  1 +-
 11 files changed, 167 insertions(+)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index d630600..ecea91a 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1443,6 +1443,51 @@ int pci_restore_msi_state(struct pci_dev *pdev)
 return 0;
 }
 
+int msi_control(struct pci_dev *pdev, bool msix, bool enable)
+{
+unsigned int cap = msix ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI;
+unsigned int other_cap = msix ? PCI_CAP_ID_MSI : PCI_CAP_ID_MSIX;
+uint16_t cmd;
+
+if ( !use_msi )
+return -EOPNOTSUPP;
+
+if ( !pci_find_cap_offset(pdev->seg,
+  pdev->bus,
+  PCI_SLOT(pdev->devfn),
+  PCI_FUNC(pdev->devfn),
+  cap) )
+return -ENODEV;
+
+cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
+
+/* don't allow enabling MSI(-X) and INTx at the same time */
+if ( enable && ! (cmd & PCI_COMMAND_INTX_DISABLE) )
+return -EBUSY;
+
+/* don't allow enabling both MSI and MSI-X at the same time */
+if ( enable && find_msi_entry(pdev, -1, other_cap) )
+return -EBUSY;
+
+if ( msix )
+msix_set_enable(pdev, enable);
+else
+msi_set_enable(pdev, enable);
+
+return 0;
+}
+
+int intx_control(struct pci_dev *pdev, bool enable)
+{
+/* don't allow enabling INTx if MSI(-X) is already enabled */
+if ( enable && find_msi_entry(pdev, -1, PCI_CAP_ID_MSI) )
+return -EBUSY;
+if ( enable && find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX) )
+return -EBUSY;
+pci_intx(pdev, enable);
+return 0;
+}
+
 void __init early_msi_init(void)
 {
 if ( use_msi < 0 )
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 3a3c158..7b71039 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -662,6 +662,59 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_interrupt_control: {
+struct physdev_interrupt_control op;
+struct pci_dev *pdev;
+int intr_type;
+bool enable;
+
+ret = -EFAULT;
+if ( copy_from_guest(&op, arg, 1) )
+break;
+
+ret = -EINVAL;
+if ( op.flags & ~(PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK |
+  PHYSDEVOP_INTERRUPT_CONTROL_ENABLE) )
+break;
+
+intr_type = op.flags & PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK;
+enable = op.flags & PHYSDEVOP_INTERRUPT_CONTROL_ENABLE;
+
+pcidevs_lock();
+pdev = pci_get_pdev(op.seg, op.bus, op.devfn);
+

[Xen-devel] [PATCH v6 1/6] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use

2019-09-14 Thread Marek Marczykowski-Górecki
HVM domains use IOMMU and device model assistance for communicating with
PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain.
But pciback serve also second function - it reset the device when it is
deassigned from the guest and for this reason pciback needs to be used
with HVM domain too.
When HVM domain has device model in stubdomain, attaching xen-pciback to
the target domain itself may prevent attaching xen-pciback to the
(PV) stubdomain, effectively breaking PCI passthrough.

Fix this by attaching pciback only to one domain: if PV stubdomain is in
use, let it be stubdomain (the commit prevents attaching device to target
HVM in this case); otherwise, attach it to the target domain.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Wei Liu 
---
Changes in v2:
 - previously called "libxl: attach xen-pciback only to PV domains"
 - instead of excluding all HVMs, change the condition to what actually
   matters here - check if stubdomain is in use; this way xen-pciback is
   always in use (either for the target domain, or it's stubdomain),
   fixing PCI reset by xen-pciback concerns
Changes in v3:
 - adjust commit message
---
 tools/libxl/libxl_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 03beb86..2e06a45 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1106,7 +1106,7 @@ out:
 }
 }
 
-if (!starting)
+if (!starting && !libxl_get_stubdom_id(CTX, domid))
 rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
 else
 rc = 0;
@@ -1302,7 +1302,7 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao 
*ao, uint32_t domid,
 }
 }
 
-if (d_config->num_pcidevs > 0) {
+if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) {
 rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
 d_config->num_pcidevs);
 if (rc < 0) {
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 4/6] xen/x86: Allow stubdom access to irq created for msi.

2019-09-14 Thread Marek Marczykowski-Górecki
Stubdomains need to be given sufficient privilege over the guest which it
provides emulation for in order for PCI passthrough to work correctly.
When a HVM domain try to enable MSI, QEMU in stubdomain calls
PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
part of xc_domain_update_msi_irq. Allow for that as part of
PHYSDEVOP_map_pirq.

This is not needed for PCI INTx, because IRQ in that case is known
beforehand and the stubdomain is given permissions over this IRQ by
libxl__device_pci_add (there's a do_pci_add against the stubdomain).

create_irq() already grant IRQ access to hardware_domain, with
assumption the device model (something managing this IRQ) lives there.
Modify create_irq() to take additional parameter pointing at device
model domain - which may be dom0 or stubdomain.  Save ID of the domain
given permission, to revoke it in destroy_irq() - easier and cleaner
than replaying logic of create_irq() parameter. Use domid instead of
actual reference to the domain, because it might get destroyed before
destroying IRQ (stubdomain is destroyed before its target domain). And
it is not an issue, because IRQ permissions live within domain
structure, so destroying a domain also implicitly revoke the permission.
Potential domid reuse is detected by by checking if that domain does
have permission over the IRQ being destroyed.

Then, adjust all callers to provide the parameter. In case of calls not
related to stubdomain-initiated allocations, give it either
hardware_domain (so the behavior is unchanged there), or NULL for
interrupts used by Xen internally.

Inspired by 
https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
 by Eric Chanudet .

Signed-off-by: Simon Gaiser 
Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - extend commit message
Changes in v4:
 - add missing destroy_irq on error path
Changes in v5:
 - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which
   basically make it a different patch
 - add get_dm_domain() helper
 - do not give hardware_domain permission over IRQs used in Xen
   internally
 - rename create_irq argument to just 'd', to avoid confusion
   when it's called by hardware domain
 - verify that device is de-assigned before pci_remove_device call
 - save ID of domain given permission in create_irq(), to revoke it in
 destroy_irq()
 - drop domain parameter from destroy_irq() and msi_free_irq()
 - do not give hardware domain permission over IRQ created in
 iommu_set_interrupt()
Changes in v6:
 - do not give permission over hpet irq to hardware_domain
 - move creator_domid to arch_irq_desc
 - fix creator_domid initialization
 - always give current->domain permission instead of using
 get_dm_domain() helper. Analysis of all its use cases tells that it is
 the only value it returns.
 - drop unrelated change
---
 xen/arch/x86/hpet.c  |  3 +-
 xen/arch/x86/irq.c   | 51 ++---
 xen/drivers/char/ns16550.c   |  2 +-
 xen/drivers/passthrough/amd/iommu_init.c |  2 +-
 xen/drivers/passthrough/vtd/iommu.c  |  3 +-
 xen/include/asm-x86/irq.h|  7 ++-
 6 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 4b08488..5ed4405 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel 
*ch)
 {
 int irq;
 
-if ( (irq = create_irq(NUMA_NO_NODE)) < 0 )
+if ( (irq = create_irq(NUMA_NO_NODE, NULL)) < 0 )
 return irq;
 
 ch->msi.irq = irq;
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 0ee3346..0b4c20a 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -254,7 +254,13 @@ void __init clear_irq_vector(int irq)
 /*
  * Dynamic irq allocate and deallocation for MSI
  */
-int create_irq(nodeid_t node)
+
+/*
+ * create_irq - allocate irq for MSI
+ * @d domain that will get permission over the allocated irq; this permission
+ * will automatically be revoked on destroy_irq
+ */
+int create_irq(nodeid_t node, struct domain *d)
 {
 int irq, ret;
 struct irq_desc *desc;
@@ -282,23 +288,30 @@ int create_irq(nodeid_t node)
 }
 ret = assign_irq_vector(irq, mask);
 }
+ASSERT(desc->arch.creator_domid == DOMID_INVALID);
 if (ret < 0)
 {
 desc->arch.used = IRQ_UNUSED;
 irq = ret;
 }
-else if ( hardware_domain )
+else if ( d )
 {
-ret = irq_permit_access(hardware_domain, irq);
+ASSERT(d == current->domain);
+ret = irq_permit_access(d, irq);
 if ( ret )
 printk(XENLOG_G_ERR
-   "Could not grant Dom0 access to IRQ%d (error %d)\n",
-

[Xen-devel] [PATCH v6 2/6] libxl: attach PCI device to qemu only after setting pciback/pcifront

2019-09-14 Thread Marek Marczykowski-Górecki
When qemu is running in stubdomain, handling "pci-ins" command will fail
if pcifront is not initialized already. Fix this by sending such command
only after confirming that pciback/front is running.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Wei Liu 
---
Changes in v2:
- Fixed code style since previous version.
---
 tools/libxl/libxl_pci.c |  9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 2e06a45..578535f 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcide
 {
 libxl_ctx *ctx = libxl__gc_owner(gc);
 unsigned int orig_vdev, pfunc_mask;
+char *be_path;
 libxl_device_pci *assigned;
 int num_assigned, i, rc;
 int stubdomid = 0;
@@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcide
 rc = do_pci_add(gc, stubdomid, &pcidev_s, 0);
 if ( rc )
 goto out;
+/* Wait for the device actually being connected, otherwise device model
+ * running there will fail to find the device. */
+be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0",
+ libxl__xs_get_dompath(gc, 0), stubdomid);
+rc = libxl__wait_for_backend(gc, be_path,
+ GCSPRINTF("%d", XenbusStateConnected));
+if (rc)
+goto out;
 }
 
 orig_vdev = pcidev->vdevfn & ~7U;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control

2019-09-20 Thread Marek Marczykowski-Górecki
On Fri, Sep 20, 2019 at 12:10:09PM +0200, Jan Beulich wrote:
> On 14.09.2019 17:37, Marek Marczykowski-Górecki  wrote:
> > Allow device model running in stubdomain to enable/disable INTx/MSI(-X),
> > bypassing pciback. While pciback is still used to access config space
> > from within stubdomain, it refuse to write to
> > PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE
> > in non-permissive mode. Which is the right thing to do for PV domain
> > (the main use case for pciback), as PV domain should use XEN_PCI_OP_*
> > commands for that. Unfortunately those commands are not good for
> > stubdomain use, as they configure MSI in dom0's kernel too, which should
> > not happen for HVM domain.
> 
> Why the "for HVM domain" here? I.e. why would this be correct for
> a PV domain? Besides my dislike for such a bypass (imo all of the
> handling should go through pciback, or none of it) I continue to
> wonder whether the problem can't be addressed by a pciback change.
> And even if not, I'd still wonder whether the request shouldn't go
> through pciback, to retain proper layering. Ultimately it may be
> better to have even the map/unmap go through pciback (it's at
> least an apparent violation of the original physdev-op model that
> these two are XSM_DM_PRIV).

Technically it should be possible to move this part to pciback, and in
fact this is what I've considered in the first version of this series.
But Roger points out on each version[1] of this series that pciback is
meant to serve *PV* domains, where a PCI passthrough is a completely
different different beast. In fact, I even consider that using pcifront
in a Linux stubdomain as a proxy for qemu there may be a bad idea (one
needs to be careful to avoid stubdomain kernel fighting with qemu about
device state).

Roger, what is the state of Xen internal vPCI? If handling PCI
passthrough in Xen (or maybe standalone emulator), without qemu help is
going to happen sooner than later (I guess not 4.13, but maybe 4.14?),
then maybe this whole patch doesn't make sense as a temporary measure?

Anyway, if you all agree that pciback should be the way to go, I can go
that route too. In practice, it would be a flag (set by the toolstack?)
allowing writes to appropriate config space registers directly (with
appropriate checks, as in this patch).

[1] https://lists.xenproject.org/archives/html/xen-devel/2019-08/msg00486.html

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control

2019-09-23 Thread Marek Marczykowski-Górecki
On Mon, Sep 23, 2019 at 09:58:27AM +0200, Jan Beulich wrote:
> On 20.09.2019 18:02, Marek Marczykowski-Górecki  wrote:
> > On Fri, Sep 20, 2019 at 12:10:09PM +0200, Jan Beulich wrote:
> >> On 14.09.2019 17:37, Marek Marczykowski-Górecki  wrote:
> >>> Allow device model running in stubdomain to enable/disable INTx/MSI(-X),
> >>> bypassing pciback. While pciback is still used to access config space
> >>> from within stubdomain, it refuse to write to
> >>> PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE
> >>> in non-permissive mode. Which is the right thing to do for PV domain
> >>> (the main use case for pciback), as PV domain should use XEN_PCI_OP_*
> >>> commands for that. Unfortunately those commands are not good for
> >>> stubdomain use, as they configure MSI in dom0's kernel too, which should
> >>> not happen for HVM domain.
> >>
> >> Why the "for HVM domain" here? I.e. why would this be correct for
> >> a PV domain? Besides my dislike for such a bypass (imo all of the
> >> handling should go through pciback, or none of it) I continue to
> >> wonder whether the problem can't be addressed by a pciback change.
> >> And even if not, I'd still wonder whether the request shouldn't go
> >> through pciback, to retain proper layering. Ultimately it may be
> >> better to have even the map/unmap go through pciback (it's at
> >> least an apparent violation of the original physdev-op model that
> >> these two are XSM_DM_PRIV).
> > 
> > Technically it should be possible to move this part to pciback, and in
> > fact this is what I've considered in the first version of this series.
> > But Roger points out on each version[1] of this series that pciback is
> > meant to serve *PV* domains, where a PCI passthrough is a completely
> > different different beast. In fact, I even consider that using pcifront
> > in a Linux stubdomain as a proxy for qemu there may be a bad idea (one
> > needs to be careful to avoid stubdomain kernel fighting with qemu about
> > device state).
> 
> Well, not using pciback _at all_ in this case would be another option.
> What I dislike is the furthering of hybrid-ness.

Ah, I see. This may be a good idea, if this type of PCI passthrough is
going to stay. If we're going away from qemu towards other options
mentioned in previous email, I'd say such a rework is too much work for a
time limited usefulness.
> 
> > Anyway, if you all agree that pciback should be the way to go, I can go
> > that route too. In practice, it would be a flag (set by the toolstack?)
> > allowing writes to appropriate config space registers directly (with
> > appropriate checks, as in this patch).
> 
> I'm afraid I don't agree: How would allowing writes to more config space
> registers by a stubdom be safe?

Exactly the same as in this patch: pciback would perform the same
validation (prohibit enabling MSI together with INTx etc).

BTW what are the risks (besides DoS) of allowing full config space
access, assuming VT-d with interrupt remapping present? This sounds
similar to risks of malicious device connected to some domU, right? Can
such device (or a domain controlling such device) break out to Xen or
dom0? Can it steal data from other domains?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control

2019-09-23 Thread Marek Marczykowski-Górecki
On Mon, Sep 23, 2019 at 02:05:58PM +0200, Jan Beulich wrote:
> On 23.09.2019 12:47, Marek Marczykowski-Górecki  wrote:
> > On Mon, Sep 23, 2019 at 09:58:27AM +0200, Jan Beulich wrote:
> >> On 20.09.2019 18:02, Marek Marczykowski-Górecki  wrote:
> >>> Anyway, if you all agree that pciback should be the way to go, I can go
> >>> that route too. In practice, it would be a flag (set by the toolstack?)
> >>> allowing writes to appropriate config space registers directly (with
> >>> appropriate checks, as in this patch).
> >>
> >> I'm afraid I don't agree: How would allowing writes to more config space
> >> registers by a stubdom be safe?
> > 
> > Exactly the same as in this patch: pciback would perform the same
> > validation (prohibit enabling MSI together with INTx etc).
> > 
> > BTW what are the risks (besides DoS) of allowing full config space
> > access, assuming VT-d with interrupt remapping present? This sounds
> > similar to risks of malicious device connected to some domU, right? Can
> > such device (or a domain controlling such device) break out to Xen or
> > dom0? Can it steal data from other domains?
> 
> There shouldn't be, but this would need proving. The direction of
> proof then should be the other way around (and I realize it may be
> [close to] impossible): Widening what guests (including stub
> domains) are allowed to do should be proven to add no additional
> risks. It shouldn't be (by example, as I imply from your question)
> that an actual issue needs to be pointed out.

What about this: HVM guest can already do all of this when qemu is
running in dom0. So, allowing those actions when qemu is running in
stubdomain should not introduce _additional_ risks.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 5/6] xen/x86: add PHYSDEVOP_interrupt_control

2019-09-23 Thread Marek Marczykowski-Górecki
On Mon, Sep 23, 2019 at 03:02:49PM +0200, Jan Beulich wrote:
> On 23.09.2019 14:25, Marek Marczykowski-Górecki  wrote:
> > What about this: HVM guest can already do all of this when qemu is
> > running in dom0. So, allowing those actions when qemu is running in
> > stubdomain should not introduce _additional_ risks.
> 
> Well, in a way - yes. But I don't think it's right to have qemu do
> the direct writes it does (and I wouldn't be surprised if there
> were still actual issues with this model). Hence it's not going to
> be an improvement if this suspicious underlying design got
> extended to other components.

This sounds like any workflow involving qemu would be inferior. And I
agree with that. But also I do need PCI passthrough working, so I need a
solution until we have an alternative implementation. If that alternative
is going to happen soon, I'm also fine with carrying patches in Qubes
package (like I already do). This wouldn't be nice for the rest of the
community (I believe many other Xen-based projects also carry similar
patches already), but it looks like upstreaming this is taking way too
much effort than it's worth for a temporary solution.

So, in the next version of this series I'm going to drop this patch (and
the next one).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 0/4] Fix PCI passthrough for HVM with stubdomain

2019-09-24 Thread Marek Marczykowski-Górecki
In this version, I drop PHYSDEVOP_interrupt_control patch, since Jan prefer not
to mix pciif and hypercalls for serving device model. Enabling MSI by the
stubdomain remains unsolved here, but other patches are improvement anyway.

Changes in v2:
 - new "xen/x86: Allow stubdom access to irq created for msi" patch
 - applied review comments from v1
Changes is v3:
 - apply suggestions by Roger
 - add PHYSDEVOP_msi_msix_set_enable
Changes in v4:
 - implement suggestions by Wei, Roger, Jan
 - plug new physdevop into XSM
Changes in v5:
 - rebase on master
 - rename to PHYSDEVOP_msi_control
 - move granting access to IRQ into create_irq
Changes in v6:
 - simplify granting IRQ access, record dm domid for cleanup
 - rename to PHYSDEVOP_interrupt_control
 - include INTx control in the hypercall
Changes in v7:
 - update "xen/x86: Allow stubdom access to irq created for msi"
 - drop "xen/x86: add PHYSDEVOP_interrupt_control"
 - drop "tools/libxc: add wrapper for PHYSDEVOP_interrupt_control"

---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Anthony PERARD 
Cc: "Roger Pau Monné" 
Cc: Suravee Suthikulpanit 
Cc: Brian Woods 
Cc: Kevin Tian 
Cc: Daniel De Graaf 

Marek Marczykowski-Górecki (4):
  libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
  libxl: attach PCI device to qemu only after setting pciback/pcifront
  libxl: don't try to manipulate json config for stubdomain
  xen/x86: Allow stubdom access to irq created for msi.

 tools/libxl/libxl_pci.c  | 63 +
 xen/arch/x86/hpet.c  |  3 +-
 xen/arch/x86/irq.c   | 42 +++--
 xen/drivers/char/ns16550.c   |  2 +-
 xen/drivers/passthrough/amd/iommu_init.c |  2 +-
 xen/drivers/passthrough/vtd/iommu.c  |  3 +-
 xen/include/asm-x86/irq.h|  7 ++-
 7 files changed, 84 insertions(+), 38 deletions(-)

base-commit: 6c9639a72f0ca3a9430ef75f375877182281fdef
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 3/4] libxl: don't try to manipulate json config for stubdomain

2019-09-24 Thread Marek Marczykowski-Górecki
Stubdomain do not have it's own config file - its configuration is
derived from target domains. Do not try to manipulate it when attaching
PCI device.

This bug prevented starting HVM with stubdomain and PCI passthrough
device attached.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Wei Liu 
---
Changes in v3:
 - skip libxl__dm_check_start too, as stubdomain is guaranteed to be
   running at this stage already
 - do not init d_config at all, as it is used only for json manipulation
Changes in v4:
 - adjust comment style
---
 tools/libxl/libxl_pci.c | 50 ++
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 578535f..d26fc9a 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -120,10 +120,14 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 libxl_domain_config d_config;
 libxl_device_pci pcidev_saved;
 libxl__domain_userdata_lock *lock = NULL;
+bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL);
 
-libxl_domain_config_init(&d_config);
-libxl_device_pci_init(&pcidev_saved);
-libxl_device_pci_copy(CTX, &pcidev_saved, pcidev);
+/* Stubdomain doesn't have own config. */
+if (!is_stubdomain) {
+libxl_domain_config_init(&d_config);
+libxl_device_pci_init(&pcidev_saved);
+libxl_device_pci_copy(CTX, &pcidev_saved, pcidev);
+}
 
 be_path = libxl__domain_device_backend_path(gc, 0, domid, 0,
 LIBXL__DEVICE_KIND_PCI);
@@ -152,27 +156,35 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 GCNEW(device);
 libxl__device_from_pcidev(gc, domid, pcidev, device);
 
-lock = libxl__lock_domain_userdata(gc, domid);
-if (!lock) {
-rc = ERROR_LOCK_FAIL;
-goto out;
-}
+/* 
+ * Stubdomin config is derived from its target domain, it doesn't have
+ * its own file.
+ */
+if (!is_stubdomain) {
+lock = libxl__lock_domain_userdata(gc, domid);
+if (!lock) {
+rc = ERROR_LOCK_FAIL;
+goto out;
+}
 
-rc = libxl__get_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+rc = libxl__get_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
 
-device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
- &pcidev_saved);
+device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
+ &pcidev_saved);
 
-rc = libxl__dm_check_start(gc, &d_config, domid);
-if (rc) goto out;
+rc = libxl__dm_check_start(gc, &d_config, domid);
+if (rc) goto out;
+}
 
 for (;;) {
 rc = libxl__xs_transaction_start(gc, &t);
 if (rc) goto out;
 
-rc = libxl__set_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+if (lock) {
+rc = libxl__set_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
+}
 
 libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back));
 
@@ -184,8 +196,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 out:
 libxl__xs_transaction_abort(gc, &t);
 if (lock) libxl__unlock_domain_userdata(lock);
-libxl_device_pci_dispose(&pcidev_saved);
-libxl_domain_config_dispose(&d_config);
+if (!is_stubdomain) {
+libxl_device_pci_dispose(&pcidev_saved);
+libxl_domain_config_dispose(&d_config);
+}
 return rc;
 }
 
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-09-24 Thread Marek Marczykowski-Górecki
Stubdomains need to be given sufficient privilege over the guest which it
provides emulation for in order for PCI passthrough to work correctly.
When a HVM domain try to enable MSI, QEMU in stubdomain calls
PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
part of xc_domain_update_msi_irq. Allow for that as part of
PHYSDEVOP_map_pirq.

This is not needed for PCI INTx, because IRQ in that case is known
beforehand and the stubdomain is given permissions over this IRQ by
libxl__device_pci_add (there's a do_pci_add against the stubdomain).

create_irq() already grant IRQ access to hardware_domain, with
assumption the device model (something managing this IRQ) lives there.
Modify create_irq() to take additional parameter pointing at device
model domain - which may be dom0 or stubdomain.  Save ID of the domain
given permission, to revoke it in destroy_irq() - easier and cleaner
than replaying logic of create_irq() parameter. Use domid instead of
actual reference to the domain, because it might get destroyed before
destroying IRQ (stubdomain is destroyed before its target domain). And
it is not an issue, because IRQ permissions live within domain
structure, so destroying a domain also implicitly revoke the permission.
Potential domid reuse is detected by by checking if that domain does
have permission over the IRQ being destroyed.

Then, adjust all callers to provide the parameter. In case of calls not
related to stubdomain-initiated allocations, give it either
hardware_domain (so the behavior is unchanged there), or NULL for
interrupts used by Xen internally.

Inspired by 
https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
 by Eric Chanudet .

Signed-off-by: Simon Gaiser 
Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - extend commit message
Changes in v4:
 - add missing destroy_irq on error path
Changes in v5:
 - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which
   basically make it a different patch
 - add get_dm_domain() helper
 - do not give hardware_domain permission over IRQs used in Xen
   internally
 - rename create_irq argument to just 'd', to avoid confusion
   when it's called by hardware domain
 - verify that device is de-assigned before pci_remove_device call
 - save ID of domain given permission in create_irq(), to revoke it in
 destroy_irq()
 - drop domain parameter from destroy_irq() and msi_free_irq()
 - do not give hardware domain permission over IRQ created in
 iommu_set_interrupt()
Changes in v6:
 - do not give permission over hpet irq to hardware_domain
 - move creator_domid to arch_irq_desc
 - fix creator_domid initialization
 - always give current->domain permission instead of using
 get_dm_domain() helper. Analysis of all its use cases tells that it is
 the only value it returns.
 - drop unrelated change
Changes in v7:
 - Code style improvements (spaces, use %pd etc)
 - use bool parameter to create_irq, as it's only getting
 current->domain or NULL
 - remove redundant irq_access_permitted()
---
 xen/arch/x86/hpet.c  |  3 +-
 xen/arch/x86/irq.c   | 42 +
 xen/drivers/char/ns16550.c   |  2 +-
 xen/drivers/passthrough/amd/iommu_init.c |  2 +-
 xen/drivers/passthrough/vtd/iommu.c  |  3 +-
 xen/include/asm-x86/irq.h|  7 +++-
 6 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 4b08488..57f68fa 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel 
*ch)
 {
 int irq;
 
-if ( (irq = create_irq(NUMA_NO_NODE)) < 0 )
+if ( (irq = create_irq(NUMA_NO_NODE, false)) < 0 )
 return irq;
 
 ch->msi.irq = irq;
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 0ee3346..256dd02 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -254,7 +254,8 @@ void __init clear_irq_vector(int irq)
 /*
  * Dynamic irq allocate and deallocation for MSI
  */
-int create_irq(nodeid_t node)
+
+int create_irq(nodeid_t node, bool grant_access)
 {
 int irq, ret;
 struct irq_desc *desc;
@@ -282,18 +283,23 @@ int create_irq(nodeid_t node)
 }
 ret = assign_irq_vector(irq, mask);
 }
+
+ASSERT(desc->arch.creator_domid == DOMID_INVALID);
+
 if (ret < 0)
 {
 desc->arch.used = IRQ_UNUSED;
 irq = ret;
 }
-else if ( hardware_domain )
+else if ( grant_access )
 {
-ret = irq_permit_access(hardware_domain, irq);
+ret = irq_permit_access(current->domain, irq);
 if ( ret )
 printk(XENLOG_G_ERR
-   "Could not grant Dom0 access to IRQ%d (error %d)\n&qu

[Xen-devel] [PATCH v7 1/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use

2019-09-24 Thread Marek Marczykowski-Górecki
HVM domains use IOMMU and device model assistance for communicating with
PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain.
But pciback serve also second function - it reset the device when it is
deassigned from the guest and for this reason pciback needs to be used
with HVM domain too.
When HVM domain has device model in stubdomain, attaching xen-pciback to
the target domain itself may prevent attaching xen-pciback to the
(PV) stubdomain, effectively breaking PCI passthrough.

Fix this by attaching pciback only to one domain: if PV stubdomain is in
use, let it be stubdomain (the commit prevents attaching device to target
HVM in this case); otherwise, attach it to the target domain.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Wei Liu 
---
Changes in v2:
 - previously called "libxl: attach xen-pciback only to PV domains"
 - instead of excluding all HVMs, change the condition to what actually
   matters here - check if stubdomain is in use; this way xen-pciback is
   always in use (either for the target domain, or it's stubdomain),
   fixing PCI reset by xen-pciback concerns
Changes in v3:
 - adjust commit message
---
 tools/libxl/libxl_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 03beb86..2e06a45 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1106,7 +1106,7 @@ out:
 }
 }
 
-if (!starting)
+if (!starting && !libxl_get_stubdom_id(CTX, domid))
 rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
 else
 rc = 0;
@@ -1302,7 +1302,7 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao 
*ao, uint32_t domid,
 }
 }
 
-if (d_config->num_pcidevs > 0) {
+if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) {
 rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
 d_config->num_pcidevs);
 if (rc < 0) {
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 2/4] libxl: attach PCI device to qemu only after setting pciback/pcifront

2019-09-24 Thread Marek Marczykowski-Górecki
When qemu is running in stubdomain, handling "pci-ins" command will fail
if pcifront is not initialized already. Fix this by sending such command
only after confirming that pciback/front is running.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Wei Liu 
---
Changes in v2:
- Fixed code style since previous version.
---
 tools/libxl/libxl_pci.c |  9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 2e06a45..578535f 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcide
 {
 libxl_ctx *ctx = libxl__gc_owner(gc);
 unsigned int orig_vdev, pfunc_mask;
+char *be_path;
 libxl_device_pci *assigned;
 int num_assigned, i, rc;
 int stubdomid = 0;
@@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcide
 rc = do_pci_add(gc, stubdomid, &pcidev_s, 0);
 if ( rc )
 goto out;
+/* Wait for the device actually being connected, otherwise device model
+ * running there will fail to find the device. */
+be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0",
+ libxl__xs_get_dompath(gc, 0), stubdomid);
+rc = libxl__wait_for_backend(gc, be_path,
+ GCSPRINTF("%d", XenbusStateConnected));
+if (rc)
+goto out;
 }
 
 orig_vdev = pcidev->vdevfn & ~7U;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-09-25 Thread Marek Marczykowski-Górecki
On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote:
> It would still be nice to get the missing bits (interrupt enabling),
> or else this patch is kind of pointless, since it still doesn't allow
> stubdomains to work correctly with passed through devices.

Well, that part, as discussed, doesn't need to be in Xen. For example
the solution deployed in current Qubes stable version is based on
pciback for this purpose.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7.1 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-09-25 Thread Marek Marczykowski-Górecki
Stubdomains need to be given sufficient privilege over the guest which it
provides emulation for in order for PCI passthrough to work correctly.
When a HVM domain try to enable MSI, QEMU in stubdomain calls
PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
part of xc_domain_update_msi_irq. Give the stubdomain enough permissions
over the mapped interrupt in order to bind it successfully to it's
target domain.

This is not needed for PCI INTx, because IRQ in that case is known
beforehand and the stubdomain is given permissions over this IRQ by
libxl__device_pci_add (there's a do_pci_add against the stubdomain).

create_irq() already grant IRQ access to hardware_domain, with
assumption the device model lives there.
Modify create_irq() to take additional parameter, whether to grant
permissions to the domain creating the IRQ, which may be dom0 or a
stubdomain. Do this instead of granting access always to
hardware_domain. Save ID of the domain given permission, to revoke it in
destroy_irq() - easier and cleaner than replaying logic of create_irq()
parameter. Use domid instead of actual reference to the domain,
because it might get destroyed before destroying IRQ (stubdomain is
destroyed before its target domain). And it is not an issue,
because IRQ permissions live within domain structure, so destroying
a domain also implicitly revoke the permission.  Potential domid
reuse is detected by checking if that domain does have permission
over the IRQ being destroyed.

Then, adjust all callers to provide the parameter. In case of Xen
internal allocations, set it to false, but for domain accessible
interrupt set it to true.

Inspired by 
https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
 by Eric Chanudet .

Signed-off-by: Simon Gaiser 
Signed-off-by: Marek Marczykowski-Górecki 
Reviewed-by: Roger Pau Monné 
---
Changes in v3:
 - extend commit message
Changes in v4:
 - add missing destroy_irq on error path
Changes in v5:
 - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which
   basically make it a different patch
 - add get_dm_domain() helper
 - do not give hardware_domain permission over IRQs used in Xen
   internally
 - rename create_irq argument to just 'd', to avoid confusion
   when it's called by hardware domain
 - verify that device is de-assigned before pci_remove_device call
 - save ID of domain given permission in create_irq(), to revoke it in
 destroy_irq()
 - drop domain parameter from destroy_irq() and msi_free_irq()
 - do not give hardware domain permission over IRQ created in
 iommu_set_interrupt()
Changes in v6:
 - do not give permission over hpet irq to hardware_domain
 - move creator_domid to arch_irq_desc
 - fix creator_domid initialization
 - always give current->domain permission instead of using
 get_dm_domain() helper. Analysis of all its use cases tells that it is
 the only value it returns.
 - drop unrelated change
Changes in v7:
 - Code style improvements (spaces, use %pd etc)
 - use bool parameter to create_irq, as it's only getting
 current->domain or NULL
 - remove redundant irq_access_permitted()
Changes in v7.1:
 - adjust comments, merge if
 - update commit message
---
 xen/arch/x86/hpet.c  |  3 +-
 xen/arch/x86/irq.c   | 42 +
 xen/drivers/char/ns16550.c   |  2 +-
 xen/drivers/passthrough/amd/iommu_init.c |  2 +-
 xen/drivers/passthrough/vtd/iommu.c  |  3 +-
 xen/include/asm-x86/irq.h| 11 ++-
 6 files changed, 45 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 4b08488..57f68fa 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel 
*ch)
 {
 int irq;
 
-if ( (irq = create_irq(NUMA_NO_NODE)) < 0 )
+if ( (irq = create_irq(NUMA_NO_NODE, false)) < 0 )
 return irq;
 
 ch->msi.irq = irq;
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 0ee3346..4304896 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -254,7 +254,8 @@ void __init clear_irq_vector(int irq)
 /*
  * Dynamic irq allocate and deallocation for MSI
  */
-int create_irq(nodeid_t node)
+
+int create_irq(nodeid_t node, bool grant_access)
 {
 int irq, ret;
 struct irq_desc *desc;
@@ -282,18 +283,23 @@ int create_irq(nodeid_t node)
 }
 ret = assign_irq_vector(irq, mask);
 }
+
+ASSERT(desc->arch.creator_domid == DOMID_INVALID);
+
 if (ret < 0)
 {
 desc->arch.used = IRQ_UNUSED;
 irq = ret;
 }
-else if ( hardware_domain )
+else if ( grant_access )
 {
-ret = irq_permit_access(hardware_domain, irq);
+ret = irq_

Re: [Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-09-25 Thread Marek Marczykowski-Górecki
On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote:
> On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki wrote:
> > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote:
> > > It would still be nice to get the missing bits (interrupt enabling),
> > > or else this patch is kind of pointless, since it still doesn't allow
> > > stubdomains to work correctly with passed through devices.
> > 
> > Well, that part, as discussed, doesn't need to be in Xen. For example
> > the solution deployed in current Qubes stable version is based on
> > pciback for this purpose.
> 
> Ack. Do you think it would make sense to submit that part to Linux
> then?

How would an interface with toolstack (when to allow enabling MSI)
should look like? Right now I have it as extra attribute in sysfs of
pciback and libxl writes to it. Or rather should it be in xenstore?
Or maybe pciback should somehow detect itself if it's talking to
stubdomain while the device is assigned to a HVM domain, or to a target
PV domain itself?

The actual patch is here:
https://github.com/QubesOS/qubes-linux-kernel/blob/master/0014-xen-pciback-add-attribute-to-allow-MSI-enable-flag-w.patch
and the toolstack part:
https://github.com/QubesOS/qubes-vmm-xen/blob/xen-4.8/patch-stubdom-allow-msi-enable.patch

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] VM_BUG_ON_PAGE(!PageOffline(page), page) in alloc_xenballooned_pages

2019-09-26 Thread Marek Marczykowski-Górecki
Hi,

I've hit VM_BUG_ON_PAGE(!PageOffline(page), page) in
alloc_xenballooned_pages, when trying to use gnttab from userspace
application. It happens on Xen PV, but not on Xen PVH or HVM with the
same kernel. This happens at least with 5.1.6, but also 5.2.15
(as seen below). Based on this, it looks related to 0266def91377
(xen/balloon: Fix mapping PG_offline pages to user space) and probably
77c4adf6a6df (xen/balloon: mark inflated pages PG_offline).

Any idea? Below is full message.


page:ea0003e7ffc0 refcount:1 mapcount:0 mapping: index:0x0
flags: 0xe1000(reserved)
raw: 000e1000 dead0100 dead0200 
raw:   0001 
page dumped because: VM_BUG_ON_PAGE(!PageOffline(page))
[ cut here ]
kernel BUG at include/linux/page-flags.h:744!
invalid opcode:  [#1] SMP NOPTI
CPU: 0 PID: 551 Comm: qubesdb-daemon Tainted: GW 
5.2.15-200.fc30.x86_64 #1
RIP: e030:alloc_xenballooned_pages+0xef/0x110
Code: c0 0c 10 00 e8 b2 fa ff ff 85 c0 0f 84 60 ff ff ff 41 89 dd b8 f4 ff ff 
ff eb b0 48 c7 c6 e8 af 14 82 48 89 c7 e8 31 32 ca ff <0f> 0b 48 c7 c7 40 f 0 
4d 82 e8 13 85 3f 00 31 c0 48 83 c4 08 5b 5d
RSP: e02b:c90001113d98 EFLAGS: 00010246
RAX: 0037 RBX:  RCX: 0149
RDX:  RSI:  RDI: 82143bbc
RBP: 0001 R08: 0181 R09: 0149
R10: 000a R11: c90001113c38 R12: 88800d670960
R13: 7fffdff236a0 R14: 7fffdff236a0 R15: 8880108bd000
FS:  7f30e205e7c0() GS:888013e0() knlGS:
CS:  e030 DS:  ES:  CR0: 80050033
CR2: 7f30e2082000 CR3: 0c92 CR4: 00040660
Call Trace:
 ? __kmalloc+0x16c/0x210
 gnttab_alloc_pages+0x11/0x40
 gntdev_alloc_map+0xe7/0x180 [xen_gntdev]
 gntdev_ioctl+0x203/0x530 [xen_gntdev]
 do_vfs_ioctl+0x405/0x660
 ksys_ioctl+0x5e/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x5f/0x1a0
 ? page_fault+0x8/0x30
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f30e239b3bb
Code: 0f 1e fa 48 8b 05 cd ca 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff 
c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 8b 0d 9d ca 0c 00 f7 d8 64 89 01 48
RSP: 002b:7fffdff23698 EFLAGS: 0202 ORIG_RAX: 0010
RAX: ffda RBX: 0003 RCX: 7f30e239b3bb
RDX: 7fffdff236a0 RSI: 00184700 RDI: 000b
RBP: 7fffdff23730 R08: 7fffdff2375c R09: 7fffdff23758
R10: fcc9 R11: 0202 R12: 7fffdff236a0
R13: 1000 R14: 000b R15: 0001
Modules linked in: xenfs ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel 
ghash_clmulni_intel xen_blkfront xen_scsiback target_core_mod xen_netback 
xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] VM_BUG_ON_PAGE(!PageOffline(page), page) in alloc_xenballooned_pages

2019-09-27 Thread Marek Marczykowski-Górecki
On Fri, Sep 27, 2019 at 09:44:35AM +0200, David Hildenbrand wrote:
> On 26.09.19 23:34, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > I've hit VM_BUG_ON_PAGE(!PageOffline(page), page) in
> > alloc_xenballooned_pages, when trying to use gnttab from userspace
> > application. It happens on Xen PV, but not on Xen PVH or HVM with the
> > same kernel. This happens at least with 5.1.6, but also 5.2.15
> > (as seen below). Based on this, it looks related to 0266def91377
> > (xen/balloon: Fix mapping PG_offline pages to user space) and probably
> > 77c4adf6a6df (xen/balloon: mark inflated pages PG_offline).
> > 
> > Any idea? Below is full message.
> 
> Now that's weird. Weird because half a year passed since 
> 0266def91377 (xen/balloon: Fix mapping PG_offline pages to user space).

Not sure about others, but in Qubes we use PVH/HVM VMs mostly.

> > page:ea0003e7ffc0 refcount:1 mapcount:0 mapping: 
> > index:0x0
> > flags: 0xe1000(reserved)
> 
> So we have a PageReserved page that is not PageOffline. I assume this
> happens when we do a __ClearPageOffline() in alloc_xenballooned_pages().
> 
> That means, that we get such a page via balloon_retrieve(true). Which
> means that we have such a page sitting in the ballooned_pages list, which
> is weird.
> 
> Pages enter ballooned_pages via __balloon_append() only.
> 
> 1. Via xen_online_page(). We have a __SetPageOffline() right in front
>of it.
> 2. Via balloon_add_region(). I don't see a __SetPageOffline().
> 3. Via decrease_reservation(). We seem to do a __SetPageOffline() on all
>pages in the previous loop.
> 4. Via free_xenballooned_pages(). We have a __SetPageOffline() right
>in front of it.
> 
> 
> So this smells like #2 (and matches your PV only observation). Also,
> it makes sense that the page is PageReserved that way.
> 
> 
> Wonder if it is as easy as:

Yes, besides missing semicolon it works. Thanks!

> From 0955beef5aa11da4a8398472ce3106a92599cbe6 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand 
> Date: Fri, 27 Sep 2019 09:39:31 +0200
> Subject: [PATCH v1] xen/balloon: Set pages PageOffline() in
>  balloon_add_region()
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> We are missing a __SetPageOffline(), which is why we can get
> !PageOffline() pages onto the balloon list, where
> alloc_xenballooned_pages() will complain:
> 
> page:ea0003e7ffc0 refcount:1 mapcount:0 mapping: index:0x0
> flags: 0xe1000(reserved)
> raw: 000e1000 dead0100 dead0200 
> raw:   0001 00000000
> page dumped because: VM_BUG_ON_PAGE(!PageOffline(page))
> [ cut here ]
> kernel BUG at include/linux/page-flags.h:744!
> invalid opcode: 0000 [#1] SMP NOPTI
> 
> Reported-by: Marek Marczykowski-Górecki 
> Fixes: 77c4adf6a6df ("xen/balloon: mark inflated pages PG_offline")
> Cc: sta...@vger.kernel.org # v5.1+
> Signed-off-by: David Hildenbrand 

Tested-by: Marek Marczykowski-Górecki 

> ---
>  drivers/xen/balloon.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 05b1f7e948ef..d31149068448 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -687,6 +687,7 @@ static void __init balloon_add_region(unsigned long 
> start_pfn,
>   /* totalram_pages and totalhigh_pages do not
>  include the boot-time balloon extension, so
>  don't subtract from it. */
> + __SetPageOffline(page)
  ^
  ;

>   __balloon_append(page);
>   }
>  

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 0/4] Fix PCI passthrough for HVM with stubdomain

2019-09-28 Thread Marek Marczykowski-Górecki
On Fri, Sep 27, 2019 at 03:36:08PM +0100, Wei Liu wrote:
> On Fri, Sep 27, 2019 at 04:21:55PM +0200, Jan Beulich wrote:
> > > 
> > > Marek Marczykowski-Górecki (4):
> > >   libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
> > >   libxl: attach PCI device to qemu only after setting pciback/pcifront
> > >   libxl: don't try to manipulate json config for stubdomain
> > >   xen/x86: Allow stubdom access to irq created for msi.
> > 
> > I did commit the last one, but I'd prefer if one of you could take
> > care of the three libxl ones.
> > 
> 
> I tried to apply the first three. They don't apply cleanly. That's
> perhaps due to Anthony's series which got committed recently.
> 
> Marek, do you have a branch?

This rebase turned out to be fairly complex, because of the whole pci
attach got reworked for async api. So, I've done it, but dropped your
ack on the patch needing rework for this reason. And also found one
regression introduced by Anthony series.

All in all - v8 on its way, still 4 patches (+1, -1).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 4/4] libxl: don't try to manipulate json config for stubdomain

2019-09-28 Thread Marek Marczykowski-Górecki
Stubdomain do not have it's own config file - its configuration is
derived from target domains. Do not try to manipulate it when attaching
PCI device.

This bug prevented starting HVM with stubdomain and PCI passthrough
device attached.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Wei Liu 
---
Changes in v3:
 - skip libxl__dm_check_start too, as stubdomain is guaranteed to be
   running at this stage already
 - do not init d_config at all, as it is used only for json manipulation
Changes in v4:
 - adjust comment style
Changes in v8:
 - rebase on staging
---
 tools/libxl/libxl_pci.c | 42 +++---
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index ba0287d..dc96163 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -126,8 +126,11 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
 int rc;
 libxl_domain_config d_config;
 libxl__domain_userdata_lock *lock = NULL;
+bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL);
 
-libxl_domain_config_init(&d_config);
+/* Stubdomain doesn't have own config. */
+if (!is_stubdomain)
+libxl_domain_config_init(&d_config);
 
 be_path = libxl__domain_device_backend_path(gc, 0, domid, 0,
 LIBXL__DEVICE_KIND_PCI);
@@ -153,27 +156,35 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
 if (!starting)
 flexarray_append_pair(back, "state", GCSPRINTF("%d", 
XenbusStateReconfiguring));
 
-lock = libxl__lock_domain_userdata(gc, domid);
-if (!lock) {
-rc = ERROR_LOCK_FAIL;
-goto out;
-}
+/*
+ * Stubdomin config is derived from its target domain, it doesn't have
+ * its own file.
+ */
+if (!is_stubdomain) {
+lock = libxl__lock_domain_userdata(gc, domid);
+if (!lock) {
+rc = ERROR_LOCK_FAIL;
+goto out;
+}
 
-rc = libxl__get_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+rc = libxl__get_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
 
-device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
- pcidev);
+device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
+ pcidev);
 
-rc = libxl__dm_check_start(gc, &d_config, domid);
-if (rc) goto out;
+rc = libxl__dm_check_start(gc, &d_config, domid);
+if (rc) goto out;
+}
 
 for (;;) {
 rc = libxl__xs_transaction_start(gc, &t);
 if (rc) goto out;
 
-rc = libxl__set_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+if (lock) {
+rc = libxl__set_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
+}
 
 libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back));
 
@@ -185,7 +196,8 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
 out:
 libxl__xs_transaction_abort(gc, &t);
 if (lock) libxl__unlock_domain_userdata(lock);
-libxl_domain_config_dispose(&d_config);
+if (!is_stubdomain)
+libxl_domain_config_dispose(&d_config);
 return rc;
 }
 
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 2/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use

2019-09-28 Thread Marek Marczykowski-Górecki
HVM domains use IOMMU and device model assistance for communicating with
PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain.
But pciback serve also second function - it reset the device when it is
deassigned from the guest and for this reason pciback needs to be used
with HVM domain too.
When HVM domain has device model in stubdomain, attaching xen-pciback to
the target domain itself may prevent attaching xen-pciback to the
(PV) stubdomain, effectively breaking PCI passthrough.

Fix this by attaching pciback only to one domain: if PV stubdomain is in
use, let it be stubdomain (the commit prevents attaching device to target
HVM in this case); otherwise, attach it to the target domain.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Wei Liu 
---
Changes in v2:
 - previously called "libxl: attach xen-pciback only to PV domains"
 - instead of excluding all HVMs, change the condition to what actually
   matters here - check if stubdomain is in use; this way xen-pciback is
   always in use (either for the target domain, or it's stubdomain),
   fixing PCI reset by xen-pciback concerns
Changes in v3:
 - adjust commit message
Changes in v8:
 - rebase on staging
---
 tools/libxl/libxl_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 2edff64..ac597a5 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1399,7 +1399,7 @@ out_no_irq:
 }
 }
 
-if (!starting)
+if (!starting && !libxl_get_stubdom_id(CTX, domid))
 rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
 else
 rc = 0;
@@ -1697,7 +1697,7 @@ static void add_pcidevs_done(libxl__egc *egc, 
libxl__multidev *multidev,
 libxl_domid domid = apds->domid;
 libxl__ao_device *aodev = apds->outer_aodev;
 
-if (d_config->num_pcidevs > 0) {
+if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) {
 rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
 d_config->num_pcidevs);
 if (rc < 0) {
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 3/4] libxl: attach PCI device to qemu only after setting pciback/pcifront

2019-09-28 Thread Marek Marczykowski-Górecki
When qemu is running in stubdomain, handling "pci-ins" command will fail
if pcifront is not initialized already. Fix this by sending such command
only after confirming that pciback/front is running.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
- Fixed code style since previous version.
Changes in v8:
- rebase on staging
- rework for async api
---
 tools/libxl/libxl_pci.c | 45 +-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index ac597a5..ba0287d 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1012,6 +1012,9 @@ typedef struct pci_add_state {
 bool starting;
 void (*callback)(libxl__egc *, struct pci_add_state *, int rc);
 
+/* private to device_pci_add_stubdom_wait */
+libxl__ev_devstate pciback_ds;
+
 /* private to do_pci_add */
 libxl__xswait_state xswait;
 libxl__ev_qmp qmp;
@@ -1487,6 +1490,10 @@ static int libxl_pcidev_assignable(libxl_ctx *ctx, 
libxl_device_pci *pcidev)
 return i != num;
 }
 
+static void device_pci_add_stubdom_wait(libxl__egc *egc,
+pci_add_state *pas, int rc);
+static void device_pci_add_stubdom_ready(libxl__egc *egc,
+libxl__ev_devstate* ds, int rc);
 static void device_pci_add_stubdom_done(libxl__egc *egc,
 pci_add_state *, int rc);
 static void device_pci_add_done(libxl__egc *egc,
@@ -1563,7 +1570,8 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t 
domid,
 GCNEW(pcidev_s);
 libxl_device_pci_init(pcidev_s);
 libxl_device_pci_copy(CTX, pcidev_s, pcidev);
-pas->callback = device_pci_add_stubdom_done;
+pas->callback = device_pci_add_stubdom_wait;
+
 do_pci_add(egc, stubdomid, pcidev_s, pas); /* must be last */
 return;
 }
@@ -1575,6 +1583,41 @@ out:
 device_pci_add_done(egc, pas, rc); /* must be last */
 }
 
+static void device_pci_add_stubdom_wait(libxl__egc *egc,
+pci_add_state *pas,
+int rc)
+{
+libxl__ao_device *aodev = pas->aodev;
+STATE_AO_GC(aodev->ao);
+libxl_ctx *ctx = libxl__gc_owner(gc);
+int stubdomid = libxl_get_stubdom_id(ctx, pas->domid);
+char *state_path;
+
+if (rc) goto out;
+
+/* Wait for the device actually being connected, otherwise device model
+ * running there will fail to find the device. */
+state_path = libxl__sprintf(gc, "%s/backend/pci/%d/0/state",
+libxl__xs_get_dompath(gc, 0), stubdomid);
+rc = libxl__ev_devstate_wait(aodev->ao, &pas->pciback_ds,
+device_pci_add_stubdom_ready,
+state_path, XenbusStateConnected,
+LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000);
+if (rc) goto out;
+return;
+out:
+device_pci_add_done(egc, pas, rc); /* must be last */
+}
+
+static void device_pci_add_stubdom_ready(libxl__egc *egc,
+ libxl__ev_devstate* ds,
+ int rc)
+{
+pci_add_state *pas = CONTAINER_OF(ds, *pas, pciback_ds);
+
+device_pci_add_stubdom_done(egc, pas, rc); /* must be last */
+}
+
 static void device_pci_add_stubdom_done(libxl__egc *egc,
 pci_add_state *pas,
 int rc)
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 0/4] Fix PCI passthrough for HVM with stubdomain

2019-09-28 Thread Marek Marczykowski-Górecki
This version is rebase libxl patches on staging. Changes to "libxl: attach PCI
device to qemu only after setting pciback/pcifront" are significant enough that
I've dropped Wei's ack.
Also, there is a new patch to fix regression introduced during async conversion.

Changes in v2:
 - new "xen/x86: Allow stubdom access to irq created for msi" patch
 - applied review comments from v1
Changes is v3:
 - apply suggestions by Roger
 - add PHYSDEVOP_msi_msix_set_enable
Changes in v4:
 - implement suggestions by Wei, Roger, Jan
 - plug new physdevop into XSM
Changes in v5:
 - rebase on master
 - rename to PHYSDEVOP_msi_control
 - move granting access to IRQ into create_irq
Changes in v6:
 - simplify granting IRQ access, record dm domid for cleanup
 - rename to PHYSDEVOP_interrupt_control
 - include INTx control in the hypercall
Changes in v7:
 - update "xen/x86: Allow stubdom access to irq created for msi"
 - drop "xen/x86: add PHYSDEVOP_interrupt_control"
 - drop "tools/libxc: add wrapper for PHYSDEVOP_interrupt_control"
Chages in v8:
 - drop applied "xen/x86: Allow stubdom access to irq created for msi"
 - new patch "libxl: fix cold plugged PCI device with stubdomain"
 - rebase libxl patches on staging

---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Anthony PERARD 
Cc: "Roger Pau Monné" 
Cc: Suravee Suthikulpanit 
Cc: Brian Woods 
Cc: Kevin Tian 
Cc: Daniel De Graaf 

Marek Marczykowski-Górecki (4):
  libxl: fix cold plugged PCI device with stubdomain
  libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
  libxl: attach PCI device to qemu only after setting pciback/pcifront
  libxl: don't try to manipulate json config for stubdomain

 tools/libxl/libxl_pci.c | 96 +-
 1 file changed, 77 insertions(+), 19 deletions(-)

base-commit: 7a4e674905b3cbbe48e81c3222361a7f3579
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 1/4] libxl: fix cold plugged PCI device with stubdomain

2019-09-28 Thread Marek Marczykowski-Górecki
When libxl__device_pci_add() is called, stubdomain is already running,
even when still constructing the target domain. Previously, do_pci_add()
was called with 'starting' hardcoded to false, but now do_pci_add() shares
'starting' flag in pci_add_state for both target domain and stubdomain.

Fix this by resetting (local) 'starting' to false in pci_add_dm_done()
(previously part of do_pci_add()) when handling stubdomain, regardless
of pas->starting value.

Fixes: 11db56f9a6 (libxl_pci: Use libxl__ao_device with libxl__device_pci_add)
Signed-off-by: Marek Marczykowski-Górecki 
---
 tools/libxl/libxl_pci.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 4725817..2edff64 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1300,6 +1300,10 @@ static void pci_add_dm_done(libxl__egc *egc,
 
 if (rc) goto out;
 
+/* stubdomain is always running by now, even at create time */
+if (isstubdom)
+starting = false;
+
 sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/resource", pcidev->domain,
pcidev->bus, pcidev->dev, pcidev->func);
 f = fopen(sysfs_path, "r");
@@ -1559,7 +1563,6 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t 
domid,
 GCNEW(pcidev_s);
 libxl_device_pci_init(pcidev_s);
 libxl_device_pci_copy(CTX, pcidev_s, pcidev);
-/* stubdomain is always running by now, even at create time */
 pas->callback = device_pci_add_stubdom_done;
 do_pci_add(egc, stubdomid, pcidev_s, pas); /* must be last */
 return;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-09-28 Thread Marek Marczykowski-Górecki
On Thu, Sep 26, 2019 at 09:10:17AM +0200, Roger Pau Monné wrote:
> On Thu, Sep 26, 2019 at 06:16:06AM +0200, Marek Marczykowski-Górecki wrote:
> > On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote:
> > > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki 
> > > wrote:
> > > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote:
> > > > > It would still be nice to get the missing bits (interrupt enabling),
> > > > > or else this patch is kind of pointless, since it still doesn't allow
> > > > > stubdomains to work correctly with passed through devices.
> > > > 
> > > > Well, that part, as discussed, doesn't need to be in Xen. For example
> > > > the solution deployed in current Qubes stable version is based on
> > > > pciback for this purpose.
> > > 
> > > Ack. Do you think it would make sense to submit that part to Linux
> > > then?
> > 
> > How would an interface with toolstack (when to allow enabling MSI)
> > should look like? Right now I have it as extra attribute in sysfs of
> > pciback and libxl writes to it. Or rather should it be in xenstore?
> 
> I think xenstore would be more suitable for this. There are already
> some flags passed to pciback there: msitranslate, power_mgmt and
> permissive (see libxl_pci.c:63).

Hmm, I see permissive is also set in sysfs
(libxl_pci.c:pci_add_dm_done). And I think that is used by pciback,
based on inspection of its source code.
In fact, I don't see anything in pciback parsing opts-%d xenstore entry
at all. It looks like it's only used by the toolstack to reconstruct
libxl_device_pci struct from xenstore.

> > Or maybe pciback should somehow detect itself if it's talking to
> > stubdomain while the device is assigned to a HVM domain, or to a target
> > PV domain itself?
> 
> I think doing it in the toolstack and just passing an option to
> pciback is likely easier than adding more logic to pciback.

Agree.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-09-29 Thread Marek Marczykowski-Górecki
On Sun, Sep 29, 2019 at 03:35:57AM +0200, Marek Marczykowski-Górecki wrote:
> On Thu, Sep 26, 2019 at 09:10:17AM +0200, Roger Pau Monné wrote:
> > On Thu, Sep 26, 2019 at 06:16:06AM +0200, Marek Marczykowski-Górecki wrote:
> > > On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote:
> > > > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki 
> > > > wrote:
> > > > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote:
> > > > > > It would still be nice to get the missing bits (interrupt enabling),
> > > > > > or else this patch is kind of pointless, since it still doesn't 
> > > > > > allow
> > > > > > stubdomains to work correctly with passed through devices.

BTW it is useful with permissive mode enabled. 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] {xen, dom0}_vga_console_info.u.vesa_lfb.lfb_base field too small

2019-05-05 Thread Marek Marczykowski-Górecki
Hi,

I have a machine that allocate vesa LFB above 4GB, as reported by UEFI
GOP. At 0x40 to be specific.
vga_console_info.u.vesa_lfb.lfb_base is a 32bit field, so it gets
truncated, leading to all kind of memory corruptions when something
writes there.
If that would be only about Xen, that wouldn't be that bad, but
unfortunately exactly the same structure is used as an interface for
dom0 start info (at least PV one).
My only idea is to introduce yet another entry in *_vga_console_info.u
union (efi_lfb64?) with a 64bit lfb_base field. And mark it in
video_type (XEN_VGATYPE_EFI_LFB64?). But I'm not sure how non-patched
Linux (or other supported OSes) would respond to this. xen_init_vga() in
Linux doesn't seem to bail on unknown video_type, so it may be fragile.

Any better ideas?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/5] drivers/video: drop unused limits

2019-05-06 Thread Marek Marczykowski-Górecki
MAX_BPP, MAX_FONT_W, MAX_FONT_H are not used in the code at all.

Signed-off-by: Marek Marczykowski-Górecki 
---
 xen/drivers/video/lfb.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/xen/drivers/video/lfb.c b/xen/drivers/video/lfb.c
index d0c8c49..0475a68 100644
--- a/xen/drivers/video/lfb.c
+++ b/xen/drivers/video/lfb.c
@@ -12,9 +12,6 @@
 
 #define MAX_XRES 1900
 #define MAX_YRES 1200
-#define MAX_BPP 4
-#define MAX_FONT_W 8
-#define MAX_FONT_H 16
 
 struct lfb_status {
 struct lfb_prop lfbp;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/5] drivers/video: Drop framebuffer size constraints

2019-05-06 Thread Marek Marczykowski-Górecki
The limit 1900x1200 do not match real world devices (1900 looks like a
typo, should be 1920). But in practice the limits are arbitrary and do
not serve any real purpose. As discussed in "Increase framebuffer size
to todays standards" thread, drop them completely.

This fixes graphic console on device with 3840x2160 native resolution.

Signed-off-by: Marek Marczykowski-Górecki 
---
Cc: Olaf Hering 
---
 xen/drivers/video/lfb.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/xen/drivers/video/lfb.c b/xen/drivers/video/lfb.c
index 0475a68..5022195 100644
--- a/xen/drivers/video/lfb.c
+++ b/xen/drivers/video/lfb.c
@@ -10,9 +10,6 @@
 #include "lfb.h"
 #include "font.h"
 
-#define MAX_XRES 1900
-#define MAX_YRES 1200
-
 struct lfb_status {
 struct lfb_prop lfbp;
 
@@ -146,13 +143,6 @@ void lfb_carriage_return(void)
 
 int __init lfb_init(struct lfb_prop *lfbp)
 {
-if ( lfbp->width > MAX_XRES || lfbp->height > MAX_YRES )
-{
-printk(XENLOG_WARNING "Couldn't initialize a %ux%u framebuffer 
early.\n",
-   lfbp->width, lfbp->height);
-return -EINVAL;
-}
-
 lfb.lfbp = *lfbp;
 
 lfb.lbuf = xmalloc_bytes(lfb.lfbp.bytes_per_line);
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/5] xen/bitmap: fix bitmap_fill with zero-sized bitmap

2019-05-06 Thread Marek Marczykowski-Górecki
When bitmap_fill(..., 0) is called, do not try to write anything. Before
this patch, it tried to write almost LONG_MAX, surely overwriting
something.

Signed-off-by: Marek Marczykowski-Górecki 
---
Found while debugging framebuffer located above 4GB. In that case 32bit
variable for it overflows and framebuffer initialization zeroed
unrelated memory. Specifically, it hit mbi->mods_count, so later on
bitmap_fill(module_map, mbi->mods_count) in __start_xen() crashed.
---
 xen/include/xen/bitmap.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/xen/bitmap.h b/xen/include/xen/bitmap.h
index fe3c720..0430c1c 100644
--- a/xen/include/xen/bitmap.h
+++ b/xen/include/xen/bitmap.h
@@ -126,6 +126,8 @@ static inline void bitmap_fill(unsigned long *dst, int 
nbits)
size_t nlongs = BITS_TO_LONGS(nbits);
 
switch (nlongs) {
+   case 0:
+   break;
default:
memset(dst, -1, (nlongs - 1) * sizeof(unsigned long));
/* fall through */
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 4/5] xen: fix handling framebuffer located above 4GB

2019-05-06 Thread Marek Marczykowski-Górecki
On some machines (for example Thinkpad P52), UEFI GOP reports
framebuffer located above 4GB (0x40 on that machine). This
address does not fit in {xen,dom0}_vga_console_info.u.vesa_lfb.lfb_base
field, which is 32bit. The overflow here cause all kind of memory
corruption when anything tries to write something on the screen,
starting with zeroing the whole framebuffer in vesa_init().

Fix this similar to how it's done in Linux: add ext_lfb_base field at
the end of the structure, to hold upper 32bits of the address. Since the
field is added at the end of the structure, it will work with older
Linux versions too (other than using possibly truncated address - no
worse than without this change). Thanks to ABI containing size of the
structure (start_info.console.dom0.info_size), Linux can detect when
this field is present and use it appropriately then.

Signed-off-by: Marek Marczykowski-Górecki 
---
 xen/arch/x86/efi/efi-boot.h |  1 +
 xen/drivers/video/vesa.c| 15 +++
 xen/include/public/xen.h|  2 ++
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 5789d2c..7a13a30 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -550,6 +550,7 @@ static void __init 
efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
 vga_console_info.u.vesa_lfb.bytes_per_line =
 (mode_info->PixelsPerScanLine * bpp + 7) >> 3;
 vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase;
+vga_console_info.u.vesa_lfb.ext_lfb_base = gop->Mode->FrameBufferBase 
>> 32;
 vga_console_info.u.vesa_lfb.lfb_size =
 (gop->Mode->FrameBufferSize + 0x) >> 16;
 }
diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c
index c92497e..f22cf7f 100644
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -84,6 +84,7 @@ void __init vesa_early_init(void)
 void __init vesa_init(void)
 {
 struct lfb_prop lfbp;
+unsigned long lfb_base;
 
 if ( !font )
 return;
@@ -97,15 +98,17 @@ void __init vesa_init(void)
 lfbp.text_columns = vlfb_info.width / font->width;
 lfbp.text_rows = vlfb_info.height / font->height;
 
-lfbp.lfb = lfb = ioremap(vlfb_info.lfb_base, vram_remap);
+lfb_base = vlfb_info.lfb_base;
+lfb_base |= (unsigned long)vlfb_info.ext_lfb_base << 32;
+lfbp.lfb = lfb = ioremap(lfb_base, vram_remap);
 if ( !lfb )
 return;
 
 memset(lfb, 0, vram_remap);
 
-printk(XENLOG_INFO "vesafb: framebuffer at %#x, mapped to 0x%p, "
+printk(XENLOG_INFO "vesafb: framebuffer at %#lx, mapped to 0x%p, "
"using %uk, total %uk\n",
-   vlfb_info.lfb_base, lfb,
+   lfb_base, lfb,
vram_remap >> 10, vram_total >> 10);
 printk(XENLOG_INFO "vesafb: mode is %dx%dx%u, linelength=%d, font %ux%u\n",
vlfb_info.width, vlfb_info.height,
@@ -152,6 +155,10 @@ void __init vesa_mtrr_init(void)
 MTRR_TYPE_WRCOMB, MTRR_TYPE_WRTHROUGH };
 unsigned int size_total;
 int rc, type;
+unsigned long lfb_base;
+
+lfb_base = vlfb_info.lfb_base;
+lfb_base |= (unsigned long)vlfb_info.ext_lfb_base << 32;
 
 if ( !lfb || (vesa_mtrr == 0) || (vesa_mtrr >= ARRAY_SIZE(mtrr_types)) )
 return;
@@ -167,7 +174,7 @@ void __init vesa_mtrr_init(void)
 
 /* Try and find a power of two to add */
 do {
-rc = mtrr_add(vlfb_info.lfb_base, size_total, type, 1);
+rc = mtrr_add(lfb_base, size_total, type, 1);
 size_total >>= 1;
 } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) );
 }
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index ccdffc0..b0f0f7e 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -923,6 +923,8 @@ typedef struct dom0_vga_console_info {
 /* Mode attributes (offset 0x0, VESA command 0x4f01). */
 uint16_t mode_attrs;
 #endif
+/* high 32 bits of lfb_base */
+uint32_t ext_lfb_base;
 } vesa_lfb;
 } u;
 } dom0_vga_console_info_t;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/5] Fixes for large framebuffer, placed above 4GB

2019-05-06 Thread Marek Marczykowski-Górecki
A bunch of fixes for booting Xen on Thinkpad P52, through grub2-efi +
multiboot2. Most of them can be applied independently.

---
cc: Andrew Cooper 
cc: George Dunlap 
cc: Ian Jackson 
cc: Jan Beulich 
cc: Julien Grall 
cc: Konrad Rzeszutek Wilk 
cc: Stefano Stabellini 
cc: Tim Deegan 
cc: Wei Liu 
cc: Olaf Hering 

Marek Marczykowski-Górecki (5):
  xen/bitmap: fix bitmap_fill with zero-sized bitmap
  drivers/video: drop unused limits
  drivers/video: Drop framebuffer size constraints
  xen: fix handling framebuffer located above 4GB
  drivers/video: use vlfb_info consistently

 xen/arch/x86/efi/efi-boot.h |  1 +
 xen/drivers/video/lfb.c | 13 -
 xen/drivers/video/vesa.c| 17 -
 xen/include/public/xen.h|  2 ++
 xen/include/xen/bitmap.h|  2 ++
 5 files changed, 17 insertions(+), 18 deletions(-)

base-commit: dc497635d93f6672f82727ad97a55205177be2aa
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 5/5] drivers/video: use vlfb_info consistently

2019-05-06 Thread Marek Marczykowski-Górecki
vlfb_info is an alias for vga_console_info.u.vesa_lfb, so this change is
purely cosmetic. But using the same name helps reading the code.

Signed-off-by: Marek Marczykowski-Górecki 
---
 xen/drivers/video/vesa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c
index f22cf7f..3767024 100644
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -44,7 +44,7 @@ void __init vesa_early_init(void)
 {
 unsigned int vram_vmode;
 
-vga_compat = !(vga_console_info.u.vesa_lfb.gbl_caps & 2);
+vga_compat = !(vlfb_info.gbl_caps & 2);
 
 if ( (vlfb_info.bits_per_pixel < 8) || (vlfb_info.bits_per_pixel > 32) )
 return;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/5] xen: fix handling framebuffer located above 4GB

2019-05-06 Thread Marek Marczykowski-Górecki
On Mon, May 06, 2019 at 05:15:19PM +0200, Juergen Gross wrote:
> On 06/05/2019 16:50, Marek Marczykowski-Górecki wrote:
> > diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
> > index ccdffc0..b0f0f7e 100644
> > --- a/xen/include/public/xen.h
> > +++ b/xen/include/public/xen.h
> > @@ -923,6 +923,8 @@ typedef struct dom0_vga_console_info {
> >  /* Mode attributes (offset 0x0, VESA command 0x4f01). */
> >  uint16_t mode_attrs;
> >  #endif
> > +/* high 32 bits of lfb_base */
> > +uint32_t ext_lfb_base;
> 
> You will need to put this addition into an:
> 
> #if __XEN_INTERFACE_VERSION__ >= 0x00040d00
> ...
> #endif
> 
> section (only available from Xen 4.13 onwards).

Why exactly? I don't see this structure used in any hypercall argument.
The only externally accessible place is start_info structure, where it
has explicit struct size already.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/5] xen/bitmap: fix bitmap_fill with zero-sized bitmap

2019-05-09 Thread Marek Marczykowski-Górecki
When bitmap_fill(..., 0) is called, do not try to write anything. Before
this patch, it tried to write almost LONG_MAX, surely overwriting
something.

Signed-off-by: Marek Marczykowski-Górecki 
Reviewed-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
 xen/include/xen/bitmap.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/xen/bitmap.h b/xen/include/xen/bitmap.h
index fe3c720..0430c1c 100644
--- a/xen/include/xen/bitmap.h
+++ b/xen/include/xen/bitmap.h
@@ -126,6 +126,8 @@ static inline void bitmap_fill(unsigned long *dst, int 
nbits)
size_t nlongs = BITS_TO_LONGS(nbits);
 
switch (nlongs) {
+   case 0:
+   break;
default:
memset(dst, -1, (nlongs - 1) * sizeof(unsigned long));
/* fall through */
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 2/5] drivers/video: drop unused limits

2019-05-09 Thread Marek Marczykowski-Górecki
MAX_BPP, MAX_FONT_W, MAX_FONT_H are not used in the code at all.

Signed-off-by: Marek Marczykowski-Górecki 
Suggested-by: Jan Beulich 
Acked-by: Andrew Cooper 
---
 xen/drivers/video/lfb.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/xen/drivers/video/lfb.c b/xen/drivers/video/lfb.c
index d0c8c49..0475a68 100644
--- a/xen/drivers/video/lfb.c
+++ b/xen/drivers/video/lfb.c
@@ -12,9 +12,6 @@
 
 #define MAX_XRES 1900
 #define MAX_YRES 1200
-#define MAX_BPP 4
-#define MAX_FONT_W 8
-#define MAX_FONT_H 16
 
 struct lfb_status {
 struct lfb_prop lfbp;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 5/5] drivers/video: use vlfb_info consistently

2019-05-09 Thread Marek Marczykowski-Górecki
vlfb_info is an alias for vga_console_info.u.vesa_lfb, so this change is
purely cosmetic. But using the same name helps reading the code.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Andrew Cooper 
---
 xen/drivers/video/vesa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c
index f07d293..e9918a7 100644
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -49,7 +49,7 @@ void __init vesa_early_init(void)
 {
 unsigned int vram_vmode;
 
-vga_compat = !(vga_console_info.u.vesa_lfb.gbl_caps & 2);
+vga_compat = !(vlfb_info.gbl_caps & 2);
 
 if ( (vlfb_info.bits_per_pixel < 8) || (vlfb_info.bits_per_pixel > 32) )
 return;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 3/5] drivers/video: Drop framebuffer size constraints

2019-05-09 Thread Marek Marczykowski-Górecki
The limit 1900x1200 do not match real world devices (1900 looks like a
typo, should be 1920). But in practice the limits are arbitrary and do
not serve any real purpose. As discussed in "Increase framebuffer size
to todays standards" thread, drop them completely.

This fixes graphic console on device with 3840x2160 native resolution.

Signed-off-by: Marek Marczykowski-Górecki 
Suggested-by: Jan Beulich 
Acked-by: Andrew Cooper 
---
Cc: Olaf Hering 
---
 xen/drivers/video/lfb.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/xen/drivers/video/lfb.c b/xen/drivers/video/lfb.c
index 0475a68..5022195 100644
--- a/xen/drivers/video/lfb.c
+++ b/xen/drivers/video/lfb.c
@@ -10,9 +10,6 @@
 #include "lfb.h"
 #include "font.h"
 
-#define MAX_XRES 1900
-#define MAX_YRES 1200
-
 struct lfb_status {
 struct lfb_prop lfbp;
 
@@ -146,13 +143,6 @@ void lfb_carriage_return(void)
 
 int __init lfb_init(struct lfb_prop *lfbp)
 {
-if ( lfbp->width > MAX_XRES || lfbp->height > MAX_YRES )
-{
-printk(XENLOG_WARNING "Couldn't initialize a %ux%u framebuffer 
early.\n",
-   lfbp->width, lfbp->height);
-return -EINVAL;
-}
-
 lfb.lfbp = *lfbp;
 
 lfb.lbuf = xmalloc_bytes(lfb.lfbp.bytes_per_line);
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 0/5] Fixes for large framebuffer, placed above 4GB

2019-05-09 Thread Marek Marczykowski-Górecki
A bunch of fixes for booting Xen on Thinkpad P52, through grub2-efi +
multiboot2. Most of them can be applied independently.

---
cc: Andrew Cooper 
cc: George Dunlap 
cc: Ian Jackson 
cc: Jan Beulich 
cc: Julien Grall 
cc: Konrad Rzeszutek Wilk 
cc: Stefano Stabellini 
cc: Tim Deegan 
cc: Wei Liu 
cc: Olaf Hering 

Marek Marczykowski-Górecki (5):
  xen/bitmap: fix bitmap_fill with zero-sized bitmap
  drivers/video: drop unused limits
  drivers/video: Drop framebuffer size constraints
  xen: fix handling framebuffer located above 4GB
  drivers/video: use vlfb_info consistently

 xen/arch/x86/efi/efi-boot.h |  1 +
 xen/drivers/video/lfb.c | 13 -
 xen/drivers/video/vesa.c| 15 ++-
 xen/include/public/xen.h|  4 
 xen/include/xen/bitmap.h|  2 ++
 5 files changed, 17 insertions(+), 18 deletions(-)

base-commit: dc497635d93f6672f82727ad97a55205177be2aa
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 4/5] xen: fix handling framebuffer located above 4GB

2019-05-09 Thread Marek Marczykowski-Górecki
On some machines (for example Thinkpad P52), UEFI GOP reports
framebuffer located above 4GB (0x40 on that machine). This
address does not fit in {xen,dom0}_vga_console_info.u.vesa_lfb.lfb_base
field, which is 32bit. The overflow here cause all kind of memory
corruption when anything tries to write something on the screen,
starting with zeroing the whole framebuffer in vesa_init().

Fix this similar to how it's done in Linux: add ext_lfb_base field at
the end of the structure, to hold upper 32bits of the address. Since the
field is added at the end of the structure, it will work with older
Linux versions too (other than using possibly truncated address - no
worse than without this change). Thanks to ABI containing size of the
structure (start_info.console.dom0.info_size), Linux can detect when
this field is present and use it appropriately then.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
 - guard ext_lfb_base with #if __XEN_INTERFACE_VERSION__, but always
   include whe building Xen itself
 - add a helper function for lfb_base
---
 xen/arch/x86/efi/efi-boot.h |  1 +
 xen/drivers/video/vesa.c| 13 +
 xen/include/public/xen.h|  4 
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 5789d2c..7a13a30 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -550,6 +550,7 @@ static void __init 
efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
 vga_console_info.u.vesa_lfb.bytes_per_line =
 (mode_info->PixelsPerScanLine * bpp + 7) >> 3;
 vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase;
+vga_console_info.u.vesa_lfb.ext_lfb_base = gop->Mode->FrameBufferBase 
>> 32;
 vga_console_info.u.vesa_lfb.lfb_size =
 (gop->Mode->FrameBufferSize + 0x) >> 16;
 }
diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c
index c92497e..f07d293 100644
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -40,6 +40,11 @@ static int __init parse_font_height(const char *s)
 }
 custom_param("font", parse_font_height);
 
+static inline paddr_t lfb_base(void)
+{
+return (paddr_t)(vlfb_info.ext_lfb_base) << 32 | vlfb_info.lfb_base;
+}
+
 void __init vesa_early_init(void)
 {
 unsigned int vram_vmode;
@@ -97,15 +102,15 @@ void __init vesa_init(void)
 lfbp.text_columns = vlfb_info.width / font->width;
 lfbp.text_rows = vlfb_info.height / font->height;
 
-lfbp.lfb = lfb = ioremap(vlfb_info.lfb_base, vram_remap);
+lfbp.lfb = lfb = ioremap(lfb_base(), vram_remap);
 if ( !lfb )
 return;
 
 memset(lfb, 0, vram_remap);
 
-printk(XENLOG_INFO "vesafb: framebuffer at %#x, mapped to 0x%p, "
+printk(XENLOG_INFO "vesafb: framebuffer at %" PRIpaddr ", mapped to 0x%p, "
"using %uk, total %uk\n",
-   vlfb_info.lfb_base, lfb,
+   lfb_base(), lfb,
vram_remap >> 10, vram_total >> 10);
 printk(XENLOG_INFO "vesafb: mode is %dx%dx%u, linelength=%d, font %ux%u\n",
vlfb_info.width, vlfb_info.height,
@@ -167,7 +172,7 @@ void __init vesa_mtrr_init(void)
 
 /* Try and find a power of two to add */
 do {
-rc = mtrr_add(vlfb_info.lfb_base, size_total, type, 1);
+rc = mtrr_add(lfb_base(), size_total, type, 1);
 size_total >>= 1;
 } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) );
 }
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index ccdffc0..1752252 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -923,6 +923,10 @@ typedef struct dom0_vga_console_info {
 /* Mode attributes (offset 0x0, VESA command 0x4f01). */
 uint16_t mode_attrs;
 #endif
+#if __XEN_INTERFACE_VERSION__ >= 0x00040d00 || defined(__XEN__)
+/* high 32 bits of lfb_base */
+uint32_t ext_lfb_base;
+#endif
 } vesa_lfb;
 } u;
 } dom0_vga_console_info_t;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Anyone using blktap2?

2019-05-13 Thread Marek Marczykowski-Górecki
On Mon, May 13, 2019 at 04:34:14PM +0100, Wei Liu wrote:
> Hello
> 
> Seeing that you were the last people who changed blktap2 in a meaningful
> way: do you use it at all?
> 
> I'm thinking about dropping it (again).

Fine with me too.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 0/1] Fixes for large framebuffer, placed above 4GB

2019-05-16 Thread Marek Marczykowski-Górecki
A bunch of fixes for booting Xen on Thinkpad P52, through grub2-efi +
multiboot2. Most of them can be applied independently.

Changes in v3:
 - updated "xen: fix handling framebuffer located above 4GB"
 - dropped already applied patches

---
cc: Andrew Cooper 
cc: George Dunlap 
cc: Ian Jackson 
cc: Jan Beulich 
cc: Julien Grall 
cc: Konrad Rzeszutek Wilk 
cc: Stefano Stabellini 
cc: Tim Deegan 
cc: Wei Liu 
cc: Olaf Hering 

Marek Marczykowski-Górecki (1):
  xen: fix handling framebuffer located above 4GB

 xen/arch/x86/efi/efi-boot.h |  1 +
 xen/drivers/video/vesa.c| 14 +-
 xen/include/public/xen-compat.h |  2 +-
 xen/include/public/xen.h|  5 +
 4 files changed, 16 insertions(+), 6 deletions(-)

base-commit: 9dc8043ba18409e7a2057d8cccbf0ddaff71eb7e
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 1/1] xen: fix handling framebuffer located above 4GB

2019-05-16 Thread Marek Marczykowski-Górecki
On some machines (for example Thinkpad P52), UEFI GOP reports
framebuffer located above 4GB (0x40 on that machine). This
address does not fit in {xen,dom0}_vga_console_info.u.vesa_lfb.lfb_base
field, which is 32bit. The overflow here cause all kind of memory
corruption when anything tries to write something on the screen,
starting with zeroing the whole framebuffer in vesa_init().

Fix this similar to how it's done in Linux: add ext_lfb_base field at
the end of the structure, to hold upper 32bits of the address. Since the
field is added at the end of the structure, it will work with older
Linux versions too (other than using possibly truncated address - no
worse than without this change). Thanks to ABI containing size of the
structure (start_info.console.dom0.info_size), Linux can detect when
this field is present and use it appropriately then.

Since this change public interface and use __XEN_INTERFACE_VERSION__,
bump __XEN_LATEST_INTERFACE_VERSION__.

Note: if/when backporting this change to Xen <= 4.12, #if in xen.h needs
to be extended with " || defined(__XEN__)".

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
 - guard ext_lfb_base with #if __XEN_INTERFACE_VERSION__, but always
   include whe building Xen itself
 - add a helper function for lfb_base
Changes in v3:
 - add padding field
 - add parentheses around <<
 - unwrap format string per coding style, restore 0x
 - drop #if defined(__XEN__) needed only in backport
 - bump __XEN_LATEST_INTERFACE_VERSION__
---
 xen/arch/x86/efi/efi-boot.h |  1 +
 xen/drivers/video/vesa.c| 14 +-
 xen/include/public/xen-compat.h |  2 +-
 xen/include/public/xen.h|  5 +
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 5789d2c..7a13a30 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -550,6 +550,7 @@ static void __init 
efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
 vga_console_info.u.vesa_lfb.bytes_per_line =
 (mode_info->PixelsPerScanLine * bpp + 7) >> 3;
 vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase;
+vga_console_info.u.vesa_lfb.ext_lfb_base = gop->Mode->FrameBufferBase 
>> 32;
 vga_console_info.u.vesa_lfb.lfb_size =
 (gop->Mode->FrameBufferSize + 0x) >> 16;
 }
diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c
index 26d4962..9cf4bad 100644
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -40,6 +40,11 @@ static int __init parse_font_height(const char *s)
 }
 custom_param("font", parse_font_height);
 
+static inline paddr_t lfb_base(void)
+{
+return (paddr_t)((vlfb_info.ext_lfb_base) << 32) | vlfb_info.lfb_base;
+}
+
 void __init vesa_early_init(void)
 {
 unsigned int vram_vmode;
@@ -97,15 +102,14 @@ void __init vesa_init(void)
 lfbp.text_columns = vlfb_info.width / font->width;
 lfbp.text_rows = vlfb_info.height / font->height;
 
-lfbp.lfb = lfb = ioremap(vlfb_info.lfb_base, vram_remap);
+lfbp.lfb = lfb = ioremap(lfb_base(), vram_remap);
 if ( !lfb )
 return;
 
 memset(lfb, 0, vram_remap);
 
-printk(XENLOG_INFO "vesafb: framebuffer at %#x, mapped to 0x%p, "
-   "using %uk, total %uk\n",
-   vlfb_info.lfb_base, lfb,
+printk(XENLOG_INFO "vesafb: framebuffer at 0x%" PRIpaddr ", mapped to 
0x%p, using %uk, total %uk\n",
+   lfb_base(), lfb,
vram_remap >> 10, vram_total >> 10);
 printk(XENLOG_INFO "vesafb: mode is %dx%dx%u, linelength=%d, font %ux%u\n",
vlfb_info.width, vlfb_info.height,
@@ -167,7 +171,7 @@ void __init vesa_mtrr_init(void)
 
 /* Try and find a power of two to add */
 do {
-rc = mtrr_add(vlfb_info.lfb_base, size_total, type, 1);
+rc = mtrr_add(lfb_base(), size_total, type, 1);
 size_total >>= 1;
 } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) );
 }
diff --git a/xen/include/public/xen-compat.h b/xen/include/public/xen-compat.h
index 6fabca1..6708132 100644
--- a/xen/include/public/xen-compat.h
+++ b/xen/include/public/xen-compat.h
@@ -27,7 +27,7 @@
 #ifndef __XEN_PUBLIC_XEN_COMPAT_H__
 #define __XEN_PUBLIC_XEN_COMPAT_H__
 
-#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040a00
+#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040d00
 
 #if defined(__XEN__) || defined(__XEN_TOOLS__)
 /* Xen is built with matching headers and implements the latest interface. */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index ccdffc0..4676521 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -923,6 +923,11 @@ typedef struct dom0_vga_console_info {
 /* Mode attributes (offset 0x0, VESA command 0x4f01). */
 uint16_t mode_at

[Xen-devel] [PATCH v4] xen: fix handling framebuffer located above 4GB

2019-05-16 Thread Marek Marczykowski-Górecki
On some machines (for example Thinkpad P52), UEFI GOP reports
framebuffer located above 4GB (0x40 on that machine). This
address does not fit in {xen,dom0}_vga_console_info.u.vesa_lfb.lfb_base
field, which is 32bit. The overflow here cause all kind of memory
corruption when anything tries to write something on the screen,
starting with zeroing the whole framebuffer in vesa_init().

Fix this similar to how it's done in Linux: add ext_lfb_base field at
the end of the structure, to hold upper 32bits of the address. Since the
field is added at the end of the structure, it will work with older
Linux versions too (other than using possibly truncated address - no
worse than without this change). Thanks to ABI containing size of the
structure (start_info.console.dom0.info_size), Linux can detect when
this field is present and use it appropriately then.

Since this change public interface and use __XEN_INTERFACE_VERSION__,
bump __XEN_LATEST_INTERFACE_VERSION__.

Note: if/when backporting this change to Xen <= 4.12, #if in xen.h needs
to be extended with " || defined(__XEN__)".

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
 - guard ext_lfb_base with #if __XEN_INTERFACE_VERSION__, but always
   include whe building Xen itself
 - add a helper function for lfb_base
Changes in v3:
 - add padding field
 - add parentheses around <<
 - unwrap format string per coding style, restore 0x
 - drop #if defined(__XEN__) needed only in backport
 - bump __XEN_LATEST_INTERFACE_VERSION__
Changes in v4:
 - fix parentheses
 - fix padding
---
 xen/arch/x86/efi/efi-boot.h |  1 +
 xen/drivers/video/vesa.c| 14 +-
 xen/include/public/xen-compat.h |  2 +-
 xen/include/public/xen.h|  5 +
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 5789d2cb70..7a13a30bc0 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -550,6 +550,7 @@ static void __init 
efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
 vga_console_info.u.vesa_lfb.bytes_per_line =
 (mode_info->PixelsPerScanLine * bpp + 7) >> 3;
 vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase;
+vga_console_info.u.vesa_lfb.ext_lfb_base = gop->Mode->FrameBufferBase 
>> 32;
 vga_console_info.u.vesa_lfb.lfb_size =
 (gop->Mode->FrameBufferSize + 0x) >> 16;
 }
diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c
index 26d4962b0e..fd2cb1312d 100644
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -40,6 +40,11 @@ static int __init parse_font_height(const char *s)
 }
 custom_param("font", parse_font_height);
 
+static inline paddr_t lfb_base(void)
+{
+return ((paddr_t)vlfb_info.ext_lfb_base << 32) | vlfb_info.lfb_base;
+}
+
 void __init vesa_early_init(void)
 {
 unsigned int vram_vmode;
@@ -97,15 +102,14 @@ void __init vesa_init(void)
 lfbp.text_columns = vlfb_info.width / font->width;
 lfbp.text_rows = vlfb_info.height / font->height;
 
-lfbp.lfb = lfb = ioremap(vlfb_info.lfb_base, vram_remap);
+lfbp.lfb = lfb = ioremap(lfb_base(), vram_remap);
 if ( !lfb )
 return;
 
 memset(lfb, 0, vram_remap);
 
-printk(XENLOG_INFO "vesafb: framebuffer at %#x, mapped to 0x%p, "
-   "using %uk, total %uk\n",
-   vlfb_info.lfb_base, lfb,
+printk(XENLOG_INFO "vesafb: framebuffer at 0x%" PRIpaddr ", mapped to 
0x%p, using %uk, total %uk\n",
+   lfb_base(), lfb,
vram_remap >> 10, vram_total >> 10);
 printk(XENLOG_INFO "vesafb: mode is %dx%dx%u, linelength=%d, font %ux%u\n",
vlfb_info.width, vlfb_info.height,
@@ -167,7 +171,7 @@ void __init vesa_mtrr_init(void)
 
 /* Try and find a power of two to add */
 do {
-rc = mtrr_add(vlfb_info.lfb_base, size_total, type, 1);
+rc = mtrr_add(lfb_base(), size_total, type, 1);
 size_total >>= 1;
 } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) );
 }
diff --git a/xen/include/public/xen-compat.h b/xen/include/public/xen-compat.h
index 6fabca1889..6708132394 100644
--- a/xen/include/public/xen-compat.h
+++ b/xen/include/public/xen-compat.h
@@ -27,7 +27,7 @@
 #ifndef __XEN_PUBLIC_XEN_COMPAT_H__
 #define __XEN_PUBLIC_XEN_COMPAT_H__
 
-#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040a00
+#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040d00
 
 #if defined(__XEN__) || defined(__XEN_TOOLS__)
 /* Xen is built with matching headers and implements the latest interface. */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index ccdffc0ad1..cb2917e74b 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -922,6 +922,11 @@ typedef struct dom0_vga_console_info {
   

Re: [Xen-devel] [RFC PATCH v2 11/17] xenconsoled: add support for consoles using 'state' xenstore entry

2019-01-09 Thread Marek Marczykowski-Górecki
On Thu, Nov 01, 2018 at 05:21:39PM +, Ian Jackson wrote:
> Marek Marczykowski-Górecki writes ("[RFC PATCH v2 11/17] xenconsoled: add 
> support for consoles using 'state' xenstore entry"):
> > Add support for standard xenbus initialization protocol using 'state'
> > xenstore entry. It will be necessary for secondary consoles.
> > For consoles supporting it, read 'state' entry on the frontend and
> > proceed accordingly - either init console or close it. When closing,
> > make sure all the in-transit data is flushed (both from shared ring and
> > from local buffer), if possible. This is especially important for
> > MiniOS-based qemu stubdomain, which closes console just after writing
> > device model state to it.
> > For consoles without 'state' field behavior is unchanged - on any watch
> > event try to connect console, as long as domain is alive.
> 
> I'm not opposed to the introduction of this state field.  The code
> looks plausible.
> 
> But:
> 
> Firstly, you have put the protocol description in your commit
> message (and it seems rather informal).  Can you please provide
> a comprehensive protocol specification in-tree ?  You need to patch
>docs/misc/console.txt
> I think.
> 
> I say `comprehensive' because your text is not particularly clearly
> about who is supposed to `flush' which data exactly when.  Nor what
> `flushing' means (does it ever mean discarding?)
> 
> Secondly: what about backwards compatibility ?  I think we need to at
> least think about the possibility that there are some guest frontends
> out there which may look for a `state' node and do something
> undesirable with it.  I think this possibility is remote but it should
> be mentioned in the commit message.

Note that this commit _does not_ invent any new protocol at all. It merely
add support for the protocol that is used by additional consoles. The
backend side was implemented by qemu only before, now I add support for
it to xenconsoled.

This is about (additional) consoles living in
/local/domain/$DOMID/device/console/$DEVID, not the special-kind-of-thing
living in /local/domain/$DOMID/console. Is there any protocol specification
for it already anywhere? I can't see it xen/include/public/io/ as it's
for other device types - console.h have only struct xencons_interface
declaration without any comment. I can't find it in other places either.
If there is one, it should be added to docs/misc/console.txt and/or
xen/include/public/io/console.h. Otherwise I can add some basic spec to
docs/misc/console.txt.

> What about the possibility that one or the other end of the connection
> may be replaced by a different implementation, so that the peer
> appears to gain or lose support for `state' ?

Actually, I'm doing similar thing here ;) previously xenconsoled didn't
know anything about 'state' field and it was one of the reason it
couldn't handle multiple consoles per domain.

> I'll be able to review the code more effectively when there is a
> formal protocol spec to compare it to...

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-01-15 Thread Marek Marczykowski-Górecki
From: Simon Gaiser 

Stubdomains need to be given sufficient privilege over the guest which it
provides emulation for in order for PCI passthrough to work correctly.
When a HVM domain try to enable MSI, QEMU in stubdomain calls
PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
part of xc_domain_update_msi_irq. Allow for that as part of
PHYSDEVOP_map_pirq.

Based on 
https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
 by Eric Chanudet .

Signed-off-by: Simon Gaiser 
Signed-off-by: Marek Marczykowski-Górecki 
---
This is only one part of fixing MSI with QEMU in stubdomain. The other
part is allowing stubdomain to actually enable MSI in PCI config space.
QEMU does that through pcifront/back connected to the stubdomain (see
hw/xen/xen_pt_msi.c:msi_msix_enable()), but pciback by default refuse
write to that register.
Easy, less safe solution: enable permissive mode for the device.
Safer solution - enable access to this register for stubdomain only
(pciback patch that add such flag + libxl patch to set it for relevant
 devices)
The whole story:
https://www.qubes-os.org/news/2017/10/18/msi-support/

Any other ideas? Which one is preferred upstream?
---
 xen/arch/x86/irq.c | 23 +++
 xen/arch/x86/physdev.c |  9 +
 2 files changed, 32 insertions(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 8b44d6c..123ca69 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, int 
index, int *pirq_p,
 {
 case MAP_PIRQ_TYPE_MULTI_MSI:
 irq = create_irq(NUMA_NO_NODE);
+if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) &&
+ current->domain->target == d )
+{
+ret = irq_permit_access(current->domain, irq);
+if ( ret ) {
+dprintk(XENLOG_G_ERR,
+"dom%d: can't grant it's stubdom (%d) access to "
+"irq %d for msi: %d!\n",
+d->domain_id,
+current->domain->domain_id,
+irq,
+ret);
+return -EINVAL;
+}
+}
 }
 
 if ( irq < nr_irqs_gsi || irq >= nr_irqs )
@@ -2717,7 +2732,15 @@ int allocate_and_map_msi_pirq(struct domain *d, int 
index, int *pirq_p,
 case MAP_PIRQ_TYPE_MSI:
 if ( index == -1 )
 case MAP_PIRQ_TYPE_MULTI_MSI:
+{
+if ( current->domain->target == d &&
+ irq_deny_access(current->domain, irq) )
+dprintk(XENLOG_G_ERR,
+"dom%d: can't revoke stubdom's access to irq 
%d!\n",
+d->domain_id,
+irq);
 destroy_irq(irq);
+}
 break;
 }
 }
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 3a3c158..de59e39 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -164,6 +164,15 @@ int physdev_unmap_pirq(domid_t domid, int pirq)
 
 pcidevs_lock();
 spin_lock(&d->event_lock);
+if ( current->domain->target == d)
+{
+int irq = domain_pirq_to_irq(d, pirq);
+if ( irq <= 0 || irq_deny_access(current->domain, irq) )
+dprintk(XENLOG_G_ERR,
+"dom%d: can't revoke stubdom's access to irq %d!\n",
+d->domain_id,
+irq);
+}
 ret = unmap_domain_pirq(d, pirq);
 spin_unlock(&d->event_lock);
 pcidevs_unlock();
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 3/4] libxl: don't try to manipulate json config for stubdomain

2019-01-15 Thread Marek Marczykowski-Górecki
Stubdomain do not have it's own config file - its configuration is
derived from target domains. Do not try to manipulate it when attaching
PCI device.

This bug prevented starting HVM with stubdomain and PCI passthrough
device attached.

Signed-off-by: Marek Marczykowski-Górecki 
---
 tools/libxl/libxl_pci.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 1bde537..e974f55 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -152,14 +152,18 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 GCNEW(device);
 libxl__device_from_pcidev(gc, domid, pcidev, device);
 
-lock = libxl__lock_domain_userdata(gc, domid);
-if (!lock) {
-rc = ERROR_LOCK_FAIL;
-goto out;
-}
+/* Stubdomain config is derived from its target domain, it doesn't have
+   its own file */
+if (!libxl_is_stubdom(CTX, domid, NULL)) {
+lock = libxl__lock_domain_userdata(gc, domid);
+if (!lock) {
+rc = ERROR_LOCK_FAIL;
+goto out;
+}
 
-rc = libxl__get_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+rc = libxl__get_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
+}
 
 device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
  &pcidev_saved);
@@ -171,8 +175,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 rc = libxl__xs_transaction_start(gc, &t);
 if (rc) goto out;
 
-rc = libxl__set_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+if (lock) {
+rc = libxl__set_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
+}
 
 libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back));
 
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 2/4] libxl: attach PCI device to qemu only after setting pciback/pcifront

2019-01-15 Thread Marek Marczykowski-Górecki
When qemu is running in stubdomain, handling "pci-ins" command will fail
if pcifront is not initialized already. Fix this by sending such command
only after confirming that pciback/front is running.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
- Fixed code style since previous version.
---
 tools/libxl/libxl_pci.c |  9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 3b6b23c..1bde537 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcide
 {
 libxl_ctx *ctx = libxl__gc_owner(gc);
 unsigned int orig_vdev, pfunc_mask;
+char *be_path;
 libxl_device_pci *assigned;
 int num_assigned, i, rc;
 int stubdomid = 0;
@@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcide
 rc = do_pci_add(gc, stubdomid, &pcidev_s, 0);
 if ( rc )
 goto out;
+/* Wait for the device actually being connected, otherwise device model
+ * running there will fail to find the device. */
+be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0",
+ libxl__xs_get_dompath(gc, 0), stubdomid);
+rc = libxl__wait_for_backend(gc, be_path,
+ GCSPRINTF("%d", XenbusStateConnected));
+if (rc)
+goto out;
 }
 
 orig_vdev = pcidev->vdevfn & ~7U;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use

2019-01-15 Thread Marek Marczykowski-Górecki
HVM domains use IOMMU and device model assistance for communicating with
PCI devices, xen-pcifront/pciback is used only in PV domains.
When HVM domain has device model in stubdomain, attaching xen-pciback to
the target domain itself is not only useless, but also may prevent
attaching xen-pciback to the stubdomain, effectively breaking PCI
passthrough.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
 - previously called "libxl: attach xen-pciback only to PV domains"
 - instead of excluding all HVMs, change the condition to what actually
   matters here - check if stubdomain is in use; this way xen-pciback is
   always in use (either for the target domain, or it's stubdomain),
   fixing PCI reset by xen-pciback concerns
---
 tools/libxl/libxl_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 87afa03..3b6b23c 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1106,7 +1106,7 @@ out:
 }
 }
 
-if (!starting)
+if (!starting && !libxl_get_stubdom_id(CTX, domid))
 rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
 else
 rc = 0;
@@ -1302,7 +1302,7 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao 
*ao, uint32_t domid,
 }
 }
 
-if (d_config->num_pcidevs > 0) {
+if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) {
 rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
 d_config->num_pcidevs);
 if (rc < 0) {
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 0/4] Fix PCI passthrough for HVM with stubdomain

2019-01-15 Thread Marek Marczykowski-Górecki
This is yet another attempt to upstream a set of PCI passthrough related fixed
carried in Qubes OS packages for a long time already.
Some of them were already discussed previously, see another thread with the
same subject back in October 2016.

As for "xen/x86: Allow stubdom access to irq created for msi" patch, this is
only one part of fixing MSI with QEMU in stubdomain. The other part is allowing
stubdomain to actually enable MSI in PCI config space.  QEMU does that through
pcifront/back connected to the stubdomain (see
hw/xen/xen_pt_msi.c:msi_msix_enable()), but pciback by default refuse
write to that register.
Easy, less safe solution: enable permissive mode for the device.
Safer solution - enable access to this register for stubdomain only:
 - pciback patch: 
https://github.com/QubesOS/qubes-linux-kernel/blob/master/0015-xen-pciback-add-attribute-to-allow-MSI-enable-flag-w.patch
 - libxl patch: 
https://github.com/QubesOS/qubes-vmm-xen/blob/xen-4.8/patch-stubdom-allow-msi-enable.patch

The whole story:
https://www.qubes-os.org/news/2017/10/18/msi-support/

Any other ideas? Which one is preferred upstream?

Changes in v2:
 - new "xen/x86: Allow stubdom access to irq created for msi" patch
 - applied review comments from v1

Marek Marczykowski-Górecki (3):
  libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
  libxl: attach PCI device to qemu only after setting pciback/pcifront
  libxl: don't try to manipulate json config for stubdomain

Simon Gaiser (1):
  xen/x86: Allow stubdom access to irq created for msi.

 tools/libxl/libxl_pci.c | 37 ++---
 xen/arch/x86/irq.c  | 23 +++
 xen/arch/x86/physdev.c  |  9 +
 3 files changed, 58 insertions(+), 11 deletions(-)

base-commit: 93a62c544e20ba9e141e411bbaae3d65259d13a3
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-01-16 Thread Marek Marczykowski-Górecki
On Wed, Jan 16, 2019 at 10:21:29AM +0100, Roger Pau Monné wrote:
> On Tue, Jan 15, 2019 at 04:36:31PM +0100, Marek Marczykowski-Górecki wrote:
> > From: Simon Gaiser 
> > 
> > Stubdomains need to be given sufficient privilege over the guest which it
> > provides emulation for in order for PCI passthrough to work correctly.
> > When a HVM domain try to enable MSI, QEMU in stubdomain calls
> > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
> > part of xc_domain_update_msi_irq. Allow for that as part of
> > PHYSDEVOP_map_pirq.
> 
> I see, that's not a problem AFAICT for PCI INTx because the IRQ in
> that case is known beforehand, and the stubdomain is given permissions
> over this IRQ by libxl__device_pci_add (there's a do_pci_add against
> the stubdomain).

Exactly.

> > 
> > Based on 
> > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
> >  by Eric Chanudet .
> > 
> > Signed-off-by: Simon Gaiser 
> > Signed-off-by: Marek Marczykowski-Górecki 
> > ---
> > This is only one part of fixing MSI with QEMU in stubdomain. The other
> > part is allowing stubdomain to actually enable MSI in PCI config space.
> > QEMU does that through pcifront/back connected to the stubdomain (see
> > hw/xen/xen_pt_msi.c:msi_msix_enable()), but pciback by default refuse
> > write to that register.
> > Easy, less safe solution: enable permissive mode for the device.
> > Safer solution - enable access to this register for stubdomain only
> > (pciback patch that add such flag + libxl patch to set it for relevant
> >  devices)
> > The whole story:
> > https://www.qubes-os.org/news/2017/10/18/msi-support/
> > 
> > Any other ideas? Which one is preferred upstream?
> 
> IMO, and please correct me if I'm wrong, QEMU in the stubdomain will
> receive the PCI config space write to enable MSI, and since this
> stub-QEMU runs in PV mode I think it should use the PV way to enable
> MSI, ie: the same that Linux pcifront uses to enable MSI for
> passed-through devices.
> 
> Is this something that sounds sensible?

We've considered this option too. Let me quote Simon on that (from the
link above):

The enable command that pcifront sends is intended for the normal PV use
case where the device is passed to the VM itself (via pcifront) rather
than to the stub domain target. While the command is called enable_msi,
pciback does much more than simply setting the enable flag. It also
configures IRQ handling in the dom0 kernel, adapts the MSI masking, and
more. This makes sense in the PV case, but in the HVM case, the MSI
configuration is done by QEMU, so this most likely won’t work correctly.

> > ---
> >  xen/arch/x86/irq.c | 23 +++
> >  xen/arch/x86/physdev.c |  9 +
> >  2 files changed, 32 insertions(+)
> > 
> > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> > index 8b44d6c..123ca69 100644
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, int 
> > index, int *pirq_p,
> >  {
> >  case MAP_PIRQ_TYPE_MULTI_MSI:
> >  irq = create_irq(NUMA_NO_NODE);
> > +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) &&
> 
> This check is already performed below, maybe you could re-arrange the
> code as:
> 
> case MAP_PIRQ_TYPE_MULTI_MSI:
> irq = create_irq(NUMA_NO_NODE);
> }
> 
> if ( irq < nr_irqs_gsi || irq >= nr_irqs )
> {
> dprintk(XENLOG_G_ERR, "dom%d: can't create irq for msi!\n",
> d->domain_id);
> return -EINVAL;
> }
> if ( current->domain->target == d )
> ...
> 
> But I wonder whether it would be better to place the irq_permit_access
> in map_domain_pirq, together with the existing irq_permit_access that
> grant the target domain permissions over the irq.

That may be a good idea. Let me try that in v3. But I'll wait for a
feedback on libxl patches first.

> > + current->domain->target == d )
> > +{
> > +ret = irq_permit_access(current->domain, irq);
> > +if ( ret ) {
> > +dprintk(XENLOG_G_ERR,
> > +"dom%d: can't grant it's stubdom (%d) access 
> > to "
> > +"irq %d for msi: %d!\n",
> > +d->domain_id,
> > + 

Re: [Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-01-16 Thread Marek Marczykowski-Górecki
On Wed, Jan 16, 2019 at 01:20:04PM +0100, Roger Pau Monné wrote:
> On Wed, Jan 16, 2019 at 11:52:18AM +0100, Marek Marczykowski-Górecki wrote:
> > On Wed, Jan 16, 2019 at 10:21:29AM +0100, Roger Pau Monné wrote:
> > > On Tue, Jan 15, 2019 at 04:36:31PM +0100, Marek Marczykowski-Górecki 
> > > wrote:
> > > > From: Simon Gaiser 
> > > > 
> > > > Stubdomains need to be given sufficient privilege over the guest which 
> > > > it
> > > > provides emulation for in order for PCI passthrough to work correctly.
> > > > When a HVM domain try to enable MSI, QEMU in stubdomain calls
> > > > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
> > > > part of xc_domain_update_msi_irq. Allow for that as part of
> > > > PHYSDEVOP_map_pirq.
> > > 
> > > I see, that's not a problem AFAICT for PCI INTx because the IRQ in
> > > that case is known beforehand, and the stubdomain is given permissions
> > > over this IRQ by libxl__device_pci_add (there's a do_pci_add against
> > > the stubdomain).
> > 
> > Exactly.
> 
> I would maybe consider adding something like this to the commit
> message, so it's clear why PCI INTx works but not MSI interrupts.
> 
> > 
> > > > 
> > > > Based on 
> > > > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
> > > >  by Eric Chanudet .
> > > > 
> > > > Signed-off-by: Simon Gaiser 
> > > > Signed-off-by: Marek Marczykowski-Górecki 
> > > > 
> > > > ---
> > > > This is only one part of fixing MSI with QEMU in stubdomain. The other
> > > > part is allowing stubdomain to actually enable MSI in PCI config space.
> > > > QEMU does that through pcifront/back connected to the stubdomain (see
> > > > hw/xen/xen_pt_msi.c:msi_msix_enable()), but pciback by default refuse
> > > > write to that register.
> > > > Easy, less safe solution: enable permissive mode for the device.
> > > > Safer solution - enable access to this register for stubdomain only
> > > > (pciback patch that add such flag + libxl patch to set it for relevant
> > > >  devices)
> > > > The whole story:
> > > > https://www.qubes-os.org/news/2017/10/18/msi-support/
> > > > 
> > > > Any other ideas? Which one is preferred upstream?
> > > 
> > > IMO, and please correct me if I'm wrong, QEMU in the stubdomain will
> > > receive the PCI config space write to enable MSI, and since this
> > > stub-QEMU runs in PV mode I think it should use the PV way to enable
> > > MSI, ie: the same that Linux pcifront uses to enable MSI for
> > > passed-through devices.
> > > 
> > > Is this something that sounds sensible?
> > 
> > We've considered this option too. Let me quote Simon on that (from the
> > link above):
> > 
> > The enable command that pcifront sends is intended for the normal PV use
> > case where the device is passed to the VM itself (via pcifront) rather
> > than to the stub domain target. While the command is called enable_msi,
> > pciback does much more than simply setting the enable flag. It also
> > configures IRQ handling in the dom0 kernel, adapts the MSI masking, and
> > more. This makes sense in the PV case, but in the HVM case, the MSI
> > configuration is done by QEMU, so this most likely won’t work correctly.
> 
> Oh great, that's unfortunate. Both pciback functions end up calling
> into msi_capability_init in the Linux kernel, which does indeed more
> than just toggling the PCI config space enable bit.
> 
> OTOH adding a bypass to pciback so the stubdom is able to write to the
> PCI register in order to toggle the enable bit seems quite clumsy. Not
> to mention that you would be required to update Dom0 kernel in order to
> fix the issue.
> 
> Do you think it makes sense to add a domctl to enable/disable MSI(X)?
> 
> This way the bug could be fixed by just updating Xen (and the
> stubdomain).

Indeed in case of stubdomain, that would make sens, as other PCI passthrough
related operations already bypass pcifront/back anyway.

And I agree with Jan, that physdevop makes more sense, if going this way.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use

2019-01-16 Thread Marek Marczykowski-Górecki
On Wed, Jan 16, 2019 at 05:47:19PM +0100, Roger Pau Monné wrote:
> On Tue, Jan 15, 2019 at 04:36:28PM +0100, Marek Marczykowski-Górecki wrote:
> > HVM domains use IOMMU and device model assistance for communicating with
> > PCI devices, xen-pcifront/pciback is used only in PV domains.
> 
> You still need pciback in order to reset the device when it's
> deassigned from the guest, so it's functionality is not only used by
> PV guests.

Right, I'll update the commit message to match v2 code.

> > When HVM domain has device model in stubdomain, attaching xen-pciback to
> > the target domain itself is not only useless, but also may prevent
> > attaching xen-pciback to the stubdomain, effectively breaking PCI
> > passthrough.
> 
> Right. When doing passthrough with a stubdomain you want the target
> domain to have the memory and IO regions mapped, and the stubdomain to
> handle the rest.
> 
> > Signed-off-by: Marek Marczykowski-Górecki 
> > ---
> > Changes in v2:
> >  - previously called "libxl: attach xen-pciback only to PV domains"
> >  - instead of excluding all HVMs, change the condition to what actually
> >matters here - check if stubdomain is in use; this way xen-pciback is
> >always in use (either for the target domain, or it's stubdomain),
> >fixing PCI reset by xen-pciback concerns
> > ---
> >  tools/libxl/libxl_pci.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index 87afa03..3b6b23c 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -1106,7 +1106,7 @@ out:
> >  }
> >  }
> >  
> > -if (!starting)
> > +if (!starting && !libxl_get_stubdom_id(CTX, domid))
> 
> This change seems to assume that both libxl_domain_config for the
> target and the stubdomain will have the assigned pci devices in the
> pcidevs field. 

Not really. libxl__device_pci_add() calls do_pci_add() for both stubdomain
(if applicable) and target domain.

> Yet I cannot see where the stubdomain
> libxl_domain_config will get the pci devices from the target domain
> assigned, I've looked in libxl__spawn_stub_dm but there doesn't seem
> to be any copy from the target to the stubdom of the list of pci
> devices. I guess I'm missing something.
> 
> Thanks, Roger.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use

2019-01-17 Thread Marek Marczykowski-Górecki
On Thu, Jan 17, 2019 at 10:21:34AM +0100, Roger Pau Monné wrote:
> OK. From an architectural PoV I think it would make more sense to copy
> the list of pci devices to the stubdom config in libxl__spawn_stub_dm,
> but I'm not that familiar with pci handling in libxl, so there might
> be a reason why things are done like this currently.

libxl would refuse to attach the same device to two domains and also
will perform some operations twice (as for example device reset). As you
can see even right now some things needs to be disabled for stubdomain
or target domain, because of this. If device would be included in the
config for both target domain and its stubdomain, the code would need
some more exceptions, which I think would be even worse.

Other thing is the current code attach PCI devices when device-model is
already running (regardless of its version or using stubdomain). I guess
qemu doesn't setup those devices otherwise (there is nothing on qemu
command line about it and none of libxl part generate such options). I'm
not really sure if that couldn't be done differently, but I'm kind of
afraid changing to much in PCI passthrough related code...

> The change LGTM, albeit I found the pci handling code quite hard to
> follow. I'm also not sure whether certain parts of the code are
> correct, for example the PCI INTx seems to be mapped to both the
> stubdomain and the target domain, when it's only the target domain the
> one that actually uses it.

That's true. Generally, I think stubdomain needs only config space
access for its own, other things should be done for the target domain.
This include mapping interrupts of any kind. Stubdomain is given
permission to them only to be able to map them to the target domain.

This also may be the reason why PCI devices are not included in
stubdomain's config, but only do_pci_add() is called. It may be that in
the past some more parts of the device setup was skipped this way.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-01-25 Thread Marek Marczykowski-Górecki
On Wed, Jan 16, 2019 at 11:52:21AM +0100, Marek Marczykowski-Górecki wrote:
> On Wed, Jan 16, 2019 at 10:21:29AM +0100, Roger Pau Monné wrote:
> > On Tue, Jan 15, 2019 at 04:36:31PM +0100, Marek Marczykowski-Górecki wrote:
> > > From: Simon Gaiser 
> > > 
> > > Stubdomains need to be given sufficient privilege over the guest which it
> > > provides emulation for in order for PCI passthrough to work correctly.
> > > When a HVM domain try to enable MSI, QEMU in stubdomain calls
> > > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
> > > part of xc_domain_update_msi_irq. Allow for that as part of
> > > PHYSDEVOP_map_pirq.
> > 
> > I see, that's not a problem AFAICT for PCI INTx because the IRQ in
> > that case is known beforehand, and the stubdomain is given permissions
> > over this IRQ by libxl__device_pci_add (there's a do_pci_add against
> > the stubdomain).
> 
> Exactly.
> 
> > > 
> > > Based on 
> > > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
> > >  by Eric Chanudet .
> > > 
> > > Signed-off-by: Simon Gaiser 
> > > Signed-off-by: Marek Marczykowski-Górecki 
> > > 
> > > ---
(...)
> > > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> > > index 8b44d6c..123ca69 100644
> > > --- a/xen/arch/x86/irq.c
> > > +++ b/xen/arch/x86/irq.c
> > > @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, 
> > > int index, int *pirq_p,
> > >  {
> > >  case MAP_PIRQ_TYPE_MULTI_MSI:
> > >  irq = create_irq(NUMA_NO_NODE);
> > > +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) &&
> > 
> > This check is already performed below, maybe you could re-arrange the
> > code as:
> > 
> > case MAP_PIRQ_TYPE_MULTI_MSI:
> > irq = create_irq(NUMA_NO_NODE);
> > }
> > 
> > if ( irq < nr_irqs_gsi || irq >= nr_irqs )
> > {
> > dprintk(XENLOG_G_ERR, "dom%d: can't create irq for msi!\n",
> > d->domain_id);
> > return -EINVAL;
> > }
> > if ( current->domain->target == d )
> > ...
> > 
> > But I wonder whether it would be better to place the irq_permit_access
> > in map_domain_pirq, together with the existing irq_permit_access that
> > grant the target domain permissions over the irq.
> 
> That may be a good idea. Let me try that in v3. But I'll wait for a
> feedback on libxl patches first.

That doesn't work, as map_domain_pirq() check irq access earlier.
Which bring be to a question, what should be rules guarding stubdomain
access to PHYSDEVOP_map_pirq? With this patch, stubdomain will be able
to create and map multiple irq (DoS possibility?), as only target domain
is validated in practice. Is that ok? If not, what additional limits
could be applied here?
In INTx case the problem doesn't apply, because toolstack grant access
to particular IRQ and no allocation happen on stubdomain request. But in
MSI case, it isn't that easy as IRQ number isn't known before (as
explained in the commit message).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/4] xen/x86: Allow stubdom access to irq created for msi.

2019-01-25 Thread Marek Marczykowski-Górecki
On Fri, Jan 25, 2019 at 08:43:59PM +0100, Marek Marczykowski-Górecki wrote:
> On Wed, Jan 16, 2019 at 11:52:21AM +0100, Marek Marczykowski-Górecki wrote:
> > On Wed, Jan 16, 2019 at 10:21:29AM +0100, Roger Pau Monné wrote:
> > > On Tue, Jan 15, 2019 at 04:36:31PM +0100, Marek Marczykowski-Górecki 
> > > wrote:
> > > > From: Simon Gaiser 
> > > > 
> > > > Stubdomains need to be given sufficient privilege over the guest which 
> > > > it
> > > > provides emulation for in order for PCI passthrough to work correctly.
> > > > When a HVM domain try to enable MSI, QEMU in stubdomain calls
> > > > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
> > > > part of xc_domain_update_msi_irq. Allow for that as part of
> > > > PHYSDEVOP_map_pirq.
> > > 
> > > I see, that's not a problem AFAICT for PCI INTx because the IRQ in
> > > that case is known beforehand, and the stubdomain is given permissions
> > > over this IRQ by libxl__device_pci_add (there's a do_pci_add against
> > > the stubdomain).
> > 
> > Exactly.
> > 
> > > > 
> > > > Based on 
> > > > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
> > > >  by Eric Chanudet .
> > > > 
> > > > Signed-off-by: Simon Gaiser 
> > > > Signed-off-by: Marek Marczykowski-Górecki 
> > > > 
> > > > ---
> (...)
> > > > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> > > > index 8b44d6c..123ca69 100644
> > > > --- a/xen/arch/x86/irq.c
> > > > +++ b/xen/arch/x86/irq.c
> > > > @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, 
> > > > int index, int *pirq_p,
> > > >  {
> > > >  case MAP_PIRQ_TYPE_MULTI_MSI:
> > > >  irq = create_irq(NUMA_NO_NODE);
> > > > +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) &&
> > > 
> > > This check is already performed below, maybe you could re-arrange the
> > > code as:
> > > 
> > > case MAP_PIRQ_TYPE_MULTI_MSI:
> > > irq = create_irq(NUMA_NO_NODE);
> > > }
> > > 
> > > if ( irq < nr_irqs_gsi || irq >= nr_irqs )
> > > {
> > > dprintk(XENLOG_G_ERR, "dom%d: can't create irq for msi!\n",
> > > d->domain_id);
> > > return -EINVAL;
> > > }
> > > if ( current->domain->target == d )
> > > ...

Oh, and this won't fly either, as irq_permit_access() should happen only
when create_irq() was called, otherwise it will give access to arbitrary
irq.

> > > 
> > > But I wonder whether it would be better to place the irq_permit_access
> > > in map_domain_pirq, together with the existing irq_permit_access that
> > > grant the target domain permissions over the irq.
> > 
> > That may be a good idea. Let me try that in v3. But I'll wait for a
> > feedback on libxl patches first.
> 
> That doesn't work, as map_domain_pirq() check irq access earlier.
> Which bring be to a question, what should be rules guarding stubdomain
> access to PHYSDEVOP_map_pirq? With this patch, stubdomain will be able
> to create and map multiple irq (DoS possibility?), as only target domain
> is validated in practice. Is that ok? If not, what additional limits
> could be applied here?
> In INTx case the problem doesn't apply, because toolstack grant access
> to particular IRQ and no allocation happen on stubdomain request. But in
> MSI case, it isn't that easy as IRQ number isn't known before (as
> explained in the commit message).
> 



-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] libxl: attach PCI device to qemu only after setting pciback/pcifront

2019-01-25 Thread Marek Marczykowski-Górecki
On Thu, Jan 17, 2019 at 11:29:59AM +0100, Roger Pau Monné wrote:
> On Tue, Jan 15, 2019 at 04:36:29PM +0100, Marek Marczykowski-Górecki wrote:
> > When qemu is running in stubdomain, handling "pci-ins" command will fail
> > if pcifront is not initialized already. Fix this by sending such command
> > only after confirming that pciback/front is running.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki 
> > ---
> > Changes in v2:
> > - Fixed code style since previous version.
> > ---
> >  tools/libxl/libxl_pci.c |  9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index 3b6b23c..1bde537 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t 
> > domid, libxl_device_pci *pcide
> >  {
> >  libxl_ctx *ctx = libxl__gc_owner(gc);
> >  unsigned int orig_vdev, pfunc_mask;
> > +char *be_path;
> >  libxl_device_pci *assigned;
> >  int num_assigned, i, rc;
> >  int stubdomid = 0;
> > @@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t 
> > domid, libxl_device_pci *pcide
> >  rc = do_pci_add(gc, stubdomid, &pcidev_s, 0);
> >  if ( rc )
> >  goto out;
> > +/* Wait for the device actually being connected, otherwise device 
> > model
> > + * running there will fail to find the device. */
> > +be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0",
> > + libxl__xs_get_dompath(gc, 0), stubdomid);
> > +rc = libxl__wait_for_backend(gc, be_path,
> > + GCSPRINTF("%d", 
> > XenbusStateConnected));
> > +if (rc)
> > +goto out;
> 
> I think it would be better to use the async libxl functionality here,
> see libxl__xswait_start. I will leave for the toolstack maintainers to
> decide. Apart from that the change seems correct.

libxl__device_pci_add() is not async-aware right now and it looks like
converting it is quite a bit of work. I'd leave it out of scope for this
patch series...

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 1/6] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use

2019-01-25 Thread Marek Marczykowski-Górecki
HVM domains use IOMMU and device model assistance for communicating with
PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain.
But pciback serve also second function - it reset the device when it is
deassigned from the guest and for this reason pciback needs to be used
with HVM domain too.
When HVM domain has device model in stubdomain, attaching xen-pciback to
the target domain itself may prevent attaching xen-pciback to the
(PV) stubdomain, effectively breaking PCI passthrough.

Fix this by attaching pciback only to one domain: if PV stubdomain is in
use, let it be stubdomain (the commit prevents attaching device to target
HVM in this case); otherwise, attach it to the target domain.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
 - previously called "libxl: attach xen-pciback only to PV domains"
 - instead of excluding all HVMs, change the condition to what actually
   matters here - check if stubdomain is in use; this way xen-pciback is
   always in use (either for the target domain, or it's stubdomain),
   fixing PCI reset by xen-pciback concerns
Changes in v3:
 - adjust commit message
---
 tools/libxl/libxl_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 87afa03..3b6b23c 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1106,7 +1106,7 @@ out:
 }
 }
 
-if (!starting)
+if (!starting && !libxl_get_stubdom_id(CTX, domid))
 rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
 else
 rc = 0;
@@ -1302,7 +1302,7 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao 
*ao, uint32_t domid,
 }
 }
 
-if (d_config->num_pcidevs > 0) {
+if (d_config->num_pcidevs > 0 && !libxl_get_stubdom_id(CTX, domid)) {
 rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
 d_config->num_pcidevs);
 if (rc < 0) {
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 0/6] Fix PCI passthrough for HVM with stubdomain

2019-01-25 Thread Marek Marczykowski-Górecki
In this version, I add PHYSDEVOP_msi_msix_set_enable to allow stubdomain
enabling MSI after mapping it. This patch is rather RFC and probably will
need adjustments (see comments after commit message there), if physdevop will
be the way to go.

Related article:
https://www.qubes-os.org/news/2017/10/18/msi-support/

Changes in v2:
 - new "xen/x86: Allow stubdom access to irq created for msi" patch
 - applied review comments from v1
Changes is v3:
 - apply suggestions by Roger
 - add PHYSDEVOP_msi_msix_set_enable

Marek Marczykowski-Górecki (5):
  libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
  libxl: attach PCI device to qemu only after setting pciback/pcifront
  libxl: don't try to manipulate json config for stubdomain
  xen/x86: add PHYSDEVOP_msi_msix_set_enable
  tools/libxc: add wrapper for PHYSDEVOP_msi_msix_set_enable

Simon Gaiser (1):
  xen/x86: Allow stubdom access to irq created for msi.

 tools/libxc/include/xenctrl.h |  7 -
 tools/libxc/xc_physdev.c  | 21 -
 tools/libxl/libxl_pci.c   | 61 
 xen/arch/x86/irq.c| 23 ++-
 xen/arch/x86/msi.c| 16 +-
 xen/arch/x86/physdev.c| 33 +++-
 xen/include/asm-x86/msi.h |  1 +-
 xen/include/public/physdev.h  | 13 -
 8 files changed, 155 insertions(+), 20 deletions(-)

base-commit: 93a62c544e20ba9e141e411bbaae3d65259d13a3
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 4/6] xen/x86: Allow stubdom access to irq created for msi.

2019-01-25 Thread Marek Marczykowski-Górecki
From: Simon Gaiser 

Stubdomains need to be given sufficient privilege over the guest which it
provides emulation for in order for PCI passthrough to work correctly.
When a HVM domain try to enable MSI, QEMU in stubdomain calls
PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
part of xc_domain_update_msi_irq. Allow for that as part of
PHYSDEVOP_map_pirq.

This is not needed for PCI INTx, because IRQ in that case is known
beforehand and the stubdomain is given permissions over this IRQ by
libxl__device_pci_add (there's a do_pci_add against the stubdomain).

Based on 
https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
 by Eric Chanudet .

Signed-off-by: Simon Gaiser 
Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - extend commit message

With this patch, stubdomain will be able to create and map multiple irq
(DoS possibility?), as only target domain is validated in practice. Is
that ok? If not, what additional limits could be applied here?
In INTx case the problem doesn't apply, because toolstack grant access
to particular IRQ and no allocation happen on stubdomain request. But in
MSI case, it isn't that easy as IRQ number isn't known before (as
explained in the commit message).
---
 xen/arch/x86/irq.c | 23 +++
 xen/arch/x86/physdev.c |  9 +
 2 files changed, 32 insertions(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 8b44d6c..67c67d4 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, int 
index, int *pirq_p,
 {
 case MAP_PIRQ_TYPE_MULTI_MSI:
 irq = create_irq(NUMA_NO_NODE);
+if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) &&
+current->domain->target == d )
+{
+ret = irq_permit_access(current->domain, irq);
+if ( ret ) {
+dprintk(XENLOG_G_ERR,
+"dom%d: can't grant it's stubdom (%d) access to "
+"irq %d for msi: %d!\n",
+d->domain_id,
+current->domain->domain_id,
+irq,
+ret);
+return ret;
+}
+}
 }
 
 if ( irq < nr_irqs_gsi || irq >= nr_irqs )
@@ -2717,7 +2732,15 @@ int allocate_and_map_msi_pirq(struct domain *d, int 
index, int *pirq_p,
 case MAP_PIRQ_TYPE_MSI:
 if ( index == -1 )
 case MAP_PIRQ_TYPE_MULTI_MSI:
+{
+if ( current->domain->target == d &&
+irq_deny_access(current->domain, irq) )
+dprintk(XENLOG_G_ERR,
+"dom%d: can't revoke stubdom's access to irq 
%d!\n",
+d->domain_id,
+irq);
 destroy_irq(irq);
+}
 break;
 }
 }
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 3a3c158..de59e39 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -164,6 +164,15 @@ int physdev_unmap_pirq(domid_t domid, int pirq)
 
 pcidevs_lock();
 spin_lock(&d->event_lock);
+if ( current->domain->target == d)
+{
+int irq = domain_pirq_to_irq(d, pirq);
+if ( irq <= 0 || irq_deny_access(current->domain, irq) )
+dprintk(XENLOG_G_ERR,
+"dom%d: can't revoke stubdom's access to irq %d!\n",
+d->domain_id,
+irq);
+}
 ret = unmap_domain_pirq(d, pirq);
 spin_unlock(&d->event_lock);
 pcidevs_unlock();
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 5/6] xen/x86: add PHYSDEVOP_msi_msix_set_enable

2019-01-25 Thread Marek Marczykowski-Górecki
Allow device model running in stubdomain to enable/disable MSI(-X),
bypassing pciback. While pciback is still used to access config space
from within stubdomain, it refuse to write to
PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE in non-permissive mode. Which
is the right thing to do for PV domain (the main use case for pciback),
as PV domain should use XEN_PCI_OP_* commands for that. Unfortunately
those commands are not good for stubdomain use, as they configure MSI in
dom0's kernel too, which should not happen for HVM domain.

This new physdevop is allowed only for stubdomain controlling the domain
which own the device.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - new patch

This is rather RFC. Any suggestions for shorter name? Also, I'm not sure
if physdev_msi_msix_set_enable.flag is the best name/idea.

Should it be plugged into XSM? Any suggestions how exactly? New
function with XSM_DM_PRIV default action? Should it get target domain
only, or also machine_bdf?
---
 xen/arch/x86/msi.c   | 16 
 xen/arch/x86/physdev.c   | 24 
 xen/include/asm-x86/msi.h|  1 +
 xen/include/public/physdev.h | 13 +
 4 files changed, 54 insertions(+)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index babc414..9ba934c 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1474,6 +1474,22 @@ int pci_restore_msi_state(struct pci_dev *pdev)
 return 0;
 }
 
+int msi_msix_set_enable(struct pci_dev *pdev, int flag, int enable)
+{
+if ( !current->domain->target || pdev->domain != current->domain->target )
+return -EPERM;
+
+switch ( flag ) {
+case PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSI:
+msi_set_enable(pdev, enable);
+break;
+case PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSIX:
+msix_set_enable(pdev, enable);
+break;
+}
+return 0;
+}
+
 void __init early_msi_init(void)
 {
 if ( use_msi < 0 )
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index de59e39..822846a 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -671,6 +671,30 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_msi_msix_set_enable: {
+struct physdev_msi_msix_set_enable op;
+struct pci_dev *pdev;
+
+ret = -EFAULT;
+if ( copy_from_guest(&op, arg, 1) )
+break;
+
+ret = -EINVAL;
+if ( op.flag != PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSI &&
+op.flag != PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSIX )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(op.pci.seg, op.pci.bus, op.pci.devfn);
+if ( pdev )
+ret = msi_msix_set_enable(pdev, op.flag, !!op.enable);
+else
+ret = -ENODEV;
+pcidevs_unlock();
+break;
+
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 10387dc..080bf24 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -252,5 +252,6 @@ void guest_mask_msi_irq(struct irq_desc *, bool mask);
 void ack_nonmaskable_msi_irq(struct irq_desc *);
 void end_nonmaskable_msi_irq(struct irq_desc *, u8 vector);
 void set_msi_affinity(struct irq_desc *, const cpumask_t *);
+int msi_msix_set_enable(struct pci_dev *pdev, int flag, int enable);
 
 #endif /* __ASM_MSI_H */
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index b6faf83..fd797c6 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -344,6 +344,19 @@ struct physdev_dbgp_op {
 typedef struct physdev_dbgp_op physdev_dbgp_op_t;
 DEFINE_XEN_GUEST_HANDLE(physdev_dbgp_op_t);
 
+#define PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSI  0
+#define PHYSDEVOP_MSI_MSIX_SET_ENABLE_MSIX 1
+
+#define PHYSDEVOP_msi_msix_set_enable   32
+struct physdev_msi_msix_set_enable {
+/* IN */
+struct physdev_pci_device pci;
+uint8_t flag;
+uint8_t enable;
+};
+typedef struct physdev_msi_msix_set_enable physdev_msi_msix_set_enable_t;
+DEFINE_XEN_GUEST_HANDLE(physdev_msi_msix_set_enable_t);
+
 /*
  * Notify that some PIRQ-bound event channels have been unmasked.
  * ** This command is obsolete since interface version 0x00030202 and is **
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 3/6] libxl: don't try to manipulate json config for stubdomain

2019-01-25 Thread Marek Marczykowski-Górecki
Stubdomain do not have it's own config file - its configuration is
derived from target domains. Do not try to manipulate it when attaching
PCI device.

This bug prevented starting HVM with stubdomain and PCI passthrough
device attached.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - skip libxl__dm_check_start too, as stubdomain is guaranteed to be
   running at this stage already
 - do not init d_config at all, as it is used only for json manipulation
---
 tools/libxl/libxl_pci.c | 48 ++
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 1bde537..8d159cf 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -120,10 +120,14 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 libxl_domain_config d_config;
 libxl_device_pci pcidev_saved;
 libxl__domain_userdata_lock *lock = NULL;
+bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL);
 
-libxl_domain_config_init(&d_config);
-libxl_device_pci_init(&pcidev_saved);
-libxl_device_pci_copy(CTX, &pcidev_saved, pcidev);
+/* Stubdomain doesn't have own config. */
+if (!is_stubdomain) {
+libxl_domain_config_init(&d_config);
+libxl_device_pci_init(&pcidev_saved);
+libxl_device_pci_copy(CTX, &pcidev_saved, pcidev);
+}
 
 be_path = libxl__domain_device_backend_path(gc, 0, domid, 0,
 LIBXL__DEVICE_KIND_PCI);
@@ -152,27 +156,33 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 GCNEW(device);
 libxl__device_from_pcidev(gc, domid, pcidev, device);
 
-lock = libxl__lock_domain_userdata(gc, domid);
-if (!lock) {
-rc = ERROR_LOCK_FAIL;
-goto out;
-}
+/* Stubdomain config is derived from its target domain, it doesn't have
+   its own file */
+if (!is_stubdomain) {
+lock = libxl__lock_domain_userdata(gc, domid);
+if (!lock) {
+rc = ERROR_LOCK_FAIL;
+goto out;
+}
 
-rc = libxl__get_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+rc = libxl__get_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
 
-device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
- &pcidev_saved);
+device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
+ &pcidev_saved);
 
-rc = libxl__dm_check_start(gc, &d_config, domid);
-if (rc) goto out;
+rc = libxl__dm_check_start(gc, &d_config, domid);
+if (rc) goto out;
+}
 
 for (;;) {
 rc = libxl__xs_transaction_start(gc, &t);
 if (rc) goto out;
 
-rc = libxl__set_domain_configuration(gc, domid, &d_config);
-if (rc) goto out;
+if (lock) {
+rc = libxl__set_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
+}
 
 libxl__xs_writev(gc, t, be_path, libxl__xs_kvs_of_flexarray(gc, back));
 
@@ -184,8 +194,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
uint32_t domid, libxl_d
 out:
 libxl__xs_transaction_abort(gc, &t);
 if (lock) libxl__unlock_domain_userdata(lock);
-libxl_device_pci_dispose(&pcidev_saved);
-libxl_domain_config_dispose(&d_config);
+if (!is_stubdomain) {
+libxl_device_pci_dispose(&pcidev_saved);
+libxl_domain_config_dispose(&d_config);
+}
 return rc;
 }
 
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 2/6] libxl: attach PCI device to qemu only after setting pciback/pcifront

2019-01-25 Thread Marek Marczykowski-Górecki
When qemu is running in stubdomain, handling "pci-ins" command will fail
if pcifront is not initialized already. Fix this by sending such command
only after confirming that pciback/front is running.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
- Fixed code style since previous version.
---
 tools/libxl/libxl_pci.c |  9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 3b6b23c..1bde537 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1191,6 +1191,7 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcide
 {
 libxl_ctx *ctx = libxl__gc_owner(gc);
 unsigned int orig_vdev, pfunc_mask;
+char *be_path;
 libxl_device_pci *assigned;
 int num_assigned, i, rc;
 int stubdomid = 0;
@@ -1245,6 +1246,14 @@ int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcide
 rc = do_pci_add(gc, stubdomid, &pcidev_s, 0);
 if ( rc )
 goto out;
+/* Wait for the device actually being connected, otherwise device model
+ * running there will fail to find the device. */
+be_path = libxl__sprintf(gc, "%s/backend/pci/%d/0",
+ libxl__xs_get_dompath(gc, 0), stubdomid);
+rc = libxl__wait_for_backend(gc, be_path,
+ GCSPRINTF("%d", XenbusStateConnected));
+if (rc)
+goto out;
 }
 
 orig_vdev = pcidev->vdevfn & ~7U;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 6/6] tools/libxc: add wrapper for PHYSDEVOP_msi_msix_set_enable

2019-01-25 Thread Marek Marczykowski-Górecki
Add libxc wrapper for PHYSDEVOP_msi_msix_set_enable introduced in
previous commit.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - new patch
---
 tools/libxc/include/xenctrl.h |  7 +++
 tools/libxc/xc_physdev.c  | 21 +
 2 files changed, 28 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 31cdda7..2b86f4c 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1639,6 +1639,13 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_msi_msix_set_enable(xc_interface *xch,
+   int seg,
+   int bus,
+   int devfn,
+   int flag,
+   int enable);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libxc/xc_physdev.c b/tools/libxc/xc_physdev.c
index 460a8e7..3530cb8 100644
--- a/tools/libxc/xc_physdev.c
+++ b/tools/libxc/xc_physdev.c
@@ -111,3 +111,24 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
 return rc;
 }
 
+int xc_physdev_msi_msix_set_enable(xc_interface *xch,
+   int seg,
+   int bus,
+   int devfn,
+   int flag,
+   int enable)
+{
+int rc;
+struct physdev_msi_msix_set_enable op;
+
+memset(&op, 0, sizeof(struct physdev_msi_msix_set_enable));
+op.pci.seg = seg;
+op.pci.bus = bus;
+op.pci.devfn = devfn;
+op.flag = flag;
+op.enable = enable;
+
+rc = do_physdev_op(xch, PHYSDEVOP_msi_msix_set_enable, &op, sizeof(op));
+
+return rc;
+}
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 4/6] xen/x86: Allow stubdom access to irq created for msi.

2019-01-28 Thread Marek Marczykowski-Górecki
On Mon, Jan 28, 2019 at 02:50:00PM +, Wei Liu wrote:
> On Sat, Jan 26, 2019 at 03:31:15AM +0100, Marek Marczykowski-Górecki wrote:
> > From: Simon Gaiser 
> > 
> > Stubdomains need to be given sufficient privilege over the guest which it
> > provides emulation for in order for PCI passthrough to work correctly.
> > When a HVM domain try to enable MSI, QEMU in stubdomain calls
> > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
> > part of xc_domain_update_msi_irq. Allow for that as part of
> > PHYSDEVOP_map_pirq.
> > 
> > This is not needed for PCI INTx, because IRQ in that case is known
> > beforehand and the stubdomain is given permissions over this IRQ by
> > libxl__device_pci_add (there's a do_pci_add against the stubdomain).
> > 
> > Based on 
> > https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch
> >  by Eric Chanudet .
> > 
> > Signed-off-by: Simon Gaiser 
> > Signed-off-by: Marek Marczykowski-Górecki 
> > ---
> > Changes in v3:
> >  - extend commit message
> > 
> > With this patch, stubdomain will be able to create and map multiple irq
> > (DoS possibility?), as only target domain is validated in practice. Is
> > that ok? If not, what additional limits could be applied here?
> > In INTx case the problem doesn't apply, because toolstack grant access
> > to particular IRQ and no allocation happen on stubdomain request. But in
> > MSI case, it isn't that easy as IRQ number isn't known before (as
> > explained in the commit message).
> > ---
> >  xen/arch/x86/irq.c | 23 +++
> >  xen/arch/x86/physdev.c |  9 +
> >  2 files changed, 32 insertions(+)
> > 
> > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> > index 8b44d6c..67c67d4 100644
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -2674,6 +2674,21 @@ int allocate_and_map_msi_pirq(struct domain *d, int 
> > index, int *pirq_p,
> >  {
> >  case MAP_PIRQ_TYPE_MULTI_MSI:
> >  irq = create_irq(NUMA_NO_NODE);
> > +if ( !(irq < nr_irqs_gsi || irq >= nr_irqs) &&
> > +current->domain->target == d )
> > +{
> > +ret = irq_permit_access(current->domain, irq);
> > +if ( ret ) {
> > +dprintk(XENLOG_G_ERR,
> > +"dom%d: can't grant it's stubdom (%d) access 
> > to "
> > +        "irq %d for msi: %d!\n",
> > +d->domain_id,
> > +current->domain->domain_id,
> > +irq,
> > +ret);
> > +return ret;
> 
> Don't you need to deallocate the irq before returning?

Yes, indeed.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 3/6] libxl: don't try to manipulate json config for stubdomain

2019-01-28 Thread Marek Marczykowski-Górecki
On Mon, Jan 28, 2019 at 02:41:15PM +, Wei Liu wrote:
> On Sat, Jan 26, 2019 at 03:31:14AM +0100, Marek Marczykowski-Górecki wrote:
> > Stubdomain do not have it's own config file - its configuration is
> > derived from target domains. Do not try to manipulate it when attaching
> > PCI device.
> > 
> 
> So if we add the same configuration to stubdom as well, what would
> happen? I guess libxl will try to attach the PCI devices to both the
> stubdom and DomU?

I'm not sure if I understand you here. You mean adding configuration file for
stubdomain, the one managed by
libxl__get_domain_configuration/libxl__set_domain_configuration? In
theory it would work just fine, but in practice I fear all kind of
desynchronization bugs, like adding device to target domain's config,
but not stubdomain's one in some failure handling case. We'd have 4
things to care about:
 - attaching device to target domain
 - attaching device to stubdomain
 - saving device to target domain's config
 - saving device to stubdomain's config

Handling all the failure cases properly will become quite complex,
especially if we add async callbacks into the mix.
Since stubdomain config is deterministically build based on target
domian's config, I don't think adding such complexity is a good idea.

> > This bug prevented starting HVM with stubdomain and PCI passthrough
> > device attached.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki 
> > ---
> > Changes in v3:
> >  - skip libxl__dm_check_start too, as stubdomain is guaranteed to be
> >running at this stage already
> >  - do not init d_config at all, as it is used only for json manipulation
> > ---
> >  tools/libxl/libxl_pci.c | 48 ++
> >  1 file changed, 30 insertions(+), 18 deletions(-)
> > 
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index 1bde537..8d159cf 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -120,10 +120,14 @@ static int libxl__device_pci_add_xenstore(libxl__gc 
> > *gc, uint32_t domid, libxl_d
> >  libxl_domain_config d_config;
> >  libxl_device_pci pcidev_saved;
> >  libxl__domain_userdata_lock *lock = NULL;
> > +bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL);
> >  
> > -libxl_domain_config_init(&d_config);
> > -libxl_device_pci_init(&pcidev_saved);
> > -libxl_device_pci_copy(CTX, &pcidev_saved, pcidev);
> > +/* Stubdomain doesn't have own config. */
> > +if (!is_stubdomain) {
> > +libxl_domain_config_init(&d_config);
> > +libxl_device_pci_init(&pcidev_saved);
> > +libxl_device_pci_copy(CTX, &pcidev_saved, pcidev);
> > +}
> >  
> >  be_path = libxl__domain_device_backend_path(gc, 0, domid, 0,
> >  LIBXL__DEVICE_KIND_PCI);
> > @@ -152,27 +156,33 @@ static int libxl__device_pci_add_xenstore(libxl__gc 
> > *gc, uint32_t domid, libxl_d
> >  GCNEW(device);
> >  libxl__device_from_pcidev(gc, domid, pcidev, device);
> >  
> > -lock = libxl__lock_domain_userdata(gc, domid);
> > -if (!lock) {
> > -rc = ERROR_LOCK_FAIL;
> > -goto out;
> > -}
> > +/* Stubdomain config is derived from its target domain, it doesn't have
> > +   its own file */
> 
> Although comment style isn't specified in CODING_STYLE, I would like to
> fix this to 
> 
> /*
>  * Stubdom ...
>  * ... own file.
>  */

Ok.

> > +if (!is_stubdomain) {
> > +lock = libxl__lock_domain_userdata(gc, domid);
> > +if (!lock) {
> > +rc = ERROR_LOCK_FAIL;
> > +goto out;
> > +}
> >  
> > -rc = libxl__get_domain_configuration(gc, domid, &d_config);
> > -if (rc) goto out;
> > +rc = libxl__get_domain_configuration(gc, domid, &d_config);
> > +if (rc) goto out;
> >  
> > -device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
> > - &pcidev_saved);
> > +device_add_domain_config(gc, &d_config, &libxl__pcidev_devtype,
> > + &pcidev_saved);
> >  
> > -rc = libxl__dm_check_start(gc, &d_config, domid);
> > -if (rc) goto out;
> > +rc = libxl__dm_check_start(gc, &d_config, domid);
> > +if (rc) goto out;
> > +}
> >  
> >  for (;;) {
> >  

[Xen-devel] [PATCH v3 13/17] libxl: use vchan for QMP access with Linux stubdomain, non-async version

2019-01-28 Thread Marek Marczykowski-Górecki
Access to QMP of QEMU in Linux stubdomain is possible over vchan
connection. Add appropriate handling to synchronous API.
Since only one client can be connected to vchan server at the same time
and it is not enforced by the libxenvchan itself, additional client-side
locking is needed. Note that qemu supports only one simultaneous client
on a control socket anyway (but in UNIX socket case, it enforce it
server-side, so it doesn't add any extra limitation.

Ideally, I woudn't need (or want) this part, and would communicate with
stubdomain only with async API. But some libxl public functions that are
not async-compatible do need to call qmp commands (for example
libxl_domain_unpause). Alternative to this patch, would be return error,
breaking all such functions, and incrementally convert them to async
API.

Signed-off-by: Marek Marczykowski-Górecki 
---

Two TODOs here:
 - handle locking, similarly to previous patch
 - select() qmp_next() can now be triggered by malicious stubdomain
   without actually sending any data, which would evade timeout
   enforcing (as select() each time is called with full timeout); the
   easiest thing to do, would be to re-use timeout value from previous
   select() call, but that works only on Linux; any better idea?
   Note that even without using vchan, malicious qemu could cause
   qmp_next() to wait almost indefinitely - by writing one byte at a
   time and never finishing the message

Changes in v3:
 - new patch, in place of "libxl: access QMP socket via console for
   qemu-in-stubdomain"
---
 tools/libxl/libxl_qmp.c | 61 --
 1 file changed, 47 insertions(+), 14 deletions(-)

diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 45b9f74..19ce3ce 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -113,6 +113,7 @@ typedef struct callback_id_pair {
 
 struct libxl__qmp_handler {
 int qmp_fd;
+struct libxenvchan *vchan;
 bool connected;
 time_t timeout;
 /* wait_for_id will be used by the synchronous send function */
@@ -496,7 +497,10 @@ static void qmp_close(libxl__qmp_handler *qmp)
 callback_id_pair *pp = NULL;
 callback_id_pair *tmp = NULL;
 
-close(qmp->qmp_fd);
+if (qmp->vchan)
+libxenvchan_close(qmp->vchan);
+else
+close(qmp->qmp_fd);
 LIBXL_STAILQ_FOREACH(pp, &qmp->callback_list, next) {
 free(tmp);
 tmp = pp;
@@ -516,7 +520,7 @@ static int qmp_next(libxl__gc *gc, libxl__qmp_handler *qmp)
 
 do {
 fd_set rfds;
-int ret = 0;
+int ret = 1; /* used when select() is skipped */
 struct timeval timeout = {
 .tv_sec = qmp->timeout,
 .tv_usec = 0,
@@ -525,7 +529,8 @@ static int qmp_next(libxl__gc *gc, libxl__qmp_handler *qmp)
 FD_ZERO(&rfds);
 FD_SET(qmp->qmp_fd, &rfds);
 
-ret = select(qmp->qmp_fd + 1, &rfds, NULL, NULL, &timeout);
+if (!(qmp->vchan && libxenvchan_data_ready(qmp->vchan)))
+ret = select(qmp->qmp_fd + 1, &rfds, NULL, NULL, &timeout);
 if (ret == 0) {
 LOGD(ERROR, qmp->domid, "timeout");
 return -1;
@@ -536,7 +541,14 @@ static int qmp_next(libxl__gc *gc, libxl__qmp_handler *qmp)
 return -1;
 }
 
-rd = read(qmp->qmp_fd, qmp->buffer, QMP_RECEIVE_BUFFER_SIZE);
+if (qmp->vchan) {
+libxenvchan_wait(qmp->vchan);
+if (!libxenvchan_data_ready(qmp->vchan))
+continue;
+rd = libxenvchan_read(qmp->vchan, qmp->buffer, 
QMP_RECEIVE_BUFFER_SIZE);
+} else {
+rd = read(qmp->qmp_fd, qmp->buffer, QMP_RECEIVE_BUFFER_SIZE);
+}
 if (rd == 0) {
 LOGD(ERROR, qmp->domid, "Unexpected end of socket");
 return -1;
@@ -684,9 +696,15 @@ static int qmp_send(libxl__qmp_handler *qmp,
 goto out;
 }
 
-if (libxl_write_exactly(qmp->ctx, qmp->qmp_fd, buf, strlen(buf),
-"QMP command", "QMP socket"))
-goto out;
+if (qmp->vchan) {
+/* vchan->blocking == 1, so no need to wrap it in a loop */
+if (libxenvchan_write(qmp->vchan, buf, strlen(buf)) == -1)
+goto out;
+} else {
+if (libxl_write_exactly(qmp->ctx, qmp->qmp_fd, buf, strlen(buf),
+"QMP command", "QMP socket"))
+goto out;
+}
 
 rc = qmp->last_id_used;
 out:
@@ -798,19 +816,34 @@ libxl__qmp_handler *libxl__qmp_initialize(libxl__gc *gc, 
uint32_t domid)
 {
 int ret = 0;
 libxl__qmp_handler *qmp = NULL;
-char *qmp_socket;
+int dm_domid;
+char *qmp_path;
 
 qmp = qmp_init_handler(gc, domid);
 if (!qmp) return NULL;
 
-qmp_socket = GCSPRINTF(&

[Xen-devel] [PATCH v3 05/17] libxl: Handle Linux stubdomain specific QEMU options.

2019-01-28 Thread Marek Marczykowski-Górecki
From: Eric Shelton 

This patch creates an appropriate command line for the QEMU instance
running in a Linux-based stubdomain.

NOTE: a number of items are not currently implemented for Linux-based
stubdomains, such as:
- save/restore
- QMP socket
- graphics output (e.g., VNC)

Signed-off-by: Eric Shelton 

Simon:
 * fix disk path
 * fix cdrom path and "format"
 * pass downscript for network interfaces

Signed-off-by: Simon Gaiser 
[drop Qubes-specific parts]
Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
 - fix serial specified with serial=[ ... ] syntax
 - error out on multiple consoles (incompatible with stubdom)
 - drop erroneous chunk about cdrom
Changes in v3:
 - change to use libxl__stubdomain_is_linux instead of
   b_info->stubdomain_version
 - drop libxl__stubdomain_version_running, prefer
   libxl__stubdomain_is_linux_running introduced by previous patch
 - drop ifup/ifdown script - stubdomain will handle that with qemu
   events itself
 - slightly simplify -serial argument
 - add support for multiple serial consoles, do not ignore
   b_info.u.serial(_list)
 - add error checking for more than 26 emulated disks ("/dev/xvd%c"
   format string)
---
 tools/libxl/libxl_create.c   |  40 +++-
 tools/libxl/libxl_dm.c   | 176 +---
 tools/libxl/libxl_internal.h |   1 +-
 tools/libxl/libxl_mem.c  |   6 +-
 tools/libxl/libxl_types.idl  |   3 +-
 5 files changed, 171 insertions(+), 55 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index bb62542..13fc304 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -160,6 +160,31 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 }
 }
 
+if (b_info->type == LIBXL_DOMAIN_TYPE_HVM &&
+libxl_defbool_val(b_info->device_model_stubdomain)) {
+if (!b_info->stubdomain_kernel) {
+switch (b_info->device_model_version) {
+case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+b_info->stubdomain_kernel =
+libxl__abs_path(NOGC, "ioemu-stubdom.gz", 
libxl__xenfirmwaredir_path());
+b_info->stubdomain_ramdisk = NULL;
+break;
+case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+b_info->stubdomain_kernel =
+libxl__abs_path(NOGC,
+"stubdom-linux-kernel",
+libxl__xenfirmwaredir_path());
+b_info->stubdomain_ramdisk =
+libxl__abs_path(NOGC,
+"stubdom-linux-rootfs",
+libxl__xenfirmwaredir_path());
+break;
+default:
+abort();
+}
+}
+}
+
 if (!b_info->max_vcpus)
 b_info->max_vcpus = 1;
 if (!b_info->avail_vcpus.size) {
@@ -195,6 +220,17 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 if (b_info->target_memkb == LIBXL_MEMKB_DEFAULT)
 b_info->target_memkb = b_info->max_memkb;
 
+if (b_info->stubdomain_memkb == LIBXL_MEMKB_DEFAULT) {
+if (libxl_defbool_val(b_info->device_model_stubdomain)) {
+if (libxl__stubdomain_is_linux(b_info))
+b_info->stubdomain_memkb = LIBXL_LINUX_STUBDOM_MEM * 1024;
+else
+b_info->stubdomain_memkb = 28 * 1024; // MiniOS
+} else {
+b_info->stubdomain_memkb = 0; // no stubdomain
+}
+}
+
 libxl_defbool_setdefault(&b_info->claim_mode, false);
 
 libxl_defbool_setdefault(&b_info->localtime, false);
@@ -1538,7 +1574,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
 if (dcs->sdss.dm.guest_domid) {
 if (d_config->b_info.device_model_version
 == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
-libxl__qmp_initializations(gc, domid, d_config);
+if (!libxl_defbool_val(d_config->b_info.device_model_stubdomain)) {
+libxl__qmp_initializations(gc, domid, d_config);
+}
 }
 }
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 3601b14..1fa4fa8 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1169,6 +1169,7 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
 int i, connection, devid;
 uint64_t ram_size;
 const char *path, *chardev;
+bool is_stubdom = libxl_defbool_val(b_info->device_model_stubdomain);
 
 dm_args = flexarray_make(gc, 16, 1);
 dm_envs = flexarray_make(gc, 16, 1);
@@ -1179,30 +1180,33 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
   "-xen-domid",
   GCSPRINTF("%d", guest_domid), NULL);
 

[Xen-devel] [PATCH v3 09/17] tools/libvchan: notify server when client is connected

2019-01-28 Thread Marek Marczykowski-Górecki
Let the server know when the client is connected. Otherwise server will
notice only when client send some data.
This change does not break existing clients, as libvchan user should
handle spurious notifications anyway (for example acknowledge of remote
side reading the data).

Signed-off-by: Marek Marczykowski-Górecki 
---
I had this patch in Qubes for a long time and totally forgot it wasn't
upstream thing...
---
 tools/libvchan/init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libvchan/init.c b/tools/libvchan/init.c
index 180833d..50a64c1 100644
--- a/tools/libvchan/init.c
+++ b/tools/libvchan/init.c
@@ -447,6 +447,9 @@ struct libxenvchan *libxenvchan_client_init(struct 
xentoollog_logger *logger,
ctrl->ring->cli_live = 1;
ctrl->ring->srv_notify = VCHAN_NOTIFY_WRITE;
 
+/* wake up the server */
+xenevtchn_notify(ctrl->event, ctrl->event_port);
+
  out:
if (xs)
xs_daemon_close(xs);
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 08/17] xl: add stubdomain related options to xl config parser

2019-01-28 Thread Marek Marczykowski-Górecki
Signed-off-by: Marek Marczykowski-Górecki 
Reviewed-by: Jason Andryuk 
---
 docs/man/xl.cfg.5.pod.in | 23 +++
 tools/xl/xl_parse.c  |  7 +++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 3b92f39..f475196 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2694,10 +2694,25 @@ model which they were installed with.
 
 =item B
 
-Override the path to the binary to be used as the device-model. The
-binary provided here MUST be consistent with the
-B which you have specified. You should not
-normally need to specify this option.
+Override the path to the binary to be used as the device-model running in
+toolstack domain. The binary provided here MUST be consistent with the
+B which you have specified. You should not normally need
+to specify this option.
+
+=item B
+
+Override the path to the kernel image used as device-model stubdomain.
+The binary provided here MUST be consistent with the
+B which you have specified.
+In case of B it is expected to be MiniOS-based stubdomain
+image, in case of B it is expected to be Linux-based stubdomain
+kernel.
+
+=item B
+
+Override the path to the ramdisk image used as device-model stubdomain.
+The binary provided here is to be used by a kernel pointed by 
B.
+It is known to be used only by Linux-based stubdomain kernel.
 
 =item B
 
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 352cd21..77e4cf0 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2502,6 +2502,13 @@ skip_usbdev:
 xlu_cfg_replace_string(config, "device_model_user",
&b_info->device_model_user, 0);
 
+xlu_cfg_replace_string (config, "stubdomain_kernel",
+&b_info->stubdomain_kernel, 0);
+xlu_cfg_replace_string (config, "stubdomain_ramdisk",
+&b_info->stubdomain_ramdisk, 0);
+if (!xlu_cfg_get_long (config, "stubdomain_memory", &l, 0))
+b_info->stubdomain_memkb = l * 1024;
+
 #define parse_extra_args(type)\
 e = xlu_cfg_get_list_as_string_list(config, "device_model_args"#type, \
 &b_info->extra##type, 0);\
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 11/17] libxl: move xswait declaration up in libxl_internal.h

2019-01-28 Thread Marek Marczykowski-Górecki
It will be needed for qmp_ev_* over vchan.

No functional change.

Signed-off-by: Marek Marczykowski-Górecki 

---
Changes in v3:
 - new patch
---
 tools/libxl/libxl_internal.h | 108 ++--
 1 file changed, 54 insertions(+), 54 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a26d36e..0095835 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -358,6 +358,60 @@ struct libxl__ev_child {
 LIBXL_LIST_ENTRY(struct libxl__ev_child) entry;
 };
 
+/*- xswait: wait for a xenstore node to be suitable -*/
+
+typedef struct libxl__xswait_state libxl__xswait_state;
+
+/*
+ * rc describes the circumstances of this callback:
+ *
+ * rc==0
+ *
+ * The xenstore path (may have) changed.  It has been read for
+ * you.  The result is in data (allocated from the ao gc).
+ * data may be NULL, which means that the xenstore read gave
+ * ENOENT.
+ *
+ * If you are satisfied, you MUST call libxl__xswait_stop.
+ * Otherwise, xswait will continue waiting and watching and
+ * will call you back later.
+ *
+ * rc==ERROR_TIMEDOUT, rc==ERROR_ABORTED
+ *
+ * The specified timeout was reached.
+ * This has NOT been logged (except to the debug log).
+ * xswait will not continue (but calling libxl__xswait_stop is OK).
+ *
+ * rc!=0, !=ERROR_TIMEDOUT, !=ERROR_ABORTED
+ *
+ * Some other error occurred.
+ * This HAS been logged.
+ * xswait will not continue (but calling libxl__xswait_stop is OK).
+ *
+ * xswait.path may start with with '@', in which case no read is done
+ * and the callback will always get data==0.
+ */
+typedef void libxl__xswait_callback(libxl__egc *egc,
+  libxl__xswait_state *xswa, int rc, const char *data);
+
+struct libxl__xswait_state {
+/* caller must fill these in, and they must all remain valid */
+libxl__ao *ao;
+const char *what; /* for error msgs: noun phrase, what we're waiting for */
+const char *path;
+int timeout_ms; /* as for poll(2) */
+libxl__xswait_callback *callback;
+/* remaining fields are private to xswait */
+libxl__ev_time time_ev;
+libxl__ev_xswatch watch_ev;
+};
+
+void libxl__xswait_init(libxl__xswait_state*);
+void libxl__xswait_stop(libxl__gc*, libxl__xswait_state*); /*idempotent*/
+bool libxl__xswait_inuse(const libxl__xswait_state *ss);
+
+int libxl__xswait_start(libxl__gc*, libxl__xswait_state*);
+
 /*
  * QMP asynchronous calls
  *
@@ -1401,60 +1455,6 @@ _hidden const char *libxl__device_nic_devname(libxl__gc 
*gc,
 
 _hidden int libxl__get_domid(libxl__gc *gc, uint32_t *domid);
 
-/*- xswait: wait for a xenstore node to be suitable -*/
-
-typedef struct libxl__xswait_state libxl__xswait_state;
-
-/*
- * rc describes the circumstances of this callback:
- *
- * rc==0
- *
- * The xenstore path (may have) changed.  It has been read for
- * you.  The result is in data (allocated from the ao gc).
- * data may be NULL, which means that the xenstore read gave
- * ENOENT.
- *
- * If you are satisfied, you MUST call libxl__xswait_stop.
- * Otherwise, xswait will continue waiting and watching and
- * will call you back later.
- *
- * rc==ERROR_TIMEDOUT, rc==ERROR_ABORTED
- *
- * The specified timeout was reached.
- * This has NOT been logged (except to the debug log).
- * xswait will not continue (but calling libxl__xswait_stop is OK).
- *
- * rc!=0, !=ERROR_TIMEDOUT, !=ERROR_ABORTED
- *
- * Some other error occurred.
- * This HAS been logged.
- * xswait will not continue (but calling libxl__xswait_stop is OK).
- *
- * xswait.path may start with with '@', in which case no read is done
- * and the callback will always get data==0.
- */
-typedef void libxl__xswait_callback(libxl__egc *egc,
-  libxl__xswait_state *xswa, int rc, const char *data);
-
-struct libxl__xswait_state {
-/* caller must fill these in, and they must all remain valid */
-libxl__ao *ao;
-const char *what; /* for error msgs: noun phrase, what we're waiting for */
-const char *path;
-int timeout_ms; /* as for poll(2) */
-libxl__xswait_callback *callback;
-/* remaining fields are private to xswait */
-libxl__ev_time time_ev;
-libxl__ev_xswatch watch_ev;
-};
-
-void libxl__xswait_init(libxl__xswait_state*);
-void libxl__xswait_stop(libxl__gc*, libxl__xswait_state*); /*idempotent*/
-bool libxl__xswait_inuse(const libxl__xswait_state *ss);
-
-int libxl__xswait_start(libxl__gc*, libxl__xswait_state*);
-
 /*
  * libxl__ev_devstate - waits a given time for a device to
  * reach a given state.  Follows the libxl_ev_* conventions.
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 15/17] tools: add missing libxenvchan cflags

2019-01-28 Thread Marek Marczykowski-Górecki
libxenvchan.h include xenevtchn.h and xengnttab.h, so applications built
with it needs applicable -I in CFLAGS too.

Signed-off-by: Marek Marczykowski-Górecki 
---
 tools/Rules.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/Rules.mk b/tools/Rules.mk
index 12b3129..a7c6c21 100644
--- a/tools/Rules.mk
+++ b/tools/Rules.mk
@@ -160,7 +160,7 @@ SHDEPS_libxenstat  = $(SHLIB_libxenctrl) 
$(SHLIB_libxenstore)
 LDLIBS_libxenstat  = $(SHDEPS_libxenstat) 
$(XEN_LIBXENSTAT)/libxenstat$(libextension)
 SHLIB_libxenstat   = $(SHDEPS_libxenstat) -Wl,-rpath-link=$(XEN_LIBXENSTAT)
 
-CFLAGS_libxenvchan = -I$(XEN_LIBVCHAN)
+CFLAGS_libxenvchan = -I$(XEN_LIBVCHAN) $(CFLAGS_libxengnttab) 
$(CFLAGS_libxenevtchn)
 SHDEPS_libxenvchan = $(SHLIB_libxentoollog) $(SHLIB_libxenstore) 
$(SHLIB_libxenevtchn) $(SHLIB_libxengnttab)
 LDLIBS_libxenvchan = $(SHDEPS_libxenvchan) 
$(XEN_LIBVCHAN)/libxenvchan$(libextension)
 SHLIB_libxenvchan  = $(SHDEPS_libxenvchan) -Wl,-rpath-link=$(XEN_LIBVCHAN)
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 14/17] libxl: add save/restore support for qemu-xen in stubdomain

2019-01-28 Thread Marek Marczykowski-Górecki
Rely on a wrapper script in stubdomain to attach FD 3/4 of qemu to
relevant consoles.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - adjust for qmp_ev*
 - assume specific fdset id in qemu set in stubdomain
---
 tools/libxl/libxl_dm.c  | 23 +++
 tools/libxl/libxl_qmp.c | 25 -
 2 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index b9a53f3..ce65321 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1715,10 +1715,17 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
 }
 
 if (state->saved_state) {
-/* This file descriptor is meant to be used by QEMU */
-*dm_state_fd = open(state->saved_state, O_RDONLY);
-flexarray_append(dm_args, "-incoming");
-flexarray_append(dm_args, GCSPRINTF("fd:%d",*dm_state_fd));
+if (is_stubdom) {
+/* Linux stubdomain connects specific FD to STUBDOM_CONSOLE_RESTORE
+ */
+flexarray_append(dm_args, "-incoming");
+flexarray_append(dm_args, "fd:3");
+} else {
+/* This file descriptor is meant to be used by QEMU */
+*dm_state_fd = open(state->saved_state, O_RDONLY);
+flexarray_append(dm_args, "-incoming");
+flexarray_append(dm_args, GCSPRINTF("fd:%d",*dm_state_fd));
+}
 }
 for (i = 0; b_info->extra && b_info->extra[i] != NULL; i++)
 flexarray_append(dm_args, b_info->extra[i]);
@@ -2192,14 +2199,6 @@ void libxl__spawn_stub_dm(libxl__egc *egc, 
libxl__stub_dm_spawn_state *sdss)
 
 assert(libxl_defbool_val(guest_config->b_info.device_model_stubdomain));
 
-if (libxl__stubdomain_is_linux(&guest_config->b_info)) {
-if (d_state->saved_state) {
-LOG(ERROR, "Save/Restore not supported yet with Linux Stubdom.");
-ret = -1;
-goto out;
-}
-}
-
 sdss->pvqemu.guest_domid = 0;
 
 libxl_domain_create_info_init(&dm_config->c_info);
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 19ce3ce..7643f15 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -1328,6 +1328,7 @@ static void dm_stopped(libxl__egc *egc, libxl__ev_qmp *ev,
const libxl__json_object *response, int rc);
 static void dm_state_fd_ready(libxl__egc *egc, libxl__ev_qmp *ev,
   const libxl__json_object *response, int rc);
+static void dm_state_save_to_fdset(libxl__egc *egc, libxl__ev_qmp *ev, int 
fdset);
 static void dm_state_saved(libxl__egc *egc, libxl__ev_qmp *ev,
const libxl__json_object *response, int rc);
 
@@ -1360,10 +1361,17 @@ static void dm_stopped(libxl__egc *egc, libxl__ev_qmp 
*ev,
 EGC_GC;
 libxl__domain_suspend_state *dsps = CONTAINER_OF(ev, *dsps, qmp);
 const char *const filename = dsps->dm_savefile;
+uint32_t dm_domid = libxl_get_stubdom_id(CTX, dsps->domid);
 
 if (rc)
 goto error;
 
+if (dm_domid) {
+/* see Linux stubdom interface in docs/stubdom.txt */
+dm_state_save_to_fdset(egc, ev, 1);
+return;
+}
+
 ev->payload_fd = open(filename, O_WRONLY | O_CREAT, 0600);
 if (ev->payload_fd < 0) {
 LOGED(ERROR, ev->domid,
@@ -1394,7 +1402,6 @@ static void dm_state_fd_ready(libxl__egc *egc, 
libxl__ev_qmp *ev,
 EGC_GC;
 int fdset;
 const libxl__json_object *o;
-libxl__json_object *args = NULL;
 libxl__domain_suspend_state *dsps = CONTAINER_OF(ev, *dsps, qmp);
 
 close(ev->payload_fd);
@@ -1409,6 +1416,21 @@ static void dm_state_fd_ready(libxl__egc *egc, 
libxl__ev_qmp *ev,
 goto error;
 }
 fdset = libxl__json_object_get_integer(o);
+dm_state_save_to_fdset(egc, ev, fdset);
+return;
+
+error:
+assert(rc);
+libxl__remove_file(gc, dsps->dm_savefile);
+dsps->callback_device_model_done(egc, dsps, rc);
+}
+
+static void dm_state_save_to_fdset(libxl__egc *egc, libxl__ev_qmp *ev, int 
fdset)
+{
+EGC_GC;
+int rc;
+libxl__json_object *args = NULL;
+libxl__domain_suspend_state *dsps = CONTAINER_OF(ev, *dsps, qmp);
 
 ev->callback = dm_state_saved;
 
@@ -1426,6 +1448,7 @@ static void dm_state_fd_ready(libxl__egc *egc, 
libxl__ev_qmp *ev,
 
 error:
 assert(rc);
+/* TODO: only in non-stubdom case */
 libxl__remove_file(gc, dsps->dm_savefile);
 dsps->callback_device_model_done(egc, dsps, rc);
 }
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 03/17] libxl: fix qemu-trad cmdline for no sdl/vnc case

2019-01-28 Thread Marek Marczykowski-Górecki
When qemu is running in stubdomain, any attempt to initialize vnc/sdl
there will crash it (on failed attempt to load a keymap from a file). If
vfb is present, all those cases are skipped. But since
b053f0c4c9e533f3d97837cf897eb920b8355ed3 "libxl: do not start dom0 qemu
for stubdomain when not needed" it is possible to create a stubdomain
without vfb and contrary to the comment -vnc none do trigger VNC
initialization code (just skips exposing it externally).
Change the implicit SDL avoiding method to -nographics option, used when
none of SDL or VNC is enabled.

Signed-off-by: Marek Marczykowski-Górecki 
Reviewed-by: Jason Andryuk 
Acked-by: Ian Jackson 
Acked-by: Wei Liu 
---
Changes in v2:
 - typo in qemu option
Changes in v3:
 - add missing { }
---
 tools/libxl/libxl_dm.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index b245956..3601b14 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -715,14 +715,15 @@ static int libxl__build_device_model_args_old(libxl__gc 
*gc,
 if (libxl_defbool_val(vnc->findunused)) {
 flexarray_append(dm_args, "-vncunused");
 }
-} else
+} else if (!sdl) {
 /*
  * VNC is not enabled by default by qemu-xen-traditional,
- * however passing -vnc none causes SDL to not be
- * (unexpectedly) enabled by default. This is overridden by
- * explicitly passing -sdl below as required.
+ * however skipping -vnc causes SDL to be
+ * (unexpectedly) enabled by default. If undesired, disable graphics at
+ * all.
  */
-flexarray_append_pair(dm_args, "-vnc", "none");
+flexarray_append(dm_args, "-nographic");
+}
 
 if (sdl) {
 flexarray_append(dm_args, "-sdl");
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 02/17] Document ioemu Linux stubdomain protocol

2019-01-28 Thread Marek Marczykowski-Górecki
Add documentation for upcoming Linux stubdomain for qemu-upstream.

Signed-off-by: Marek Marczykowski-Górecki 
---
 docs/misc/stubdom.txt | 50 -
 1 file changed, 50 insertions(+)

diff --git a/docs/misc/stubdom.txt b/docs/misc/stubdom.txt
index 4c524f2..9c94c6b 100644
--- a/docs/misc/stubdom.txt
+++ b/docs/misc/stubdom.txt
@@ -75,6 +75,56 @@ Defined commands:
- "running" - success
 
 
+Toolstack to Linux ioemu stubdomain protocol
+
+
+This section describe communication protocol between toolstack and
+qemu-upstream running in Linux stubdomain. The protocol include
+expectations of both stubdomain, and qemu.
+
+Setup (done by toolstack, expected by stubdomain):
+ - Block devices for target domain are connected as PV disks to stubdomain,
+   according to configuration order, starting with xvda
+ - Network devices for target domain are connected as PV nics to stubdomain,
+   according to configuration order, starting with 0
+ - [not implemented] if graphics output is expected, VFB and VKB devices are 
set for stubdomain
+   (its backend is responsible for exposing them using appropriate protocol
+   like VNC or Spice)
+ - other target domain's devices are not connected at this point to stubdomain
+   (may be hot-plugged later)
+ - QEMU command line is stored in
+   /vm//image/dmargs xenstore dir, each argument as separate key
+   in form /vm//image/dmargs/NNN, where NNN is 0-padded argument
+   number
+ - target domain id is stored in /local/domain//target xenstore 
path
+?? - bios type is stored in /local/domain//hvmloader/bios
+ - stubdomain's console 0 is connected to qemu log file
+ - stubdomain's console 1 is connected to qemu save file (for saving state)
+ - stubdomain's console 2 is connected to qemu save file (for restoring state)
+ - next consoles are connected according to target guest's serial console 
configuration
+
+Environment exposed by stubdomain to qemu (needed to construct appropriate 
qemu command line and later interact with qmp):
+ - target domain's disks are available as /dev/xvd[a-z]
+ - console 2 (incoming domain state) is connected with FD 3
+ - console 1 (saving domain state) is added over QMP to qemu as "fdset-id 1" 
(done by stubdomain, toolstack doesn't need to care about it)
+ - nics are connected to relevant stubdomain PV vifs when available (qemu 
-netdev should specify ifname= explicitly)
+
+Startup:
+1. toolstack starts PV stubdomain with stubdom-linux-kernel kernel and 
stubdom-linux-initrd initrd
+2. stubdomain initialize relevant devices
+3. stubdomain starts qemu with requested command line, plus few stubdomain 
specific ones - including local qmp access options
+4. stubdomain starts vchan server on 
/local/domain//device-model//qmp-vchan, exposing qmp 
socket to the toolstack
+5. qemu signal readiness by writing "running" to 
/local/domain//device-model//state xenstore path
+6. now device model is considered running
+
+QEMU can be controlled using QMP over vchan at 
/local/domain//device-model//qmp-vchan. Only one 
simultaneous connection is supported and toolstack needs to ensure that.
+
+Limitations:
+ - PCI passthrough require permissive mode
+ - only one nic is supported
+ - at most 26 emulated disks are supported (more are still available as PV 
disks)
+ - graphics output (VNC/SDL/Spice) not supported
+
 
PV-GRUB
===
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain.

2019-01-28 Thread Marek Marczykowski-Górecki
General idea is to allow freely set device_model_version and
device_model_stubdomain_override and choose the right options based on this 
choice.
Also, allow to specific path to stubdomain kernel/ramdisk, for greater 
flexibility.

First two patches add documentation about expected toolstack-stubdomain-qemu
interface, both for MiniOS stubdomain and Linux stubdomain.

Initial version has no QMP support - in initial patches it is completely
disabled, which means no suspend/restore and no PCI passthrough.

Later patches add QMP over libvchan connection support. This means libxenlight
will be linked with libxenvchan.

Ideally talking to stubdomain would be added only to new libxl__qmp_ev* API,
which is written from start with the assumption of untrusted qemu. But
unfortunately some parts of libxl public API that calls into QMP, are not
async-aware and can't use libxl__qmp_ev* API. Example of such API is
libxl_domain_unpause(). Because of this, separate patch add support for QMP
over vchan also to the old API.

There is also a need for external locking access to vchan connection (one
server can handle only one client and libvchan does not verify this). Since I
haven't found any asynchronous locking primitives in libxl, for now I've added
flock() on a lock file, but this also needs to be converted to async version.

The actual stubdomain implementation is here:

https://github.com/marmarek/qubes-vmm-xen-stubdom-linux
(branch for-upstream, tag for-upstream-v3)

See readme there for build instructions.
Beware: building on Debian is dangerous, as it require installing "dracut",
which will remove initramfs-tools. You may end up with broken initrd on
your host.

Few comments/questions about the stubdomain code:

1. There are extra patches for qemu that are necessary to run it in stubdomain.
While it is desirable to upstream them, I think it can be done after merging
libxl part. Stubdomain's qemu build will in most cases be separate anyway, to
limit qemu's dependencies (so the stubdomain size).

2. By default Linux hvc-xen console frontend is unreliable for data transfer
(qemu state save/restore) - it drops data sent faster than client is reading
it. To fix it, console device needs to be switched into raw mode
(`stty raw /dev/hvc1`). Especially for restoring qemu state it is tricky, as it
would need to be done before opening the device, but stty (obviously) needs to
open the device first. To solve this problem, for now the repository contains
kernel patch which changes the default for all hvc consoles. Again, this isn't
practical problem, as the kernel for stubdomain is built separately. But it
would be nice to have something working with vanilla kernel. I see those 
options:
  - convert it to kernel cmdline parameter (hvc_console_raw=1 ?)
  - use channels instead of consoles (and on the kernel side change the default
to "raw" only for channels); while in theory better design, libxl part will
be more complex, as channels can be connected to sockets but not files, so
libxl would need to read/write to it exactly when qemu write/read the data,
not before/after as it is done now

Remaining parts for eliminating dom0's instance of qemu:
 - do not force QDISK backend for CDROM
 - multiple consoles support in xenconsoled

Changes in v2:
 - apply review comments by Jason Andryuk
Changes in v3:
 - rework qemu arguments handling (separate xenstore keys, instead of \x1b 
separator)
 - add QMP over libvchan, instead of console
 - add protocol documentation
 - a lot of minor changes, see individual patches for full changes list
 - split xenconsoled patches into separate series

Cc: Simon Gaiser 
Cc: Eric Shelton 
Cc: Ian Jackson 
Cc: Wei Liu 

Eric Shelton (1):
  libxl: Handle Linux stubdomain specific QEMU options.

Marek Marczykowski-Górecki (16):
  Document ioemu MiniOS stubdomain protocol
  Document ioemu Linux stubdomain protocol
  libxl: fix qemu-trad cmdline for no sdl/vnc case
  libxl: Allow running qemu-xen in stubdomain
  libxl: write qemu arguments into separate xenstore keys
  libxl: create vkb device only for guests with graphics output
  xl: add stubdomain related options to xl config parser
  tools/libvchan: notify server when client is connected
  libxl: typo fix in comment
  libxl: move xswait declaration up in libxl_internal.h
  libxl: use vchan for QMP access with Linux stubdomain, libxl__ev_qmp_* version
  libxl: use vchan for QMP access with Linux stubdomain, non-async version
  libxl: add save/restore support for qemu-xen in stubdomain
  tools: add missing libxenvchan cflags
  libxl: add locking for libvchan QMP connection
  libxl: require qemu in dom0 even if stubdomain is in use

 docs/man/xl.cfg.5.pod.in |  23 +-
 docs/misc/stubdom.txt| 103 +-
 tools/Rules.mk   |   6 +-
 tools/libvchan/init.c|   3 +-
 tools/libxl/Makefile |   2 +-
 tools/libxl/libxl_create.c   |  51 +++-
 tools/l

[Xen-devel] [PATCH v3 07/17] libxl: create vkb device only for guests with graphics output

2019-01-28 Thread Marek Marczykowski-Górecki
The forced vkb device is meant for better performance of qemu access
(at least according to ebbd2561b4cefb299f0f68a88b2788504223de18 "libxl:
Add a vkbd frontend/backend pair for HVM guests"), which isn't used if
there is no configured channel to actually access that keyboard.

One can still add vkb device manually if needed.

This is continuation of b053f0c4c9e533f3d97837cf897eb920b8355ed3 "libxl: do
not start dom0 qemu for stubdomain when not needed".

Signed-off-by: Marek Marczykowski-Górecki 
Reviewed-by: Jason Andryuk 
---
 tools/libxl/libxl_create.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 13fc304..736b93c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1439,9 +1439,13 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
 libxl__device_console_add(gc, domid, &console, state, &device);
 libxl__device_console_dispose(&console);
 
-libxl_device_vkb_init(&vkb);
-libxl__device_add(gc, domid, &libxl__vkb_devtype, &vkb);
-libxl_device_vkb_dispose(&vkb);
+if (libxl_defbool_val(d_config->b_info.u.hvm.vnc.enable)
+|| libxl_defbool_val(d_config->b_info.u.hvm.spice.enable)
+|| libxl_defbool_val(d_config->b_info.u.hvm.sdl.enable)) {
+libxl_device_vkb_init(&vkb);
+libxl__device_add(gc, domid, &libxl__vkb_devtype, &vkb);
+libxl_device_vkb_dispose(&vkb);
+}
 
 dcs->sdss.dm.guest_domid = domid;
 if (libxl_defbool_val(d_config->b_info.device_model_stubdomain))
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 10/17] libxl: typo fix in comment

2019-01-28 Thread Marek Marczykowski-Górecki
Signed-off-by: Marek Marczykowski-Górecki 
---
 tools/libxl/libxl_qmp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 42c8ab8..a235095 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -1452,7 +1452,7 @@ static void dm_state_saved(libxl__egc *egc, libxl__ev_qmp 
*ev,
  *
  * - Allowed internal state transition:
  * disconnected -> connecting
- * connection   -> capability_negotiation
+ * connecting   -> capability_negotiation
  * capability_negotiation/connected -> waiting_reply
  * waiting_reply-> connected
  * any  -> broken
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 01/17] Document ioemu MiniOS stubdomain protocol

2019-01-28 Thread Marek Marczykowski-Górecki
Add documentation based on reverse-engineered toolstack-ioemu stubdomain
protocol.

Signed-off-by: Marek Marczykowski-Górecki 
---
 docs/misc/stubdom.txt | 53 -
 1 file changed, 53 insertions(+)

diff --git a/docs/misc/stubdom.txt b/docs/misc/stubdom.txt
index de7b6c7..4c524f2 100644
--- a/docs/misc/stubdom.txt
+++ b/docs/misc/stubdom.txt
@@ -23,6 +23,59 @@ and http://wiki.xen.org/wiki/Device_Model_Stub_Domains for 
more
 information on device model stub domains
 
 
+Toolstack to MiniOS ioemu stubdomain protocol
+-
+
+This section describe communication protocol between toolstack and
+qemu-traditional running in MiniOS stubdomain. The protocol include
+expectations of both qemu and stubdomain itself.
+
+Setup (done by toolstack, expected by stubdomain):
+ - Block devices for target domain are connected as PV disks to stubdomain,
+   according to configuration order, starting with xvda
+ - Network devices for target domain are connected as PV nics to stubdomain,
+   according to configuration order, starting with 0
+ - if graphics output is expected, VFB and VKB devices are set for stubdomain
+   (its backend is responsible for exposing them using appropriate protocol
+   like VNC or Spice)
+ - other target domain's devices are not connected at this point to stubdomain
+   (may be hot-plugged later)
+ - QEMU command line (space separated arguments) is stored in
+   /vm//image/dmargs xenstore path
+ - target domain id is stored in /local/domain//target xenstore 
path
+?? - bios type is stored in /local/domain//hvmloader/bios
+ - stubdomain's console 0 is connected to qemu log file
+ - stubdomain's console 1 is connected to qemu save file (for saving state)
+ - stubdomain's console 2 is connected to qemu save file (for restoring state)
+ - next consoles are connected according to target guest's serial console 
configuration
+
+Startup:
+1. PV stubdomain is started with ioemu-stubdom.gz kernel and no initrd
+2. stubdomain initialize relevant devices
+2. stubdoma signal readiness by writing "running" to 
/local/domain//device-model//state xenstore path
+3. now stubdomain is considered running
+
+Runtime control (hotplug etc):
+Toolstack can issue command through xenstore. The sequence is (from toolstack 
POV):
+1. Write parameter to 
/local/domain//device-model//parameter.
+2. Write command to 
/local/domain//device-model//command.
+3. Wait for command result in 
/local/domain//device-model//state (command specific 
value).
+4. Write "running" back to 
/local/domain//device-model//state.
+
+Defined commands:
+ - "pci-ins" - PCI hot plug, results:
+   - "pci-inserted" - success
+   - "pci-insert-failed" - failure
+ - "pci-rem" - PCI hot remove, results:
+   - "pci-removed" - success
+   - ??
+ - "save" - save domain state to console 1, results:
+   - "paused" - success
+ - "continue" - resume domain execution, after loading state from console 2 
(require -loadvm command argument), results:
+   - "running" - success
+
+
+
PV-GRUB
===
 
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 06/17] libxl: write qemu arguments into separate xenstore keys

2019-01-28 Thread Marek Marczykowski-Górecki
This allows using arguments with spaces, like -append, without
nominating any special "separator" character.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v3:
 - previous version of this patch "libxl: use \x1b to separate qemu
   arguments for linux stubdomain" used specific non-printable
   separator, but it was rejected as xenstore doesn't cope well with
   non-printable chars
---
 tools/libxl/libxl_dm.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 1fa4fa8..6cfc256 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -2047,6 +2047,40 @@ static int 
libxl__vfb_and_vkb_from_hvm_guest_config(libxl__gc *gc,
 return 0;
 }
 
+static int libxl__write_stub_linux_dmargs(libxl__gc *gc,
+int dm_domid, int guest_domid,
+char **args)
+{
+libxl_ctx *ctx = libxl__gc_owner(gc);
+int i;
+char *vm_path;
+char *path;
+struct xs_permissions roperm[2];
+xs_transaction_t t;
+
+roperm[0].id = 0;
+roperm[0].perms = XS_PERM_NONE;
+roperm[1].id = dm_domid;
+roperm[1].perms = XS_PERM_READ;
+
+vm_path = libxl__xs_read(gc, XBT_NULL, GCSPRINTF("/local/domain/%d/vm", 
guest_domid));
+path = GCSPRINTF("%s/image/dmargs", vm_path);
+
+retry_transaction:
+t = xs_transaction_start(ctx->xsh);
+xs_write(ctx->xsh, t, path, "", 0);
+xs_set_permissions(ctx->xsh, t, path, roperm, ARRAY_SIZE(roperm));
+i = 1;
+for (i=1; args[i] != NULL; i++)
+xs_write(ctx->xsh, t, GCSPRINTF("%s/%03d", path, i), args[i], 
strlen(args[i]));
+
+xs_set_permissions(ctx->xsh, t, GCSPRINTF("%s/rtc/timeoffset", vm_path), 
roperm, ARRAY_SIZE(roperm));
+if (!xs_transaction_end(ctx->xsh, t, 0))
+if (errno == EAGAIN)
+goto retry_transaction;
+return 0;
+}
+
 static int libxl__write_stub_dmargs(libxl__gc *gc,
 int dm_domid, int guest_domid,
 char **args)
@@ -2249,7 +2283,10 @@ void libxl__spawn_stub_dm(libxl__egc *egc, 
libxl__stub_dm_spawn_state *sdss)
 
 libxl__store_libxl_entry(gc, guest_domid, "dm-version",
 
libxl_device_model_version_to_string(dm_config->b_info.device_model_version));
-libxl__write_stub_dmargs(gc, dm_domid, guest_domid, args);
+if (libxl__stubdomain_is_linux(&guest_config->b_info))
+libxl__write_stub_linux_dmargs(gc, dm_domid, guest_domid, args);
+else
+libxl__write_stub_dmargs(gc, dm_domid, guest_domid, args);
 libxl__xs_printf(gc, XBT_NULL,
  GCSPRINTF("%s/image/device-model-domid",
libxl__xs_get_dompath(gc, guest_domid)),
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 12/17] libxl: use vchan for QMP access with Linux stubdomain, libxl__ev_qmp_* version

2019-01-28 Thread Marek Marczykowski-Górecki
Access to QMP of QEMU in Linux stubdomain is possible over vchan
connection. Add appropriate handling in libxl__ev_qmp_* API, keeping all the
asynchronous properties.
Since only one client can be connected to vchan server at the same time
and it is not enforced by the libxenvchan itself, additional client-side
locking is needed. Note that qemu supports only one simultaneous client
on a control socket anyway (but in UNIX socket case, it enforce it
server-side), so it doesn't add any extra limitation.

Regarding pipe to self:
Vchan issue notification about incoming data (or space for it)
only once - even if there is more data to read, FD returned by
libxenvchan_fd_for_select() will not be readable. Similar to buffer
space - there is one notification if more data can be written, but FD
isn't considered "writable" after libxenvchan_wait() call, even if in
fact there is a buffer space.
There are two situations where it is problematic:
 - some QMP message results in a user callback and processing further
   messages must stop (even if more data is already available in vchan
   buffer)
 - data is scheduled to write after a buffer space notification; this
   may result from either delayed libxl__ev_qmp_send() call, or internal
   state change

To avoid the above problems, use pipe to self to schedule vchan data
processing, even if vchan would not issue notification itself.

Signed-off-by: Marek Marczykowski-Górecki 
---
TODO:
 - handle locking for vchan access - a lockfile + flock comes to mind,
   but there is no async provisioning for it in libxl

Changes in v3:
 - new patch, in place of "libxl: access QMP socket via console for
   qemu-in-stubdomain"
---
 tools/Rules.mk   |   4 +-
 tools/libxl/Makefile |   2 +-
 tools/libxl/libxl_dm.c   |   2 +-
 tools/libxl/libxl_internal.h |   7 +-
 tools/libxl/libxl_qmp.c  | 293 +++-
 5 files changed, 301 insertions(+), 7 deletions(-)

diff --git a/tools/Rules.mk b/tools/Rules.mk
index 68f2ed7..12b3129 100644
--- a/tools/Rules.mk
+++ b/tools/Rules.mk
@@ -187,8 +187,8 @@ SHLIB_libblktapctl  =
 PKG_CONFIG_REMOVE += xenblktapctl
 endif
 
-CFLAGS_libxenlight = -I$(XEN_XENLIGHT) $(CFLAGS_libxenctrl) 
$(CFLAGS_xeninclude)
-SHDEPS_libxenlight = $(SHLIB_libxenctrl) $(SHLIB_libxenstore) 
$(SHLIB_libblktapctl)
+CFLAGS_libxenlight = -I$(XEN_XENLIGHT) $(CFLAGS_libxenctrl) 
$(CFLAGS_xeninclude) $(CFLAGS_libxenvchan)
+SHDEPS_libxenlight = $(SHLIB_libxenctrl) $(SHLIB_libxenstore) 
$(SHLIB_libblktapctl) $(SHLIB_libxenvchan)
 LDLIBS_libxenlight = $(SHDEPS_libxenlight) 
$(XEN_XENLIGHT)/libxenlight$(libextension)
 SHLIB_libxenlight  = $(SHDEPS_libxenlight) -Wl,-rpath-link=$(XEN_XENLIGHT)
 
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 6da342e..55b0b63 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -24,6 +24,7 @@ LIBXL_LIBS = $(LDLIBS_libxentoollog) $(LDLIBS_libxenevtchn) 
$(LDLIBS_libxenctrl)
 ifeq ($(CONFIG_LIBNL),y)
 LIBXL_LIBS += $(LIBNL3_LIBS)
 endif
+LIBXL_LIBS += $(LDLIBS_libxenvchan)
 
 CFLAGS_LIBXL += $(CFLAGS_libxentoollog)
 CFLAGS_LIBXL += $(CFLAGS_libxentoolcore)
@@ -35,6 +36,7 @@ CFLAGS_LIBXL += $(CFLAGS_libblktapctl)
 ifeq ($(CONFIG_LIBNL),y)
 CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
 endif
+CFLAGS_LIBXL += $(CFLAGS_libxenvchan)
 CFLAGS_LIBXL += -Wshadow
 
 LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 6cfc256..b9a53f3 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1180,7 +1180,7 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
   "-xen-domid",
   GCSPRINTF("%d", guest_domid), NULL);
 
-/* There is currently no way to access the QMP socket in the stubdom */
+/* QMP access to qemu running in stubdomain is done over vchan, not local 
socket */
 if (!is_stubdom) {
 flexarray_append(dm_args, "-chardev");
 if (state->dm_monitor_fd >= 0) {
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 0095835..9a903a5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -509,6 +510,12 @@ struct libxl__ev_qmp {
 libxl__carefd *cfd;
 libxl__ev_fd efd;
 libxl__qmp_state state;
+/* pipe to wake itself if there is more data in vchan */
+libxl__carefd *pipe_cfd_read;
+libxl__carefd *pipe_cfd_write;
+libxl__ev_fd pipe_efd;
+struct libxenvchan *vchan;
+libxl__xswait_state xswait;
 int id;
 int next_id;/* next id to use */
 /* receive buffer */
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index a235095..45b9f74 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -1466,6 +1466,14 @@ static void dm_state_saved(libxl__egc *egc, 
libxl__ev_qmp *ev,
 
 /* prototypes */
 
+static i

[Xen-devel] [PATCH v3 16/17] libxl: add locking for libvchan QMP connection

2019-01-28 Thread Marek Marczykowski-Górecki
It is not safe for multiple clients to (even try to) connect to the same
vchan server at the same time. Contrary to QMP over local socket,
connection over vchan needs external locking. For now use flock() for
this. This is not ideal for async QMP API, as flock() will block the
whole thread while other thread/process talks to the same QEMU instance.
This may be a problem especially in cause of malicious QEMU, that could
stall the communication. But since libxl doesn't have asynchronous
locking primitives, keep flock() until something better can be used
instead.

Note that qemu will handle only one client at a time anyway, so this
does not introduce artificial limit here.

Signed-off-by: Marek Marczykowski-Górecki 
---
 tools/libxl/libxl_internal.h |  1 +-
 tools/libxl/libxl_qmp.c  | 58 +++--
 2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9a903a5..36b38fd 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -515,6 +515,7 @@ struct libxl__ev_qmp {
 libxl__carefd *pipe_cfd_write;
 libxl__ev_fd pipe_efd;
 struct libxenvchan *vchan;
+libxl__carefd *vchan_lock;
 libxl__xswait_state xswait;
 int id;
 int next_id;/* next id to use */
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 7643f15..260dc0a 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -114,6 +114,7 @@ typedef struct callback_id_pair {
 struct libxl__qmp_handler {
 int qmp_fd;
 struct libxenvchan *vchan;
+libxl__carefd *vchan_lock;
 bool connected;
 time_t timeout;
 /* wait_for_id will be used by the synchronous send function */
@@ -497,9 +498,10 @@ static void qmp_close(libxl__qmp_handler *qmp)
 callback_id_pair *pp = NULL;
 callback_id_pair *tmp = NULL;
 
-if (qmp->vchan)
+if (qmp->vchan) {
 libxenvchan_close(qmp->vchan);
-else
+libxl__carefd_close(qmp->vchan_lock);
+} else
 close(qmp->qmp_fd);
 LIBXL_STAILQ_FOREACH(pp, &qmp->callback_list, next) {
 free(tmp);
@@ -805,6 +807,40 @@ static void qmp_parameters_add_integer(libxl__gc *gc,
 qmp_parameters_common_add(gc, param, name, obj);
 }
 
+static libxl__carefd *qmp_vchan_lock(libxl__gc *gc, int domid)
+{
+libxl__carefd *cfd;
+char *lock_path;
+int fd, r;
+
+lock_path = GCSPRINTF("%s/qmp-libxl-%d.lock", libxl__run_dir_path(), 
domid);
+libxl__carefd_begin();
+fd = open(lock_path, O_RDWR | O_CREAT, 0644);
+cfd = libxl__carefd_opened(CTX, fd);
+if (!cfd) {
+LOGED(ERROR, domid, "QMP lock file open error");
+goto error;
+}
+
+/* Try to lock the file, retrying on EINTR */
+for (;;) {
+r = flock(fd, LOCK_EX);
+if (!r)
+break;
+if (errno != EINTR) {
+/* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */
+LOGED(ERROR, domid,
+  "unexpected error while trying to lock %s, fd=%d, errno=%d",
+  lock_path, fd, errno);
+goto error;
+}
+}
+return cfd;
+error:
+libxl__carefd_close(cfd);
+return NULL;
+}
+
 #define QMP_PARAMETERS_SPRINTF(args, name, format, ...) \
 qmp_parameters_add_string(gc, args, name, GCSPRINTF(format, __VA_ARGS__))
 
@@ -825,10 +861,15 @@ libxl__qmp_handler *libxl__qmp_initialize(libxl__gc *gc, 
uint32_t domid)
 dm_domid = libxl_get_stubdom_id(CTX, domid);
 if (dm_domid) {
 qmp_path = DEVICE_MODEL_XS_PATH(gc, dm_domid, domid, "/qmp-vchan");
-/* TODO: add locking */
+qmp->vchan_lock = qmp_vchan_lock(gc, domid);
+if (!qmp->vchan_lock) {
+qmp_free_handler(qmp);
+return NULL;
+}
 qmp->vchan = libxenvchan_client_init(CTX->lg, dm_domid, qmp_path);
 if (!qmp->vchan) {
 LOGED(ERROR, domid, "QMP vchan connection failed: %s", 
strerror(errno));
+libxl__carefd_close(qmp->vchan_lock);
 qmp_free_handler(qmp);
 return NULL;
 }
@@ -1741,6 +1782,12 @@ static void qmp_vchan_watch_callback(libxl__egc *egc,
 int pipe_fd[2];
 
 if (!rc) {
+/* FIXME: convert to async locking */
+ev->vchan_lock = qmp_vchan_lock(gc, ev->domid);
+if (!ev->vchan_lock) {
+rc= ERROR_FAIL;
+goto error;
+}
 ev->vchan = libxenvchan_client_init(CTX->lg, dm_domid, 
ev->xswait.path);
 if (ev->vchan) {
 /* ok */
@@ -1775,6 +1822,8 @@ static void qmp_vchan_watch_callback(libxl__egc *egc,
 if (rc)
 goto error;
 } else if (errno == ENOENT) {
+libxl__carefd_close(ev->vchan_lock);
+ev->vchan_lock = NULL;
 /* not ready ye

[Xen-devel] [PATCH v3 17/17] libxl: require qemu in dom0 even if stubdomain is in use

2019-01-28 Thread Marek Marczykowski-Górecki
Until xenconsoled learns how to handle multiple consoles, this is needed
for save/restore support (qemu state is transferred over secondary
consoles).
Additionally, Linux-based stubdomain waits for all the backends to
initialize during boot. Lack of some console backends results in
stubdomain startup timeout.

This is a temporary patch until xenconsoled will be improved.

Signed-off-by: Marek Marczykowski-Górecki 
---
 tools/libxl/libxl_dm.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index ce65321..0fdc2f8 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -2438,7 +2438,11 @@ static void spawn_stub_launch_dm(libxl__egc *egc,
 }
 }
 
-need_qemu = libxl__need_xenpv_qemu(gc, dm_config);
+/*
+ * Until xenconsoled learns how to handle multiple consoles, require qemu
+ * in dom0, since the above code always adds at least 2 consoles.
+ */
+need_qemu = 1;
 
 for (i = 0; i < num_console; i++) {
 libxl__device device;
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 04/17] libxl: Allow running qemu-xen in stubdomain

2019-01-28 Thread Marek Marczykowski-Górecki
Do not prohibit anymore using stubdomain with qemu-xen.
To help distingushing MiniOS and Linux stubdomain, add helper inline
functions libxl__stubdomain_is_linux() and
libxl__stubdomain_is_linux_running(). Those should be used where really
the difference is about MiniOS/Linux, not qemu-xen/qemu-xen-traditional.

Signed-off-by: Marek Marczykowski-Górecki 

---
Changes in v3:
 - new patch, instead of "libxl: Add "stubdomain_version" to
 domain_build_info"
 - helper functions as suggested by Ian Jackson
---
 tools/libxl/libxl_create.c   |  9 -
 tools/libxl/libxl_internal.h | 17 +
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index a4e74a5..bb62542 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -160,15 +160,6 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 }
 }
 
-if (b_info->type == LIBXL_DOMAIN_TYPE_HVM &&
-b_info->device_model_version !=
-LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL &&
-libxl_defbool_val(b_info->device_model_stubdomain)) {
-LOG(ERROR,
-"device model stubdomains require \"qemu-xen-traditional\"");
-return ERROR_INVAL;
-}
-
 if (!b_info->max_vcpus)
 b_info->max_vcpus = 1;
 if (!b_info->avail_vcpus.size) {
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 459f9bf..b8c698a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2195,6 +2195,23 @@ _hidden int 
libxl__device_model_version_running(libxl__gc *gc, uint32_t domid);
   /* Return the system-wide default device model */
 _hidden libxl_device_model_version libxl__default_device_model(libxl__gc *gc);
 
+static inline
+bool libxl__stubdomain_is_linux_running(libxl__gc *gc, uint32_t domid)
+{
+/* same logic as in libxl__stubdomain_is_linux */
+return libxl__device_model_version_running(gc, domid)
+== LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
+}
+
+static inline
+bool libxl__stubdomain_is_linux(libxl_domain_build_info *b_info)
+{
+/* right now qemu-tranditional implies MiniOS stubdomain and qemu-xen
+ * implies Linux stubdomain */
+return libxl_defbool_val(b_info->device_model_stubdomain) &&
+b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
+}
+
 #define DEVICE_MODEL_XS_PATH(gc, dm_domid, domid, fmt, _a...)  \
 libxl__sprintf(gc, "/local/domain/%u/device-model/%u" fmt, dm_domid,   \
domid, ##_a)
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 8/9] xen/gntdev.c: Convert to use vm_map_pages()

2019-07-28 Thread Marek Marczykowski-Górecki
On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote:
> Convert to use vm_map_pages() to map range of kernel
> memory to user vma.
> 
> map->count is passed to vm_map_pages() and internal API
> verify map->count against count ( count = vma_pages(vma))
> for page array boundary overrun condition.

This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages
will:
 - use map->pages starting at vma->vm_pgoff instead of 0
 - verify map->count against vma_pages()+vma->vm_pgoff instead of just
   vma_pages().

In practice, this breaks using a single gntdev FD for mapping multiple
grants.

It looks like vm_map_pages() is not a good fit for this code and IMO it
should be reverted.

> Signed-off-by: Souptick Joarder 
> Reviewed-by: Boris Ostrovsky 
> ---
>  drivers/xen/gntdev.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 5efc5ee..5d64262 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -1084,7 +1084,7 @@ static int gntdev_mmap(struct file *flip, struct 
> vm_area_struct *vma)
>   int index = vma->vm_pgoff;
>   int count = vma_pages(vma);
>   struct gntdev_grant_map *map;
> - int i, err = -EINVAL;
> + int err = -EINVAL;
>  
>   if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED))
>   return -EINVAL;
> @@ -1145,12 +1145,9 @@ static int gntdev_mmap(struct file *flip, struct 
> vm_area_struct *vma)
>   goto out_put_map;
>  
>   if (!use_ptemod) {
> - for (i = 0; i < count; i++) {
> - err = vm_insert_page(vma, vma->vm_start + i*PAGE_SIZE,
> - map->pages[i]);
> - if (err)
> - goto out_put_map;
> - }
> + err = vm_map_pages(vma, map->pages, map->count);
> + if (err)
> + goto out_put_map;
>   } else {
>  #ifdef CONFIG_X86
>   /*

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 8/9] xen/gntdev.c: Convert to use vm_map_pages()

2019-07-29 Thread Marek Marczykowski-Górecki
On Mon, Jul 29, 2019 at 02:02:54PM +0530, Souptick Joarder wrote:
> On Mon, Jul 29, 2019 at 1:35 PM Souptick Joarder  wrote:
> >
> > On Sun, Jul 28, 2019 at 11:36 PM Marek Marczykowski-Górecki
> >  wrote:
> > >
> > > On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote:
> > > > Convert to use vm_map_pages() to map range of kernel
> > > > memory to user vma.
> > > >
> > > > map->count is passed to vm_map_pages() and internal API
> > > > verify map->count against count ( count = vma_pages(vma))
> > > > for page array boundary overrun condition.
> > >
> > > This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages
> > > will:
> > >  - use map->pages starting at vma->vm_pgoff instead of 0
> >
> > The actual code ignores vma->vm_pgoff > 0 scenario and mapped
> > the entire map->pages[i]. Why the entire map->pages[i] needs to be mapped
> > if vma->vm_pgoff > 0 (in original code) ?

vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
basically (ab)using this parameter for "which grant reference to map".

> > are you referring to set vma->vm_pgoff = 0 irrespective of value passed
> > from user space ? If yes, using vm_map_pages_zero() is an alternate
> > option.

Yes, that should work.

> > >  - verify map->count against vma_pages()+vma->vm_pgoff instead of just
> > >vma_pages().
> >
> > In original code ->
> >
> > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > index 559d4b7f807d..469dfbd6cf90 100644
> > --- a/drivers/xen/gntdev.c
> > +++ b/drivers/xen/gntdev.c
> > @@ -1084,7 +1084,7 @@ static int gntdev_mmap(struct file *flip, struct
> > vm_area_struct *vma)
> > int index = vma->vm_pgoff;
> > int count = vma_pages(vma);
> >
> > Count is user passed value.
> >
> > struct gntdev_grant_map *map;
> > - int i, err = -EINVAL;
> > + int err = -EINVAL;
> > if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED))
> > return -EINVAL;
> > @@ -1145,12 +1145,9 @@ static int gntdev_mmap(struct file *flip,
> > struct vm_area_struct *vma)
> > goto out_put_map;
> > if (!use_ptemod) {
> > - for (i = 0; i < count; i++) {
> > - err = vm_insert_page(vma, vma->vm_start + i*PAGE_SIZE,
> > - map->pages[i]);
> >
> > and when count > i , we end up with trying to map memory outside
> > boundary of map->pages[i], which was not correct.
> 
> typo.
> s/count > i / count > map->count

gntdev_find_map_index verifies it. Specifically, it looks for a map matching
both index and count.

> >
> > - if (err)
> > - goto out_put_map;
> > - }
> > + err = vm_map_pages(vma, map->pages, map->count);
> > + if (err)
> > + goto out_put_map;
> >
> > With this commit, inside __vm_map_pages(), we have addressed this scenario.
> >
> > +static int __vm_map_pages(struct vm_area_struct *vma, struct page **pages,
> > + unsigned long num, unsigned long offset)
> > +{
> > + unsigned long count = vma_pages(vma);
> > + unsigned long uaddr = vma->vm_start;
> > + int ret, i;
> > +
> > + /* Fail if the user requested offset is beyond the end of the object */
> > + if (offset > num)
> > + return -ENXIO;
> > +
> > + /* Fail if the user requested size exceeds available object size */
> > + if (count > num - offset)
> > + return -ENXIO;
> >
> > By checking count > num -offset. (considering vma->vm_pgoff != 0 as well).
> > So we will never cross the boundary of map->pages[i].
> >
> >
> > >
> > > In practice, this breaks using a single gntdev FD for mapping multiple
> > > grants.
> >
> > How ?

gntdev uses vma->vm_pgoff to select which grant entry should be mapped.
map struct returned by gntdev_find_map_index() describes just the pages
to be mapped. Specifically map->pages[0] should be mapped at
vma->vm_start, not vma->vm_start+vma->vm_pgoff*PAGE_SIZE.

When trying to map grant with index (aka vma->vm_pgoff) > 1,
__vm_map_pages() will refuse to map it because it will expect map->count
to be at least vma_pages(vma)+vma->vm_pgoff, while it is exactly
vma_pages(vma).

> > > It looks like vm_map_pages() is not a good fit for this code and IMO it
> > > should be reverted.
> >
> > Did you hit any issue around this code in real time ?

Yes, relevant strace output:
[pid   857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[

Re: [Xen-devel] [PATCH] xen: Avoid calling device suspend/resume callbacks

2019-07-30 Thread Marek Marczykowski-Górecki
On Tue, Jul 30, 2019 at 07:55:02AM +, Jan Beulich wrote:
> On 29.07.2019 19:06, Andrew Cooper wrote:
> > On 29/07/2019 16:57, Jan Beulich wrote:
> >> On 29.07.2019 17:41, Ross Lagerwall wrote:
> >>> When suspending/resuming or migrating under Xen, there isn't much need
> >>> for suspending and resuming all the attached devices since the Xen/QEMU
> >>> should correctly maintain the hardware state. Drop these calls and
> >>> replace with more specific calls to ensure Xen frontend devices are
> >>> properly reconnected.
> >> Is this true for the general pass-through case as well? While migration
> >> may not be (fully) compatible with pass-through, iirc save/restore is.
> > 
> > What gives you this impression?
> > 
> > Migration and save/restore are *literally* the same thing, except that
> > in one case you're piping the data to/from disk, and in the other you're
> > piping it to the destination and restoring it immediately.
> > 
> > If you look at the toolstack code, it is all in terms of reading/writing
> > an fd (including libxl's API) which is either a network socket or a
> > file, as chosen by xl.
> 
> Sure. The main difference is where the restore happens (by default):
> For live migration I expect this to be on a different host, whereas
> for a non-live restore I'd expect this to be the same host. And it
> is only the "same host" case where one can assume the same physical
> piece of hardware to be available again for passing through to this
> guest. In the "different host" case restore _may_ be possible, using
> identical hardware. (And yes, in the "same host" case restore may
> also be impossible, if the hardware meanwhile has been assigned to
> another guest. But as said, I'm talking about the default case here.)
> 
> >> Would qemu restore state of physical PCI devices?
> > 
> > What state would Qemu be in a position to know about, which isn't
> > already present in Qemu's datablob?
> 
> That's a valid (rhetorical) question, but not helping to answer mine.
> 
> > What we do with graphics cards is to merge Xens logdirty bitmap, with a
> > dirty list provided by the card itself.  This needs a device-specific
> > knowledge.  In addition, there is an opaque blob of data produced by the
> > source card, which is handed to the destination card.  That also lives
> > in the stream.
> > 
> > Intel's Scalable IOV spec is attempting to rationalise this by having a
> > standard ways of getting logdirty and "internal state" information out
> > of a device, but for the moment, it requires custom device-driver
> > specific code to do anything migration related with real hardware.
> 
> Which isn't very nice, since it doesn't scale well as a model.
> 
> > As for why its safe to do like this, the best argument is that this is
> > how all other vendors do migration, including KVM.  Xen is the
> > odd-one-out using the full S3 path.
> 
> So how do "all other vendors" deal with device specific state? So
> far I was under the impression that to deal with this is precisely
> why we use the S3 logic in the kernel.

FWIW in Qubes we specifically S3 domUs with PCI devices assigned just
before host S3 - to let the driver save the state and later restore it.
AFAIR in same cases (but it might be PV, not HVM then) not doing so was
breaking host S3 altogether.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 8/9] xen/gntdev.c: Convert to use vm_map_pages()

2019-07-30 Thread Marek Marczykowski-Górecki
On Tue, Jul 30, 2019 at 10:05:42AM -0400, Boris Ostrovsky wrote:
> On 7/30/19 2:03 AM, Souptick Joarder wrote:
> > On Mon, Jul 29, 2019 at 7:06 PM Marek Marczykowski-Górecki
> >  wrote:
> >> On Mon, Jul 29, 2019 at 02:02:54PM +0530, Souptick Joarder wrote:
> >>> On Mon, Jul 29, 2019 at 1:35 PM Souptick Joarder  
> >>> wrote:
> >>>> On Sun, Jul 28, 2019 at 11:36 PM Marek Marczykowski-Górecki
> >>>>  wrote:
> >>>>> On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote:
> >>>>>> Convert to use vm_map_pages() to map range of kernel
> >>>>>> memory to user vma.
> >>>>>>
> >>>>>> map->count is passed to vm_map_pages() and internal API
> >>>>>> verify map->count against count ( count = vma_pages(vma))
> >>>>>> for page array boundary overrun condition.
> >>>>> This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages
> >>>>> will:
> >>>>>  - use map->pages starting at vma->vm_pgoff instead of 0
> >>>> The actual code ignores vma->vm_pgoff > 0 scenario and mapped
> >>>> the entire map->pages[i]. Why the entire map->pages[i] needs to be mapped
> >>>> if vma->vm_pgoff > 0 (in original code) ?
> >> vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
> >> basically (ab)using this parameter for "which grant reference to map".
> >>
> >>>> are you referring to set vma->vm_pgoff = 0 irrespective of value passed
> >>>> from user space ? If yes, using vm_map_pages_zero() is an alternate
> >>>> option.
> >> Yes, that should work.
> > I prefer to use vm_map_pages_zero() to resolve both the issues. 
> > Alternatively
> > the patch can be reverted as you suggested. Let me know you opinion and wait
> > for feedback from others.
> >
> > Boris, would you like to give any feedback ?
> 
> vm_map_pages_zero() looks good to me. Marek, does it work for you?

Yes, replacing vm_map_pages() with vm_map_pages_zero() fixes the
problem for me.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 8/9] xen/gntdev.c: Convert to use vm_map_pages()

2019-07-30 Thread Marek Marczykowski-Górecki
On Tue, Jul 30, 2019 at 08:22:02PM +0530, Souptick Joarder wrote:
> On Tue, Jul 30, 2019 at 7:52 PM Marek Marczykowski-Górecki
>  wrote:
> >
> > On Tue, Jul 30, 2019 at 10:05:42AM -0400, Boris Ostrovsky wrote:
> > > On 7/30/19 2:03 AM, Souptick Joarder wrote:
> > > > On Mon, Jul 29, 2019 at 7:06 PM Marek Marczykowski-Górecki
> > > >  wrote:
> > > >> On Mon, Jul 29, 2019 at 02:02:54PM +0530, Souptick Joarder wrote:
> > > >>> On Mon, Jul 29, 2019 at 1:35 PM Souptick Joarder 
> > > >>>  wrote:
> > > >>>> On Sun, Jul 28, 2019 at 11:36 PM Marek Marczykowski-Górecki
> > > >>>>  wrote:
> > > >>>>> On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote:
> > > >>>>>> Convert to use vm_map_pages() to map range of kernel
> > > >>>>>> memory to user vma.
> > > >>>>>>
> > > >>>>>> map->count is passed to vm_map_pages() and internal API
> > > >>>>>> verify map->count against count ( count = vma_pages(vma))
> > > >>>>>> for page array boundary overrun condition.
> > > >>>>> This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages
> > > >>>>> will:
> > > >>>>>  - use map->pages starting at vma->vm_pgoff instead of 0
> > > >>>> The actual code ignores vma->vm_pgoff > 0 scenario and mapped
> > > >>>> the entire map->pages[i]. Why the entire map->pages[i] needs to be 
> > > >>>> mapped
> > > >>>> if vma->vm_pgoff > 0 (in original code) ?
> > > >> vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
> > > >> basically (ab)using this parameter for "which grant reference to map".
> > > >>
> > > >>>> are you referring to set vma->vm_pgoff = 0 irrespective of value 
> > > >>>> passed
> > > >>>> from user space ? If yes, using vm_map_pages_zero() is an alternate
> > > >>>> option.
> > > >> Yes, that should work.
> > > > I prefer to use vm_map_pages_zero() to resolve both the issues. 
> > > > Alternatively
> > > > the patch can be reverted as you suggested. Let me know you opinion and 
> > > > wait
> > > > for feedback from others.
> > > >
> > > > Boris, would you like to give any feedback ?
> > >
> > > vm_map_pages_zero() looks good to me. Marek, does it work for you?
> >
> > Yes, replacing vm_map_pages() with vm_map_pages_zero() fixes the
> > problem for me.
> 
> Marek, I can send a patch for the same if you are ok.
> We need to cc stable as this changes are available in 5.2.4.

Sounds good, thanks!

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 5/6] xen/x86: add PHYSDEVOP_msi_control

2019-08-05 Thread Marek Marczykowski-Górecki
On Fri, Jul 19, 2019 at 09:43:26AM +, Jan Beulich wrote:
> On 19.07.2019 11:02, Roger Pau Monné  wrote:
> > On Fri, Jul 19, 2019 at 08:04:45AM +, Jan Beulich wrote:
> >> On 18.07.2019 18:52, Roger Pau Monné  wrote:
> >>> On Thu, Jul 18, 2019 at 03:17:27PM +, Jan Beulich wrote:
> >>>> On 18.07.2019 15:46, Roger Pau Monné  wrote:
> >>>>> In fact I don't think INTx should be enabled when MSI(-X) is disabled,
> >>>>> QEMU already traps writes to the command register, and it will manage
> >>>>> INTx enabling/disabling by itself. I think the only check required is
> >>>>> that MSI(-X) cannot be enabled if INTx is also enabled. In the same
> >>>>> way both MSI caspabilities cannot be enabled simultaneously. The
> >>>>> function should not explicitly disable any of the other capabilities,
> >>>>> and just return -EBUSY if the caller attempts for example to enable
> >>>>> MSI while INTx or MSI-X is enabled.
> >>>>
> >>>> You do realize that pci_intx() only ever gets called for Xen
> >>>> internally used interrupts, i.e. mainly the serial console one?
> >>>
> >>> You will have to bear with me because I'm not sure I understand why
> >>> it does matter. Do you mean to point out that dom0 is the one in full
> >>> control of INTx, and thus Xen shouldn't care of whether INTx and
> >>> MSI(-X) are enabled at the same time?
> >>>
> >>> I still think that at least a warning should be printed if a caller
> >>> tries to enable MSI(-X) while INTx is also enabled, but unless there's
> >>> a reason to have both MSI(-X) and INTx enabled at the same time (maybe
> >>> a quirk for some hardware issue?) it shouldn't be allowed on this new
> >>> interface.
> >>
> >> I don't mind improvements to the current situation (i.e. such a
> >> warning may indeed make sense); I merely stated how things currently
> >> are. INTx treatment was completely left aside when MSI support was
> >> introduced into Xen.
> > 
> > In order to give Marek a more concise reply, would you agree to return
> > -EBUSY (or some error code) and print a warning message if the caller
> > attempts to enable MSI(-X) while INTx is also enabled?
> 
> As to returning an error - I think so, yes. I'm less sure about logging
> a message.

I'm trying to get it working and it isn't clear to me what should I
check for "INTx is also enabled". I assumed PCI_COMMAND_INTX_DISABLE
bit, but it looks like guest has no control over this bit, even in
permissive mode.  This means enabling MSI(-X) always fails because guest
has no way to set PCI_COMMAND_INTX_DISABLE bit before.

Should I check something different? Or change back to disabling/enabling
INTx as part of msi_control call? Or maybe allow guest/qemu to control
PCI_COMMAND_INTX_DISABLE bit? In that case, I'd have very similar
problem as with MSI - xen-pciback doesn't allow that, so I'd either need
to patch pciback (and I could move this whole patch into Linux
instead), or get around with a hypercall.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] preparations for 4.12.1

2019-08-05 Thread Marek Marczykowski-Górecki
On Mon, Aug 05, 2019 at 04:19:03PM +0100, Ian Jackson wrote:
> > >> "libxl: fix pci device re-assigning after domain reboot":
> > >> commit  c19434d9284e93e6f9aaec9a70f5f361adbfaba6
> 
> Thanks all.  I pushed this to staging-4.12 today.
> 
> There are a couple of other things that I say in git log that might
> warrant a backport, and for which I'm hoping I get replies quickly.

I'd like to propose "video: fix handling framebuffer located above 4GB"
9cf11fdcd91ff8e9cd038f8336cf21f0701e8b7b (see commit message about a
little change needed during backport).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] python: Adjust xc_physinfo wrapper for updated virt_caps bits

2019-08-05 Thread Marek Marczykowski-Górecki
On Mon, Aug 05, 2019 at 03:56:30PM +0100, Ian Jackson wrote:
> Ian Jackson writes ("Re: [PATCH v2] python: Adjust xc_physinfo wrapper for 
> updated virt_caps bits"):
> > Marek Marczykowski-Górecki writes ("[PATCH v2] python: Adjust xc_physinfo 
> > wrapper for updated virt_caps bits"):
> > > Commit f089fddd94 "xen: report PV capability in sysctl and use it in
> > > toolstack" changed meaning of virt_caps bit 1 - previously it was
> > > "directio", but was changed to "pv" and "directio" was moved to bit 2.
> > > Adjust python wrapper to use #defines for the bits values, and add
> > > reporting of both "pv_directio" and "hvm_directio".
> > > 
> > > Signed-off-by: Marek Marczykowski-Górecki 
> > > 
> > 
> > Acked-by: Ian Jackson 
> 
> I think this is a backport candidate for 4.12, since f089fddd94 was in
> 4.12.  Am I right ?

Yes, definitely.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 5/6] xen/x86: add PHYSDEVOP_msi_control

2019-08-06 Thread Marek Marczykowski-Górecki
On Tue, Aug 06, 2019 at 07:56:39AM +, Jan Beulich wrote:
> On 05.08.2019 15:44, Marek Marczykowski-Górecki  wrote:
> > On Fri, Jul 19, 2019 at 09:43:26AM +, Jan Beulich wrote:
> >> On 19.07.2019 11:02, Roger Pau Monné  wrote:
> >>> On Fri, Jul 19, 2019 at 08:04:45AM +, Jan Beulich wrote:
> >>>> On 18.07.2019 18:52, Roger Pau Monné  wrote:
> >>>>> On Thu, Jul 18, 2019 at 03:17:27PM +, Jan Beulich wrote:
> >>>>>> On 18.07.2019 15:46, Roger Pau Monné  wrote:
> >>>>>>> In fact I don't think INTx should be enabled when MSI(-X) is disabled,
> >>>>>>> QEMU already traps writes to the command register, and it will manage
> >>>>>>> INTx enabling/disabling by itself. I think the only check required is
> >>>>>>> that MSI(-X) cannot be enabled if INTx is also enabled. In the same
> >>>>>>> way both MSI caspabilities cannot be enabled simultaneously. The
> >>>>>>> function should not explicitly disable any of the other capabilities,
> >>>>>>> and just return -EBUSY if the caller attempts for example to enable
> >>>>>>> MSI while INTx or MSI-X is enabled.
> >>>>>>
> >>>>>> You do realize that pci_intx() only ever gets called for Xen
> >>>>>> internally used interrupts, i.e. mainly the serial console one?
> >>>>>
> >>>>> You will have to bear with me because I'm not sure I understand why
> >>>>> it does matter. Do you mean to point out that dom0 is the one in full
> >>>>> control of INTx, and thus Xen shouldn't care of whether INTx and
> >>>>> MSI(-X) are enabled at the same time?
> >>>>>
> >>>>> I still think that at least a warning should be printed if a caller
> >>>>> tries to enable MSI(-X) while INTx is also enabled, but unless there's
> >>>>> a reason to have both MSI(-X) and INTx enabled at the same time (maybe
> >>>>> a quirk for some hardware issue?) it shouldn't be allowed on this new
> >>>>> interface.
> >>>>
> >>>> I don't mind improvements to the current situation (i.e. such a
> >>>> warning may indeed make sense); I merely stated how things currently
> >>>> are. INTx treatment was completely left aside when MSI support was
> >>>> introduced into Xen.
> >>>
> >>> In order to give Marek a more concise reply, would you agree to return
> >>> -EBUSY (or some error code) and print a warning message if the caller
> >>> attempts to enable MSI(-X) while INTx is also enabled?
> >>
> >> As to returning an error - I think so, yes. I'm less sure about logging
> >> a message.
> > 
> > I'm trying to get it working and it isn't clear to me what should I
> > check for "INTx is also enabled". I assumed PCI_COMMAND_INTX_DISABLE
> > bit, but it looks like guest has no control over this bit, even in
> > permissive mode.  This means enabling MSI(-X) always fails because guest
> > has no way to set PCI_COMMAND_INTX_DISABLE bit before.
> 
> Well, the guest has no control, but in order to enable MSI{,-X} I'd
> have expected qemu or the Dom0 kernel to set this bit up front. 

qemu would do that, when running in dom0. But in PV stubdomain it talks
to pciback, which filters it out.

> If
> that's not the case, then of course neither checking nor logging a
> message is appropriate at this point in time. It may be worthwhile
> calling out this anomaly then in the description.

Ok, so I'll go back to setting PCI_COMMAND_INTX_DISABLE instead of just
verification.

Just to clarify: should I also clear PCI_COMMAND_INTX_DISABLE when
disabling MSI? Now I think yes, because nothing else would do that
otherwise, but I would like to double check.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 5/6] xen/x86: add PHYSDEVOP_msi_control

2019-08-06 Thread Marek Marczykowski-Górecki
On Tue, Aug 06, 2019 at 12:33:39PM +0200, Jan Beulich wrote:
> On 06.08.2019 11:46, Marek Marczykowski-Górecki  wrote:
> > On Tue, Aug 06, 2019 at 07:56:39AM +, Jan Beulich wrote:
> > > On 05.08.2019 15:44, Marek Marczykowski-Górecki  wrote:
> > > > On Fri, Jul 19, 2019 at 09:43:26AM +, Jan Beulich wrote:
> > > > > On 19.07.2019 11:02, Roger Pau Monné  wrote:
> > > > > > On Fri, Jul 19, 2019 at 08:04:45AM +, Jan Beulich wrote:
> > > > > > > On 18.07.2019 18:52, Roger Pau Monné  wrote:
> > > > > > > > On Thu, Jul 18, 2019 at 03:17:27PM +, Jan Beulich wrote:
> > > > > > > > > On 18.07.2019 15:46, Roger Pau Monné  wrote:
> > > > > > > > > > In fact I don't think INTx should be enabled when MSI(-X) 
> > > > > > > > > > is disabled,
> > > > > > > > > > QEMU already traps writes to the command register, and it 
> > > > > > > > > > will manage
> > > > > > > > > > INTx enabling/disabling by itself. I think the only check 
> > > > > > > > > > required is
> > > > > > > > > > that MSI(-X) cannot be enabled if INTx is also enabled. In 
> > > > > > > > > > the same
> > > > > > > > > > way both MSI caspabilities cannot be enabled 
> > > > > > > > > > simultaneously. The
> > > > > > > > > > function should not explicitly disable any of the other 
> > > > > > > > > > capabilities,
> > > > > > > > > > and just return -EBUSY if the caller attempts for example 
> > > > > > > > > > to enable
> > > > > > > > > > MSI while INTx or MSI-X is enabled.
> > > > > > > > > 
> > > > > > > > > You do realize that pci_intx() only ever gets called for Xen
> > > > > > > > > internally used interrupts, i.e. mainly the serial console 
> > > > > > > > > one?
> > > > > > > > 
> > > > > > > > You will have to bear with me because I'm not sure I understand 
> > > > > > > > why
> > > > > > > > it does matter. Do you mean to point out that dom0 is the one 
> > > > > > > > in full
> > > > > > > > control of INTx, and thus Xen shouldn't care of whether INTx and
> > > > > > > > MSI(-X) are enabled at the same time?
> > > > > > > > 
> > > > > > > > I still think that at least a warning should be printed if a 
> > > > > > > > caller
> > > > > > > > tries to enable MSI(-X) while INTx is also enabled, but unless 
> > > > > > > > there's
> > > > > > > > a reason to have both MSI(-X) and INTx enabled at the same time 
> > > > > > > > (maybe
> > > > > > > > a quirk for some hardware issue?) it shouldn't be allowed on 
> > > > > > > > this new
> > > > > > > > interface.
> > > > > > > 
> > > > > > > I don't mind improvements to the current situation (i.e. such a
> > > > > > > warning may indeed make sense); I merely stated how things 
> > > > > > > currently
> > > > > > > are. INTx treatment was completely left aside when MSI support was
> > > > > > > introduced into Xen.
> > > > > > 
> > > > > > In order to give Marek a more concise reply, would you agree to 
> > > > > > return
> > > > > > -EBUSY (or some error code) and print a warning message if the 
> > > > > > caller
> > > > > > attempts to enable MSI(-X) while INTx is also enabled?
> > > > > 
> > > > > As to returning an error - I think so, yes. I'm less sure about 
> > > > > logging
> > > > > a message.
> > > > 
> > > > I'm trying to get it working and it isn't clear to me what should I
> > > > check for "INTx is also enabled". I assumed PCI_COMMAND_INTX_DISABLE
> > > > bit, but it looks like guest has no control over this bit, even in
> > > > permissive mode.  This means enabling MSI(-X) always fails because guest
> > > > has no way to set PCI_COMMAND_INTX_DISABLE bit before.
> > > 
> > > Well, the guest has no control, but in order to enable MSI{,-X} I'd
> > > have expected qemu or the Dom0 kernel to set this bit up front.
> > 
> > qemu would do that, when running in dom0. But in PV stubdomain it talks
> > to pciback, which filters it out.
> 
> Filtering out the guest attempt is fine, but it should then set the
> bit while preparing MSI/MSI-X for the guest. (I'm not sure it would
> need to do so explicitly, or whether it couldn't rely on code
> elsewhere in the kernel doing so.)
...
> Well, I think I've made my position clear: So far we use pci_intx()
> only on devices used by Xen itself. I think this should remain to be
> that way. Devices in possession of domains (including Dom0) should
> be under Dom0's control in this regard.

The thing is, in case of using stubdomain, it's mostly stubdomain
handling it. Especially all the config space interception and applying
logic to it is done by qemu in stubdomain. Do you suggest duplicating /
moving that part to dom0 instead? What would be the point for stubdomain
then?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 5/6] xen/x86: add PHYSDEVOP_msi_control

2019-08-06 Thread Marek Marczykowski-Górecki
On Tue, Aug 06, 2019 at 02:05:48PM +0200, Jan Beulich wrote:
> On 06.08.2019 12:53, Marek Marczykowski-Górecki  wrote:
> > On Tue, Aug 06, 2019 at 12:33:39PM +0200, Jan Beulich wrote:
> > > On 06.08.2019 11:46, Marek Marczykowski-Górecki  wrote:
> > > > On Tue, Aug 06, 2019 at 07:56:39AM +, Jan Beulich wrote:
> > > > > On 05.08.2019 15:44, Marek Marczykowski-Górecki  wrote:
> > > > > > I'm trying to get it working and it isn't clear to me what should I
> > > > > > check for "INTx is also enabled". I assumed PCI_COMMAND_INTX_DISABLE
> > > > > > bit, but it looks like guest has no control over this bit, even in
> > > > > > permissive mode.  This means enabling MSI(-X) always fails because 
> > > > > > guest
> > > > > > has no way to set PCI_COMMAND_INTX_DISABLE bit before.
> > > > > 
> > > > > Well, the guest has no control, but in order to enable MSI{,-X} I'd
> > > > > have expected qemu or the Dom0 kernel to set this bit up front.
> > > > 
> > > > qemu would do that, when running in dom0. But in PV stubdomain it talks
> > > > to pciback, which filters it out.
> > > 
> > > Filtering out the guest attempt is fine, but it should then set the
> > > bit while preparing MSI/MSI-X for the guest. (I'm not sure it would
> > > need to do so explicitly, or whether it couldn't rely on code
> > > elsewhere in the kernel doing so.)
> > ...
> > > Well, I think I've made my position clear: So far we use pci_intx()
> > > only on devices used by Xen itself. I think this should remain to be
> > > that way. Devices in possession of domains (including Dom0) should
> > > be under Dom0's control in this regard.
> > 
> > The thing is, in case of using stubdomain, it's mostly stubdomain
> > handling it. Especially all the config space interception and applying
> > logic to it is done by qemu in stubdomain. Do you suggest duplicating /
> > moving that part to dom0 instead? What would be the point for stubdomain
> > then?
> 
> Nothing should be moved between components if possible (and if placed
> sensibly). But isn't stubdom (being a PV DomU) relying on pciback in
> Dom0 anyway? And hence doesn't the flow of events include
> pci_enable_msi{,x}() invoked by pciback? I'd have expected that to
> (also) take care of INTx.

This was discussed in v2 of this series[1], where you also responded.
Relevant part of Roger's email:

Oh great, that's unfortunate. Both pciback functions end up calling
into msi_capability_init in the Linux kernel, which does indeed more
than just toggling the PCI config space enable bit.

OTOH adding a bypass to pciback so the stubdom is able to write to the
PCI register in order to toggle the enable bit seems quite clumsy. Not
to mention that you would be required to update Dom0 kernel in order to
fix the issue.

Do you think it makes sense to add a domctl to enable/disable MSI(X)?

This way the bug could be fixed by just updating Xen (and the
stubdomain).

[1] https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg01271.html

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.12 panic on Thinkpad W540 with UEFI mutiboot2, efi=no-rs workarounds it

2019-08-07 Thread Marek Marczykowski-Górecki
On Wed, Aug 07, 2019 at 05:33:05PM +0200, Jan Beulich wrote:
> On 07.08.2019 17:17, Marek Marczykowski-Górecki  wrote:
> > On Wed, Aug 07, 2019 at 04:45:43PM +0200, Jan Beulich wrote:
> > > On 07.08.2019 15:26, Marek Marczykowski-Górecki  wrote:
> > > > Hi,
> > > > 
> > > > Xen 4.12 crashes when booting on UEFI (with multiboot2) unless I disable
> > > > runtime services. The crash happens shortly after starting dom0 kernel.
> > > > Unfortunately I don't have serial console there, so the only log I have
> > > > is a photo of VGA console (attached). Below I retype part of the 
> > > > message:
> > > > 
> > > > (XEN) [ Xen-4.12.0-3.fc29  x86_64  debug=n   Not tainted ]
> > > > (XEN) CPU:0
> > > > (XEN) RIP:e008:[<00f6>] 00f6
> > > > (XEN) RFLAGS: 00010287   CONTEXT: hypervisor (d0v0)
> > > > ...
> > > > (XEN) Xen call trace:
> > > > (XEN)[<00f6>] 00f6
> > > > (XEN)[] flushtlb.c#pre_flush+0x3d/0x80
> > > > (XEN)[  ] efi_runtime_call+0x493/0xbd0
> > > > (XEN)[  ] efi_runtime_call+0x441/0xbd0
> > > > (XEN)[  ] vcpu_restore_fpu_nonlazy+0xe7/0x180
> > > > (XEN)[  ] do_platform_op+0/0x1880
> > > > (XEN)[  ] do_platform_op+0xb9c/0x1880
> > > > (XEN)[  ] do_platform_op+0xb9c/0x1880
> > > > (XEN)[  ] 
> > > > sched_credit2.c#csched2_schedule+0xcd0/0x13a0
> > > > (XEN)[  ] lstar_enter+0xae/0x120
> > > > (XEN)[  ] do_platform_op+0/0x1880
> > > > (XEN)[  ] pv_hypercall+0x152/0x220
> > > > (XEN)[  ] lstar_enter+0xae/0x120
> > > > (XEN)[  ] lstar_enter+0xa2/0x120
> > > > (XEN)[  ] lstar_enter+0xae/0x120
> > > > (XEN)[  ] lstar_enter+0xa2/0x120
> > > > (XEN)[  ] lstar_enter+0xae/0x120
> > > > (XEN)[  ] lstar_enter+0xa2/0x120
> > > > (XEN)[  ] lstar_enter+0xae/0x120
> > > > (XEN)[  ] lstar_enter+0xa2/0x120
> > > > (XEN)[  ] lstar_enter+0xae/0x120
> > > > (XEN)[  ] lstar_enter+0xa2/0x120
> > > > (XEN)[  ] lstar_enter+0xae/0x120
> > > > (XEN)[  ] lstar_enter+0x10c/0x120
> > > > (XEN)
> > > > (XEN)
> > > > (XEN) *
> > > > (XEN) Panic on CPU 0:
> > > > (XEN) FATAL TRAP: vector = 0 (divide error)
> > > > (XEN) [error_code=]
> > > > (XEN) *
> > > > 
> > > > Any idea? Lack of serial console doesn't help with collecting logs...
> > > 
> > > May I suggest that you repeat this with a debug hypervisor, such that
> > > the call trace becomes more usable/sensible?
> > 
> > Actually, I've already tried, but every other build I try, I get even
> > less useful call trace, for example debug unstable build:
> > 
> >  Xen call trace:
> > [<7b751c09>] 00007b751c09
> > [<8c2b0398edaa>] 8c2b0398edaa
> > ...
> >  GENERAL PROTECTION FAULT
> >  [error_code=]
> 
> But this makes reasonable sense: The faulting insn is "call *0x18(%rax)"
> and %rax is 6954b2b0, which points into a block of type
> EfiBootServicesData (and hence likely reused by the time of the crash
> for other purposes). Hence the "/mapbs" option of xen.efi might help
> here too, but iirc there's no equivalent for it in the MB2 case.

Oh, yes, indeed in Qubes we have /mapbs enabled by default with xen.efi.
I'd like to add it to MB2 case too. Is command line parsed at this point
(efi_exit_boot()) already? 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.12 panic on Thinkpad W540 with UEFI mutiboot2, efi=no-rs workarounds it

2019-08-07 Thread Marek Marczykowski-Górecki
On Wed, Aug 07, 2019 at 05:58:59PM +0200, Jan Beulich wrote:
> On 07.08.2019 17:51, Marek Marczykowski-Górecki  wrote:
> > On Wed, Aug 07, 2019 at 05:33:05PM +0200, Jan Beulich wrote:
> > > On 07.08.2019 17:17, Marek Marczykowski-Górecki  wrote:
> > > > Actually, I've already tried, but every other build I try, I get even
> > > > less useful call trace, for example debug unstable build:
> > > > 
> > > >   Xen call trace:
> > > >  [<7b751c09>] 7b751c09
> > > >  [<8c2b0398edaa>] 8c2b0398edaa
> > > > ...
> > > >   GENERAL PROTECTION FAULT
> > > >   [error_code=]
> > > 
> > > But this makes reasonable sense: The faulting insn is "call *0x18(%rax)"
> > > and %rax is 6954b2b0, which points into a block of type
> > > EfiBootServicesData (and hence likely reused by the time of the crash
> > > for other purposes). Hence the "/mapbs" option of xen.efi might help
> > > here too, but iirc there's no equivalent for it in the MB2 case.
> > 
> > Oh, yes, indeed in Qubes we have /mapbs enabled by default with xen.efi.
> > I'd like to add it to MB2 case too. Is command line parsed at this point
> > (efi_exit_boot()) already?
> 
> I'm struggling a little how to reply, considering that I've already
> said "iirc there's no equivalent for it in the MB2 case". So I guess
> I'm simply not understanding your question, or more specifically
> where you want to get with it.

/mapbs option is very specific to xen.efi. I'd like to add a means to
set map_bs variable in MB2 case too. I'm asking whether I can use
standard command line parser, or do I need something special extracting
it from MB2 structures directly.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   3   4   5   6   7   8   9   10   >