Re: [PATCH 0/2] Two sets of trivials

2022-06-15 Thread Klaus Jensen
On Jun 14 11:40, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> I've sent the 3 char set last month, but have updated
> it a little; I cleaned up a comment style that was already
> broken so checkpatch is happy.
> 
> The 'namesapce' is a new patch; it's amazing how many places
> make the same typo!
> 
> Dave
> 
> Dr. David Alan Gilbert (2):
>   Trivial: 3 char repeat typos
>   trivial typos: namesapce
> 
>  hw/9pfs/9p-xattr-user.c  | 8 
>  hw/acpi/nvdimm.c | 2 +-
>  hw/intc/openpic.c| 2 +-
>  hw/net/imx_fec.c | 2 +-
>  hw/nvme/ctrl.c   | 2 +-
>  hw/pci/pcie_aer.c| 2 +-
>  hw/pci/shpc.c| 3 ++-
>  hw/ppc/spapr_caps.c  | 2 +-
>  hw/scsi/spapr_vscsi.c| 2 +-
>  qapi/net.json| 2 +-
>  tools/virtiofsd/passthrough_ll.c | 2 +-
>  ui/input.c   | 2 +-
>  12 files changed, 16 insertions(+), 15 deletions(-)
> 
> -- 
> 2.36.1
> 

Nice (and Thanks)!

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature


Re: [PULL 00/10] Block jobs & NBD patches

2022-06-15 Thread Vladimir Sementsov-Ogievskiy

On 6/14/22 21:05, Richard Henderson wrote:

On 6/14/22 03:29, Vladimir Sementsov-Ogievskiy wrote:

The following changes since commit debd0753663bc89c86f5462a53268f2e3f680f60:

   Merge tag 'pull-testing-next-140622-1' of https://github.com/stsquad/qemu 
into staging (2022-06-13 21:10:57 -0700)

are available in the Git repository at:

   https://gitlab.com/vsementsov/qemu.git tags/pull-block-2022-06-14

for you to fetch changes up to 5aef6747a250f545ff53ba7e1a3ed7a3d166011a:

   MAINTAINERS: update Vladimir's address and repositories (2022-06-14 12:51:48 
+0300)


Block jobs & NBD patches

- add new options for copy-before-write filter
- new trace points for NBD
- prefer unsigned type for some 'in_flight' fields
- update my addresses in MAINTAINERS (already in Stefan's tree, but
   I think it's OK to send it with this PULL)


Note also, that I've recently updated my pgp key with new address and
new expire time.
Updated key is here: 
https://keys.openpgp.org/search?q=vsementsov%40yandex-team.ru


This introduces or exposes new timeouts:

https://gitlab.com/qemu-project/qemu/-/pipelines/563590515/failures



Not obvious from logs, which iotest hangs. But excluding iotests that passed, 
it becomes obvious that problem is in copy-before-write iotest, which is added 
and then updated in the series..

And most probably, that's a new timeout feature, that doesn't work (patches 
04-07).. It works for me locally still. I'd be glad if someone could look it 
through.

I think, for now, I'll just resend a pull request without these 4 patches.

Also, could/should I run all these test pipelines on gitlab by hand before 
sending a PULL request? Or can I rerun them on my qemu fork for debugging?


--
Best regards,
Vladimir



Re: [PATCH 2/5] tests/qemu-iotests: skip 108 when FUSE is not loaded

2022-06-15 Thread John Snow
On Tue, Jun 14, 2022 at 10:30 AM John Snow  wrote:
>
> On Tue, Jun 14, 2022 at 4:59 AM Daniel P. Berrangé  
> wrote:
> >
> > On Tue, Jun 14, 2022 at 06:46:35AM +0200, Thomas Huth wrote:
> > > On 14/06/2022 03.50, John Snow wrote:
> > > > In certain container environments we may not have FUSE at all, so skip
> > > > the test in this circumstance too.
> > > >
> > > > Signed-off-by: John Snow 
> > > > ---
> > > >   tests/qemu-iotests/108 | 6 ++
> > > >   1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
> > > > index 9e923d6a59f..e401c5e9933 100755
> > > > --- a/tests/qemu-iotests/108
> > > > +++ b/tests/qemu-iotests/108
> > > > @@ -60,6 +60,12 @@ if sudo -n losetup &>/dev/null; then
> > > >   else
> > > >   loopdev=false
> > > > +# Check for fuse support in the host environment:
> > > > +lsmod | grep fuse &>/dev/null;
> > >
> > > That doesn't work if fuse has been linked statically into the kernel. 
> > > Would
> > > it make sense to test for /sys/fs/fuse instead?
> > >
> > > (OTOH, we likely hardly won't run this on statically linked kernels 
> > > anyway,
> > > so it might not matter too much)
> >
> > But more importantly 'lsmod' may not be installed in our container
> > images. So checking /sys/fs/fuse avoids introducing a dep on the
> > 'kmod' package.
> >
> > >
> > > > +if [[ $? -ne 0 ]]; then
> > >
> > > I'd prefer single "[" instead of "[[" ... but since we're requiring bash
> > > anyway, it likely doesn't matter.
> >
> > Or
> >
> > if  test $? != 0 ; then
> >
> > >
> > > > +_notrun 'No Passwordless sudo nor FUSE kernel module'
> > > > +fi
> > > > +
> > > >   # QSD --export fuse will either yield "Parameter 'id' is missing"
> > > >   # or "Invalid parameter 'fuse'", depending on whether there is
> > > >   # FUSE support or not.
> > >
>
> Good suggestions, thanks!
>

I think I need to test against /dev/fuse instead, because /sys/fs/fuse
actually exists, but because of docker permissions (etc), FUSE isn't
actually usable from the child container.

I wound up with this:

# Check for usable FUSE in the host environment:
if test ! -c "/dev/fuse"; then
_notrun 'No passwordless sudo nor usable /dev/fuse'
fi

Seems to work for my case here, at least, but I don't have a good
sense for how broadly flexible it might be. It might be nicer to
concoct some kind of NOP fuse mount instead, but I wasn't able to
figure out such a command quickly.

The next problem I have is actually related; test-qga (for the
Centos.x86_64 run) is failing because the guest agent is reading
/proc/self/mountinfo -- which contains entries for block devices that
are not visible in the current container scope. I think when QGA goes
to read info about these devices to populate a response, it chokes.
This might be a genuine bug in QGA if we want it to tolerate existing
inside of a container.

--js




Re: [PULL 00/10] Block jobs & NBD patches

2022-06-15 Thread Richard Henderson

On 6/15/22 02:47, Vladimir Sementsov-Ogievskiy wrote:
Also, could/should I run all these test pipelines on gitlab by hand before sending a PULL 
request? Or can I rerun them on my qemu fork for debugging?


The first thing I'd try is make vm-build- and make 
docker-test-full@.

Either or both will reproduce the docker environment being used on gitlab.
If that fails to reproduce, it could be a difference in kernels, at which point I don't 
know how to advise.


It would be a good idea to run those test pipelines manually before the next 
PULL.


r~



Re: [PATCH 2/5] tests/qemu-iotests: skip 108 when FUSE is not loaded

2022-06-15 Thread Daniel P . Berrangé
On Wed, Jun 15, 2022 at 09:41:32AM -0400, John Snow wrote:
> On Tue, Jun 14, 2022 at 10:30 AM John Snow  wrote:
> >
> > On Tue, Jun 14, 2022 at 4:59 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Tue, Jun 14, 2022 at 06:46:35AM +0200, Thomas Huth wrote:
> > > > On 14/06/2022 03.50, John Snow wrote:
> > > > > In certain container environments we may not have FUSE at all, so skip
> > > > > the test in this circumstance too.
> > > > >
> > > > > Signed-off-by: John Snow 
> > > > > ---
> > > > >   tests/qemu-iotests/108 | 6 ++
> > > > >   1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
> > > > > index 9e923d6a59f..e401c5e9933 100755
> > > > > --- a/tests/qemu-iotests/108
> > > > > +++ b/tests/qemu-iotests/108
> > > > > @@ -60,6 +60,12 @@ if sudo -n losetup &>/dev/null; then
> > > > >   else
> > > > >   loopdev=false
> > > > > +# Check for fuse support in the host environment:
> > > > > +lsmod | grep fuse &>/dev/null;
> > > >
> > > > That doesn't work if fuse has been linked statically into the kernel. 
> > > > Would
> > > > it make sense to test for /sys/fs/fuse instead?
> > > >
> > > > (OTOH, we likely hardly won't run this on statically linked kernels 
> > > > anyway,
> > > > so it might not matter too much)
> > >
> > > But more importantly 'lsmod' may not be installed in our container
> > > images. So checking /sys/fs/fuse avoids introducing a dep on the
> > > 'kmod' package.
> > >
> > > >
> > > > > +if [[ $? -ne 0 ]]; then
> > > >
> > > > I'd prefer single "[" instead of "[[" ... but since we're requiring bash
> > > > anyway, it likely doesn't matter.
> > >
> > > Or
> > >
> > > if  test $? != 0 ; then
> > >
> > > >
> > > > > +_notrun 'No Passwordless sudo nor FUSE kernel module'
> > > > > +fi
> > > > > +
> > > > >   # QSD --export fuse will either yield "Parameter 'id' is 
> > > > > missing"
> > > > >   # or "Invalid parameter 'fuse'", depending on whether there is
> > > > >   # FUSE support or not.
> > > >
> >
> > Good suggestions, thanks!
> >
> 
> I think I need to test against /dev/fuse instead, because /sys/fs/fuse
> actually exists, but because of docker permissions (etc), FUSE isn't
> actually usable from the child container.
> 
> I wound up with this:
> 
> # Check for usable FUSE in the host environment:
> if test ! -c "/dev/fuse"; then
> _notrun 'No passwordless sudo nor usable /dev/fuse'
> fi
> 
> Seems to work for my case here, at least, but I don't have a good
> sense for how broadly flexible it might be. It might be nicer to
> concoct some kind of NOP fuse mount instead, but I wasn't able to
> figure out such a command quickly.
> 
> The next problem I have is actually related; test-qga (for the
> Centos.x86_64 run) is failing because the guest agent is reading
> /proc/self/mountinfo -- which contains entries for block devices that
> are not visible in the current container scope. I think when QGA goes
> to read info about these devices to populate a response, it chokes.
> This might be a genuine bug in QGA if we want it to tolerate existing
> inside of a container.

Yes, we should fix this. Even if you don't run QGA in a container,
someone might configure the systemd service to harden it, by
restricting what /dev it is able to see and thus trigger the
same issue.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 2/5] tests/qemu-iotests: skip 108 when FUSE is not loaded

2022-06-15 Thread John Snow
On Wed, Jun 15, 2022 at 11:33 AM Daniel P. Berrangé  wrote:
>
> On Wed, Jun 15, 2022 at 09:41:32AM -0400, John Snow wrote:
> > On Tue, Jun 14, 2022 at 10:30 AM John Snow  wrote:
> > >
> > > On Tue, Jun 14, 2022 at 4:59 AM Daniel P. Berrangé  
> > > wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 06:46:35AM +0200, Thomas Huth wrote:
> > > > > On 14/06/2022 03.50, John Snow wrote:
> > > > > > In certain container environments we may not have FUSE at all, so 
> > > > > > skip
> > > > > > the test in this circumstance too.
> > > > > >
> > > > > > Signed-off-by: John Snow 
> > > > > > ---
> > > > > >   tests/qemu-iotests/108 | 6 ++
> > > > > >   1 file changed, 6 insertions(+)
> > > > > >
> > > > > > diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
> > > > > > index 9e923d6a59f..e401c5e9933 100755
> > > > > > --- a/tests/qemu-iotests/108
> > > > > > +++ b/tests/qemu-iotests/108
> > > > > > @@ -60,6 +60,12 @@ if sudo -n losetup &>/dev/null; then
> > > > > >   else
> > > > > >   loopdev=false
> > > > > > +# Check for fuse support in the host environment:
> > > > > > +lsmod | grep fuse &>/dev/null;
> > > > >
> > > > > That doesn't work if fuse has been linked statically into the kernel. 
> > > > > Would
> > > > > it make sense to test for /sys/fs/fuse instead?
> > > > >
> > > > > (OTOH, we likely hardly won't run this on statically linked kernels 
> > > > > anyway,
> > > > > so it might not matter too much)
> > > >
> > > > But more importantly 'lsmod' may not be installed in our container
> > > > images. So checking /sys/fs/fuse avoids introducing a dep on the
> > > > 'kmod' package.
> > > >
> > > > >
> > > > > > +if [[ $? -ne 0 ]]; then
> > > > >
> > > > > I'd prefer single "[" instead of "[[" ... but since we're requiring 
> > > > > bash
> > > > > anyway, it likely doesn't matter.
> > > >
> > > > Or
> > > >
> > > > if  test $? != 0 ; then
> > > >
> > > > >
> > > > > > +_notrun 'No Passwordless sudo nor FUSE kernel module'
> > > > > > +fi
> > > > > > +
> > > > > >   # QSD --export fuse will either yield "Parameter 'id' is 
> > > > > > missing"
> > > > > >   # or "Invalid parameter 'fuse'", depending on whether there is
> > > > > >   # FUSE support or not.
> > > > >
> > >
> > > Good suggestions, thanks!
> > >
> >
> > I think I need to test against /dev/fuse instead, because /sys/fs/fuse
> > actually exists, but because of docker permissions (etc), FUSE isn't
> > actually usable from the child container.
> >
> > I wound up with this:
> >
> > # Check for usable FUSE in the host environment:
> > if test ! -c "/dev/fuse"; then
> > _notrun 'No passwordless sudo nor usable /dev/fuse'
> > fi
> >
> > Seems to work for my case here, at least, but I don't have a good
> > sense for how broadly flexible it might be. It might be nicer to
> > concoct some kind of NOP fuse mount instead, but I wasn't able to
> > figure out such a command quickly.
> >
> > The next problem I have is actually related; test-qga (for the
> > Centos.x86_64 run) is failing because the guest agent is reading
> > /proc/self/mountinfo -- which contains entries for block devices that
> > are not visible in the current container scope. I think when QGA goes
> > to read info about these devices to populate a response, it chokes.
> > This might be a genuine bug in QGA if we want it to tolerate existing
> > inside of a container.
>
> Yes, we should fix this. Even if you don't run QGA in a container,
> someone might configure the systemd service to harden it, by
> restricting what /dev it is able to see and thus trigger the
> same issue.

Naive solution: if we try to look in /sys/dev/block/%u:%u and find
that we are unable to do so for whatever reason (ENOENT et al), just
skip that entry for the fsinfo returned to the caller.

Does it need to be fancier than that?

--js




[PULL 00/18] Block patches

2022-06-15 Thread Stefan Hajnoczi
The following changes since commit 8e6c70b9d4a1b1f3011805947925cfdb31642f7f:

  Merge tag 'kraxel-20220614-pull-request' of git://git.kraxel.org/qemu into 
staging (2022-06-14 06:21:46 -0700)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 99b969fbe105117f5af6060d3afef40ca39cc9c1:

  linux-aio: explain why max batch is checked in laio_io_unplug() (2022-06-15 
16:43:42 +0100)


Pull request

This pull request includes an important aio=native I/O stall fix, the
experimental vifo-user server, the io_uring_register_ring_fd() optimization for
aio=io_uring, and an update to Vladimir Sementsov-Ogievskiy's maintainership
details.



Jagannathan Raman (14):
  qdev: unplug blocker for devices
  remote/machine: add HotplugHandler for remote machine
  remote/machine: add vfio-user property
  vfio-user: build library
  vfio-user: define vfio-user-server object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: IOMMU support for remote device
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  vfio-user: handle reset of remote device

Sam Li (1):
  Use io_uring_register_ring_fd() to skip fd operations

Stefan Hajnoczi (2):
  linux-aio: fix unbalanced plugged counter in laio_io_unplug()
  linux-aio: explain why max batch is checked in laio_io_unplug()

Vladimir Sementsov-Ogievskiy (1):
  MAINTAINERS: update Vladimir's address and repositories

 MAINTAINERS |  27 +-
 meson_options.txt   |   2 +
 qapi/misc.json  |  31 +
 qapi/qom.json   |  20 +-
 configure   |  17 +
 meson.build |  24 +-
 include/exec/memory.h   |   3 +
 include/hw/pci/msi.h|   1 +
 include/hw/pci/msix.h   |   1 +
 include/hw/pci/pci.h|  13 +
 include/hw/qdev-core.h  |  29 +
 include/hw/remote/iommu.h   |  40 +
 include/hw/remote/machine.h |   4 +
 include/hw/remote/vfio-user-obj.h   |   6 +
 block/io_uring.c|  12 +-
 block/linux-aio.c   |  10 +-
 hw/core/qdev.c  |  24 +
 hw/pci/msi.c|  49 +-
 hw/pci/msix.c   |  35 +-
 hw/pci/pci.c|  13 +
 hw/remote/iommu.c   | 131 
 hw/remote/machine.c |  88 ++-
 hw/remote/vfio-user-obj.c   | 958 
 softmmu/physmem.c   |   4 +-
 softmmu/qdev-monitor.c  |   4 +
 stubs/vfio-user-obj.c   |   6 +
 tests/qtest/fuzz/generic_fuzz.c |   9 +-
 .gitlab-ci.d/buildtest.yml  |   1 +
 .gitmodules |   3 +
 Kconfig.host|   4 +
 hw/remote/Kconfig   |   4 +
 hw/remote/meson.build   |   4 +
 hw/remote/trace-events  |  11 +
 scripts/meson-buildoptions.sh   |   4 +
 stubs/meson.build   |   1 +
 subprojects/libvfio-user|   1 +
 tests/docker/dockerfiles/centos8.docker |   2 +
 37 files changed, 1565 insertions(+), 31 deletions(-)
 create mode 100644 include/hw/remote/iommu.h
 create mode 100644 include/hw/remote/vfio-user-obj.h
 create mode 100644 hw/remote/iommu.c
 create mode 100644 hw/remote/vfio-user-obj.c
 create mode 100644 stubs/vfio-user-obj.c
 create mode 16 subprojects/libvfio-user

-- 
2.36.1




[PULL 03/18] qdev: unplug blocker for devices

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Add blocker to prevent hot-unplug of devices

TYPE_VFIO_USER_SERVER, which is introduced shortly, attaches itself to a
PCIDevice on which it depends. If the attached PCIDevice gets removed
while the server in use, it could cause it crash. To prevent this,
TYPE_VFIO_USER_SERVER adds an unplug blocker for the PCIDevice.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
c41ef80b7cc063314d629737bed2159e5713f2e0.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/qdev-core.h | 29 +
 hw/core/qdev.c | 24 
 softmmu/qdev-monitor.c |  4 
 3 files changed, 57 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 92c3d65208..98774e2835 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@ struct DeviceState {
 int instance_id_alias;
 int alias_required_for_version;
 ResettableState reset;
+GSList *unplug_blockers;
 };
 
 struct DeviceListener {
@@ -419,6 +420,34 @@ void qdev_simple_device_unplug_cb(HotplugHandler 
*hotplug_dev,
 void qdev_machine_creation_done(void);
 bool qdev_machine_modified(void);
 
+/**
+ * qdev_add_unplug_blocker: Add an unplug blocker to a device
+ *
+ * @dev: Device to be blocked from unplug
+ * @reason: Reason for blocking
+ */
+void qdev_add_unplug_blocker(DeviceState *dev, Error *reason);
+
+/**
+ * qdev_del_unplug_blocker: Remove an unplug blocker from a device
+ *
+ * @dev: Device to be unblocked
+ * @reason: Pointer to the Error used with qdev_add_unplug_blocker.
+ *  Used as a handle to lookup the blocker for deletion.
+ */
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason);
+
+/**
+ * qdev_unplug_blocked: Confirm if a device is blocked from unplug
+ *
+ * @dev: Device to be tested
+ * @reason: Returns one of the reasons why the device is blocked,
+ *  if any
+ *
+ * Returns: true if device is blocked from unplug, false otherwise
+ */
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp);
+
 /**
  * GpioPolarity: Polarity of a GPIO line
  *
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 84f3019440..0806d8fcaa 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -468,6 +468,28 @@ char *qdev_get_dev_path(DeviceState *dev)
 return NULL;
 }
 
+void qdev_add_unplug_blocker(DeviceState *dev, Error *reason)
+{
+dev->unplug_blockers = g_slist_prepend(dev->unplug_blockers, reason);
+}
+
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason)
+{
+dev->unplug_blockers = g_slist_remove(dev->unplug_blockers, reason);
+}
+
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp)
+{
+ERRP_GUARD();
+
+if (dev->unplug_blockers) {
+error_propagate(errp, error_copy(dev->unplug_blockers->data));
+return true;
+}
+
+return false;
+}
+
 static bool device_get_realized(Object *obj, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
@@ -704,6 +726,8 @@ static void device_finalize(Object *obj)
 
 DeviceState *dev = DEVICE(obj);
 
+g_assert(!dev->unplug_blockers);
+
 QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
 QLIST_REMOVE(ngl, node);
 qemu_free_irqs(ngl->in, ngl->num_in);
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index bb5897fc76..4b0ef65780 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -899,6 +899,10 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 HotplugHandlerClass *hdc;
 Error *local_err = NULL;
 
+if (qdev_unplug_blocked(dev, errp)) {
+return;
+}
+
 if (dev->parent_bus && !qbus_is_hotpluggable(dev->parent_bus)) {
 error_setg(errp, QERR_BUS_NO_HOTPLUG, dev->parent_bus->name);
 return;
-- 
2.36.1




[PULL 02/18] Use io_uring_register_ring_fd() to skip fd operations

2022-06-15 Thread Stefan Hajnoczi
From: Sam Li 

Linux recently added a new io_uring(7) optimization API that QEMU
doesn't take advantage of yet. The liburing library that QEMU uses
has added a corresponding new API calling io_uring_register_ring_fd().
When this API is called after creating the ring, the io_uring_submit()
library function passes a flag to the io_uring_enter(2) syscall
allowing it to skip the ring file descriptor fdget()/fdput()
operations. This saves some CPU cycles.

Signed-off-by: Sam Li 
Message-id: 20220531105011.111082-1-faithilike...@gmail.com
Signed-off-by: Stefan Hajnoczi 
---
 meson.build  |  1 +
 block/io_uring.c | 12 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 0c2e11ff07..9e65cc5367 100644
--- a/meson.build
+++ b/meson.build
@@ -1752,6 +1752,7 @@ config_host_data.set('CONFIG_LIBNFS', libnfs.found())
 config_host_data.set('CONFIG_LIBSSH', libssh.found())
 config_host_data.set('CONFIG_LINUX_AIO', libaio.found())
 config_host_data.set('CONFIG_LINUX_IO_URING', linux_io_uring.found())
+config_host_data.set('CONFIG_LIBURING_REGISTER_RING_FD', 
cc.has_function('io_uring_register_ring_fd', prefix: '#include ', 
dependencies:linux_io_uring))
 config_host_data.set('CONFIG_LIBPMEM', libpmem.found())
 config_host_data.set('CONFIG_NUMA', numa.found())
 config_host_data.set('CONFIG_OPENGL', opengl.found())
diff --git a/block/io_uring.c b/block/io_uring.c
index 0b401512b9..d48e472e74 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -18,6 +18,7 @@
 #include "qapi/error.h"
 #include "trace.h"
 
+
 /* io_uring ring size */
 #define MAX_ENTRIES 128
 
@@ -434,8 +435,17 @@ LuringState *luring_init(Error **errp)
 }
 
 ioq_init(&s->io_q);
+#ifdef CONFIG_LIBURING_REGISTER_RING_FD
+if (io_uring_register_ring_fd(&s->ring) < 0) {
+/*
+ * Only warn about this error: we will fallback to the non-optimized
+ * io_uring operations.
+ */
+warn_report("failed to register linux io_uring ring file descriptor");
+}
+#endif
+
 return s;
-
 }
 
 void luring_cleanup(LuringState *s)
-- 
2.36.1




[PULL 04/18] remote/machine: add HotplugHandler for remote machine

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Allow hotplugging of PCI(e) devices to remote machine

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
d1e6cfa0afb528ad343758f9b1d918be0175c5e5.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/remote/machine.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 92d71d47bb..a97e53e250 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -20,6 +20,7 @@
 #include "qapi/error.h"
 #include "hw/pci/pci_host.h"
 #include "hw/remote/iohub.h"
+#include "hw/qdev-core.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -53,14 +54,19 @@ static void remote_machine_init(MachineState *machine)
 
 pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
  &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+
+qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
 
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
+HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
 mc->init = remote_machine_init;
 mc->desc = "Experimental remote machine";
+
+hc->unplug = qdev_simple_device_unplug_cb;
 }
 
 static const TypeInfo remote_machine = {
@@ -68,6 +74,10 @@ static const TypeInfo remote_machine = {
 .parent = TYPE_MACHINE,
 .instance_size = sizeof(RemoteMachineState),
 .class_init = remote_machine_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_HOTPLUG_HANDLER },
+{ }
+}
 };
 
 static void remote_machine_register_types(void)
-- 
2.36.1




[PULL 01/18] MAINTAINERS: update Vladimir's address and repositories

2022-06-15 Thread Stefan Hajnoczi
From: Vladimir Sementsov-Ogievskiy 

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20220526115432.138384-1-vsement...@yandex-team.ru
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4cf6174f9f..5ba93348aa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2546,7 +2546,7 @@ F: scsi/*
 
 Block Jobs
 M: John Snow 
-M: Vladimir Sementsov-Ogievskiy 
+M: Vladimir Sementsov-Ogievskiy 
 L: qemu-block@nongnu.org
 S: Supported
 F: blockjob.c
@@ -2571,7 +2571,7 @@ F: block/aio_task.c
 F: util/qemu-co-shared-resource.c
 F: include/qemu/co-shared-resource.h
 T: git https://gitlab.com/jsnow/qemu.git jobs
-T: git https://src.openvz.org/scm/~vsementsov/qemu.git jobs
+T: git https://gitlab.com/vsementsov/qemu.git block
 
 Block QAPI, monitor, command line
 M: Markus Armbruster 
@@ -2592,7 +2592,7 @@ F: include/hw/cxl/
 
 Dirty Bitmaps
 M: Eric Blake 
-M: Vladimir Sementsov-Ogievskiy 
+M: Vladimir Sementsov-Ogievskiy 
 R: John Snow 
 L: qemu-block@nongnu.org
 S: Supported
@@ -2606,6 +2606,7 @@ F: util/hbitmap.c
 F: tests/unit/test-hbitmap.c
 F: docs/interop/bitmaps.rst
 T: git https://repo.or.cz/qemu/ericb.git bitmaps
+T: git https://gitlab.com/vsementsov/qemu.git block
 
 Character device backends
 M: Marc-André Lureau 
@@ -2816,16 +2817,17 @@ F: scripts/*.py
 F: tests/*.py
 
 Benchmark util
-M: Vladimir Sementsov-Ogievskiy 
+M: Vladimir Sementsov-Ogievskiy 
 S: Maintained
 F: scripts/simplebench/
-T: git https://src.openvz.org/scm/~vsementsov/qemu.git simplebench
+T: git https://gitlab.com/vsementsov/qemu.git simplebench
 
 Transactions helper
-M: Vladimir Sementsov-Ogievskiy 
+M: Vladimir Sementsov-Ogievskiy 
 S: Maintained
 F: include/qemu/transactions.h
 F: util/transactions.c
+T: git https://gitlab.com/vsementsov/qemu.git block
 
 QAPI
 M: Markus Armbruster 
@@ -3402,7 +3404,7 @@ F: block/iscsi-opts.c
 
 Network Block Device (NBD)
 M: Eric Blake 
-M: Vladimir Sementsov-Ogievskiy 
+M: Vladimir Sementsov-Ogievskiy 
 L: qemu-block@nongnu.org
 S: Maintained
 F: block/nbd*
@@ -3414,7 +3416,7 @@ F: docs/interop/nbd.txt
 F: docs/tools/qemu-nbd.rst
 F: tests/qemu-iotests/tests/*nbd*
 T: git https://repo.or.cz/qemu/ericb.git nbd
-T: git https://src.openvz.org/scm/~vsementsov/qemu.git nbd
+T: git https://gitlab.com/vsementsov/qemu.git block
 
 NFS
 M: Peter Lieven 
@@ -3499,13 +3501,13 @@ F: block/dmg.c
 parallels
 M: Stefan Hajnoczi 
 M: Denis V. Lunev 
-M: Vladimir Sementsov-Ogievskiy 
+M: Vladimir Sementsov-Ogievskiy 
 L: qemu-block@nongnu.org
 S: Supported
 F: block/parallels.c
 F: block/parallels-ext.c
 F: docs/interop/parallels.txt
-T: git https://src.openvz.org/scm/~vsementsov/qemu.git parallels
+T: git https://gitlab.com/vsementsov/qemu.git block
 
 qed
 M: Stefan Hajnoczi 
-- 
2.36.1




[PULL 08/18] vfio-user: instantiate vfio-user context

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

create a context with the vfio-user library to run a PCI device

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
a452871ac8c812ff96fc4f0ce6037f4769953fab.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/remote/vfio-user-obj.c | 82 +++
 1 file changed, 82 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index bc49adcc27..68f8a9dfa9 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -40,6 +40,9 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/notify.h"
+#include "sysemu/sysemu.h"
+#include "libvfio-user.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -73,8 +76,14 @@ struct VfuObject {
 char *device;
 
 Error *err;
+
+Notifier machine_done;
+
+vfu_ctx_t *vfu_ctx;
 };
 
+static void vfu_object_init_ctx(VfuObject *o, Error **errp);
+
 static bool vfu_object_auto_shutdown(void)
 {
 bool auto_shutdown = true;
@@ -107,6 +116,11 @@ static void vfu_object_set_socket(Object *obj, Visitor *v, 
const char *name,
 {
 VfuObject *o = VFU_OBJECT(obj);
 
+if (o->vfu_ctx) {
+error_setg(errp, "vfu: Unable to set socket property - server busy");
+return;
+}
+
 qapi_free_SocketAddress(o->socket);
 
 o->socket = NULL;
@@ -122,17 +136,69 @@ static void vfu_object_set_socket(Object *obj, Visitor 
*v, const char *name,
 }
 
 trace_vfu_prop("socket", o->socket->u.q_unix.path);
+
+vfu_object_init_ctx(o, errp);
 }
 
 static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
 {
 VfuObject *o = VFU_OBJECT(obj);
 
+if (o->vfu_ctx) {
+error_setg(errp, "vfu: Unable to set device property - server busy");
+return;
+}
+
 g_free(o->device);
 
 o->device = g_strdup(str);
 
 trace_vfu_prop("device", str);
+
+vfu_object_init_ctx(o, errp);
+}
+
+/*
+ * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
+ * properties. It also depends on devices instantiated in QEMU. These
+ * dependencies are not available during the instance_init phase of this
+ * object's life-cycle. As such, the server is initialized after the
+ * machine is setup. machine_init_done_notifier notifies TYPE_VFU_OBJECT
+ * when the machine is setup, and the dependencies are available.
+ */
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+VfuObject *o = container_of(notifier, VfuObject, machine_done);
+Error *err = NULL;
+
+vfu_object_init_ctx(o, &err);
+
+if (err) {
+error_propagate(&error_abort, err);
+}
+}
+
+static void vfu_object_init_ctx(VfuObject *o, Error **errp)
+{
+ERRP_GUARD();
+
+if (o->vfu_ctx || !o->socket || !o->device ||
+!phase_check(PHASE_MACHINE_READY)) {
+return;
+}
+
+if (o->err) {
+error_propagate(errp, o->err);
+o->err = NULL;
+return;
+}
+
+o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+o, VFU_DEV_TYPE_PCI);
+if (o->vfu_ctx == NULL) {
+error_setg(errp, "vfu: Failed to create context - %s", 
strerror(errno));
+return;
+}
 }
 
 static void vfu_object_init(Object *obj)
@@ -147,6 +213,12 @@ static void vfu_object_init(Object *obj)
TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
 return;
 }
+
+if (!phase_check(PHASE_MACHINE_READY)) {
+o->machine_done.notify = vfu_object_machine_done;
+qemu_add_machine_init_done_notifier(&o->machine_done);
+}
+
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -160,6 +232,11 @@ static void vfu_object_finalize(Object *obj)
 
 o->socket = NULL;
 
+if (o->vfu_ctx) {
+vfu_destroy_ctx(o->vfu_ctx);
+o->vfu_ctx = NULL;
+}
+
 g_free(o->device);
 
 o->device = NULL;
@@ -167,6 +244,11 @@ static void vfu_object_finalize(Object *obj)
 if (!k->nr_devs && vfu_object_auto_shutdown()) {
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
 }
+
+if (o->machine_done.notify) {
+qemu_remove_machine_init_done_notifier(&o->machine_done);
+o->machine_done.notify = NULL;
+}
 }
 
 static void vfu_object_class_init(ObjectClass *klass, void *data)
-- 
2.36.1




[PULL 05/18] remote/machine: add vfio-user property

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Add vfio-user to x-remote machine. It is a boolean, which indicates if
the machine supports vfio-user protocol. The machine configures the bus
differently vfio-user and multiprocess protocols, so this property
informs it on how to configure the bus.

This property should be short lived. Once vfio-user fully replaces
multiprocess, this property could be removed.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
5d51a152a419cbda35d070b8e49b772b60a7230a.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/remote/machine.h |  2 ++
 hw/remote/machine.c | 23 +++
 2 files changed, 25 insertions(+)

diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 2a2a33c4b2..8d0fa98d33 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -22,6 +22,8 @@ struct RemoteMachineState {
 
 RemotePCIHost *host;
 RemoteIOHubState iohub;
+
+bool vfio_user;
 };
 
 /* Used to pass to co-routine device and ioc. */
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index a97e53e250..9f3cdc55c3 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -58,6 +58,25 @@ static void remote_machine_init(MachineState *machine)
 qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
 
+static bool remote_machine_get_vfio_user(Object *obj, Error **errp)
+{
+RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+return s->vfio_user;
+}
+
+static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
+{
+RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+if (phase_check(PHASE_MACHINE_CREATED)) {
+error_setg(errp, "Error enabling vfio-user - machine already created");
+return;
+}
+
+s->vfio_user = value;
+}
+
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -67,6 +86,10 @@ static void remote_machine_class_init(ObjectClass *oc, void 
*data)
 mc->desc = "Experimental remote machine";
 
 hc->unplug = qdev_simple_device_unplug_cb;
+
+object_class_property_add_bool(oc, "vfio-user",
+   remote_machine_get_vfio_user,
+   remote_machine_set_vfio_user);
 }
 
 static const TypeInfo remote_machine = {
-- 
2.36.1




[PULL 10/18] vfio-user: run vfio-user context

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Setup a handler to run vfio-user context. The context is driven by
messages to the file descriptor associated with it - get the fd for
the context and hook up the handler with it

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
e934b0090529d448b6a7972b21dfc3d7421ce494.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 qapi/misc.json|  31 ++
 hw/remote/vfio-user-obj.c | 118 +-
 2 files changed, 148 insertions(+), 1 deletion(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index 45344483cd..27ef5a2b20 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -553,3 +553,34 @@
 ##
 { 'event': 'RTC_CHANGE',
   'data': { 'offset': 'int', 'qom-path': 'str' } }
+
+##
+# @VFU_CLIENT_HANGUP:
+#
+# Emitted when the client of a TYPE_VFIO_USER_SERVER closes the
+# communication channel
+#
+# @vfu-id: ID of the TYPE_VFIO_USER_SERVER object. It is the last component
+#  of @vfu-qom-path referenced below
+#
+# @vfu-qom-path: path to the TYPE_VFIO_USER_SERVER object in the QOM tree
+#
+# @dev-id: ID of attached PCI device
+#
+# @dev-qom-path: path to attached PCI device in the QOM tree
+#
+# Since: 7.1
+#
+# Example:
+#
+# <- { "event": "VFU_CLIENT_HANGUP",
+#  "data": { "vfu-id": "vfu1",
+#"vfu-qom-path": "/objects/vfu1",
+#"dev-id": "sas1",
+#"dev-qom-path": "/machine/peripheral/sas1" },
+#  "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+#
+##
+{ 'event': 'VFU_CLIENT_HANGUP',
+  'data': { 'vfu-id': 'str', 'vfu-qom-path': 'str',
+'dev-id': 'str', 'dev-qom-path': 'str' } }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 3ca6aa2b45..178bd6f8ed 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -27,6 +27,9 @@
  *
  * device - id of a device on the server, a required option. PCI devices
  *  alone are supported presently.
+ *
+ * notes - x-vfio-user-server could block IO and monitor during the
+ * initialization phase.
  */
 
 #include "qemu/osdep.h"
@@ -40,11 +43,14 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "qemu/timer.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -86,6 +92,8 @@ struct VfuObject {
 PCIDevice *pci_dev;
 
 Error *unplug_blocker;
+
+int vfu_poll_fd;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -164,6 +172,78 @@ static void vfu_object_set_device(Object *obj, const char 
*str, Error **errp)
 vfu_object_init_ctx(o, errp);
 }
 
+static void vfu_object_ctx_run(void *opaque)
+{
+VfuObject *o = opaque;
+const char *vfu_id;
+char *vfu_path, *pci_dev_path;
+int ret = -1;
+
+while (ret != 0) {
+ret = vfu_run_ctx(o->vfu_ctx);
+if (ret < 0) {
+if (errno == EINTR) {
+continue;
+} else if (errno == ENOTCONN) {
+vfu_id = object_get_canonical_path_component(OBJECT(o));
+vfu_path = object_get_canonical_path(OBJECT(o));
+g_assert(o->pci_dev);
+pci_dev_path = object_get_canonical_path(OBJECT(o->pci_dev));
+ /* o->device is a required property and is non-NULL here */
+g_assert(o->device);
+qapi_event_send_vfu_client_hangup(vfu_id, vfu_path,
+  o->device, pci_dev_path);
+qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+o->vfu_poll_fd = -1;
+object_unparent(OBJECT(o));
+g_free(vfu_path);
+g_free(pci_dev_path);
+break;
+} else {
+VFU_OBJECT_ERROR(o, "vfu: Failed to run device %s - %s",
+ o->device, strerror(errno));
+break;
+}
+}
+}
+}
+
+static void vfu_object_attach_ctx(void *opaque)
+{
+VfuObject *o = opaque;
+GPollFD pfds[1];
+int ret;
+
+qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+
+pfds[0].fd = o->vfu_poll_fd;
+pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+
+retry_attach:
+ret = vfu_attach_ctx(o->vfu_ctx);
+if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
+/**
+ * vfu_object_attach_ctx can block QEMU's main loop
+ * during attach - the monitor and other IO
+ * could be unresponsive during this time.
+ */
+(void)qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS);
+goto retry_attach;

[PULL 06/18] vfio-user: build library

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

add the libvfio-user library as a submodule. build it as a meson
subproject.

libvfio-user is distributed with BSD 3-Clause license and
json-c with MIT (Expat) license

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
c2adec87958b081d1dc8775d4aa05c897912f025.1655151679.git.jag.ra...@oracle.com

[Changed submodule URL to QEMU's libvfio-user mirror on GitLab. The QEMU
project mirrors its dependencies so that it can provide full source code
even in the event that its dependencies become unavailable. Note that
the mirror repo is manually updated, so please contact me to make newer
libvfio-user commits available. If I become a bottleneck we can set up a
cronjob.

Updated scripts/meson-buildoptions.sh to match the meson_options.txt
change. Failure to do so can result in scripts/meson-buildoptions.sh
being modified by the build system later on and you end up with a dirty
working tree.
--Stefan]

Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS |  1 +
 meson_options.txt   |  2 ++
 configure   | 17 +
 meson.build | 23 ++-
 .gitlab-ci.d/buildtest.yml  |  1 +
 .gitmodules |  3 +++
 Kconfig.host|  4 
 hw/remote/Kconfig   |  4 
 hw/remote/meson.build   |  2 ++
 scripts/meson-buildoptions.sh   |  4 
 subprojects/libvfio-user|  1 +
 tests/docker/dockerfiles/centos8.docker |  2 ++
 12 files changed, 63 insertions(+), 1 deletion(-)
 create mode 16 subprojects/libvfio-user

diff --git a/MAINTAINERS b/MAINTAINERS
index 5ba93348aa..d0fcaf0edb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3642,6 +3642,7 @@ F: hw/remote/proxy-memory-listener.c
 F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
+F: subprojects/libvfio-user
 
 EBPF:
 M: Jason Wang 
diff --git a/meson_options.txt b/meson_options.txt
index 0e8197386b..f3e2f22c1e 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -88,6 +88,8 @@ option('cfi_debug', type: 'boolean', value: 'false',
description: 'Verbose errors in case of CFI violation')
 option('multiprocess', type: 'feature', value: 'auto',
description: 'Out of process device emulation support')
+option('vfio_user_server', type: 'feature', value: 'disabled',
+   description: 'vfio-user server support')
 option('dbus_display', type: 'feature', value: 'auto',
description: '-display dbus support')
 option('tpm', type : 'feature', value : 'auto',
diff --git a/configure b/configure
index 4b12a8094c..c14e7f590a 100755
--- a/configure
+++ b/configure
@@ -315,6 +315,7 @@ meson_args=""
 ninja=""
 bindir="bin"
 skip_meson=no
+vfio_user_server="disabled"
 
 # The following Meson options are handled manually (still they
 # are included in the automatically generated help message)
@@ -909,6 +910,10 @@ for opt do
   ;;
   --disable-blobs) meson_option_parse --disable-install-blobs ""
   ;;
+  --enable-vfio-user-server) vfio_user_server="enabled"
+  ;;
+  --disable-vfio-user-server) vfio_user_server="disabled"
+  ;;
   --enable-tcmalloc) meson_option_parse --enable-malloc=tcmalloc tcmalloc
   ;;
   --enable-jemalloc) meson_option_parse --enable-malloc=jemalloc jemalloc
@@ -2132,6 +2137,17 @@ write_container_target_makefile() {
 
 
 
+##
+# check for vfio_user_server
+
+case "$vfio_user_server" in
+  enabled )
+if test "$git_submodules_action" != "ignore"; then
+  git_submodules="${git_submodules} subprojects/libvfio-user"
+fi
+;;
+esac
+
 ##
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -2672,6 +2688,7 @@ if test "$skip_meson" = no; then
   test "$slirp" != auto && meson_option_add "-Dslirp=$slirp"
   test "$smbd" != '' && meson_option_add "-Dsmbd=$smbd"
   test "$tcg" != enabled && meson_option_add "-Dtcg=$tcg"
+  test "$vfio_user_server" != auto && meson_option_add 
"-Dvfio_user_server=$vfio_user_server"
   run_meson() {
 NINJA=$ninja $meson setup --prefix "$prefix" "$@" $cross_arg "$PWD" 
"$source_path"
   }
diff --git a/meson.build b/meson.build
index 9e65cc5367..ca19ddc30c 100644
--- a/meson.build
+++ b/meson.build
@@ -308,6 +308,10 @@ multiprocess_allowed = get_option('multiprocess') \
   .require(targetos == 'linux', error_message: 'Multiprocess QEMU is supported 
only on Linux') \
   .allowed()
 
+vfio_user_server_allowed = get_option('vfio_user_server') \
+  .require(targetos == 'linux', error_message: 'vfio-user server is supported 
only on Linux') \
+  .allowed()
+
 have_tpm = get_option('tpm') \
   .require(targetos != 'windows', error_message: 'TPM emulation only available 
on POSIX systems') \
   .allowed()
@@ -2380,7 +2384,8 @@ 

[PULL 09/18] vfio-user: find and init PCI device

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
7798dbd730099b33fdd00c4c202cfe79e5c5c151.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/remote/vfio-user-obj.c | 67 +++
 1 file changed, 67 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 68f8a9dfa9..3ca6aa2b45 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -43,6 +43,8 @@
 #include "qemu/notify.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -80,6 +82,10 @@ struct VfuObject {
 Notifier machine_done;
 
 vfu_ctx_t *vfu_ctx;
+
+PCIDevice *pci_dev;
+
+Error *unplug_blocker;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -181,6 +187,9 @@ static void vfu_object_machine_done(Notifier *notifier, 
void *data)
 static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 {
 ERRP_GUARD();
+DeviceState *dev = NULL;
+vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
+int ret;
 
 if (o->vfu_ctx || !o->socket || !o->device ||
 !phase_check(PHASE_MACHINE_READY)) {
@@ -199,6 +208,53 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 error_setg(errp, "vfu: Failed to create context - %s", 
strerror(errno));
 return;
 }
+
+dev = qdev_find_recursive(sysbus_get_default(), o->device);
+if (dev == NULL) {
+error_setg(errp, "vfu: Device %s not found", o->device);
+goto fail;
+}
+
+if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+error_setg(errp, "vfu: %s not a PCI device", o->device);
+goto fail;
+}
+
+o->pci_dev = PCI_DEVICE(dev);
+
+object_ref(OBJECT(o->pci_dev));
+
+if (pci_is_express(o->pci_dev)) {
+pci_type = VFU_PCI_TYPE_EXPRESS;
+}
+
+ret = vfu_pci_init(o->vfu_ctx, pci_type, PCI_HEADER_TYPE_NORMAL, 0);
+if (ret < 0) {
+error_setg(errp,
+   "vfu: Failed to attach PCI device %s to context - %s",
+   o->device, strerror(errno));
+goto fail;
+}
+
+error_setg(&o->unplug_blocker,
+   "vfu: %s for %s must be deleted before unplugging",
+   TYPE_VFU_OBJECT, o->device);
+qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+
+return;
+
+fail:
+vfu_destroy_ctx(o->vfu_ctx);
+if (o->unplug_blocker && o->pci_dev) {
+qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+error_free(o->unplug_blocker);
+o->unplug_blocker = NULL;
+}
+if (o->pci_dev) {
+object_unref(OBJECT(o->pci_dev));
+o->pci_dev = NULL;
+}
+o->vfu_ctx = NULL;
 }
 
 static void vfu_object_init(Object *obj)
@@ -241,6 +297,17 @@ static void vfu_object_finalize(Object *obj)
 
 o->device = NULL;
 
+if (o->unplug_blocker && o->pci_dev) {
+qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+error_free(o->unplug_blocker);
+o->unplug_blocker = NULL;
+}
+
+if (o->pci_dev) {
+object_unref(OBJECT(o->pci_dev));
+o->pci_dev = NULL;
+}
+
 if (!k->nr_devs && vfu_object_auto_shutdown()) {
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
 }
-- 
2.36.1




[PULL 13/18] vfio-user: handle DMA mappings

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Define and register callbacks to manage the RAM regions used for
device DMA

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
faacbcd45c4d02c591f0dbfdc19041fbb3eae7eb.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/remote/machine.c   |  5 
 hw/remote/vfio-user-obj.c | 55 +++
 hw/remote/trace-events|  2 ++
 3 files changed, 62 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index cbb2add291..645b54343d 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -22,6 +22,7 @@
 #include "hw/remote/iohub.h"
 #include "hw/remote/iommu.h"
 #include "hw/qdev-core.h"
+#include "hw/remote/iommu.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -51,6 +52,10 @@ static void remote_machine_init(MachineState *machine)
 
 pci_host = PCI_HOST_BRIDGE(rem_host);
 
+if (s->vfio_user) {
+remote_iommu_setup(pci_host->bus);
+}
+
 remote_iohub_init(&s->iohub);
 
 pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index cef473cb98..7b21f77052 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -284,6 +284,54 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, 
char * const buf,
 return count;
 }
 
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+VfuObject *o = vfu_get_private(vfu_ctx);
+AddressSpace *dma_as = NULL;
+MemoryRegion *subregion = NULL;
+g_autofree char *name = NULL;
+struct iovec *iov = &info->iova;
+
+if (!info->vaddr) {
+return;
+}
+
+name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
+   (uint64_t)info->vaddr);
+
+subregion = g_new0(MemoryRegion, 1);
+
+memory_region_init_ram_ptr(subregion, NULL, name,
+   iov->iov_len, info->vaddr);
+
+dma_as = pci_device_iommu_address_space(o->pci_dev);
+
+memory_region_add_subregion(dma_as->root, (hwaddr)iov->iov_base, 
subregion);
+
+trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+VfuObject *o = vfu_get_private(vfu_ctx);
+AddressSpace *dma_as = NULL;
+MemoryRegion *mr = NULL;
+ram_addr_t offset;
+
+mr = memory_region_from_host(info->vaddr, &offset);
+if (!mr) {
+return;
+}
+
+dma_as = pci_device_iommu_address_space(o->pci_dev);
+
+memory_region_del_subregion(dma_as->root, mr);
+
+object_unparent((OBJECT(mr)));
+
+trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -387,6 +435,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 goto fail;
 }
 
+ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register, &dma_unregister);
+if (ret < 0) {
+error_setg(errp, "vfu: Failed to setup DMA handlers for %s",
+   o->device);
+goto fail;
+}
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884346..f945c7e33b 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to 
receive %d size %d,
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
 vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", 
%zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
-- 
2.36.1




[PULL 07/18] vfio-user: define vfio-user-server object

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
e45a17001e9b38f451543a664ababdf860e5f2f2.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS |   1 +
 qapi/qom.json   |  20 +++-
 include/hw/remote/machine.h |   2 +
 hw/remote/machine.c |  27 +
 hw/remote/vfio-user-obj.c   | 210 
 hw/remote/meson.build   |   1 +
 hw/remote/trace-events  |   3 +
 7 files changed, 262 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d0fcaf0edb..cbac72e239 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3643,6 +3643,7 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
+F: hw/remote/vfio-user-obj.c
 
 EBPF:
 M: Jason Wang 
diff --git a/qapi/qom.json b/qapi/qom.json
index 6a653c6636..80dd419b39 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -734,6 +734,20 @@
 { 'struct': 'RemoteObjectProperties',
   'data': { 'fd': 'str', 'devid': 'str' } }
 
+##
+# @VfioUserServerProperties:
+#
+# Properties for x-vfio-user-server objects.
+#
+# @socket: socket to be used by the libvfio-user library
+#
+# @device: the ID of the device to be emulated at the server
+#
+# Since: 7.1
+##
+{ 'struct': 'VfioUserServerProperties',
+  'data': { 'socket': 'SocketAddress', 'device': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -874,7 +888,8 @@
 'tls-creds-psk',
 'tls-creds-x509',
 'tls-cipher-suites',
-{ 'name': 'x-remote-object', 'features': [ 'unstable' ] }
+{ 'name': 'x-remote-object', 'features': [ 'unstable' ] },
+{ 'name': 'x-vfio-user-server', 'features': [ 'unstable' ] }
   ] }
 
 ##
@@ -938,7 +953,8 @@
   'tls-creds-psk':  'TlsCredsPskProperties',
   'tls-creds-x509': 'TlsCredsX509Properties',
   'tls-cipher-suites':  'TlsCredsProperties',
-  'x-remote-object':'RemoteObjectProperties'
+  'x-remote-object':'RemoteObjectProperties',
+  'x-vfio-user-server': 'VfioUserServerProperties'
   } }
 
 ##
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 8d0fa98d33..ac32fda387 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -24,6 +24,8 @@ struct RemoteMachineState {
 RemoteIOHubState iohub;
 
 bool vfio_user;
+
+bool auto_shutdown;
 };
 
 /* Used to pass to co-routine device and ioc. */
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 9f3cdc55c3..4d008ed721 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -77,6 +77,28 @@ static void remote_machine_set_vfio_user(Object *obj, bool 
value, Error **errp)
 s->vfio_user = value;
 }
 
+static bool remote_machine_get_auto_shutdown(Object *obj, Error **errp)
+{
+RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+return s->auto_shutdown;
+}
+
+static void remote_machine_set_auto_shutdown(Object *obj, bool value,
+ Error **errp)
+{
+RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+s->auto_shutdown = value;
+}
+
+static void remote_machine_instance_init(Object *obj)
+{
+RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+s->auto_shutdown = true;
+}
+
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -90,12 +112,17 @@ static void remote_machine_class_init(ObjectClass *oc, 
void *data)
 object_class_property_add_bool(oc, "vfio-user",
remote_machine_get_vfio_user,
remote_machine_set_vfio_user);
+
+object_class_property_add_bool(oc, "auto-shutdown",
+   remote_machine_get_auto_shutdown,
+   remote_machine_set_auto_shutdown);
 }
 
 static const TypeInfo remote_machine = {
 .name = TYPE_REMOTE_MACHINE,
 .parent = TYPE_MACHINE,
 .instance_size = sizeof(RemoteMachineState),
+.instance_init = remote_machine_instance_init,
 .class_init = remote_machine_class_init,
 .interfaces = (InterfaceInfo[]) {
 { TYPE_HOTPLUG_HANDLER },
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 00..bc49adcc27
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,210 @@
+/**
+ * QEMU vfio-user-server server object
+ *
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ * -machine x

[PULL 15/18] vfio-user: handle device interrupts

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Message-id: 
9523479eaafe050677f4de2af5dd0df18c27cfd9.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |   1 +
 include/hw/pci/msi.h  |   1 +
 include/hw/pci/msix.h |   1 +
 include/hw/pci/pci.h  |  13 +++
 include/hw/remote/vfio-user-obj.h |   6 ++
 hw/pci/msi.c  |  49 +++--
 hw/pci/msix.c |  35 ++-
 hw/pci/pci.c  |  13 +++
 hw/remote/machine.c   |  16 ++-
 hw/remote/vfio-user-obj.c | 167 ++
 stubs/vfio-user-obj.c |   6 ++
 hw/remote/trace-events|   1 +
 stubs/meson.build |   1 +
 13 files changed, 298 insertions(+), 12 deletions(-)
 create mode 100644 include/hw/remote/vfio-user-obj.h
 create mode 100644 stubs/vfio-user-obj.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 563259101b..aaa649a50d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3644,6 +3644,7 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: include/hw/remote/vfio-user-obj.h
 F: hw/remote/iommu.c
 F: include/hw/remote/iommu.h
 
diff --git a/include/hw/pci/msi.h b/include/hw/pci/msi.h
index 4087688486..58aa576215 100644
--- a/include/hw/pci/msi.h
+++ b/include/hw/pci/msi.h
@@ -43,6 +43,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector);
 void msi_send_message(PCIDevice *dev, MSIMessage msg);
 void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len);
 unsigned int msi_nr_vectors_allocated(const PCIDevice *dev);
+void msi_set_mask(PCIDevice *dev, int vector, bool mask, Error **errp);
 
 static inline bool msi_present(const PCIDevice *dev)
 {
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 4c4a60c739..4f1cda0ebe 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -36,6 +36,7 @@ void msix_clr_pending(PCIDevice *dev, int vector);
 int msix_vector_use(PCIDevice *dev, unsigned vector);
 void msix_vector_unuse(PCIDevice *dev, unsigned vector);
 void msix_unuse_all_vectors(PCIDevice *dev);
+void msix_set_mask(PCIDevice *dev, int vector, bool mask, Error **errp);
 
 void msix_notify(PCIDevice *dev, unsigned vector);
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 44dacfa224..b54b6ef88f 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -16,6 +16,7 @@ extern bool pci_available;
 #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f)
 #define PCI_FUNC(devfn) ((devfn) & 0x07)
 #define PCI_BUILD_BDF(bus, devfn) ((bus << 8) | (devfn))
+#define PCI_BDF_TO_DEVFN(x) ((x) & 0xff)
 #define PCI_BUS_MAX 256
 #define PCI_DEVFN_MAX   256
 #define PCI_SLOT_MAX32
@@ -127,6 +128,10 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int 
region_num,
 pcibus_t addr, pcibus_t size, int type);
 typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
 
+typedef void MSITriggerFunc(PCIDevice *dev, MSIMessage msg);
+typedef MSIMessage MSIPrepareMessageFunc(PCIDevice *dev, unsigned vector);
+typedef MSIMessage MSIxPrepareMessageFunc(PCIDevice *dev, unsigned vector);
+
 typedef struct PCIIORegion {
 pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
 #define PCI_BAR_UNMAPPED (~(pcibus_t)0)
@@ -329,6 +334,14 @@ struct PCIDevice {
 /* Space to store MSIX table & pending bit array */
 uint8_t *msix_table;
 uint8_t *msix_pba;
+
+/* May be used by INTx or MSI during interrupt notification */
+void *irq_opaque;
+
+MSITriggerFunc *msi_trigger;
+MSIPrepareMessageFunc *msi_prepare_message;
+MSIxPrepareMessageFunc *msix_prepare_message;
+
 /* MemoryRegion container for msix exclusive BAR setup */
 MemoryRegion msix_exclusive_bar;
 /* Memory Regions for MSIX table and pending bit entries. */
diff --git a/include/hw/remote/vfio-user-obj.h 
b/include/hw/remote/vfio-user-obj.h
new file mode 100644
index 00..87ab78b875
--- /dev/null
+++ b/include/hw/remote/vfio-user-obj.h
@@ -0,0 +1,6 @@
+#ifndef VFIO_USER_OBJ_H
+#define VFIO_USER_OBJ_H
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus);
+
+#endif
diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index 47d2b0f33c..5c471b9616 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -134,7 +134,7 @@ void msi_set_message(PCIDevice *dev, MSIMessage msg)
 pci_set_word(dev->config + msi_data_off(dev, msi64bit), msg.data);
 }
 
-MSIMessage msi_get_message(PCIDevice *dev, unsigned int vector)
+static MSIMessage msi_prepare_message(PCIDevice *dev, unsigned int vector)
 {
 uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
 bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
@@ -159,6 +159,11 @@ MSIMessage msi_get_message(PCIDevice *dev,

[PULL 16/18] vfio-user: handle reset of remote device

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Adds handler to reset a remote device

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
112eeadf3bc4c6cdb100bc3f9a6fcfc20b467c1b.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/remote/vfio-user-obj.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 5ecdec06f6..c6cc53acf2 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -676,6 +676,20 @@ void vfu_object_set_bus_irq(PCIBus *pci_bus)
  max_bdf);
 }
 
+static int vfu_object_device_reset(vfu_ctx_t *vfu_ctx, vfu_reset_type_t type)
+{
+VfuObject *o = vfu_get_private(vfu_ctx);
+
+/* vfu_object_ctx_run() handles lost connection */
+if (type == VFU_RESET_LOST_CONN) {
+return 0;
+}
+
+qdev_reset_all(DEVICE(o->pci_dev));
+
+return 0;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -795,6 +809,12 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 goto fail;
 }
 
+ret = vfu_setup_device_reset_cb(o->vfu_ctx, &vfu_object_device_reset);
+if (ret < 0) {
+error_setg(errp, "vfu: Failed to setup reset callback");
+goto fail;
+}
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(errp, "vfu: Failed to realize device %s- %s",
-- 
2.36.1




[PULL 12/18] vfio-user: IOMMU support for remote device

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Assign separate address space for each device in the remote processes.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
afe0b0a97582cdad42b5b25636a29c523265a10a.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |   2 +
 include/hw/remote/iommu.h |  40 
 hw/remote/iommu.c | 131 ++
 hw/remote/machine.c   |  13 +++-
 hw/remote/meson.build |   1 +
 5 files changed, 186 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/remote/iommu.h
 create mode 100644 hw/remote/iommu.c

diff --git a/MAINTAINERS b/MAINTAINERS
index cbac72e239..563259101b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3644,6 +3644,8 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: hw/remote/iommu.c
+F: include/hw/remote/iommu.h
 
 EBPF:
 M: Jason Wang 
diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
new file mode 100644
index 00..33b68a8f4b
--- /dev/null
+++ b/include/hw/remote/iommu.h
@@ -0,0 +1,40 @@
+/**
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef REMOTE_IOMMU_H
+#define REMOTE_IOMMU_H
+
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
+
+#ifndef INT2VOIDP
+#define INT2VOIDP(i) (void *)(uintptr_t)(i)
+#endif
+
+typedef struct RemoteIommuElem {
+MemoryRegion *mr;
+
+AddressSpace as;
+} RemoteIommuElem;
+
+#define TYPE_REMOTE_IOMMU "x-remote-iommu"
+OBJECT_DECLARE_SIMPLE_TYPE(RemoteIommu, REMOTE_IOMMU)
+
+struct RemoteIommu {
+Object parent;
+
+GHashTable *elem_by_devfn;
+
+QemuMutex lock;
+};
+
+void remote_iommu_setup(PCIBus *pci_bus);
+
+void remote_iommu_unplug_dev(PCIDevice *pci_dev);
+
+#endif
diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
new file mode 100644
index 00..fd723d91f3
--- /dev/null
+++ b/hw/remote/iommu.c
@@ -0,0 +1,131 @@
+/**
+ * IOMMU for remote device
+ *
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/remote/iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
+#include "trace.h"
+
+/**
+ * IOMMU for TYPE_REMOTE_MACHINE - manages DMA address space isolation
+ * for remote machine. It is used by TYPE_VFIO_USER_SERVER.
+ *
+ * - Each TYPE_VFIO_USER_SERVER instance handles one PCIDevice on a PCIBus.
+ *   There is one RemoteIommu per PCIBus, so the RemoteIommu tracks multiple
+ *   PCIDevices by maintaining a ->elem_by_devfn mapping.
+ *
+ * - memory_region_init_iommu() is not used because vfio-user MemoryRegions
+ *   will be added to the elem->mr container instead. This is more natural
+ *   than implementing the IOMMUMemoryRegionClass APIs since vfio-user
+ *   provides something that is close to a full-fledged MemoryRegion and
+ *   not like an IOMMU mapping.
+ *
+ * - When a device is hot unplugged, the elem->mr reference is dropped so
+ *   all vfio-user MemoryRegions associated with this vfio-user server are
+ *   destroyed.
+ */
+
+static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
+  void *opaque, int devfn)
+{
+RemoteIommu *iommu = opaque;
+RemoteIommuElem *elem = NULL;
+
+qemu_mutex_lock(&iommu->lock);
+
+elem = g_hash_table_lookup(iommu->elem_by_devfn, INT2VOIDP(devfn));
+
+if (!elem) {
+elem = g_malloc0(sizeof(RemoteIommuElem));
+g_hash_table_insert(iommu->elem_by_devfn, INT2VOIDP(devfn), elem);
+}
+
+if (!elem->mr) {
+elem->mr = MEMORY_REGION(object_new(TYPE_MEMORY_REGION));
+memory_region_set_size(elem->mr, UINT64_MAX);
+address_space_init(&elem->as, elem->mr, NULL);
+}
+
+qemu_mutex_unlock(&iommu->lock);
+
+return &elem->as;
+}
+
+void remote_iommu_unplug_dev(PCIDevice *pci_dev)
+{
+AddressSpace *as = pci_device_iommu_address_space(pci_dev);
+RemoteIommuElem *elem = NULL;
+
+if (as == &address_space_memory) {
+return;
+}
+
+elem = container_of(as, RemoteIommuElem, as);
+
+address_space_destroy(&elem->as);
+
+object_unref(elem->mr);
+
+elem->mr = NULL;
+}
+
+static void remote_iommu_init(Object *obj)
+{
+RemoteIommu *iommu = REMOTE_IOMMU(obj);
+
+iommu->elem_by_devfn = g_hash_table_new_full(NULL, NULL, NULL, g_free);
+
+qemu_mutex_init(&iommu->lock);
+}
+
+static void remote_iommu_finalize(Object *obj)
+{
+RemoteIommu *iommu = REMOTE_IOMMU(obj);
+
+qemu_mutex_destroy(&iommu->lock);
+
+g_hash_table_destroy(iommu->elem_by_devfn);

[PULL 11/18] vfio-user: handle PCI config space accesses

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Define and register handlers for PCI config space accesses

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
be9d2ccf9b1d24e50dcd9c23404dbf284142cec7.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/remote/vfio-user-obj.c | 51 +++
 hw/remote/trace-events|  2 ++
 2 files changed, 53 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 178bd6f8ed..cef473cb98 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -46,6 +46,7 @@
 #include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
@@ -244,6 +245,45 @@ retry_attach:
 qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
 }
 
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+ size_t count, loff_t offset,
+ const bool is_write)
+{
+VfuObject *o = vfu_get_private(vfu_ctx);
+uint32_t pci_access_width = sizeof(uint32_t);
+size_t bytes = count;
+uint32_t val = 0;
+char *ptr = buf;
+int len;
+
+/*
+ * Writes to the BAR registers would trigger an update to the
+ * global Memory and IO AddressSpaces. But the remote device
+ * never uses the global AddressSpaces, therefore overlapping
+ * memory regions are not a problem
+ */
+while (bytes > 0) {
+len = (bytes > pci_access_width) ? pci_access_width : bytes;
+if (is_write) {
+memcpy(&val, ptr, len);
+pci_host_config_write_common(o->pci_dev, offset,
+ pci_config_size(o->pci_dev),
+ val, len);
+trace_vfu_cfg_write(offset, val);
+} else {
+val = pci_host_config_read_common(o->pci_dev, offset,
+  pci_config_size(o->pci_dev), 
len);
+memcpy(ptr, &val, len);
+trace_vfu_cfg_read(offset, val);
+}
+offset += len;
+ptr += len;
+bytes -= len;
+}
+
+return count;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -336,6 +376,17 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
TYPE_VFU_OBJECT, o->device);
 qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
 
+ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+   pci_config_size(o->pci_dev), &vfu_object_cfg_access,
+   VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+   NULL, 0, -1, 0);
+if (ret < 0) {
+error_setg(errp,
+   "vfu: Failed to setup config space handlers for %s- %s",
+   o->device, strerror(errno));
+goto fail;
+}
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0d96..2ef7884346 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to 
receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
-- 
2.36.1




[PULL 14/18] vfio-user: handle PCI BAR accesses

2022-06-15 Thread Stefan Hajnoczi
From: Jagannathan Raman 

Determine the BARs used by the PCI device and register handlers to
manage the access to the same.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
3373e10b5be5f42846f0632d4382466e1698c505.1655151679.git.jag.ra...@oracle.com
Signed-off-by: Stefan Hajnoczi 
---
 include/exec/memory.h   |   3 +
 hw/remote/vfio-user-obj.c   | 190 
 softmmu/physmem.c   |   4 +-
 tests/qtest/fuzz/generic_fuzz.c |   9 +-
 hw/remote/trace-events  |   3 +
 5 files changed, 203 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index f1c19451bc..a6a0f4d8ad 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2810,6 +2810,9 @@ MemTxResult 
address_space_write_cached_slow(MemoryRegionCache *cache,
 hwaddr addr, const void *buf,
 hwaddr len);
 
+int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
+bool prepare_mmio_access(MemoryRegion *mr);
+
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
 if (is_write) {
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 7b21f77052..dd760a99e2 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -52,6 +52,7 @@
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
 #include "qemu/timer.h"
+#include "exec/memory.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -332,6 +333,193 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, 
vfu_dma_info_t *info)
 trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
 }
 
+static int vfu_object_mr_rw(MemoryRegion *mr, uint8_t *buf, hwaddr offset,
+hwaddr size, const bool is_write)
+{
+uint8_t *ptr = buf;
+bool release_lock = false;
+uint8_t *ram_ptr = NULL;
+MemTxResult result;
+int access_size;
+uint64_t val;
+
+if (memory_access_is_direct(mr, is_write)) {
+/**
+ * Some devices expose a PCI expansion ROM, which could be buffer
+ * based as compared to other regions which are primarily based on
+ * MemoryRegionOps. memory_region_find() would already check
+ * for buffer overflow, we don't need to repeat it here.
+ */
+ram_ptr = memory_region_get_ram_ptr(mr);
+
+if (is_write) {
+memcpy((ram_ptr + offset), buf, size);
+} else {
+memcpy(buf, (ram_ptr + offset), size);
+}
+
+return 0;
+}
+
+while (size) {
+/**
+ * The read/write logic used below is similar to the ones in
+ * flatview_read/write_continue()
+ */
+release_lock = prepare_mmio_access(mr);
+
+access_size = memory_access_size(mr, size, offset);
+
+if (is_write) {
+val = ldn_he_p(ptr, access_size);
+
+result = memory_region_dispatch_write(mr, offset, val,
+  size_memop(access_size),
+  MEMTXATTRS_UNSPECIFIED);
+} else {
+result = memory_region_dispatch_read(mr, offset, &val,
+ size_memop(access_size),
+ MEMTXATTRS_UNSPECIFIED);
+
+stn_he_p(ptr, access_size, val);
+}
+
+if (release_lock) {
+qemu_mutex_unlock_iothread();
+release_lock = false;
+}
+
+if (result != MEMTX_OK) {
+return -1;
+}
+
+size -= access_size;
+ptr += access_size;
+offset += access_size;
+}
+
+return 0;
+}
+
+static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
+hwaddr bar_offset, char * const buf,
+hwaddr len, const bool is_write)
+{
+MemoryRegionSection section = { 0 };
+uint8_t *ptr = (uint8_t *)buf;
+MemoryRegion *section_mr = NULL;
+uint64_t section_size;
+hwaddr section_offset;
+hwaddr size = 0;
+
+while (len) {
+section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
+ bar_offset, len);
+
+if (!section.mr) {
+warn_report("vfu: invalid address 0x%"PRIx64"", bar_offset);
+return size;
+}
+
+section_mr = section.mr;
+section_offset = section.offset_within_region;
+section_size = int128_get64(section.size);
+
+if (is_write && section_mr->readonly) {
+warn_report("vfu: attempting to write to readonly region in "
+"bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
+pci_bar, bar_offset,
+(bar_o

[PULL 17/18] linux-aio: fix unbalanced plugged counter in laio_io_unplug()

2022-06-15 Thread Stefan Hajnoczi
Every laio_io_plug() call has a matching laio_io_unplug() call. There is
a plugged counter that tracks the number of levels of plugging and
allows for nesting.

The plugged counter must reflect the balance between laio_io_plug() and
laio_io_unplug() calls accurately. Otherwise I/O stalls occur since
io_submit(2) calls are skipped while plugged.

Reported-by: Nikolay Tenev 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Stefano Garzarella 
Message-id: 20220609164712.1539045-2-stefa...@redhat.com
Cc: Stefano Garzarella 
Fixes: 68d7946648 ("linux-aio: add `dev_max_batch` parameter to 
laio_io_unplug()")
[Stefano Garzarella suggested adding a Fixes tag.
--Stefan]
Signed-off-by: Stefan Hajnoczi 
---
 block/linux-aio.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 4c423fcccf..6078da7e42 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -363,8 +363,10 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s,
 uint64_t dev_max_batch)
 {
 assert(s->io_q.plugged);
+s->io_q.plugged--;
+
 if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
-(--s->io_q.plugged == 0 &&
+(!s->io_q.plugged &&
  !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) {
 ioq_submit(s);
 }
-- 
2.36.1




[PULL 18/18] linux-aio: explain why max batch is checked in laio_io_unplug()

2022-06-15 Thread Stefan Hajnoczi
It may not be obvious why laio_io_unplug() checks max batch. I discussed
this with Stefano and have added a comment summarizing the reason.

Cc: Stefano Garzarella 
Cc: Kevin Wolf 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Stefano Garzarella 
Message-id: 20220609164712.1539045-3-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/linux-aio.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 6078da7e42..9c2393a2f7 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -365,6 +365,12 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s,
 assert(s->io_q.plugged);
 s->io_q.plugged--;
 
+/*
+ * Why max batch checking is performed here:
+ * Another BDS may have queued requests with a higher dev_max_batch and
+ * therefore in_queue could now exceed our dev_max_batch. Re-check the max
+ * batch so we can honor our device's dev_max_batch.
+ */
 if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
 (!s->io_q.plugged &&
  !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) {
-- 
2.36.1




Re: [PATCH 2/5] tests/qemu-iotests: skip 108 when FUSE is not loaded

2022-06-15 Thread Daniel P . Berrangé
On Wed, Jun 15, 2022 at 11:48:02AM -0400, John Snow wrote:
> On Wed, Jun 15, 2022 at 11:33 AM Daniel P. Berrangé  
> wrote:
> >
> > On Wed, Jun 15, 2022 at 09:41:32AM -0400, John Snow wrote:
> > > On Tue, Jun 14, 2022 at 10:30 AM John Snow  wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 4:59 AM Daniel P. Berrangé 
> > > >  wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 06:46:35AM +0200, Thomas Huth wrote:
> > > > > > On 14/06/2022 03.50, John Snow wrote:
> > > > > > > In certain container environments we may not have FUSE at all, so 
> > > > > > > skip
> > > > > > > the test in this circumstance too.
> > > > > > >
> > > > > > > Signed-off-by: John Snow 
> > > > > > > ---
> > > > > > >   tests/qemu-iotests/108 | 6 ++
> > > > > > >   1 file changed, 6 insertions(+)
> > > > > > >
> > > > > > > diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
> > > > > > > index 9e923d6a59f..e401c5e9933 100755
> > > > > > > --- a/tests/qemu-iotests/108
> > > > > > > +++ b/tests/qemu-iotests/108
> > > > > > > @@ -60,6 +60,12 @@ if sudo -n losetup &>/dev/null; then
> > > > > > >   else
> > > > > > >   loopdev=false
> > > > > > > +# Check for fuse support in the host environment:
> > > > > > > +lsmod | grep fuse &>/dev/null;
> > > > > >
> > > > > > That doesn't work if fuse has been linked statically into the 
> > > > > > kernel. Would
> > > > > > it make sense to test for /sys/fs/fuse instead?
> > > > > >
> > > > > > (OTOH, we likely hardly won't run this on statically linked kernels 
> > > > > > anyway,
> > > > > > so it might not matter too much)
> > > > >
> > > > > But more importantly 'lsmod' may not be installed in our container
> > > > > images. So checking /sys/fs/fuse avoids introducing a dep on the
> > > > > 'kmod' package.
> > > > >
> > > > > >
> > > > > > > +if [[ $? -ne 0 ]]; then
> > > > > >
> > > > > > I'd prefer single "[" instead of "[[" ... but since we're requiring 
> > > > > > bash
> > > > > > anyway, it likely doesn't matter.
> > > > >
> > > > > Or
> > > > >
> > > > > if  test $? != 0 ; then
> > > > >
> > > > > >
> > > > > > > +_notrun 'No Passwordless sudo nor FUSE kernel module'
> > > > > > > +fi
> > > > > > > +
> > > > > > >   # QSD --export fuse will either yield "Parameter 'id' is 
> > > > > > > missing"
> > > > > > >   # or "Invalid parameter 'fuse'", depending on whether there 
> > > > > > > is
> > > > > > >   # FUSE support or not.
> > > > > >
> > > >
> > > > Good suggestions, thanks!
> > > >
> > >
> > > I think I need to test against /dev/fuse instead, because /sys/fs/fuse
> > > actually exists, but because of docker permissions (etc), FUSE isn't
> > > actually usable from the child container.
> > >
> > > I wound up with this:
> > >
> > > # Check for usable FUSE in the host environment:
> > > if test ! -c "/dev/fuse"; then
> > > _notrun 'No passwordless sudo nor usable /dev/fuse'
> > > fi
> > >
> > > Seems to work for my case here, at least, but I don't have a good
> > > sense for how broadly flexible it might be. It might be nicer to
> > > concoct some kind of NOP fuse mount instead, but I wasn't able to
> > > figure out such a command quickly.
> > >
> > > The next problem I have is actually related; test-qga (for the
> > > Centos.x86_64 run) is failing because the guest agent is reading
> > > /proc/self/mountinfo -- which contains entries for block devices that
> > > are not visible in the current container scope. I think when QGA goes
> > > to read info about these devices to populate a response, it chokes.
> > > This might be a genuine bug in QGA if we want it to tolerate existing
> > > inside of a container.
> >
> > Yes, we should fix this. Even if you don't run QGA in a container,
> > someone might configure the systemd service to harden it, by
> > restricting what /dev it is able to see and thus trigger the
> > same issue.
> 
> Naive solution: if we try to look in /sys/dev/block/%u:%u and find
> that we are unable to do so for whatever reason (ENOENT et al), just
> skip that entry for the fsinfo returned to the caller.
> 
> Does it need to be fancier than that?

/sys stuff maybe unfiltered, while /dev is restricted.

I've not looked at the QGA code for this, but conceptually I think
I would just identify where in the code errors hit, and ignore the
appropriate error conditions. The goal is to return as much info
as we reasonably can offer, given our execution environment
constraints.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|