date:20220811

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

2022-08-11 Thread Marc-André Lureau

Hi

On Thu, Aug 11, 2022 at 10:52 AM Markus Armbruster 
wrote:

> Marc-André Lureau  writes:
>
> > Hi
> >
> > On Fri, Aug 5, 2022 at 12:12 PM Markus Armbruster 
> wrote:
> >
> >> marcandre.lur...@redhat.com writes:
> >>
> >> > From: Marc-André Lureau 
> >> >
> >> > This is just moving qapi-gen.py and related subdir to qemu-common, to
> >> > ease review and proceed step by step. The following patches will move
> >> > related necessary code, tests etc.
> >> >
> >> > Signed-off-by: Marc-André Lureau 
> >>
> >> As moved files tend to become low-level annoyances for a long time, I'd
> >> like to understand why you want to move them.  The commit message says
> >> "to ease review", which I suspect isn't the real reason.  Perhaps you
> >> explained all that elsewhere already, but I missed it.
> >>
> >>
> >>
> > The end goal is to split some projects, such as qemu-ga, to standalone
> > meson projects/subprojects. We will be able to build them independently
> > from the rest of QEMU, and later on perhaps handle them outside of QEMU
> > main repository. To achieve this, I first introduce a qemu-common
> > subproject, where qapi and common units are provided. You can check
> > https://gitlab.com/marcandre.lureau/qemu/-/commits/qga for a sneak peek
> at
> > current end result.
>
> I worry this move of the QAPI generator code into
> subjprojects/common/scripts/qapi/ will be followed by a move into its
> own subproject.
>

Do you mean: it could be moved again to another smaller subproject? not
really, see below

> Ignorant question: could we turn the QAPI generator into a subproject in
> place?
>

If it's just the generator, probably the target would then be a python
project (not meson), similar to python-qemu-qmp.

But I don't see much point, since it's not really a standalone python
module, it generates code, and that code needs most of what is in
qemu-common (see
https://gitlab.com/marcandre.lureau/qemu/-/tree/qga/subprojects/qemu-common).
It's best to have it together imho. Maybe we can consider a different
naming or to be more careful not to add stuff that is not strictly needed
by qapi?

(fwiw, it's a bit of a shame python-qemu-qmp didn't import git history from
qemu.. we did better with libslirp. If we ever move code in standalone
repositories again, we should be careful to keep history with it)

> > I said "to ease review and proceed step by step" simply because there are
> > no other changes: I don't move the rest of the qapi code & tests all
> > together, it's in the subsequent series.
>
> I'd recommend to provide a bit more context in the commit message, even
> if you copy it to several messages in a row.  Our future selves will
> likely be grateful.
>
>

-- 
Marc-André Lureau

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

2022-08-11 Thread Markus Armbruster

Marc-André Lureau  writes:

> Hi
>
> On Thu, Aug 11, 2022 at 10:52 AM Markus Armbruster 
> wrote:
>
>> Marc-André Lureau  writes:
>>
>> > Hi
>> >
>> > On Fri, Aug 5, 2022 at 12:12 PM Markus Armbruster 
>> wrote:
>> >
>> >> marcandre.lur...@redhat.com writes:
>> >>
>> >> > From: Marc-André Lureau 
>> >> >
>> >> > This is just moving qapi-gen.py and related subdir to qemu-common, to
>> >> > ease review and proceed step by step. The following patches will move
>> >> > related necessary code, tests etc.
>> >> >
>> >> > Signed-off-by: Marc-André Lureau 
>> >>
>> >> As moved files tend to become low-level annoyances for a long time, I'd
>> >> like to understand why you want to move them.  The commit message says
>> >> "to ease review", which I suspect isn't the real reason.  Perhaps you
>> >> explained all that elsewhere already, but I missed it.
>> >>
>> >>
>> >>
>> > The end goal is to split some projects, such as qemu-ga, to standalone
>> > meson projects/subprojects. We will be able to build them independently
>> > from the rest of QEMU, and later on perhaps handle them outside of QEMU
>> > main repository. To achieve this, I first introduce a qemu-common
>> > subproject, where qapi and common units are provided. You can check
>> > https://gitlab.com/marcandre.lureau/qemu/-/commits/qga for a sneak peek at
>> > current end result.
>>
>> I worry this move of the QAPI generator code into
>> subjprojects/common/scripts/qapi/ will be followed by a move into its
>> own subproject.
>>
>
> Do you mean: it could be moved again to another smaller subproject? not
> really, see below
>
>
>> Ignorant question: could we turn the QAPI generator into a subproject in
>> place?
>>
>
> If it's just the generator, probably the target would then be a python
> project (not meson), similar to python-qemu-qmp.
>
> But I don't see much point, since it's not really a standalone python
> module, it generates code, and that code needs most of what is in
> qemu-common (see
> https://gitlab.com/marcandre.lureau/qemu/-/tree/qga/subprojects/qemu-common).
> It's best to have it together imho. Maybe we can consider a different
> naming or to be more careful not to add stuff that is not strictly needed
> by qapi?

I had a look at subjprojects/qemu-common in your qga branch.  Contents:

* Subproject machinery

* Some common headers (glib-compat.h), but not others (qemu/osdep.h).  I
  guess it's whatever subjproject code needs.

  Is subprojects/qemu-common/include/block/nvme.h there by accident?

* Most of the QObject subsystem

  qobject/block-qdict.c is left behind.

* Most of the QAPI subsystem

  Some visitors left behind: opts, forward, string input / output.  Hmm,
  only the .c, the .h are in the subjproject.  Accident?

  A bit of HMP support left behind.

* Parts of util/ and include/qemu/

  Error reporting, key-value CLI, some C utilities, but not others
  (e.g. qemu/atomic.h, but not qemu/atomic128.h).  I guess it's again
  whatever subjproject code needs.

* Parts of the QAPI Schema subsystem

Aside: MAINTAINERS mostly not updated.

Your moves tear closely related code apart.  This is going to be a drag
for developers in general and maintainers in particular.

Ergonomics suffer when related code is in multiple places.  Having to
switch between directories and remember where is what will a constant
low-level pain.  Things that used to be simple & quick, like git-grep
qapi/*.c, become more involved.

Hurts even when merely consuming the subsystem: when I see #include
"qemu/foo.h", the straightforward include/qemu/foo.h may or may not do.
When it doesn't, I need to know where to look instead.

subprojects/qemu-common/include/ is a lot to type.  Sufficiently
powerful editors mitigate, but not completely.

When changes need to be applied to every instance of an abstraction,
it's easy to miss instances "elsewhere".  There's a reason the QAPI
visitors are all in one place.

The actual split seems somewhat arbitrary in places.  I suspect more
code will move over time.  Invalidating "what is where" knowledge.

I believe a serious think about other ways to accomplish your goals is
called for.

> (fwiw, it's a bit of a shame python-qemu-qmp didn't import git history from
> qemu.. we did better with libslirp. If we ever move code in standalone
> repositories again, we should be careful to keep history with it)

Yes, we should preserve history whenever practical.

[...]

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

2022-08-11 Thread Marc-André Lureau

On Thu, Aug 11, 2022 at 1:05 PM Markus Armbruster  wrote:

> Marc-André Lureau  writes:
>
> > Hi
> >
> > On Thu, Aug 11, 2022 at 10:52 AM Markus Armbruster 
> > wrote:
> >
> >> Marc-André Lureau  writes:
> >>
> >> > Hi
> >> >
> >> > On Fri, Aug 5, 2022 at 12:12 PM Markus Armbruster 
> >> wrote:
> >> >
> >> >> marcandre.lur...@redhat.com writes:
> >> >>
> >> >> > From: Marc-André Lureau 
> >> >> >
> >> >> > This is just moving qapi-gen.py and related subdir to qemu-common,
> to
> >> >> > ease review and proceed step by step. The following patches will
> move
> >> >> > related necessary code, tests etc.
> >> >> >
> >> >> > Signed-off-by: Marc-André Lureau 
> >> >>
> >> >> As moved files tend to become low-level annoyances for a long time,
> I'd
> >> >> like to understand why you want to move them.  The commit message
> says
> >> >> "to ease review", which I suspect isn't the real reason.  Perhaps you
> >> >> explained all that elsewhere already, but I missed it.
> >> >>
> >> >>
> >> >>
> >> > The end goal is to split some projects, such as qemu-ga, to standalone
> >> > meson projects/subprojects. We will be able to build them
> independently
> >> > from the rest of QEMU, and later on perhaps handle them outside of
> QEMU
> >> > main repository. To achieve this, I first introduce a qemu-common
> >> > subproject, where qapi and common units are provided. You can check
> >> > https://gitlab.com/marcandre.lureau/qemu/-/commits/qga for a sneak
> peek at
> >> > current end result.
> >>
> >> I worry this move of the QAPI generator code into
> >> subjprojects/common/scripts/qapi/ will be followed by a move into its
> >> own subproject.
> >>
> >
> > Do you mean: it could be moved again to another smaller subproject? not
> > really, see below
> >
> >
> >> Ignorant question: could we turn the QAPI generator into a subproject in
> >> place?
> >>
> >
> > If it's just the generator, probably the target would then be a python
> > project (not meson), similar to python-qemu-qmp.
> >
> > But I don't see much point, since it's not really a standalone python
> > module, it generates code, and that code needs most of what is in
> > qemu-common (see
> >
> https://gitlab.com/marcandre.lureau/qemu/-/tree/qga/subprojects/qemu-common
> ).
> > It's best to have it together imho. Maybe we can consider a different
> > naming or to be more careful not to add stuff that is not strictly needed
> > by qapi?
>
> I had a look at subjprojects/qemu-common in your qga branch.  Contents:
>
> * Subproject machinery
>
> * Some common headers (glib-compat.h), but not others (qemu/osdep.h).  I
>   guess it's whatever subjproject code needs.
>
>   Is subprojects/qemu-common/include/block/nvme.h there by accident?
>

It's a header shared with qemu-ga (the commit message explains)


>
> * Most of the QObject subsystem
>
>   qobject/block-qdict.c is left behind.
>

It's qemu block specific. Not needed to move at this point, it drags other
stuff iirc.


> * Most of the QAPI subsystem
>
>   Some visitors left behind: opts, forward, string input / output.  Hmm,
>   only the .c, the .h are in the subjproject.  Accident?
>

If they are not shared with qemu-ga, I didn't move them yet. They can stay
specific to qemu or specific subprojects, or we can decide to move them (if
that doesn't drag too much stuff along).


>
>   A bit of HMP support left behind.
>

Can you be more specific?


>
> * Parts of util/ and include/qemu/
>
>   Error reporting, key-value CLI, some C utilities, but not others
>   (e.g. qemu/atomic.h, but not qemu/atomic128.h).  I guess it's again
>   whatever subjproject code needs.
>
>
* Parts of the QAPI Schema subsystem
>
> Aside: MAINTAINERS mostly not updated.
>
>
That needs fixing.


> Your moves tear closely related code apart.  This is going to be a drag
> for developers in general and maintainers in particular.
>
> Ergonomics suffer when related code is in multiple places.  Having to
> switch between directories and remember where is what will a constant
> low-level pain.  Things that used to be simple & quick, like git-grep
> qapi/*.c, become more involved.
>
>
It's inevitable when a project grows. QEMU is already a very large project.
Over the years, we have structured the project, by moving things and
splitting in subdirectories. Imho, this is actually helpful in many ways,
and we got used to it easily hopefully.

Do you see fundamental reasons why qemu-ga or (qemu-img, qemu-nbd,
storage-daemon, virtiofsd, vhost-user-*, *helper, ivshmem* etc) need to be
tight to qemu code, release and management process? I am not saying it's
time to move them out of qemu project, but I believe it's helpful to have
code that is structured and can be compiled indepently.

And by having "standalone" subprojects, we can more easily evolve in new
directions, without touching the rest of the projects.

Hurts even when merely consuming the subsystem: when I see #include
> "qemu/foo.h", the straightforward include/qemu/foo.h may or may not do.
> When it d

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

2022-08-11 Thread Peter Maydell

On Thu, 11 Aug 2022 at 11:09, Marc-André Lureau
 wrote:
> On Thu, Aug 11, 2022 at 1:05 PM Markus Armbruster  wrote:
>> Your moves tear closely related code apart.  This is going to be a drag
>> for developers in general and maintainers in particular.
>>
>> Ergonomics suffer when related code is in multiple places.  Having to
>> switch between directories and remember where is what will a constant
>> low-level pain.  Things that used to be simple & quick, like git-grep
>> qapi/*.c, become more involved.
>>
>
> It's inevitable when a project grows. QEMU is already a very large project. 
> Over the years, we have structured the project, by moving things and 
> splitting in subdirectories. Imho, this is actually helpful in many ways, and 
> we got used to it easily hopefully.

I agree with this. But generally we've tried to do that by
having the subdirectory splitting and naming match the
structure of the codebase. The exception, which I strongly
dislike, is that contrib/ is a grabbag of random stuff.
subprojects/ is starting to feel like it is also turning
into "place where random stuff ends up"...

> Do you see fundamental reasons why qemu-ga or (qemu-img, qemu-nbd, 
> storage-daemon, virtiofsd, vhost-user-*, *helper, ivshmem* etc) need to be 
> tight to qemu code, release and management process? I am not saying it's time 
> to move them out of qemu project, but I believe it's helpful to have code 
> that is structured and can be compiled indepently.
>
> And by having "standalone" subprojects, we can more easily evolve in new 
> directions, without touching the rest of the projects.

As Markus says, your branch ends up moving most of qobject into
qemu-common/. We are never going to let that out of QEMU proper,
because we are never going to allow ourselves to be tied to API
compatibility with it as an external library. So anything that
needs qobject is never going to leave the QEMU codebase. Which
means that there's not much gain from shoving it into subproject/
IMHO.

If there's stuff that is reasonably independent then it may be
worth disentangling from our build process, but it has to be
actually independent, or made that way, first, I think. And,
as with libslirp, there needs to be a clear beneficiary who
would want to take that independent-sub-thingy and use it
entirely distinctly from QEMU.

thanks
-- PMM

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

2022-08-11 Thread Marc-André Lureau

Hi

On Thu, Aug 11, 2022 at 2:22 PM Peter Maydell 
wrote:

> On Thu, 11 Aug 2022 at 11:09, Marc-André Lureau
>  wrote:
> > On Thu, Aug 11, 2022 at 1:05 PM Markus Armbruster 
> wrote:
> >> Your moves tear closely related code apart.  This is going to be a drag
> >> for developers in general and maintainers in particular.
> >>
> >> Ergonomics suffer when related code is in multiple places.  Having to
> >> switch between directories and remember where is what will a constant
> >> low-level pain.  Things that used to be simple & quick, like git-grep
> >> qapi/*.c, become more involved.
> >>
> >
> > It's inevitable when a project grows. QEMU is already a very large
> project. Over the years, we have structured the project, by moving things
> and splitting in subdirectories. Imho, this is actually helpful in many
> ways, and we got used to it easily hopefully.
>
> I agree with this. But generally we've tried to do that by
> having the subdirectory splitting and naming match the
> structure of the codebase. The exception, which I strongly
> dislike, is that contrib/ is a grabbag of random stuff.
> subprojects/ is starting to feel like it is also turning
> into "place where random stuff ends up"...
>

Yes, most of contrib/* should probably be standalone projects. If only we
had some kind of common library/subproject :)

subproject/* is a meson *cough* convention (imposed location for
subprojects). If we don't want to depend on external released libraries,
that's just the way it is.


>
> > Do you see fundamental reasons why qemu-ga or (qemu-img, qemu-nbd,
> storage-daemon, virtiofsd, vhost-user-*, *helper, ivshmem* etc) need to be
> tight to qemu code, release and management process? I am not saying it's
> time to move them out of qemu project, but I believe it's helpful to have
> code that is structured and can be compiled indepently.
> >
> > And by having "standalone" subprojects, we can more easily evolve in new
> directions, without touching the rest of the projects.
>
> As Markus says, your branch ends up moving most of qobject into
> qemu-common/. We are never going to let that out of QEMU proper,
> because we are never going to allow ourselves to be tied to API
> compatibility with it as an external library. So anything that
>

Why is that? We do it with a lot of dependencies already, with stable APIs.

Furthermore, we don't "have to" be tied to a specific ABI/API, we can
continue to link statically and compile against a specific version. like
with libvfio-user today.

And at this point, I am _not_ proposing to have an extra "qemu-common"
repository. I don't think there are enough reasons to want that either.



> needs qobject is never going to leave the QEMU codebase. Which
>
means that there's not much gain from shoving it into subproject/
> IMHO.


(just to be extra clear, it's qobject not QOM we are talking about)

qobject is fundamental to all the QAPI related generated code. Why should
that remain tight to QEMU proper?


thanks

-- 
Marc-André Lureau

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

2022-08-11 Thread Markus Armbruster

Peter Maydell  writes:

> On Thu, 11 Aug 2022 at 11:09, Marc-André Lureau
>  wrote:
>> On Thu, Aug 11, 2022 at 1:05 PM Markus Armbruster  wrote:
>>> Your moves tear closely related code apart.  This is going to be a drag
>>> for developers in general and maintainers in particular.
>>>
>>> Ergonomics suffer when related code is in multiple places.  Having to
>>> switch between directories and remember where is what will a constant
>>> low-level pain.  Things that used to be simple & quick, like git-grep
>>> qapi/*.c, become more involved.
>>>
>>
>> It's inevitable when a project grows. QEMU is already a very large project. 
>> Over the years, we have structured the project, by moving things and 
>> splitting in subdirectories. Imho, this is actually helpful in many ways, 
>> and we got used to it easily hopefully.
>
> I agree with this. But generally we've tried to do that by
> having the subdirectory splitting and naming match the
> structure of the codebase.

Good: move a bunch of closely related files from . (tree root) and
include/ to new ./whatever/ and include/whatever/.  The improvement in
organization outweighs the pain of moving.

Not so good: split some files off ./whatever into ./else/where/, even
though they are closely related.

[...]

[PATCH v15 1/6] qmp: add QMP command x-query-virtio

2022-08-11 Thread Jonah Palmer

From: Laurent Vivier 

This new command lists all the instances of VirtIODevices with
their canonical QOM path and name.

[Jonah: @virtio_list duplicates information that already exists in
 the QOM composition tree. However, extracting necessary information
 from this tree seems to be a bit convoluted.

 Instead, we still create our own list of realized virtio devices
 but use @qmp_qom_get with the device's canonical QOM path to confirm
 that the device exists and is realized. If the device exists but
 is actually not realized, then we remove it from our list (for
 synchronicity to the QOM composition tree).

 Also, the QMP command @x-query-virtio is redundant as @qom-list
 and @qom-get are sufficient to search '/machine/' for realized
 virtio devices. However, @x-query-virtio is much more convenient
 in listing realized virtio devices.]

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/meson.build  |  2 ++
 hw/virtio/virtio-stub.c| 14 
 hw/virtio/virtio.c | 44 
 include/hw/virtio/virtio.h |  1 +
 qapi/meson.build   |  1 +
 qapi/qapi-schema.json  |  1 +
 qapi/virtio.json   | 68 ++
 tests/qtest/qmp-cmd-test.c |  1 +
 8 files changed, 132 insertions(+)
 create mode 100644 hw/virtio/virtio-stub.c
 create mode 100644 qapi/virtio.json

diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 7e8877fd64..e16f1b22d4 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -60,4 +60,6 @@ virtio_ss.add_all(when: 'CONFIG_VIRTIO_PCI', if_true: 
virtio_pci_ss)
 specific_ss.add_all(when: 'CONFIG_VIRTIO', if_true: virtio_ss)
 softmmu_ss.add_all(when: 'CONFIG_VIRTIO', if_true: softmmu_virtio_ss)
 softmmu_ss.add(when: 'CONFIG_VIRTIO', if_false: files('vhost-stub.c'))
+softmmu_ss.add(when: 'CONFIG_VIRTIO', if_false: files('virtio-stub.c'))
 softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
+softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('virtio-stub.c'))
diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
new file mode 100644
index 00..05a81edc92
--- /dev/null
+++ b/hw/virtio/virtio-stub.c
@@ -0,0 +1,14 @@
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-virtio.h"
+
+static void *qmp_virtio_unsupported(Error **errp)
+{
+error_setg(errp, "Virtio is disabled");
+return NULL;
+}
+
+VirtioInfoList *qmp_x_query_virtio(Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..bdfa82e9c0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -13,12 +13,18 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qapi-commands-virtio.h"
+#include "qapi/qapi-commands-qom.h"
+#include "qapi/qapi-visit-virtio.h"
+#include "qapi/qmp/qjson.h"
 #include "cpu.h"
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
+#include "qom/object_interfaces.h"
 #include "hw/virtio/virtio.h"
 #include "migration/qemu-file-types.h"
 #include "qemu/atomic.h"
@@ -29,6 +35,9 @@
 #include "sysemu/runstate.h"
 #include "standard-headers/linux/virtio_ids.h"
 
+/* QAPI list of realized VirtIODevices */
+static QTAILQ_HEAD(, VirtIODevice) virtio_list;
+
 /*
  * The alignment to use between consumer and producer parts of vring.
  * x86 pagesize again. This is the default, used by transports like PCI
@@ -3698,6 +3707,7 @@ static void virtio_device_realize(DeviceState *dev, Error 
**errp)
 vdev->listener.commit = virtio_memory_listener_commit;
 vdev->listener.name = "virtio";
 memory_listener_register(&vdev->listener, vdev->dma_as);
+QTAILQ_INSERT_TAIL(&virtio_list, vdev, next);
 }
 
 static void virtio_device_unrealize(DeviceState *dev)
@@ -3712,6 +3722,7 @@ static void virtio_device_unrealize(DeviceState *dev)
 vdc->unrealize(dev);
 }
 
+QTAILQ_REMOVE(&virtio_list, vdev, next);
 g_free(vdev->bus_name);
 vdev->bus_name = NULL;
 }
@@ -3885,6 +3896,8 @@ static void virtio_device_class_init(ObjectClass *klass, 
void *data)
 vdc->stop_ioeventfd = virtio_device_stop_ioeventfd_impl;
 
 vdc->legacy_features |= VIRTIO_LEGACY_FEATURES;
+
+QTAILQ_INIT(&virtio_list);
 }
 
 bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
@@ -3895,6 +3908,37 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
 return virtio_bus_ioeventfd_enabled(vbus);
 }
 
+VirtioInfoList *qmp_x_query_virtio(Error **errp)
+{
+VirtioInfoList *list = NULL;
+VirtioInfoList *node;
+VirtIODevice *vdev;
+
+QTAILQ_FOREACH(vdev, &virtio_list, next) {
+DeviceState *dev = DEVICE(vdev);
+Error *err = NULL;
+QObject *obj = qmp_qom_get(dev->canonical_path, "realized", &err);
+
+if (err == NULL) {
+GString *is_realized = qobject_to_json_pretty(obj, true);
+/* v

[PATCH v15 0/6] hmp,qmp: Add commands to introspect virtio devices

2022-08-11 Thread Jonah Palmer

This series introduces new QMP/HMP commands to dump the status of a
virtio device at different levels.

[Jonah: Rebasing from previous patchset from Apr. 1 (v14). Original patches
 are by Laurent Vivier from May 2020.

 I sincerely apologize for the *massive* delay in getting this latest v15
 series out. It was a perferct storm of other more pressing issues, time off,
 and trying to understand why I was seeing this 30 feature bit in my PCI virtio
 devices. Please see patch 3/6 for more explanation on this issue.

 Rebase from v14 to v15 includes: adding the missing sign-off-by from the 
poster,
 renaming & moving all virtio device feature map definitions to 
hw/virtio/virtio.c,
 including brief descriptions for all status & feature bits, a new virtio device
 feature map defined for virtio-rng, and mappings for virtio/vhost-vsock,
 virtio-iommu, virtio-mem, and virtio transport features updated with their
 newest feature bits. Most of these changes can be found in patch 3/6.

 And again, sorry for the long wait on this.]


1. List available virtio devices in the machine

HMP Form:

info virtio

Example:

(qemu) info virtio
/machine/peripheral-anon/device[2]/virtio-backend [virtio-scsi]
/machine/peripheral/vsock0/virtio-backend [vhost-vsock]
/machine/peripheral/crypto0/virtio-backend [virtio-crypto]
/machine/peripheral-anon/device[1]/virtio-backend [virtio-net]
/machine/peripheral-anon/device[0]/virtio-backend [virtio-serial]

QMP Form:

{ 'command': 'x-query-virtio',
  'returns': ['VirtioInfo'],
  'features': [ 'unstable' ] }

Example:

-> { "execute": "x-query-virtio" }
<- { "return": [
 {
 "name": "virtio-scsi",
 "path": "/machine/peripheral-anon/device[2]/virtio-backend"
 },
 {
 "name": "vhost-vsock",
 "path": "/machine/peripheral/vsock0/virtio-backend"
 },
 {
 "name": "virtio-crypto",
 "path": "/machine/peripheral/crypto0/virtio-backend"
 },
 {
 "name": "virtio-net",
 "path": "/machine/peripheral-anon/device[1]/virtio-backend"
 },
 {
 "name": "virtio-serial",
 "path": "/machine/peripheral-anon/device[0]/virtio-backend"
 }
 ]
   }

2. Display status of a given virtio device

HMP Form:

info virtio-status 

Example:

(qemu) info virtio-status /machine/peripheral/vsock0/virtio-backend
/machine/peripheral/vsock0/virtio-backend:
  device_name: vhost-vsock (vhost)
  device_id:   19
  vhost_started:   true
  bus_name:(null)
  broken:  false
  disabled:false
  disable_legacy_check:false
  started: true
  use_started: true
  start_on_kick:   false
  use_guest_notifier_mask: true
  vm_running:  true
  num_vqs: 3
  queue_sel:   2
  isr: 0
  endianness:  little
  status:
VIRTIO_CONFIG_S_ACKNOWLEDGE: Valid virtio device found,
VIRTIO_CONFIG_S_DRIVER: Guest OS compatible with device,
VIRTIO_CONFIG_S_FEATURES_OK: Feature negotiation complete,
VIRTIO_CONFIG_S_DRIVER_OK: Driver setup and ready
  Guest features:
VIRTIO_RING_F_EVENT_IDX: Used & avail. event fields enabled,
VIRTIO_RING_F_INDIRECT_DESC: Indirect descriptors supported,
VIRTIO_F_VERSION_1: Device compliant for v1 spec (legacy)
  Host features:
VIRTIO_RING_F_EVENT_IDX: Used & avail. event fields enabled,
VIRTIO_RING_F_INDIRECT_DESC: Indirect descriptors supported,
VIRTIO_F_VERSION_1: Device compliant for v1 spec (legacy),
VIRTIO_F_ANY_LAYOUT: Device accepts arbitrary desc. layouts,
VIRTIO_F_NOTIFY_ON_EMPTY: Notify when device runs out of avail. 
descs. on VQ
VHOST_USER_F_PROTOCOL_FEATURES: Vhost-user protocol features 
negotiation supported
  Backend features:

  VHost:
  nvqs:   2
  vq_index:   0
  max_queues: 0
  n_mem_sections: 4
  n_tmp_sections: 4
  backend_cap:0
  log_enabled:false
  log_size:   0
  Features:
  VIRTIO_RING_F_EVENT_IDX: Used & avail. event fields enabled,
  VIRTIO_RING_F_INDIRECT_DESC: Indirect descriptors supported,

[PATCH v15 3/6] qmp: decode feature & status bits in virtio-status

2022-08-11 Thread Jonah Palmer

From: Laurent Vivier 

Display feature names instead of bitmaps for host, guest, and
backend for VirtIODevices.

Display status names instead of bitmaps for VirtIODevices.

Display feature names instead of bitmaps for backend, protocol,
acked, and features (hdev->features) for vhost devices.

Decode features according to device ID. Decode statuses
according to configuration status bitmap (config_status_map).
Decode vhost user protocol features according to vhost user
protocol bitmap (vhost_user_protocol_map).

Transport features are on the first line. Undecoded bits (if
any) are stored in a separate field.

[Jonah: Several changes made to this patch from prev. version (v14):
 - Moved all device features mappings to hw/virtio/virtio.c
 - Renamed device features mappings (less generic)
 - Generalized @FEATURE_ENTRY macro for all device mappings
 - Virtio device feature map definitions include descriptions of
   feature bits
 - Moved @VHOST_USER_F_PROTOCOL_FEATURES feature bit from transport
   feature map to vhost-user-supported device feature mappings
   (blk, fs, i2c, rng, net, gpu, input, scsi, vsock)
 - New feature bit added for virtio-vsock: @VIRTIO_VSOCK_F_SEQPACKET
 - New feature bit added for virtio-iommu: @VIRTIO_IOMMU_F_BYPASS_CONFIG
 - New feature bit added for virtio-mem: @VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE
 - New virtio transport feature bit added: @VIRTIO_F_IN_ORDER
 - Added device feature map definition for virtio-rng
]

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---

 As mentioned in the cover letter, there's a bit of confusion for me here
 regarding a 30 feature bit that I've been seeing in all of the PCI virtio
 devices that I've defined when bringing up a guest. The only 30 feature
 bits that are defined are @VHOST_USER_F_PROTOCOL_FEATURES &
 @VIRTIO_F_BAD_FEATURE.

 Initially, I thought that this must be the vhost-user 'protocol-features'
 feature bit (since the 'bad-feature' bit would mean negotiation is broken),
 but I noticed that this bit was still being set on devices that don't even
 have vhost-user support (e.g. virtio-serial), nor was I using vhost-user on
 any of my virtio devices.

 On closer inspection, with a little printf debugging, I confirmed my
 suspicion that every PCI virtio device *is* in fact having this 'bad-feature'
 feature set. Futhermore, *none* of my devices actually set the
 'protocol-features' feature (which makes sense since I wasn't using
 vhost-user).

 What doesn't make sense to me though is why this 'bad-feature' bit is
 being set at all, as it implies that negotiation has failed for the device.

 In short, I've left out defining the 'bad-feature' feature bit in the
 transport features map to avoid confusion. However, I'm afraid that this may
 cause even more confusion since (1) non-vhost-user-supported devices will
 show 'unknown-dev-features' (e.g. virtio-serial) and (2) vhost-user-supported
 devices (e.g. virtio-net) *not* using vhost-user will show
 @VHOST_USER_F_PROTOCOL_FEATURES when in reality it's
 @VIRTIO_F_BAD_FEATURE.

 Please let me know how you would like me to address this. Thanks.

 hw/virtio/virtio.c | 643 -
 include/hw/virtio/vhost.h  |   3 +
 include/hw/virtio/virtio.h |   5 +
 qapi/virtio.json   | 251 +--
 4 files changed, 874 insertions(+), 28 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 3e0a484660..23bdc77a95 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -34,10 +34,433 @@
 #include "sysemu/dma.h"
 #include "sysemu/runstate.h"
 #include "standard-headers/linux/virtio_ids.h"
+#include "standard-headers/linux/vhost_types.h"
+#include "standard-headers/linux/virtio_blk.h"
+#include "standard-headers/linux/virtio_console.h"
+#include "standard-headers/linux/virtio_gpu.h"
+#include "standard-headers/linux/virtio_net.h"
+#include "standard-headers/linux/virtio_scsi.h"
+#include "standard-headers/linux/virtio_i2c.h"
+#include "standard-headers/linux/virtio_balloon.h"
+#include "standard-headers/linux/virtio_iommu.h"
+#include "standard-headers/linux/virtio_mem.h"
+#include "standard-headers/linux/virtio_vsock.h"
+#include CONFIG_DEVICES
 
 /* QAPI list of realized VirtIODevices */
 static QTAILQ_HEAD(, VirtIODevice) virtio_list;
 
+/*
+ * Maximum size of virtio device config space
+ */
+#define VHOST_USER_MAX_CONFIG_SIZE 256
+
+#define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \
+{ .virtio_bit = name, .feature_desc = desc }
+
+enum VhostUserProtocolFeature {
+VHOST_USER_PROTOCOL_F_MQ = 0,
+VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
+VHOST_USER_PROTOCOL_F_RARP = 2,
+VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
+VHOST_USER_PROTOCOL_F_NET_MTU = 4,
+VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
+VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
+VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
+VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
+VHOST_USER_PROTOCOL_F_CONFIG = 9,
+VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,

[PATCH v15 5/6] qmp: add QMP command x-query-virtio-queue-element

2022-08-11 Thread Jonah Palmer

From: Laurent Vivier 

This new command shows the information of a VirtQueue element.

[Note: Up until v10 of this patch series, virtio.json had many (15+)
 enums defined (e.g. decoded device features, statuses, etc.). In v10
 most of these enums were removed and replaced with string literals.
 By doing this we get (1) simpler schema, (2) smaller generated code,
 and (3) less maintenance burden for when new things are added (e.g.
 devices, device features, etc.).]

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-stub.c |   9 ++
 hw/virtio/virtio.c  | 154 +++
 qapi/virtio.json| 197 
 3 files changed, 360 insertions(+)

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index 13e5f93652..7ddb22cc5e 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -31,3 +31,12 @@ VirtQueueStatus *qmp_x_query_virtio_queue_status(const char 
*path,
 {
 return qmp_virtio_unsupported(errp);
 }
+
+VirtioQueueElement *qmp_x_query_virtio_queue_element(const char *path,
+ uint16_t queue,
+ bool has_index,
+ uint16_t index,
+ Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index dc73b82b38..c6e244a3c9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -823,6 +823,19 @@ static inline void vring_used_write(VirtQueue *vq, 
VRingUsedElem *uelem,
 address_space_cache_invalidate(&caches->used, pa, sizeof(VRingUsedElem));
 }
 
+/* Called within rcu_read_lock(). */
+static inline uint16_t vring_used_flags(VirtQueue *vq)
+{
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
+hwaddr pa = offsetof(VRingUsed, flags);
+
+if (!caches) {
+return 0;
+}
+
+return virtio_lduw_phys_cached(vq->vdev, &caches->used, pa);
+}
+
 /* Called within rcu_read_lock().  */
 static uint16_t vring_used_idx(VirtQueue *vq)
 {
@@ -4773,6 +4786,147 @@ VirtQueueStatus *qmp_x_query_virtio_queue_status(const 
char *path,
 return status;
 }
 
+static strList *qmp_decode_vring_desc_flags(uint16_t flags)
+{
+strList *list = NULL;
+strList *node;
+int i;
+
+struct {
+uint16_t flag;
+const char *value;
+} map[] = {
+{ VRING_DESC_F_NEXT, "next" },
+{ VRING_DESC_F_WRITE, "write" },
+{ VRING_DESC_F_INDIRECT, "indirect" },
+{ 1 << VRING_PACKED_DESC_F_AVAIL, "avail" },
+{ 1 << VRING_PACKED_DESC_F_USED, "used" },
+{ 0, "" }
+};
+
+for (i = 0; map[i].flag; i++) {
+if ((map[i].flag & flags) == 0) {
+continue;
+}
+node = g_malloc0(sizeof(strList));
+node->value = g_strdup(map[i].value);
+node->next = list;
+list = node;
+}
+
+return list;
+}
+
+VirtioQueueElement *qmp_x_query_virtio_queue_element(const char *path,
+ uint16_t queue,
+ bool has_index,
+ uint16_t index,
+ Error **errp)
+{
+VirtIODevice *vdev;
+VirtQueue *vq;
+VirtioQueueElement *element = NULL;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIO device", path);
+return NULL;
+}
+
+if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) {
+error_setg(errp, "Invalid virtqueue number %d", queue);
+return NULL;
+}
+vq = &vdev->vq[queue];
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+error_setg(errp, "Packed ring not supported");
+return NULL;
+} else {
+unsigned int head, i, max;
+VRingMemoryRegionCaches *caches;
+MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+MemoryRegionCache *desc_cache;
+VRingDesc desc;
+VirtioRingDescList *list = NULL;
+VirtioRingDescList *node;
+int rc; int ndescs;
+
+RCU_READ_LOCK_GUARD();
+
+max = vq->vring.num;
+
+if (!has_index) {
+head = vring_avail_ring(vq, vq->last_avail_idx % vq->vring.num);
+} else {
+head = vring_avail_ring(vq, index % vq->vring.num);
+}
+i = head;
+
+caches = vring_get_region_caches(vq);
+if (!caches) {
+error_setg(errp, "Region caches not initialized");
+return NULL;
+}
+if (caches->desc.len < max * sizeof(VRingDesc)) {
+error_setg(errp, "Cannot map descriptor ring");
+return NULL;
+}
+
+desc_cache = &caches->desc;
+

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

2022-08-11 Thread Daniel P . Berrangé

On Thu, Aug 11, 2022 at 02:50:01PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Aug 11, 2022 at 2:22 PM Peter Maydell 
> wrote:
> 
> > On Thu, 11 Aug 2022 at 11:09, Marc-André Lureau
> >  wrote:
> > > On Thu, Aug 11, 2022 at 1:05 PM Markus Armbruster 
> > wrote:
> > >> Your moves tear closely related code apart.  This is going to be a drag
> > >> for developers in general and maintainers in particular.
> > >>
> > >> Ergonomics suffer when related code is in multiple places.  Having to
> > >> switch between directories and remember where is what will a constant
> > >> low-level pain.  Things that used to be simple & quick, like git-grep
> > >> qapi/*.c, become more involved.
> > >>
> > >
> > > It's inevitable when a project grows. QEMU is already a very large
> > project. Over the years, we have structured the project, by moving things
> > and splitting in subdirectories. Imho, this is actually helpful in many
> > ways, and we got used to it easily hopefully.
> >
> > I agree with this. But generally we've tried to do that by
> > having the subdirectory splitting and naming match the
> > structure of the codebase. The exception, which I strongly
> > dislike, is that contrib/ is a grabbag of random stuff.
> > subprojects/ is starting to feel like it is also turning
> > into "place where random stuff ends up"...
> >
> 
> Yes, most of contrib/* should probably be standalone projects. If only we
> had some kind of common library/subproject :)
> 
> subproject/* is a meson *cough* convention (imposed location for
> subprojects). If we don't want to depend on external released libraries,
> that's just the way it is.
> 
> 
> >
> > > Do you see fundamental reasons why qemu-ga or (qemu-img, qemu-nbd,
> > storage-daemon, virtiofsd, vhost-user-*, *helper, ivshmem* etc) need to be
> > tight to qemu code, release and management process? I am not saying it's
> > time to move them out of qemu project, but I believe it's helpful to have
> > code that is structured and can be compiled indepently.
> > >
> > > And by having "standalone" subprojects, we can more easily evolve in new
> > directions, without touching the rest of the projects.
> >
> > As Markus says, your branch ends up moving most of qobject into
> > qemu-common/. We are never going to let that out of QEMU proper,
> > because we are never going to allow ourselves to be tied to API
> > compatibility with it as an external library. So anything that
> >
> 
> Why is that? We do it with a lot of dependencies already, with stable APIs.
> 
> Furthermore, we don't "have to" be tied to a specific ABI/API, we can
> continue to link statically and compile against a specific version. like
> with libvfio-user today.
> 
> And at this point, I am _not_ proposing to have an extra "qemu-common"
> repository. I don't think there are enough reasons to want that either.
> 
> 
> 
> > needs qobject is never going to leave the QEMU codebase. Which
> >
> means that there's not much gain from shoving it into subproject/
> > IMHO.
> 
> 
> (just to be extra clear, it's qobject not QOM we are talking about)
> 
> qobject is fundamental to all the QAPI related generated code. Why should
> that remain tight to QEMU proper?

Neither qobject nor QOM have ever been designed to be public APIs.
Though admittedly qobject is quite a bit simpler as an API, I'm
not convinced its current design is something we want to consider
public. As an example, just last month Markus proposed changing
QDict's implementation

https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00758.html


If we want external projects to be able to take advantage of QAPI,
the bare minimum we need to be public is a QAPI parser, from which
people can then build any code generators desired.

We don't neccessarily need the current QAPI C code generator. There
could be a new C generator that didn't use qobject, but instead used
some standard GLib types like GHashTable/GList instead of QDict/QList.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v15 4/6] qmp: add QMP commands for virtio/vhost queue-status

2022-08-11 Thread Jonah Palmer

From: Laurent Vivier 

These new commands show the internal status of a VirtIODevice's
VirtQueue and a vhost device's vhost_virtqueue (if active).

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-stub.c |  14 +++
 hw/virtio/virtio.c  | 103 
 qapi/virtio.json| 256 
 3 files changed, 373 insertions(+)

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index 0b432e8de7..13e5f93652 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -17,3 +17,17 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error **errp)
 {
 return qmp_virtio_unsupported(errp);
 }
+
+VirtVhostQueueStatus *qmp_x_query_virtio_vhost_queue_status(const char *path,
+uint16_t queue,
+Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
+
+VirtQueueStatus *qmp_x_query_virtio_queue_status(const char *path,
+ uint16_t queue,
+ Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 23bdc77a95..dc73b82b38 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -4670,6 +4670,109 @@ VirtioStatus *qmp_x_query_virtio_status(const char 
*path, Error **errp)
 return status;
 }
 
+VirtVhostQueueStatus *qmp_x_query_virtio_vhost_queue_status(const char *path,
+uint16_t queue,
+Error **errp)
+{
+VirtIODevice *vdev;
+VirtVhostQueueStatus *status;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIODevice", path);
+return NULL;
+}
+
+if (!vdev->vhost_started) {
+error_setg(errp, "Error: vhost device has not started yet");
+return NULL;
+}
+
+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
+struct vhost_dev *hdev = vdc->get_vhost(vdev);
+
+if (queue < hdev->vq_index || queue >= hdev->vq_index + hdev->nvqs) {
+error_setg(errp, "Invalid vhost virtqueue number %d", queue);
+return NULL;
+}
+
+status = g_new0(VirtVhostQueueStatus, 1);
+status->name = g_strdup(vdev->name);
+status->kick = hdev->vqs[queue].kick;
+status->call = hdev->vqs[queue].call;
+status->desc = (uintptr_t)hdev->vqs[queue].desc;
+status->avail = (uintptr_t)hdev->vqs[queue].avail;
+status->used = (uintptr_t)hdev->vqs[queue].used;
+status->num = hdev->vqs[queue].num;
+status->desc_phys = hdev->vqs[queue].desc_phys;
+status->desc_size = hdev->vqs[queue].desc_size;
+status->avail_phys = hdev->vqs[queue].avail_phys;
+status->avail_size = hdev->vqs[queue].avail_size;
+status->used_phys = hdev->vqs[queue].used_phys;
+status->used_size = hdev->vqs[queue].used_size;
+
+return status;
+}
+
+VirtQueueStatus *qmp_x_query_virtio_queue_status(const char *path,
+ uint16_t queue,
+ Error **errp)
+{
+VirtIODevice *vdev;
+VirtQueueStatus *status;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIODevice", path);
+return NULL;
+}
+
+if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) {
+error_setg(errp, "Invalid virtqueue number %d", queue);
+return NULL;
+}
+
+status = g_new0(VirtQueueStatus, 1);
+status->name = g_strdup(vdev->name);
+status->queue_index = vdev->vq[queue].queue_index;
+status->inuse = vdev->vq[queue].inuse;
+status->vring_num = vdev->vq[queue].vring.num;
+status->vring_num_default = vdev->vq[queue].vring.num_default;
+status->vring_align = vdev->vq[queue].vring.align;
+status->vring_desc = vdev->vq[queue].vring.desc;
+status->vring_avail = vdev->vq[queue].vring.avail;
+status->vring_used = vdev->vq[queue].vring.used;
+status->used_idx = vdev->vq[queue].used_idx;
+status->signalled_used = vdev->vq[queue].signalled_used;
+status->signalled_used_valid = vdev->vq[queue].signalled_used_valid;
+
+if (vdev->vhost_started) {
+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
+struct vhost_dev *hdev = vdc->get_vhost(vdev);
+
+/* check if vq index exists for vhost as well  */
+if (queue >= hdev->vq_index && queue < hdev->vq_index + hdev->nvqs) {
+status->has_last_avail_idx = true;
+
+int vhost_vq_index =
+hdev->vhost_ops->vhost_get_vq_index(hdev, queue);
+struct vhost_vring_state state = {
+.index = vhost_vq_index,
+};
+
+status->last_avail_idx =
+

[PATCH v15 2/6] qmp: add QMP command x-query-virtio-status

2022-08-11 Thread Jonah Palmer

From: Laurent Vivier 

This new command shows the status of a VirtIODevice, including
its corresponding vhost device's status (if active).

Next patch will improve output by decoding feature bits, including
vhost device's feature bits (backend, protocol, acked, and features).
Also will decode status bits of a VirtIODevice.

[Jonah: From patch v12; added a check to @virtio_device_find to ensure
 synchronicity between @virtio_list and the devices in the QOM
 composition tree.]

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-stub.c |   5 +
 hw/virtio/virtio.c  | 104 +++
 qapi/virtio.json| 222 
 3 files changed, 331 insertions(+)

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index 05a81edc92..0b432e8de7 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -12,3 +12,8 @@ VirtioInfoList *qmp_x_query_virtio(Error **errp)
 {
 return qmp_virtio_unsupported(errp);
 }
+
+VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index bdfa82e9c0..3e0a484660 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3939,6 +3939,110 @@ VirtioInfoList *qmp_x_query_virtio(Error **errp)
 return list;
 }
 
+static VirtIODevice *virtio_device_find(const char *path)
+{
+VirtIODevice *vdev;
+
+QTAILQ_FOREACH(vdev, &virtio_list, next) {
+DeviceState *dev = DEVICE(vdev);
+
+if (strcmp(dev->canonical_path, path) != 0) {
+continue;
+}
+
+Error *err = NULL;
+QObject *obj = qmp_qom_get(dev->canonical_path, "realized", &err);
+if (err == NULL) {
+GString *is_realized = qobject_to_json_pretty(obj, true);
+/* virtio device is NOT realized, remove it from list */
+if (!strncmp(is_realized->str, "false", 4)) {
+g_string_free(is_realized, true);
+qobject_unref(obj);
+QTAILQ_REMOVE(&virtio_list, vdev, next);
+return NULL;
+}
+g_string_free(is_realized, true);
+} else {
+/* virtio device doesn't exist in QOM tree */
+QTAILQ_REMOVE(&virtio_list, vdev, next);
+qobject_unref(obj);
+return NULL;
+}
+/* device exists in QOM tree & is realized */
+qobject_unref(obj);
+return vdev;
+}
+return NULL;
+}
+
+VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp)
+{
+VirtIODevice *vdev;
+VirtioStatus *status;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIODevice", path);
+return NULL;
+}
+
+status = g_new0(VirtioStatus, 1);
+status->name = g_strdup(vdev->name);
+status->device_id = vdev->device_id;
+status->vhost_started = vdev->vhost_started;
+status->guest_features = vdev->guest_features;
+status->host_features = vdev->host_features;
+status->backend_features = vdev->backend_features;
+
+switch (vdev->device_endian) {
+case VIRTIO_DEVICE_ENDIAN_LITTLE:
+status->device_endian = g_strdup("little");
+break;
+case VIRTIO_DEVICE_ENDIAN_BIG:
+status->device_endian = g_strdup("big");
+break;
+default:
+status->device_endian = g_strdup("unknown");
+break;
+}
+
+status->num_vqs = virtio_get_num_queues(vdev);
+status->status = vdev->status;
+status->isr = vdev->isr;
+status->queue_sel = vdev->queue_sel;
+status->vm_running = vdev->vm_running;
+status->broken = vdev->broken;
+status->disabled = vdev->disabled;
+status->use_started = vdev->use_started;
+status->started = vdev->started;
+status->start_on_kick = vdev->start_on_kick;
+status->disable_legacy_check = vdev->disable_legacy_check;
+status->bus_name = g_strdup(vdev->bus_name);
+status->use_guest_notifier_mask = vdev->use_guest_notifier_mask;
+status->has_vhost_dev = vdev->vhost_started;
+
+if (vdev->vhost_started) {
+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
+struct vhost_dev *hdev = vdc->get_vhost(vdev);
+
+status->vhost_dev = g_new0(VhostStatus, 1);
+status->vhost_dev->n_mem_sections = hdev->n_mem_sections;
+status->vhost_dev->n_tmp_sections = hdev->n_tmp_sections;
+status->vhost_dev->nvqs = hdev->nvqs;
+status->vhost_dev->vq_index = hdev->vq_index;
+status->vhost_dev->features = hdev->features;
+status->vhost_dev->acked_features = hdev->acked_features;
+status->vhost_dev->backend_features = hdev->backend_features;
+status->vhost_dev->protocol_features = hdev->protocol_features;
+status->vhost_dev->max_queues = hdev->max_queues;
+status->vhost_dev->backend_cap = hdev->backend_cap;
+

[PATCH v15 6/6] hmp: add virtio commands

2022-08-11 Thread Jonah Palmer

From: Laurent Vivier 

This patch implements the HMP versions of the virtio QMP commands.

[Jonah: Adjusted hmp monitor output format for features / statuses
with their descriptions.]

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---
 hmp-commands-info.hx  |  70 ++
 include/monitor/hmp.h |   5 +
 monitor/hmp-cmds.c| 310 ++
 3 files changed, 385 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 188d9ece3b..58cfa86638 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -921,3 +921,73 @@ SRST
   ``stats``
 Show runtime-collected statistics
 ERST
+
+{
+.name  = "virtio",
+.args_type = "",
+.params= "",
+.help  = "List all available virtio devices",
+.cmd   = hmp_virtio_query,
+.flags = "p",
+},
+
+SRST
+  ``info virtio``
+List all available virtio devices
+ERST
+
+{
+.name  = "virtio-status",
+.args_type = "path:s",
+.params= "path",
+.help  = "Display status of a given virtio device",
+.cmd   = hmp_virtio_status,
+.flags = "p",
+},
+
+SRST
+  ``info virtio-status`` *path*
+Display status of a given virtio device
+ERST
+
+{
+.name  = "virtio-queue-status",
+.args_type = "path:s,queue:i",
+.params= "path queue",
+.help  = "Display status of a given virtio queue",
+.cmd   = hmp_virtio_queue_status,
+.flags = "p",
+},
+
+SRST
+  ``info virtio-queue-status`` *path* *queue*
+Display status of a given virtio queue
+ERST
+
+{
+.name  = "virtio-vhost-queue-status",
+.args_type = "path:s,queue:i",
+.params= "path queue",
+.help  = "Display status of a given vhost queue",
+.cmd   = hmp_vhost_queue_status,
+.flags = "p",
+},
+
+SRST
+  ``info virtio-vhost-queue-status`` *path* *queue*
+Display status of a given vhost queue
+ERST
+
+{
+.name   = "virtio-queue-element",
+.args_type  = "path:s,queue:i,index:i?",
+.params = "path queue [index]",
+.help   = "Display element of a given virtio queue",
+.cmd= hmp_virtio_queue_element,
+.flags  = "p",
+},
+
+SRST
+  ``info virtio-queue-element`` *path* *queue* [*index*]
+Display element of a given virtio queue
+ERST
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index a618eb1e4e..a9cf064ee8 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -95,6 +95,11 @@ void hmp_qom_list(Monitor *mon, const QDict *qdict);
 void hmp_qom_get(Monitor *mon, const QDict *qdict);
 void hmp_qom_set(Monitor *mon, const QDict *qdict);
 void hmp_info_qom_tree(Monitor *mon, const QDict *dict);
+void hmp_virtio_query(Monitor *mon, const QDict *qdict);
+void hmp_virtio_status(Monitor *mon, const QDict *qdict);
+void hmp_virtio_queue_status(Monitor *mon, const QDict *qdict);
+void hmp_vhost_queue_status(Monitor *mon, const QDict *qdict);
+void hmp_virtio_queue_element(Monitor *mon, const QDict *qdict);
 void object_add_completion(ReadLineState *rs, int nb_args, const char *str);
 void object_del_completion(ReadLineState *rs, int nb_args, const char *str);
 void device_add_completion(ReadLineState *rs, int nb_args, const char *str);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index c6cd6f91dd..0934dbd557 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -43,6 +43,8 @@
 #include "qapi/qapi-commands-stats.h"
 #include "qapi/qapi-commands-tpm.h"
 #include "qapi/qapi-commands-ui.h"
+#include "qapi/qapi-commands-virtio.h"
+#include "qapi/qapi-visit-virtio.h"
 #include "qapi/qapi-visit-net.h"
 #include "qapi/qapi-visit-migration.h"
 #include "qapi/qmp/qdict.h"
@@ -2472,3 +2474,311 @@ exit:
 exit_no_print:
 error_free(err);
 }
+
+static void hmp_virtio_dump_protocols(Monitor *mon,
+  VhostDeviceProtocols *pcol)
+{
+strList *pcol_list = pcol->protocols;
+while (pcol_list) {
+monitor_printf(mon, "\t%s", pcol_list->value);
+pcol_list = pcol_list->next;
+if (pcol_list != NULL) {
+monitor_printf(mon, ",\n");
+}
+}
+monitor_printf(mon, "\n");
+if (pcol->has_unknown_protocols) {
+monitor_printf(mon, "  unknown-protocols(0x%016"PRIx64")\n",
+   pcol->unknown_protocols);
+}
+}
+
+static void hmp_virtio_dump_status(Monitor *mon,
+   VirtioDeviceStatus *status)
+{
+strList *status_list = status->statuses;
+while (status_list) {
+monitor_printf(mon, "\t%s", status_list->value);
+status_list = status_list->next;
+if (status_list != NULL) {
+monitor_printf(mon, ",\n");
+}
+}
+monitor_printf(mon, "\n");
+if (status->has_unknown_statuses) {
+

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

2022-08-11 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Thu, Aug 11, 2022 at 02:50:01PM +0400, Marc-André Lureau wrote:
>> Hi
>> 
>> On Thu, Aug 11, 2022 at 2:22 PM Peter Maydell 
>> wrote:

[...]

>> > As Markus says, your branch ends up moving most of qobject into
>> > qemu-common/. We are never going to let that out of QEMU proper,
>> > because we are never going to allow ourselves to be tied to API
>> > compatibility with it as an external library. So anything that
>> >
>> 
>> Why is that? We do it with a lot of dependencies already, with stable APIs.
>> 
>> Furthermore, we don't "have to" be tied to a specific ABI/API, we can
>> continue to link statically and compile against a specific version. like
>> with libvfio-user today.
>> 
>> And at this point, I am _not_ proposing to have an extra "qemu-common"
>> repository. I don't think there are enough reasons to want that either.
>> 
>> 
>> 
>> > needs qobject is never going to leave the QEMU codebase. Which
>> > means that there's not much gain from shoving it into subproject/
>> > IMHO.
>> 
>> 
>> (just to be extra clear, it's qobject not QOM we are talking about)
>> 
>> qobject is fundamental to all the QAPI related generated code. Why should
>> that remain tight to QEMU proper?
>
> Neither qobject nor QOM have ever been designed to be public APIs.
> Though admittedly qobject is quite a bit simpler as an API, I'm
> not convinced its current design is something we want to consider
> public. As an example, just last month Markus proposed changing
> QDict's implementation
>
> https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00758.html
>
>
> If we want external projects to be able to take advantage of QAPI,
> the bare minimum we need to be public is a QAPI parser, from which
> people can then build any code generators desired.

Basically scripts/qapi/ less the code generators.

Not sure a subproject would be a good fit.

Shot from the hip: could the build process spit out something external
projects could consume?  It's how "consumables" are commonly delivered.
E.g. .so + a bunch of headers.  Sometimes that gets packaged.  Sometimes
it gets copied into the consuming tree ("vendored").

> We don't neccessarily need the current QAPI C code generator. There
> could be a new C generator that didn't use qobject, but instead used
> some standard GLib types like GHashTable/GList instead of QDict/QList.

Yes, that should be possible.

[PATCH v2 3/8] parallels: Replace bdrv_co_pwrite_sync by bdrv_co_flush for BAT flushing

2022-08-11 Thread Alexander Ivanov

It's too costly to write all the BAT to the disk. Let the flush function
write only dirty blocks.
Use parallels_set_bat_entry for setting a BAT entry and marking a relevant
block as dirty.
Move bdrv_co_flush call outside the locked area.

v2: Patch order was changed so the replacement is done in parallels_co_check.
Now we use a helper to set BAT entry and mark the block dirty.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 7f68f3cbc9..6879ea4597 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -428,7 +428,6 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
 int64_t size, prev_off, high_off;
 int ret;
 uint32_t i;
-bool flush_bat = false;
 
 size = bdrv_getlength(bs->file->bs);
 if (size < 0) {
@@ -467,9 +466,8 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
 res->corruptions++;
 if (fix & BDRV_FIX_ERRORS) {
 prev_off = 0;
-s->bat_bitmap[i] = 0;
+parallels_set_bat_entry(s, i, 0);
 res->corruptions_fixed++;
-flush_bat = true;
 continue;
 }
 }
@@ -485,15 +483,6 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 prev_off = off;
 }
 
-ret = 0;
-if (flush_bat) {
-ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, s->header, 0);
-if (ret < 0) {
-res->check_errors++;
-goto out;
-}
-}
-
 res->image_end_offset = high_off + s->cluster_size;
 if (size > res->image_end_offset) {
 int64_t count;
@@ -522,6 +511,12 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 
 out:
 qemu_co_mutex_unlock(&s->lock);
+
+ret = bdrv_co_flush(bs);
+if (ret < 0) {
+res->check_errors++;
+}
+
 return ret;
 }
 
-- 
2.34.1

[PATCH v2 0/8] parallels: Refactor the code of images checks and fix a bug

2022-08-11 Thread Alexander Ivanov

Fix image inflation when offset in BAT is out of image.

Replace whole BAT syncing by flushing only dirty blocks.

Move all the checks outside the main check function in
separate functions

Use WITH_QEMU_LOCK_GUARD for more clean code.

Alexander Ivanov (8):
  parallels: Out of image offset in BAT leads to image inflation
  parallels: Move BAT entry setting to a separate function
  parallels: Replace bdrv_co_pwrite_sync by bdrv_co_flush for BAT
flushing
  parallels: Move check of unclean image to a separate function
  parallels: Move check of cluster outside image to a separate function
  parallels: Move check of leaks to a separate function
  parallels: Move statistic collection to a separate function
  parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD

 block/parallels.c | 188 --
 1 file changed, 132 insertions(+), 56 deletions(-)

-- 
2.34.1

[PATCH v2 1/8] parallels: Out of image offset in BAT leads to image inflation

2022-08-11 Thread Alexander Ivanov

When an image is opened, data_end field in BDRVParallelsState
is setted as the biggest offset in the BAT plus cluster size.
If there is a corrupted offset pointing outside the image,
the image size increase accordingly. It potentially leads
to attempts to create a file size of petabytes.

Set the data_end field with the original file size if the image
was opened for checking and repairing purposes or raise an error.

v2: No changes.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..a76cf9d993 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 BDRVParallelsState *s = bs->opaque;
 ParallelsHeader ph;
 int ret, size, i;
+int64_t file_size;
 QemuOpts *opts = NULL;
 Error *local_err = NULL;
 char *buf;
@@ -811,6 +812,22 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
+file_size = bdrv_getlength(bs->file->bs);
+if (file_size < 0) {
+goto fail;
+}
+
+file_size >>= BDRV_SECTOR_BITS;
+if (s->data_end > file_size) {
+if (flags & BDRV_O_CHECK) {
+s->data_end = file_size;
+} else {
+error_setg(errp, "parallels: Offset in BAT is out of image");
+ret = -EINVAL;
+goto fail;
+}
+}
+
 if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
 /* Image was not closed correctly. The check is mandatory */
 s->header_unclean = true;
-- 
2.34.1

[PATCH v2 8/8] parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD

2022-08-11 Thread Alexander Ivanov

Replace the way we use mutex in parallels_co_check() for more clean code.

v2: Fix an incorrect usage of WITH_QEMU_LOCK_GUARD.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index d0364182bb..e124a8bb7d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -553,24 +553,22 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 BDRVParallelsState *s = bs->opaque;
 int ret;
 
-qemu_co_mutex_lock(&s->lock);
+WITH_QEMU_LOCK_GUARD(&s->lock) {
+parallels_check_unclean(bs, res, fix);
 
-parallels_check_unclean(bs, res, fix);
+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+return ret;
+}
 
-ret = parallels_check_outside_image(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
-ret = parallels_check_leak(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+return ret;
+}
 
-parallels_collect_statistics(bs, res, fix);
+parallels_collect_statistics(bs, res, fix);
 
-out:
-qemu_co_mutex_unlock(&s->lock);
+}
 
 ret = bdrv_co_flush(bs);
 if (ret < 0) {
-- 
2.34.1

[PATCH v2 7/8] parallels: Move statistic collection to a separate function

2022-08-11 Thread Alexander Ivanov

v2: Move fragmentation counting code to this function too.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 54 +++
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 8737eadfb4..d0364182bb 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -518,48 +518,56 @@ static int parallels_check_leak(BlockDriverState *bs,
 return 0;
 }
 
-static int coroutine_fn parallels_co_check(BlockDriverState *bs,
-   BdrvCheckResult *res,
-   BdrvCheckMode fix)
+static void parallels_collect_statistics(BlockDriverState *bs,
+ BdrvCheckResult *res,
+ BdrvCheckMode fix)
 {
 BDRVParallelsState *s = bs->opaque;
-int64_t prev_off;
-int ret;
+int64_t off, prev_off;
 uint32_t i;
 
-
-qemu_co_mutex_lock(&s->lock);
-
-parallels_check_unclean(bs, res, fix);
-
-ret = parallels_check_outside_image(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
-ret = parallels_check_leak(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
 res->bfi.total_clusters = s->bat_size;
 res->bfi.compressed_clusters = 0; /* compression is not supported */
 
 prev_off = 0;
 for (i = 0; i < s->bat_size; i++) {
-int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
 if (off == 0) {
 prev_off = 0;
 continue;
 }
 
-res->bfi.allocated_clusters++;
-
 if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
 res->bfi.fragmented_clusters++;
 }
+
 prev_off = off;
+res->bfi.allocated_clusters++;
 }
+}
+
+static int coroutine_fn parallels_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res,
+   BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+int ret;
+
+qemu_co_mutex_lock(&s->lock);
+
+parallels_check_unclean(bs, res, fix);
+
+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+parallels_collect_statistics(bs, res, fix);
 
 out:
 qemu_co_mutex_unlock(&s->lock);
-- 
2.34.1

[PATCH v2 4/8] parallels: Move check of unclean image to a separate function

2022-08-11 Thread Alexander Ivanov

v2: Revert the condition with s->header_unclean.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 6879ea4597..c53b2810cf 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -419,6 +419,25 @@ static coroutine_fn int 
parallels_co_readv(BlockDriverState *bs,
 return ret;
 }
 
+static void parallels_check_unclean(BlockDriverState *bs,
+BdrvCheckResult *res,
+BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+
+if (!s->header_unclean) {
+return;
+}
+
+fprintf(stderr, "%s image was not closed correctly\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
+res->corruptions++;
+if (fix & BDRV_FIX_ERRORS) {
+/* parallels_close will do the job right */
+res->corruptions_fixed++;
+s->header_unclean = false;
+}
+}
 
 static int coroutine_fn parallels_co_check(BlockDriverState *bs,
BdrvCheckResult *res,
@@ -436,16 +455,8 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 }
 
 qemu_co_mutex_lock(&s->lock);
-if (s->header_unclean) {
-fprintf(stderr, "%s image was not closed correctly\n",
-fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
-res->corruptions++;
-if (fix & BDRV_FIX_ERRORS) {
-/* parallels_close will do the job right */
-res->corruptions_fixed++;
-s->header_unclean = false;
-}
-}
+
+parallels_check_unclean(bs, res, fix);
 
 res->bfi.total_clusters = s->bat_size;
 res->bfi.compressed_clusters = 0; /* compression is not supported */
-- 
2.34.1

[PATCH v2 5/8] parallels: Move check of cluster outside image to a separate function

2022-08-11 Thread Alexander Ivanov

v2: Move unrelated helper parallels_set_bat_entry creation to
a separate patch.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 48 ++-
 1 file changed, 35 insertions(+), 13 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index c53b2810cf..12104ba5ad 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -439,6 +439,36 @@ static void parallels_check_unclean(BlockDriverState *bs,
 }
 }
 
+static int parallels_check_outside_image(BlockDriverState *bs,
+ BdrvCheckResult *res,
+ BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+uint32_t i;
+int64_t off, size;
+
+size = bdrv_getlength(bs->file->bs);
+if (size < 0) {
+res->check_errors++;
+return size;
+}
+
+for (i = 0; i < s->bat_size; i++) {
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (off > size) {
+fprintf(stderr, "%s cluster %u is outside image\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+res->corruptions++;
+if (fix & BDRV_FIX_ERRORS) {
+parallels_set_bat_entry(s, i, 0);
+res->corruptions_fixed++;
+}
+}
+}
+
+return 0;
+}
+
 static int coroutine_fn parallels_co_check(BlockDriverState *bs,
BdrvCheckResult *res,
BdrvCheckMode fix)
@@ -458,6 +488,11 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 
 parallels_check_unclean(bs, res, fix);
 
+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
 res->bfi.total_clusters = s->bat_size;
 res->bfi.compressed_clusters = 0; /* compression is not supported */
 
@@ -470,19 +505,6 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 continue;
 }
 
-/* cluster outside the image */
-if (off > size) {
-fprintf(stderr, "%s cluster %u is outside image\n",
-fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
-res->corruptions++;
-if (fix & BDRV_FIX_ERRORS) {
-prev_off = 0;
-parallels_set_bat_entry(s, i, 0);
-res->corruptions_fixed++;
-continue;
-}
-}
-
 res->bfi.allocated_clusters++;
 if (off > high_off) {
 high_off = off;
-- 
2.34.1

[PATCH v2 2/8] parallels: Move BAT entry setting to a separate function

2022-08-11 Thread Alexander Ivanov

Will need to set BAT entry in multiple places.
Move the code of settings entries and marking relevant blocks dirty
to a separate helper parallels_set_bat_entry.

v2: A new patch - a part of a splitted patch.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index a76cf9d993..7f68f3cbc9 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -165,6 +165,13 @@ static int64_t block_status(BDRVParallelsState *s, int64_t 
sector_num,
 return start_off;
 }
 
+static void parallels_set_bat_entry(BDRVParallelsState *s,
+uint32_t index, uint32_t offset)
+{
+s->bat_bitmap[index] = offset;
+bitmap_set(s->bat_dirty_bmap, bat_entry_off(index) / s->bat_dirty_block, 
1);
+}
+
 static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
  int nb_sectors, int *pnum)
 {
@@ -250,10 +257,9 @@ static int64_t allocate_clusters(BlockDriverState *bs, 
int64_t sector_num,
 }
 
 for (i = 0; i < to_allocate; i++) {
-s->bat_bitmap[idx + i] = cpu_to_le32(s->data_end / s->off_multiplier);
+parallels_set_bat_entry(s, idx + i,
+cpu_to_le32(s->data_end / s->off_multiplier));
 s->data_end += s->tracks;
-bitmap_set(s->bat_dirty_bmap,
-   bat_entry_off(idx + i) / s->bat_dirty_block, 1);
 }
 
 return bat2sect(s, idx) + sector_num % s->tracks;
-- 
2.34.1

[PATCH v2 6/8] parallels: Move check of leaks to a separate function

2022-08-11 Thread Alexander Ivanov

v2: No changes.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 85 +--
 1 file changed, 52 insertions(+), 33 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 12104ba5ad..8737eadfb4 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -469,14 +469,13 @@ static int parallels_check_outside_image(BlockDriverState 
*bs,
 return 0;
 }
 
-static int coroutine_fn parallels_co_check(BlockDriverState *bs,
-   BdrvCheckResult *res,
-   BdrvCheckMode fix)
+static int parallels_check_leak(BlockDriverState *bs,
+BdrvCheckResult *res,
+BdrvCheckMode fix)
 {
 BDRVParallelsState *s = bs->opaque;
-int64_t size, prev_off, high_off;
-int ret;
-uint32_t i;
+int64_t size, off, high_off, count;
+int i, ret;
 
 size = bdrv_getlength(bs->file->bs);
 if (size < 0) {
@@ -484,41 +483,16 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 return size;
 }
 
-qemu_co_mutex_lock(&s->lock);
-
-parallels_check_unclean(bs, res, fix);
-
-ret = parallels_check_outside_image(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
-res->bfi.total_clusters = s->bat_size;
-res->bfi.compressed_clusters = 0; /* compression is not supported */
-
 high_off = 0;
-prev_off = 0;
 for (i = 0; i < s->bat_size; i++) {
-int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
-if (off == 0) {
-prev_off = 0;
-continue;
-}
-
-res->bfi.allocated_clusters++;
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
 if (off > high_off) {
 high_off = off;
 }
-
-if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
-res->bfi.fragmented_clusters++;
-}
-prev_off = off;
 }
 
 res->image_end_offset = high_off + s->cluster_size;
 if (size > res->image_end_offset) {
-int64_t count;
 count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
 fprintf(stderr, "%s space leaked at the end of the image %" PRId64 
"\n",
 fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
@@ -536,11 +510,56 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 if (ret < 0) {
 error_report_err(local_err);
 res->check_errors++;
-goto out;
+return ret;
 }
 res->leaks_fixed += count;
 }
 }
+return 0;
+}
+
+static int coroutine_fn parallels_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res,
+   BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+int64_t prev_off;
+int ret;
+uint32_t i;
+
+
+qemu_co_mutex_lock(&s->lock);
+
+parallels_check_unclean(bs, res, fix);
+
+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+res->bfi.total_clusters = s->bat_size;
+res->bfi.compressed_clusters = 0; /* compression is not supported */
+
+prev_off = 0;
+for (i = 0; i < s->bat_size; i++) {
+int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (off == 0) {
+prev_off = 0;
+continue;
+}
+
+res->bfi.allocated_clusters++;
+
+if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
+res->bfi.fragmented_clusters++;
+}
+prev_off = off;
+}
 
 out:
 qemu_co_mutex_unlock(&s->lock);
-- 
2.34.1

[PATCH 1/4] hw/nvme: avoid unnecessary call to irq (de)assertion functions

2022-08-11 Thread Jinhao Fan

nvme_irq_assert() only does useful work when cq->irq_enabled is true.
nvme_irq_deassert() only works for pin-based interrupts. Avoid calls
into these functions if we are sure they will not do useful work.

This will be most useful when we use eventfd to send interrupts. We
can avoid the unnecessary overhead of signalling eventfd.

Signed-off-by: Jinhao Fan 
---
 hw/nvme/ctrl.c | 40 ++--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 87aeba0564..bd3350d7e0 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1377,11 +1377,13 @@ static void nvme_post_cqes(void *opaque)
 QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
 }
 if (cq->tail != cq->head) {
-if (cq->irq_enabled && !pending) {
-n->cq_pending++;
-}
+if (cq->irq_enabled) {
+if (!pending) {
+n->cq_pending++;
+}
 
-nvme_irq_assert(n, cq);
+nvme_irq_assert(n, cq);
+}
 }
 }
 
@@ -4244,12 +4246,11 @@ static void nvme_cq_notifier(EventNotifier *e)
 
 nvme_update_cq_head(cq);
 
-if (cq->tail == cq->head) {
-if (cq->irq_enabled) {
-n->cq_pending--;
+if (cq->irq_enabled && cq->tail == cq->head) {
+n->cq_pending--;
+if (!msix_enabled(&n->parent_obj)) {
+nvme_irq_deassert(n, cq);
 }
-
-nvme_irq_deassert(n, cq);
 }
 
 nvme_post_cqes(cq);
@@ -4730,11 +4731,15 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_QUEUE_DEL;
 }
 
-if (cq->irq_enabled && cq->tail != cq->head) {
-n->cq_pending--;
-}
+if (cq->irq_enabled) {
+if (cq->tail != cq->head) {
+n->cq_pending--;
+}
 
-nvme_irq_deassert(n, cq);
+if (!msix_enabled(&n->parent_obj)) {
+nvme_irq_deassert(n, cq);
+}
+}
 trace_pci_nvme_del_cq(qid);
 nvme_free_cq(cq, n);
 return NVME_SUCCESS;
@@ -6918,12 +6923,11 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, 
int val)
 timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
 }
 
-if (cq->tail == cq->head) {
-if (cq->irq_enabled) {
-n->cq_pending--;
+if (cq->irq_enabled && cq->tail == cq->head) {
+n->cq_pending--;
+if (!msix_enabled(&n->parent_obj)) {
+nvme_irq_deassert(n, cq);
 }
-
-nvme_irq_deassert(n, cq);
 }
 } else {
 /* Submission queue doorbell write */
-- 
2.25.1

[PATCH 2/4] hw/nvme: add option to (de)assert irq with eventfd

2022-08-11 Thread Jinhao Fan

When the new option 'irq-eventfd' is turned on, the IO emulation code
signals an eventfd when it want to (de)assert an irq. The main loop
eventfd handler does the actual irq (de)assertion.  This paves the way
for iothread support since QEMU's interrupt emulation is not thread
safe.

Asserting and deasseting irq with eventfd has some performance
implications. For small queue depth it increases request latency but
for large queue depth it effectively coalesces irqs.

Comparision (KIOPS):

QD1   4  16  64
QEMU 38 123 210 329
irq-eventfd  32 106 240 364

Signed-off-by: Jinhao Fan 
---
 hw/nvme/ctrl.c | 89 --
 hw/nvme/nvme.h |  4 +++
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index bd3350d7e0..8a1c5ce3e1 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1338,6 +1338,54 @@ static void nvme_update_cq_head(NvmeCQueue *cq)
 trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head);
 }
 
+static void nvme_assert_notifier_read(EventNotifier *e)
+{
+NvmeCQueue *cq = container_of(e, NvmeCQueue, assert_notifier);
+if (event_notifier_test_and_clear(e)) {
+nvme_irq_assert(cq->ctrl, cq);
+}
+}
+
+static void nvme_deassert_notifier_read(EventNotifier *e)
+{
+NvmeCQueue *cq = container_of(e, NvmeCQueue, deassert_notifier);
+if (event_notifier_test_and_clear(e)) {
+nvme_irq_deassert(cq->ctrl, cq);
+}
+}
+
+static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
+{
+int ret;
+
+ret = event_notifier_init(&cq->assert_notifier, 0);
+if (ret < 0) {
+goto fail_assert_handler;
+}
+
+event_notifier_set_handler(&cq->assert_notifier,
+nvme_assert_notifier_read);
+
+if (!msix_enabled(&n->parent_obj)) {
+ret = event_notifier_init(&cq->deassert_notifier, 0);
+if (ret < 0) {
+goto fail_deassert_handler;
+}
+
+event_notifier_set_handler(&cq->deassert_notifier,
+   nvme_deassert_notifier_read);
+}
+
+return;
+
+fail_deassert_handler:
+event_notifier_set_handler(&cq->deassert_notifier, NULL);
+event_notifier_cleanup(&cq->deassert_notifier);
+fail_assert_handler:
+event_notifier_set_handler(&cq->assert_notifier, NULL);
+event_notifier_cleanup(&cq->assert_notifier);
+}
+
 static void nvme_post_cqes(void *opaque)
 {
 NvmeCQueue *cq = opaque;
@@ -1382,7 +1430,23 @@ static void nvme_post_cqes(void *opaque)
 n->cq_pending++;
 }
 
-nvme_irq_assert(n, cq);
+if (unlikely(cq->first_io_cqe)) {
+/*
+ * Initilize event notifier when first cqe is posted. For 
irqfd 
+ * support we need to register the MSI message in KVM. We
+ * can not do this registration at CQ creation time because
+ * Linux's NVMe driver changes the MSI message after CQ 
creation.
+ */
+cq->first_io_cqe = false;
+
+nvme_init_irq_notifier(n, cq);
+}
+
+if (cq->assert_notifier.initialized) {
+event_notifier_set(&cq->assert_notifier);
+} else {
+nvme_irq_assert(n, cq);
+}
 }
 }
 }
@@ -4249,7 +4313,11 @@ static void nvme_cq_notifier(EventNotifier *e)
 if (cq->irq_enabled && cq->tail == cq->head) {
 n->cq_pending--;
 if (!msix_enabled(&n->parent_obj)) {
-nvme_irq_deassert(n, cq);
+if (cq->deassert_notifier.initialized) {
+event_notifier_set(&cq->deassert_notifier);
+} else {
+nvme_irq_deassert(n, cq);
+}
 }
 }
 
@@ -4706,6 +4774,14 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 event_notifier_set_handler(&cq->notifier, NULL);
 event_notifier_cleanup(&cq->notifier);
 }
+if (cq->assert_notifier.initialized) {
+event_notifier_set_handler(&cq->assert_notifier, NULL);
+event_notifier_cleanup(&cq->assert_notifier);
+}
+if (cq->deassert_notifier.initialized) {
+event_notifier_set_handler(&cq->deassert_notifier, NULL);
+event_notifier_cleanup(&cq->deassert_notifier);
+}
 if (msix_enabled(&n->parent_obj)) {
 msix_vector_unuse(&n->parent_obj, cq->vector);
 }
@@ -4737,6 +4813,7 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest *req)
 }
 
 if (!msix_enabled(&n->parent_obj)) {
+/* Do not use eventfd since this is always called in main loop */
 nvme_irq_deassert(n, cq);
 }
 }
@@ -4777,6 +4854,7 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, 
uint64_t dma_addr,
 }
 n->cq[cqid] = cq;
 cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+cq->first_io_cqe = cqid != 0;
 }
 
 static uint16_t nvme_create_cq(Nvme

[PATCH 4/4] hw/nvme: add MSI-x mask handlers for irqfd

2022-08-11 Thread Jinhao Fan

When irqfd is enabled, we bypass QEMU's irq emulation and let KVM to
directly assert the irq. However, KVM is not aware of the device's MSI-x
masking status. Add MSI-x mask bookkeeping in NVMe emulation and
detach the corresponding irqfd when the certain vector is masked.

Signed-off-by: Jinhao Fan 
---
 hw/nvme/ctrl.c   | 82 
 hw/nvme/nvme.h   |  2 ++
 hw/nvme/trace-events |  3 ++
 3 files changed, 87 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 63f988f2f9..ac5460c7c8 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -7478,10 +7478,84 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
uint8_t offset)
 
 return 0;
 }
+static int nvme_vector_unmask(PCIDevice *pci_dev, unsigned vector,
+   MSIMessage msg)
+{
+NvmeCtrl *n = NVME(pci_dev);
+int ret;
+
+trace_pci_nvme_irq_unmask(vector, msg.address, msg.data);
+
+for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) {
+NvmeCQueue *cq = n->cq[i];
+/* 
+ * If this function is called, then irqfd must be available. Therefore,
+ * irqfd must be in use if cq->assert_notifier.initialized is true.
+ */
+if (cq && cq->vector == vector && cq->assert_notifier.initialized) {
+if (cq->msg.data != msg.data || cq->msg.address != msg.address) {
+ret = kvm_irqchip_update_msi_route(kvm_state, cq->virq, msg,
+   pci_dev);
+if (ret < 0) {
+return ret;
+}
+kvm_irqchip_commit_routes(kvm_state);
+cq->msg = msg;
+}
+
+ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+ &cq->assert_notifier,
+ NULL, cq->virq);
+if (ret < 0) {
+return ret;
+}
+}
+}
+
+return 0;
+}
+
+static void nvme_vector_mask(PCIDevice *pci_dev, unsigned vector)
+{
+NvmeCtrl *n = NVME(pci_dev);
+
+trace_pci_nvme_irq_mask(vector);
+
+for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) {
+NvmeCQueue *cq = n->cq[i];
+if (cq && cq->vector == vector && cq->assert_notifier.initialized) {
+kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+  &cq->assert_notifier,
+  cq->virq);
+}
+}
+}
+
+static void nvme_vector_poll(PCIDevice *pci_dev,
+ unsigned int vector_start,
+ unsigned int vector_end)
+{
+NvmeCtrl *n = NVME(pci_dev);
+
+trace_pci_nvme_irq_poll(vector_start, vector_end);
+
+for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) {
+NvmeCQueue *cq = n->cq[i];
+if (cq && cq->vector >= vector_start && cq->vector <= vector_end 
+&& msix_is_masked(pci_dev, cq->vector) 
+&& cq->assert_notifier.initialized) {
+if (event_notifier_test_and_clear(&cq->assert_notifier)) {
+msix_set_pending(pci_dev, i);
+}
+}
+}
+}
 
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
+bool with_irqfd = msix_enabled(&n->parent_obj) &&
+  kvm_msi_via_irqfd_enabled();
 uint64_t bar_size;
 unsigned msix_table_offset, msix_pba_offset;
 int ret;
@@ -7534,6 +7608,13 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 }
 }
 
+if (with_irqfd) {
+msix_set_vector_notifiers(pci_dev,
+  nvme_vector_unmask,
+  nvme_vector_mask,
+  nvme_vector_poll);
+}
+
 nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize);
 
 if (n->params.cmb_size_mb) {
@@ -7781,6 +7862,7 @@ static void nvme_exit(PCIDevice *pci_dev)
 pcie_sriov_pf_exit(pci_dev);
 }
 
+msix_unset_vector_notifiers(pci_dev);
 msix_uninit(pci_dev, &n->bar0, &n->bar0);
 memory_region_del_subregion(&n->bar0, &n->iomem);
 }
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 85fd9cd0e2..707a55ebfc 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -20,6 +20,7 @@
 
 #include "qemu/uuid.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
 #include "hw/block/block.h"
 
 #include "block/nvme.h"
@@ -401,6 +402,7 @@ typedef struct NvmeCQueue {
 EventNotifier notifier;
 EventNotifier assert_notifier;
 EventNotifier deassert_notifier;
+MSIMessage  msg;
 boolfirst_io_cqe;
 boolioeventfd_enabled;
 QTAILQ_HEAD(, NvmeSQueue) sq_list;
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index fccb79f489..b11fcf4a65 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trac

[PATCH 3/4] hw/nvme: use irqfd to send interrupts

2022-08-11 Thread Jinhao Fan

Use KVM's irqfd to send interrupts when possible. This approach is
thread safe. Moreover, it does not have the inter-thread communication
overhead of plain event notifiers since handler callback are called
in the same system call as irqfd write.

Signed-off-by: Jinhao Fan 
---
 hw/nvme/ctrl.c | 50 --
 hw/nvme/nvme.h |  1 +
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 8a1c5ce3e1..63f988f2f9 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -192,6 +192,7 @@
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/hostmem.h"
 #include "hw/pci/msix.h"
@@ -1354,8 +1355,26 @@ static void nvme_deassert_notifier_read(EventNotifier *e)
 }
 }
 
+static int nvme_kvm_msix_vector_use(NvmeCtrl *n,
+NvmeCQueue *cq,
+uint32_t vector)
+{
+int ret;
+
+KVMRouteChange c = kvm_irqchip_begin_route_changes(kvm_state);
+ret = kvm_irqchip_add_msi_route(&c, vector, &n->parent_obj);
+if (ret < 0) {
+return ret;
+}
+kvm_irqchip_commit_route_changes(&c);
+cq->virq = ret;
+return 0;
+}
+
 static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
 {
+bool with_irqfd = msix_enabled(&n->parent_obj) &&
+  kvm_msi_via_irqfd_enabled();
 int ret;
 
 ret = event_notifier_init(&cq->assert_notifier, 0);
@@ -1363,8 +1382,21 @@ static void nvme_init_irq_notifier(NvmeCtrl *n, 
NvmeCQueue *cq)
 goto fail_assert_handler;
 }
 
-event_notifier_set_handler(&cq->assert_notifier,
-nvme_assert_notifier_read);
+if (with_irqfd) {
+ret = nvme_kvm_msix_vector_use(n, cq, cq->vector);
+if (ret < 0) {
+goto fail_assert_handler;
+}
+ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+ &cq->assert_notifier, NULL, 
+ cq->virq);
+if (ret < 0) {
+goto fail_kvm;
+ }
+} else {
+event_notifier_set_handler(&cq->assert_notifier,
+   nvme_assert_notifier_read);
+}
 
 if (!msix_enabled(&n->parent_obj)) {
 ret = event_notifier_init(&cq->deassert_notifier, 0);
@@ -1381,6 +1413,12 @@ static void nvme_init_irq_notifier(NvmeCtrl *n, 
NvmeCQueue *cq)
 fail_deassert_handler:
 event_notifier_set_handler(&cq->deassert_notifier, NULL);
 event_notifier_cleanup(&cq->deassert_notifier);
+if (with_irqfd) {
+kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &cq->assert_notifier,
+  cq->virq);
+fail_kvm:
+kvm_irqchip_release_virq(kvm_state, cq->virq);
+}
 fail_assert_handler:
 event_notifier_set_handler(&cq->assert_notifier, NULL);
 event_notifier_cleanup(&cq->assert_notifier);
@@ -4764,6 +4802,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest 
*req)
 
 static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 {
+bool with_irqfd = msix_enabled(&n->parent_obj) &&
+  kvm_msi_via_irqfd_enabled();
 uint16_t offset = (cq->cqid << 3) + (1 << 2);
 
 n->cq[cq->cqid] = NULL;
@@ -4775,6 +4815,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 event_notifier_cleanup(&cq->notifier);
 }
 if (cq->assert_notifier.initialized) {
+if (with_irqfd) {
+kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+  &cq->assert_notifier, 
+  cq->virq);
+kvm_irqchip_release_virq(kvm_state, cq->virq);
+}
 event_notifier_set_handler(&cq->assert_notifier, NULL);
 event_notifier_cleanup(&cq->assert_notifier);
 }
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 759d0ecd7c..85fd9cd0e2 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -396,6 +396,7 @@ typedef struct NvmeCQueue {
 uint64_tdma_addr;
 uint64_tdb_addr;
 uint64_tei_addr;
+int virq;
 QEMUTimer   *timer;
 EventNotifier notifier;
 EventNotifier assert_notifier;
-- 
2.25.1

Re: [RFC v3 1/8] blkio: add io_uring block driver using libblkio

2022-08-11 Thread Stefan Hajnoczi

On Tue, Jul 12, 2022 at 04:23:32PM +0200, Stefano Garzarella wrote:
> On Fri, Jul 08, 2022 at 05:17:30AM +0100, Stefan Hajnoczi wrote:
> > libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
> > high-performance disk I/O. It currently supports io_uring and
> > virtio-blk-vhost-vdpa with additional drivers under development.
> > 
> > One of the reasons for developing libblkio is that other applications
> > besides QEMU can use it. This will be particularly useful for
> > vhost-user-blk which applications may wish to use for connecting to
> > qemu-storage-daemon.
> > 
> > libblkio also gives us an opportunity to develop in Rust behind a C API
> > that is easy to consume from QEMU.
> > 
> > This commit adds io_uring and virtio-blk-vhost-vdpa BlockDrivers to QEMU
> > using libblkio. It will be easy to add other libblkio drivers since they
> > will share the majority of code.
> > 
> > For now I/O buffers are copied through bounce buffers if the libblkio
> > driver requires it. Later commits add an optimization for
> > pre-registering guest RAM to avoid bounce buffers.
> > 
> > The syntax is:
> > 
> >  --blockdev 
> > io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off
> > 
> > and:
> > 
> >  --blockdev 
> > virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> > MAINTAINERS   |   6 +
> > meson_options.txt |   2 +
> > qapi/block-core.json  |  37 +-
> > meson.build   |   9 +
> > block/blkio.c | 659 ++
> > tests/qtest/modules-test.c|   3 +
> > block/meson.build |   1 +
> > scripts/meson-buildoptions.sh |   3 +
> > 8 files changed, 718 insertions(+), 2 deletions(-)
> > create mode 100644 block/blkio.c
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 450abd0252..50f340d9ee 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3395,6 +3395,12 @@ L: qemu-block@nongnu.org
> > S: Maintained
> > F: block/vdi.c
> > 
> > +blkio
> > +M: Stefan Hajnoczi 
> > +L: qemu-block@nongnu.org
> > +S: Maintained
> > +F: block/blkio.c
> > +
> > iSCSI
> > M: Ronnie Sahlberg 
> > M: Paolo Bonzini 
> > diff --git a/meson_options.txt b/meson_options.txt
> > index 97c38109b1..b0b2e0c9b5 100644
> > --- a/meson_options.txt
> > +++ b/meson_options.txt
> > @@ -117,6 +117,8 @@ option('bzip2', type : 'feature', value : 'auto',
> >description: 'bzip2 support for DMG images')
> > option('cap_ng', type : 'feature', value : 'auto',
> >description: 'cap_ng support')
> > +option('blkio', type : 'feature', value : 'auto',
> > +   description: 'libblkio block device driver')
> > option('bpf', type : 'feature', value : 'auto',
> > description: 'eBPF support')
> > option('cocoa', type : 'feature', value : 'auto',
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 2173e7734a..aa63d5e9bd 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -2951,11 +2951,15 @@
> > 'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
> > {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
> > {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
> > -'http', 'https', 'iscsi',
> > +'http', 'https',
> > +{ 'name': 'io_uring', 'if': 'CONFIG_BLKIO' },
> > +'iscsi',
> > 'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 
> > 'parallels',
> > 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
> > { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> > -'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> > +'ssh', 'throttle', 'vdi', 'vhdx',
> > +{ 'name': 'virtio-blk-vhost-vdpa', 'if': 'CONFIG_BLKIO' },
> > +'vmdk', 'vpc', 'vvfat' ] }
> > 
> > ##
> > # @BlockdevOptionsFile:
> > @@ -3678,6 +3682,30 @@
> > '*debug': 'int',
> > '*logfile': 'str' } }
> > 
> > +##
> > +# @BlockdevOptionsIoUring:
> > +#
> > +# Driver specific block device options for the io_uring backend.
> > +#
> > +# @filename: path to the image file
> > +#
> > +# Since: 7.1
> > +##
> > +{ 'struct': 'BlockdevOptionsIoUring',
> > +  'data': { 'filename': 'str' } }
> > +
> > +##
> > +# @BlockdevOptionsVirtioBlkVhostVdpa:
> > +#
> > +# Driver specific block device options for the virtio-blk-vhost-vdpa 
> > backend.
> > +#
> > +# @path: path to the vhost-vdpa character device.
> > +#
> > +# Since: 7.1
> > +##
> > +{ 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa',
> > +  'data': { 'path': 'str' } }
> > +
> > ##
> > # @IscsiTransport:
> > #
> > @@ -4305,6 +4333,8 @@
> >'if': 'HAVE_HOST_BLOCK_DEVICE' },
> >   'http':   'BlockdevOptionsCurlHttp',
> >   'https':  'BlockdevOptionsCurlHttps',
> > +  'io_uring':   { 'type': 'BlockdevOptionsIoUring',
> > +

[PATCH] gluster: stop using .bdrv_needs_filename

2022-08-11 Thread Stefan Hajnoczi

The gluster protocol driver used to parse URIs (filenames) but was
extended with a richer JSON syntax in commit 6c7189bb29de
("block/gluster: add support for multiple gluster servers"). The gluster
drivers that have JSON parsing set .bdrv_needs_filename to false.

The gluster+unix and gluster+rdma drivers still to require a filename
even though the JSON parser is equipped to parse the same
volume/path/sockaddr details as the URI parser. Let's allow JSON parsing
for these drivers too.

Note that the gluster+rdma driver actually uses TCP because RDMA support
is not available, so the JSON server.type field must be "inet".

Drop .bdrv_needs_filename since both the filename and the JSON parsers
can handle gluster+unix and gluster+rdma. This change is in preparation
for eventually removing .bdrv_needs_filename across the entire codebase.

Cc: Prasanna Kumar Kalever 
Signed-off-by: Stefan Hajnoczi 
---
My motivation was to remove .bdrv_needs_filename across the entire
codebase, but my confidence in avoiding regressions is too low so I gave
up. There is too much magic around the filename. Most drivers don't want
a "filename" QDict entry although .bdrv_parse_filename() places the
filename string there temporarily. But file-posix does want a "filename"
QDict entry so it breaks when we remove .bdrv_needs_filename. I'm pretty
sure it's accidental complexity but it's hard to simplify without
breaking block drivers.

This gluster patch can be merged though, so I'm sending it.

 block/gluster.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index b60213ab80..bb1144cf6a 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1555,7 +1555,6 @@ static BlockDriver bdrv_gluster = {
 .format_name  = "gluster",
 .protocol_name= "gluster",
 .instance_size= sizeof(BDRVGlusterState),
-.bdrv_needs_filename  = false,
 .bdrv_file_open   = qemu_gluster_open,
 .bdrv_reopen_prepare  = qemu_gluster_reopen_prepare,
 .bdrv_reopen_commit   = qemu_gluster_reopen_commit,
@@ -1585,7 +1584,6 @@ static BlockDriver bdrv_gluster_tcp = {
 .format_name  = "gluster",
 .protocol_name= "gluster+tcp",
 .instance_size= sizeof(BDRVGlusterState),
-.bdrv_needs_filename  = false,
 .bdrv_file_open   = qemu_gluster_open,
 .bdrv_reopen_prepare  = qemu_gluster_reopen_prepare,
 .bdrv_reopen_commit   = qemu_gluster_reopen_commit,
@@ -1615,7 +1613,6 @@ static BlockDriver bdrv_gluster_unix = {
 .format_name  = "gluster",
 .protocol_name= "gluster+unix",
 .instance_size= sizeof(BDRVGlusterState),
-.bdrv_needs_filename  = true,
 .bdrv_file_open   = qemu_gluster_open,
 .bdrv_reopen_prepare  = qemu_gluster_reopen_prepare,
 .bdrv_reopen_commit   = qemu_gluster_reopen_commit,
@@ -1651,7 +1648,6 @@ static BlockDriver bdrv_gluster_rdma = {
 .format_name  = "gluster",
 .protocol_name= "gluster+rdma",
 .instance_size= sizeof(BDRVGlusterState),
-.bdrv_needs_filename  = true,
 .bdrv_file_open   = qemu_gluster_open,
 .bdrv_reopen_prepare  = qemu_gluster_reopen_prepare,
 .bdrv_reopen_commit   = qemu_gluster_reopen_commit,
-- 
2.37.1

Re: [RFC v3 1/8] blkio: add io_uring block driver using libblkio

2022-08-11 Thread Stefan Hajnoczi

On Wed, Jul 13, 2022 at 02:05:18PM +0200, Hanna Reitz wrote:
> On 08.07.22 06:17, Stefan Hajnoczi wrote:
> > libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
> > high-performance disk I/O. It currently supports io_uring and
> > virtio-blk-vhost-vdpa with additional drivers under development.
> > 
> > One of the reasons for developing libblkio is that other applications
> > besides QEMU can use it. This will be particularly useful for
> > vhost-user-blk which applications may wish to use for connecting to
> > qemu-storage-daemon.
> > 
> > libblkio also gives us an opportunity to develop in Rust behind a C API
> > that is easy to consume from QEMU.
> > 
> > This commit adds io_uring and virtio-blk-vhost-vdpa BlockDrivers to QEMU
> > using libblkio. It will be easy to add other libblkio drivers since they
> > will share the majority of code.
> > 
> > For now I/O buffers are copied through bounce buffers if the libblkio
> > driver requires it. Later commits add an optimization for
> > pre-registering guest RAM to avoid bounce buffers.
> > 
> > The syntax is:
> > 
> >--blockdev 
> > io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off
> > 
> > and:
> > 
> >--blockdev 
> > virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   MAINTAINERS   |   6 +
> >   meson_options.txt |   2 +
> >   qapi/block-core.json  |  37 +-
> >   meson.build   |   9 +
> >   block/blkio.c | 659 ++
> >   tests/qtest/modules-test.c|   3 +
> >   block/meson.build |   1 +
> >   scripts/meson-buildoptions.sh |   3 +
> >   8 files changed, 718 insertions(+), 2 deletions(-)
> >   create mode 100644 block/blkio.c
> 
> [...]
> 
> > diff --git a/block/blkio.c b/block/blkio.c
> > new file mode 100644
> > index 00..7fbdbd7fae
> > --- /dev/null
> > +++ b/block/blkio.c
> > @@ -0,0 +1,659 @@
> 
> Not sure whether it’s necessary, but I would have expected a copyright
> header here.

Thanks for reminding me, I will add a header.

> 
> > +#include "qemu/osdep.h"
> > +#include 
> > +#include "block/block_int.h"
> > +#include "qapi/error.h"
> > +#include "qapi/qmp/qdict.h"
> > +#include "qemu/module.h"
> > +
> > +typedef struct BlkAIOCB {
> > +BlockAIOCB common;
> > +struct blkio_mem_region mem_region;
> > +QEMUIOVector qiov;
> > +struct iovec bounce_iov;
> > +} BlkioAIOCB;
> > +
> > +typedef struct {
> > +/* Protects ->blkio and request submission on ->blkioq */
> > +QemuMutex lock;
> > +
> > +struct blkio *blkio;
> > +struct blkioq *blkioq; /* this could be multi-queue in the future */
> > +int completion_fd;
> > +
> > +/* Polling fetches the next completion into this field */
> > +struct blkio_completion poll_completion;
> > +
> > +/* The value of the "mem-region-alignment" property */
> > +size_t mem_region_alignment;
> > +
> > +/* Can we skip adding/deleting blkio_mem_regions? */
> > +bool needs_mem_regions;
> > +} BDRVBlkioState;
> > +
> > +static void blkio_aiocb_complete(BlkioAIOCB *acb, int ret)
> > +{
> > +/* Copy bounce buffer back to qiov */
> > +if (acb->qiov.niov > 0) {
> > +qemu_iovec_from_buf(&acb->qiov, 0,
> > +acb->bounce_iov.iov_base,
> > +acb->bounce_iov.iov_len);
> > +qemu_iovec_destroy(&acb->qiov);
> > +}
> > +
> > +acb->common.cb(acb->common.opaque, ret);
> > +
> > +if (acb->mem_region.len > 0) {
> > +BDRVBlkioState *s = acb->common.bs->opaque;
> > +
> > +WITH_QEMU_LOCK_GUARD(&s->lock) {
> > +blkio_free_mem_region(s->blkio, &acb->mem_region);
> > +}
> > +}
> > +
> > +qemu_aio_unref(&acb->common);
> > +}
> > +
> > +/*
> > + * Only the thread that calls aio_poll() invokes fd and poll handlers.
> > + * Therefore locks are not necessary except when accessing s->blkio.
> > + *
> > + * No locking is performed around blkioq_get_completions() although other
> > + * threads may submit I/O requests on s->blkioq. We're assuming there is no
> > + * inteference between blkioq_get_completions() and other s->blkioq APIs.
> > + */
> > +
> > +static void blkio_completion_fd_read(void *opaque)
> > +{
> > +BlockDriverState *bs = opaque;
> > +BDRVBlkioState *s = bs->opaque;
> > +struct blkio_completion completion;
> > +uint64_t val;
> > +ssize_t ret __attribute__((unused));
> 
> I’d prefer a `(void)ret;` over this attribute, not least because that line
> would give a nice opportunity to explain in a short comment why we ignore
> this return value that the compiler tells us not to ignore, but if you
> don’t, then this’ll be fine.

Okay, I'll use (void)ret; and add a comment.

> 
> > +
> > +/* Polling may have already fetched a completion */
> > +if (s->poll_completion.user_data != NULL) {
> > +completion =

Re: [RFC v3 1/8] blkio: add io_uring block driver using libblkio

2022-08-11 Thread Stefan Hajnoczi

On Wed, Jul 27, 2022 at 09:33:40PM +0200, Kevin Wolf wrote:
> Am 08.07.2022 um 06:17 hat Stefan Hajnoczi geschrieben:
> > libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
> > high-performance disk I/O. It currently supports io_uring and
> > virtio-blk-vhost-vdpa with additional drivers under development.
> > 
> > One of the reasons for developing libblkio is that other applications
> > besides QEMU can use it. This will be particularly useful for
> > vhost-user-blk which applications may wish to use for connecting to
> > qemu-storage-daemon.
> > 
> > libblkio also gives us an opportunity to develop in Rust behind a C API
> > that is easy to consume from QEMU.
> > 
> > This commit adds io_uring and virtio-blk-vhost-vdpa BlockDrivers to QEMU
> > using libblkio. It will be easy to add other libblkio drivers since they
> > will share the majority of code.
> > 
> > For now I/O buffers are copied through bounce buffers if the libblkio
> > driver requires it. Later commits add an optimization for
> > pre-registering guest RAM to avoid bounce buffers.
> > 
> > The syntax is:
> > 
> >   --blockdev 
> > io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off
> > 
> > and:
> > 
> >   --blockdev 
> > virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> The subject line implies only io_uring, but you actually add vhost-vdpa
> support, too. I think the subject line should be changed.
> 
> I think it would also make sense to already implement support for
> vhost-user-blk on the QEMU side even if support isn't compiled in
> libblkio by default and opening vhost-user-blk images would therefore
> always fail with a default build.
> 
> But then you could run QEMU with a custom build of libblkio to make use
> of it without patching QEMU. This is probably useful for getting libvirt
> support for using a storage daemon implemented without having to wait
> for another QEMU release. (Peter, do you have any opinion on this?)

vhost-user-blk is now supported in all builds of libblkio. I'll add it.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

[PATCH v15 1/6] qmp: add QMP command x-query-virtio

[PATCH v15 0/6] hmp,qmp: Add commands to introspect virtio devices

[PATCH v15 3/6] qmp: decode feature & status bits in virtio-status

[PATCH v15 5/6] qmp: add QMP command x-query-virtio-queue-element

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

[PATCH v15 4/6] qmp: add QMP commands for virtio/vhost queue-status

[PATCH v15 2/6] qmp: add QMP command x-query-virtio-status

[PATCH v15 6/6] hmp: add virtio commands

Re: [PATCH v2 11/15] qemu-common: move scripts/qapi

[PATCH v2 3/8] parallels: Replace bdrv_co_pwrite_sync by bdrv_co_flush for BAT flushing

[PATCH v2 0/8] parallels: Refactor the code of images checks and fix a bug

[PATCH v2 1/8] parallels: Out of image offset in BAT leads to image inflation

[PATCH v2 8/8] parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD

[PATCH v2 7/8] parallels: Move statistic collection to a separate function

[PATCH v2 4/8] parallels: Move check of unclean image to a separate function

[PATCH v2 5/8] parallels: Move check of cluster outside image to a separate function

[PATCH v2 2/8] parallels: Move BAT entry setting to a separate function

[PATCH v2 6/8] parallels: Move check of leaks to a separate function

[PATCH 1/4] hw/nvme: avoid unnecessary call to irq (de)assertion functions

[PATCH 2/4] hw/nvme: add option to (de)assert irq with eventfd

[PATCH 4/4] hw/nvme: add MSI-x mask handlers for irqfd

[PATCH 3/4] hw/nvme: use irqfd to send interrupts

Re: [RFC v3 1/8] blkio: add io_uring block driver using libblkio

[PATCH] gluster: stop using .bdrv_needs_filename

Re: [RFC v3 1/8] blkio: add io_uring block driver using libblkio

Re: [RFC v3 1/8] blkio: add io_uring block driver using libblkio

32 matches

Site Navigation

Mail list logo

Footer information