On Fri, Apr 04, 2025 at 12:55:09PM +0200, David Hildenbrand wrote: > On 04.04.25 12:00, David Hildenbrand wrote: > > On 04.04.25 06:36, Halil Pasic wrote: > > > On Thu, 3 Apr 2025 16:28:31 +0200 > > > David Hildenbrand <da...@redhat.com> wrote: > > > > > > > > Sorry I have to have a look at that discussion. Maybe it will answer > > > > > some my questions. > > > > > > > > Yes, I think so. > > > > > > > > > > Let's fix it without affecting existing setups for now by properly > > > > > > ignoring the non-existing queues, so the indicator bits will match > > > > > > the queue indexes. > > > > > > > > > > Just one question. My understanding is that the crux is that Linux > > > > > and QEMU (or the driver and the device) disagree at which index > > > > > reporting_vq is actually sitting. Is that right? > > > > > > > > I thought I made it clear: this is only about the airq indicator bit. > > > > That's where both disagree. > > > > > > > > Not the actual queue index (see above). > > > > > > I did some more research including having a look at that discussion. Let > > > me try to sum up how did we end up here. > > > > Let me add some more details after digging as well: > > > > > > > > Before commit a229989d975e ("virtio: don't allocate vqs when names[i] = > > > NULL") the kernel behavior used to be in spec, but QEMU and possibly > > > other hypervisor were out of spec and things did not work. > > > > It all started with VIRTIO_BALLOON_F_FREE_PAGE_HINT. Before that, > > we only had the single optional VIRTIO_BALLOON_F_STATS_VQ queue at the very > > end. So there was no possibility for holes "in-between". > > > > In the Linux driver, we created the stats queue only if the feature bit > > VIRTIO_BALLOON_F_STATS_VQ was actually around: > > > > nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2; > > err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL); > > > > That changed with VIRTIO_BALLOON_F_FREE_PAGE_HINT, because we would > > unconditionally create 4 queues. QEMU always supported the first 3 queues > > unconditionally, but old QEMU did obviously not support the (new) > > VIRTIO_BALLOON_F_FREE_PAGE_HINT queue. > > > > 390x didn't particularly like getting queried for non-existing > > queues. [1] So the fix was not for a hypervisor that was out of spec, but > > because quering non-existing queues didn't work. > > > > The fix implied that if VIRTIO_BALLOON_F_STATS_VQ is missing, suddenly the > > queue > > index of VIRTIO_BALLOON_F_FREE_PAGE_HINT changed as well. > > > > Again, as QEMU always implemented the 3 first queues unconditionally, this > > was > > not a problem. > > > > [1] > > https://lore.kernel.org/all/c6746307-fae5-7652-af8d-19f560fc3...@de.ibm.com/#t > > > > > > > > Possibly because of the complexity of fixing the hypervisor(s) commit > > > a229989d975e ("virtio: don't allocate vqs when names[i] = NULL") opted > > > for changing the guest side so that it does not fit the spec but fits > > > the hypervisor(s). It unfortunately also broke notifiers (for the with > > > holes) scenario for virtio-ccw only. > > > > Yes, it broke the notifiers. > > > > But note that everything was in spec at that point, because we only > > documented > > "free_page_vq == 3" in the spec *2 years later*, in 2020: > > > > commit 38448268eba0c105200d131c3f7f660129a4d673 > > Author: Alexander Duyck <alexander.h.du...@linux.intel.com> > > Date: Tue Aug 25 07:45:02 2020 -0700 > > > > content: Document balloon feature free page hints > > Free page hints allow the balloon driver to provide information on > > what > > pages are not currently in use so that we can avoid the cost of > > copying > > them in migration scenarios. Add a feature description for free page > > hints > > describing basic functioning and requirements. > > At that point, what we documented in the spec *did not match reality* in > > Linux. QEMU was fully compatible, because VIRTIO_BALLOON_F_STATS_VQ is > > unconditionally set. > > > > > > QEMU and Linux kept using that queue index assignment model, and the spec > > was wrong (out of sync?) at that point. The spec got more wrong with > > > > commit d917d4a8d552c003e046b0e3b1b529d98f7e695b > > Author: Alexander Duyck <alexander.h.du...@linux.intel.com> > > Date: Tue Aug 25 07:45:17 2020 -0700 > > > > content: Document balloon feature free page reporting > > Free page reporting is a feature that allows the guest to proactively > > report unused pages to the host. By making use of this feature is is > > possible to reduce the overall memory footprint of the guest in cases > > where > > some significant portion of the memory is idle. Add documentation for > > the > > free page reporting feature describing the functionality and > > requirements. > > > > Where we documented VIRTIO_BALLOON_F_REPORTING after the changes were added > > to > > QEMU+Linux implementation, so the spec did not reflect reality. > > > > I'll note also cloud-hypervisor [2] today follows that model. > > > > In particular, it *only* supports VIRTIO_BALLOON_F_REPORTING, turning > > the queue index of VIRTIO_BALLOON_F_REPORTING into *2* instead of documented > > in the spec to be *4*. > > > > So in reality, we can see VIRTIO_BALLOON_F_REPORTING to be either 2/3/4, > > depending > > on the availability of the other two features/queues. > > > > [2] > > https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/virtio-devices/src/balloon.rs > > > > > > > > > > Now we had another look at this, and have concluded that fixing the > > > hypervisor(s) and fixing the kernel, and making sure that the fixed > > > kernel can tolerate the old broken hypervisor(s) is way to complicated > > > if possible at all. So we decided to give the spec a reality check and > > > fix the notifier bit assignment for virtio-ccw which is broken beyond > > > doubt if we accept that the correct virtqueue index is the one that the > > > hypervisor(s) use and not the one that the spec says they should use. > > > > In case of virtio-balloon, it's unfortunate that it went that way, but the > > spec simply did not / does not reflect reality when it was added to the > > spec. > > > > > > > > With the spec fixed, the whole notion of "holes" will be something that > > > does not make sense any more. With that the merit of the kernel interface > > > virtio_find_vqs() supporting "holes" is quite questionable. Now we need > > > it because the drivers within the Linux kernel still think of the queues > > > in terms of the current spec, i.e. they try to have the "holes" as > > > mandated by the spec, and the duty of making it work with the broken > > > device implementations falls to the transports. > > > > > > > Right, the "holes" only exist in the input array. > > > > > Under the assumption that the spec is indeed going to be fixed: > > For virito-balloon, we should probably do the following: > > From 38e340c2bb53c2a7cc7c675f5dfdd44ecf7701d9 Mon Sep 17 00:00:00 2001 > From: David Hildenbrand <da...@redhat.com> > Date: Fri, 4 Apr 2025 12:53:16 +0200 > Subject: [PATCH] virtio-balloon: Fix queue index assignment for > non-existing queues > > Signed-off-by: David Hildenbrand <da...@redhat.com> > --- > device-types/balloon/description.tex | 22 ++++++++++++++++------ > 1 file changed, 16 insertions(+), 6 deletions(-) > > diff --git a/device-types/balloon/description.tex > b/device-types/balloon/description.tex > index a1d9603..a7396ff 100644 > --- a/device-types/balloon/description.tex > +++ b/device-types/balloon/description.tex > @@ -16,6 +16,21 @@ \subsection{Device ID}\label{sec:Device Types / Memory > Balloon Device / Device I > 5 > \subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / > Virtqueues} > + > +\begin{description} > +\item[inflateq] Exists unconditionally. > +\item[deflateq] Exists unconditionally. > +\item[statsq] Only exists if VIRTIO_BALLOON_F_STATS_VQ is set. > +\item[free_page_vq] Only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set. > +\item[reporting_vq] Only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set. > +\end{description} > + > +\begin{note} > +Virtqueue indexes are assigned sequentially for existing queues, starting > +with index 0; consequently, if a virtqueue does not exist, it does not get > +an index assigned. Assuming all virtqueues exist for a device, the indexes > +are: > + > \begin{description} > \item[0] inflateq > \item[1] deflateq > @@ -23,12 +38,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Memory > Balloon Device / Virtque > \item[3] free_page_vq > \item[4] reporting_vq > \end{description} > - > - statsq only exists if VIRTIO_BALLOON_F_STATS_VQ is set. > - > - free_page_vq only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set. > - > - reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set. > +\end{note} > \subsection{Feature bits}\label{sec:Device Types / Memory Balloon Device / > Feature bits} > \begin{description} > -- > 2.48.1 > > > If something along these lines sounds reasonable, I can send a proper patch > to the > proper audience.
Indeed, but do we want to add a note about previous spec versions saying something different? Maybe, with a hint how devices following old spec can be detected? > -- > Cheers, > > David / dhildenb