date:20211213

On Mon, Dec 13, 2021 at 11:02:39AM +0800, Jason Wang wrote:
> On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > Sorry for reviving this ancient thread. I was kinda lost for the 
> > > conclusion
> > > it ended up with. I have the following questions,
> > >
> > > 1. legacy guest support: from the past conversations it doesn't seem the
> > > support will be completely dropped from the table, is my understanding
> > > correct? Actually we're interested in supporting virtio v0.95 guest for 
> > > x86,
> > > which is backed by the spec at
> > > https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not 
> > > sure
> > > if there's request/need to support wilder legacy virtio versions earlier
> > > beyond.
> >
> > I personally feel it's less work to add in kernel than try to
> > work around it in userspace. Jason feels differently.
> > Maybe post the patches and this will prove to Jason it's not
> > too terrible?
> 
> That's one way, other than the config access before setting features,
> we need to deal with other stuffs:
> 
> 1) VIRTIO_F_ORDER_PLATFORM
> 2) there could be a parent device that only support 1.0 device
> 
> And a lot of other stuff summarized in spec 7.4 which seems not an
> easy task. Various vDPA parent drivers were written under the
> assumption that only modern devices are supported.
> 
> Thanks

Limiting things to x86 will likely address most issues though, won't it?

> >
> > > 2. suppose some form of legacy guest support needs to be there, how do we
> > > deal with the bogus assumption below in vdpa_get_config() in the short 
> > > term?
> > > It looks one of the intuitive fix is to move the vdpa_set_features call 
> > > out
> > > of vdpa_get_config() to vdpa_set_config().
> > >
> > > /*
> > >  * Config accesses aren't supposed to trigger before features are
> > > set.
> > >  * If it does happen we assume a legacy guest.
> > >  */
> > > if (!vdev->features_valid)
> > > vdpa_set_features(vdev, 0);
> > > ops->get_config(vdev, offset, buf, len);
> > >
> > > I can post a patch to fix 2) if there's consensus already reached.
> > >
> > > Thanks,
> > > -Siwei
> >
> > I'm not sure how important it is to change that.
> > In any case it only affects transitional devices, right?
> > Legacy only should not care ...
> >
> >
> > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > >
> > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > Detecting it isn't enough though, we will need a new ioctl to 
> > > > > > > > > notify
> > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > know what the use
> > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > directly? Is there a
> > > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > BTW a good API could be
> > > > > > >
> > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > >
> > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > >
> > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > legacy driver
> > > > > > for vDPA. Consider:
> > > > > >
> > > > > > 1) It's definition is no-normative
> > > > > > 2) A lot of budren of codes
> > > > > >
> > > > > > So qemu can still present the legacy device since the config
> > > > > > space or other
> > > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > > accessed by
> > > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > > in this
> > > > > > case?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > Overall I would be fine with this approach but we need to avoid 
> > > > > breaking
> > > > > working userspace, qemu releases with vdpa support are out there and
> > > > > seem to work for people. Any changes need to take that into account
> > > > > and document compatibility concerns.
> > > >
> > > >
> > > > Agree, let me check.
> > > >
> > > >
> > > > >   I note that any hardware
> > > > > implementation is already broken for legacy except on platforms with
> > > > > strong ordering which might be helpful in reducing the scope.
> > > >
> > > >
> > > > Yes.
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > >
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter

On Sun, Dec 12, 2021 at 06:48:05AM -0800, Dan Williams wrote:
> On Fri, Dec 10, 2021 at 6:05 AM Vivek Goyal  wrote:
> >
> > On Thu, Dec 09, 2021 at 07:38:28AM +0100, Christoph Hellwig wrote:
> > > While using the MC-safe copy routines is rather pointless on a virtual 
> > > device
> > > like virtiofs,
> >
> > I was wondering about that. Is it completely pointless.
> >
> > Typically we are just mapping host page cache into qemu address space.
> > That shows as virtiofs device pfn in guest and that pfn is mapped into
> > guest application address space in mmap() call.
> >
> > Given on host its DRAM, so I would not expect machine check on load side
> > so there was no need to use machine check safe variant.
> 
> That's a broken assumption, DRAM experiences multi-bit ECC errors.
> Machine checks, data aborts, etc existed before PMEM.

So the conclusion here is that we should always use the mc safe variant?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods

On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal  wrote:
> > Going forward, I am wondering should virtiofs use flushcache version as
> > well. What if host filesystem is using DAX and mapping persistent memory
> > pfn directly into qemu address space. I have never tested that.
> >
> > Right now we are relying on applications to do fsync/msync on virtiofs
> > for data persistence.
> 
> This sounds like it would need coordination with a paravirtualized
> driver that can indicate whether the host side is pmem or not, like
> the virtio_pmem driver. However, if the guest sends any fsync/msync
> you would still need to go explicitly cache flush any dirty page
> because you can't necessarily trust that the guest did that already.

Do we?  The application can't really know what backend it is on, so
it sounds like the current virtiofs implementation doesn't really, does it?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods

On Sun, Dec 12, 2021 at 06:39:16AM -0800, Dan Williams wrote:
> > /* flag to check if device supports synchronous flush */
> > DAXDEV_SYNC,
> > +   /* do not use uncached operations to write data */
> > +   DAXDEV_CACHED,
> > +   /* do not use mcsafe operations to read data */
> > +   DAXDEV_NOMCSAFE,
> 
> Linus did not like the mcsafe name, and this brings it back. Let's
> flip the polarity to positively indicate which routine to use, and to
> match the 'nofault' style which says "copy and handle faults".
> 
> /* do not leave the caches dirty after writes */
> DAXDEV_NOCACHE
> 
> /* handle CPU fetch exceptions during reads */
> DAXDEV_NOMC
> 
> ...and then flip the use cases around.

Sure we can do that.  But let's finish the discussion if we actually
need the virtiofs special casing, as it seems pretty fishy in many
aspects.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()

On Sun, Dec 12, 2021 at 06:22:20AM -0800, Dan Williams wrote:
> > - * Use the 'no check' versions of copy_from_iter_flushcache() and
> > - * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
> > + * Use the 'no check' versions of _copy_from_iter_flushcache() and
> > + * _copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
> >   * checking, both file offset and device offset, is handled by
> >   * dax_iomap_actor()
> >   */
> 
> This comment change does not make sense since it is saying why pmem is
> using the "_" versions. However, I assume this whole comment goes away
> in a later patch.

It does not go away in this series.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Mon, Dec 13, 2021 at 4:07 PM Michael S. Tsirkin  wrote:
>
> On Mon, Dec 13, 2021 at 11:02:39AM +0800, Jason Wang wrote:
> > On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > > Sorry for reviving this ancient thread. I was kinda lost for the 
> > > > conclusion
> > > > it ended up with. I have the following questions,
> > > >
> > > > 1. legacy guest support: from the past conversations it doesn't seem the
> > > > support will be completely dropped from the table, is my understanding
> > > > correct? Actually we're interested in supporting virtio v0.95 guest for 
> > > > x86,
> > > > which is backed by the spec at
> > > > https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not 
> > > > sure
> > > > if there's request/need to support wilder legacy virtio versions earlier
> > > > beyond.
> > >
> > > I personally feel it's less work to add in kernel than try to
> > > work around it in userspace. Jason feels differently.
> > > Maybe post the patches and this will prove to Jason it's not
> > > too terrible?
> >
> > That's one way, other than the config access before setting features,
> > we need to deal with other stuffs:
> >
> > 1) VIRTIO_F_ORDER_PLATFORM
> > 2) there could be a parent device that only support 1.0 device
> >
> > And a lot of other stuff summarized in spec 7.4 which seems not an
> > easy task. Various vDPA parent drivers were written under the
> > assumption that only modern devices are supported.
> >
> > Thanks
>
> Limiting things to x86 will likely address most issues though, won't it?

For the ordering, yes. But it means we need to introduce a config
option for legacy logic?

And we need to deal with, as you said in another thread, kick before DRIVER_OK:

E.g we had thing like this:

if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
!(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) {
ret = ifcvf_request_irq(adapter);
if (ret) {

Similar issues could be found in other parents.

We also need to consider whether we should encourage the vendor to
implement the logic.

I think we can try and see how hard it is.

Thanks

>
> > >
> > > > 2. suppose some form of legacy guest support needs to be there, how do 
> > > > we
> > > > deal with the bogus assumption below in vdpa_get_config() in the short 
> > > > term?
> > > > It looks one of the intuitive fix is to move the vdpa_set_features call 
> > > > out
> > > > of vdpa_get_config() to vdpa_set_config().
> > > >
> > > > /*
> > > >  * Config accesses aren't supposed to trigger before features 
> > > > are
> > > > set.
> > > >  * If it does happen we assume a legacy guest.
> > > >  */
> > > > if (!vdev->features_valid)
> > > > vdpa_set_features(vdev, 0);
> > > > ops->get_config(vdev, offset, buf, len);
> > > >
> > > > I can post a patch to fix 2) if there's consensus already reached.
> > > >
> > > > Thanks,
> > > > -Siwei
> > >
> > > I'm not sure how important it is to change that.
> > > In any case it only affects transitional devices, right?
> > > Legacy only should not care ...
> > >
> > >
> > > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > > >
> > > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > > Detecting it isn't enough though, we will need a new ioctl 
> > > > > > > > > > to notify
> > > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > > know what the use
> > > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > > directly? Is there a
> > > > > > > > > case QEMU can't do with dedicate ioctls later if there's 
> > > > > > > > > indeed
> > > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > > BTW a good API could be
> > > > > > > >
> > > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > >
> > > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > > >
> > > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > > legacy driver
> > > > > > > for vDPA. Consider:
> > > > > > >
> > > > > > > 1) It's definition is no-normative
> > > > > > > 2) A lot of budren of codes
> > > > > > >
> > > > > > > So qemu can still present the legacy device since the config
> > > > > > > space or other
> > > > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > > > accessed by
> > > > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > > > in this
> > > > > > > case?
> > > > > > >
> > > > > > > Thanks
> >

Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Mon, Dec 13, 2021 at 04:57:38PM +0800, Jason Wang wrote:
> On Mon, Dec 13, 2021 at 4:07 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Dec 13, 2021 at 11:02:39AM +0800, Jason Wang wrote:
> > > On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > > > Sorry for reviving this ancient thread. I was kinda lost for the 
> > > > > conclusion
> > > > > it ended up with. I have the following questions,
> > > > >
> > > > > 1. legacy guest support: from the past conversations it doesn't seem 
> > > > > the
> > > > > support will be completely dropped from the table, is my understanding
> > > > > correct? Actually we're interested in supporting virtio v0.95 guest 
> > > > > for x86,
> > > > > which is backed by the spec at
> > > > > https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm 
> > > > > not sure
> > > > > if there's request/need to support wilder legacy virtio versions 
> > > > > earlier
> > > > > beyond.
> > > >
> > > > I personally feel it's less work to add in kernel than try to
> > > > work around it in userspace. Jason feels differently.
> > > > Maybe post the patches and this will prove to Jason it's not
> > > > too terrible?
> > >
> > > That's one way, other than the config access before setting features,
> > > we need to deal with other stuffs:
> > >
> > > 1) VIRTIO_F_ORDER_PLATFORM
> > > 2) there could be a parent device that only support 1.0 device
> > >
> > > And a lot of other stuff summarized in spec 7.4 which seems not an
> > > easy task. Various vDPA parent drivers were written under the
> > > assumption that only modern devices are supported.
> > >
> > > Thanks
> >
> > Limiting things to x86 will likely address most issues though, won't it?
> 
> For the ordering, yes. But it means we need to introduce a config
> option for legacy logic?
> And we need to deal with, as you said in another thread, kick before 
> DRIVER_OK:
> 
> E.g we had thing like this:
> 
> if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> !(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) {
> ret = ifcvf_request_irq(adapter);
> if (ret) {
> 
> Similar issues could be found in other parents.

The driver ok thing is mostly an issue for block where it
expects to access the disk directly during probe.

> We also need to consider whether we should encourage the vendor to
> implement the logic.
> 
> I think we can try and see how hard it is.
> 
> Thanks

Right. My point exactly.

> >
> > > >
> > > > > 2. suppose some form of legacy guest support needs to be there, how 
> > > > > do we
> > > > > deal with the bogus assumption below in vdpa_get_config() in the 
> > > > > short term?
> > > > > It looks one of the intuitive fix is to move the vdpa_set_features 
> > > > > call out
> > > > > of vdpa_get_config() to vdpa_set_config().
> > > > >
> > > > > /*
> > > > >  * Config accesses aren't supposed to trigger before features 
> > > > > are
> > > > > set.
> > > > >  * If it does happen we assume a legacy guest.
> > > > >  */
> > > > > if (!vdev->features_valid)
> > > > > vdpa_set_features(vdev, 0);
> > > > > ops->get_config(vdev, offset, buf, len);
> > > > >
> > > > > I can post a patch to fix 2) if there's consensus already reached.
> > > > >
> > > > > Thanks,
> > > > > -Siwei
> > > >
> > > > I'm not sure how important it is to change that.
> > > > In any case it only affects transitional devices, right?
> > > > Legacy only should not care ...
> > > >
> > > >
> > > > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > > > >
> > > > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > > > Detecting it isn't enough though, we will need a new 
> > > > > > > > > > > ioctl to notify
> > > > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > > > know what the use
> > > > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > > > directly? Is there a
> > > > > > > > > > case QEMU can't do with dedicate ioctls later if there's 
> > > > > > > > > > indeed
> > > > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > > > BTW a good API could be
> > > > > > > > >
> > > > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > >
> > > > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > > > >
> > > > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > > > legacy driver
> > > > > > > > for vDPA. Consider:
> > > > > > > >
> > > > > > > >

Re: [PATCH] Bluetooth: virtio_bt: fix device removal

On Thu, Dec 09, 2021 at 04:22:58PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 25, 2021 at 09:02:01PM +0100, Marcel Holtmann wrote:
> > Hi Michael,
> > 
> > > Device removal is clearly out of virtio spec: it attempts to remove
> > > unused buffers from a VQ before invoking device reset. To fix, make
> > > open/close NOPs and do all cleanup/setup in probe/remove.
> > 
> > so the virtbt_{open,close} as NOP is not really what a driver is suppose
> > to be doing. These are transport enable/disable callbacks from the BT
> > Core towards the driver. It maps to a device being enabled/disabled by
> > something like bluetoothd for example. So if disabled, I expect that no
> > resources/queues are in use.
> > 
> > Maybe I misunderstand the virtio spec in that regard, but I would like
> > to keep this fundamental concept of a Bluetooth driver. It does work
> > with all other transports like USB, SDIO, UART etc.
> > 
> > > The cost here is a single skb wasted on an unused bt device - which
> > > seems modest.
> > 
> > There should be no buffer used if the device is powered off. We also don’t
> > have any USB URBs in-flight if the transport is not active.
> > 
> > > NB: with this fix in place driver still suffers from a race condition if
> > > an interrupt triggers while device is being reset. Work on a fix for
> > > that issue is in progress.
> > 
> > In the virtbt_close() callback we should deactivate all interrupts.
> > 
> > Regards
> > 
> > Marcel
> 
> So Marcel, do I read it right that you are working on a fix
> and I can drop this patch for now?

ping

> -- 
> MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods

2021-12-13 Thread Vivek Goyal

On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal  wrote:
> >
> > On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote:
> > > These methods indirect the actual DAX read/write path.  In the end pmem
> > > uses magic flush and mc safe variants and fuse and dcssblk use plain ones
> > > while device mapper picks redirects to the underlying device.
> > >
> > > Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
> > > special variants, then use them everywhere as they fall back to the plain
> > > ones on s390 anyway and remove an indirect call from the read/write path
> > > as well as a lot of boilerplate code.
> > >
> > > Signed-off-by: Christoph Hellwig 
> > > ---
> > >  drivers/dax/super.c   | 36 ++--
> > >  drivers/md/dm-linear.c| 20 -
> > >  drivers/md/dm-log-writes.c| 80 ---
> > >  drivers/md/dm-stripe.c| 20 -
> > >  drivers/md/dm.c   | 50 --
> > >  drivers/nvdimm/pmem.c | 20 -
> > >  drivers/s390/block/dcssblk.c  | 14 --
> > >  fs/dax.c  |  5 ---
> > >  fs/fuse/virtio_fs.c   | 19 +
> > >  include/linux/dax.h   |  9 ++--
> > >  include/linux/device-mapper.h |  4 --
> > >  11 files changed, 37 insertions(+), 240 deletions(-)
> > >
> >
> > [..]
> > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > > index 5c03a0364a9bb..754319ce2a29b 100644
> > > --- a/fs/fuse/virtio_fs.c
> > > +++ b/fs/fuse/virtio_fs.c
> > > @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct 
> > > dax_device *dax_dev, pgoff_t pgoff,
> > >   return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
> > >  }
> > >
> > > -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev,
> > > -pgoff_t pgoff, void *addr,
> > > -size_t bytes, struct iov_iter *i)
> > > -{
> > > - return copy_from_iter(addr, bytes, i);
> > > -}
> > > -
> > > -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev,
> > > -pgoff_t pgoff, void *addr,
> > > -size_t bytes, struct iov_iter *i)
> > > -{
> > > - return copy_to_iter(addr, bytes, i);
> > > -}
> > > -
> > >  static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> > >pgoff_t pgoff, size_t nr_pages)
> > >  {
> > > @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct 
> > > dax_device *dax_dev,
> > >
> > >  static const struct dax_operations virtio_fs_dax_ops = {
> > >   .direct_access = virtio_fs_direct_access,
> > > - .copy_from_iter = virtio_fs_copy_from_iter,
> > > - .copy_to_iter = virtio_fs_copy_to_iter,
> > >   .zero_page_range = virtio_fs_zero_page_range,
> > >  };
> > >
> > > @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device 
> > > *vdev, struct virtio_fs *fs)
> > >   fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
> > >   if (IS_ERR(fs->dax_dev))
> > >   return PTR_ERR(fs->dax_dev);
> > > -
> > > + set_dax_cached(fs->dax_dev);
> >
> > Looks good to me from virtiofs point of view.
> >
> > Reviewed-by: Vivek Goyal 
> >
> > Going forward, I am wondering should virtiofs use flushcache version as
> > well. What if host filesystem is using DAX and mapping persistent memory
> > pfn directly into qemu address space. I have never tested that.
> >
> > Right now we are relying on applications to do fsync/msync on virtiofs
> > for data persistence.
> 
> This sounds like it would need coordination with a paravirtualized
> driver that can indicate whether the host side is pmem or not, like
> the virtio_pmem driver.

Agreed. Let me check the details of virtio_pmem driver.

> However, if the guest sends any fsync/msync
> you would still need to go explicitly cache flush any dirty page
> because you can't necessarily trust that the guest did that already.

So host dax functionality will already take care of that, IIUC, right?
I see a dax_flush() call in dax_writeback_one(). I am assuming that's
the will take care of flushing dirty pages when guest issues
fsync()/msync(). So probably don't have to do anything extra here.

I think qemu should map files using MAP_SYNC though in this case though.
Any read/writes to virtiofs files will turn into host file load/store
operations. So flushcache in guest makes more sense with MAP_SYNC which
should make sure any filesystem metadata will already persist after
fault completion. And later guest can do writes followed by flush and
ensure data persists too.

IOW, I probably only need to do following.

- In virtiofs virtual device, add a notion of kind of dax window or memory
  it supports. So may be some kind of "writethrough" property of virtiofs
  dax cache.

- Use this property in virtiofs driver to dec

Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter

2021-12-13 Thread Dan Williams

On Mon, Dec 13, 2021 at 12:20 AM Christoph Hellwig  wrote:
>
> On Sun, Dec 12, 2021 at 06:48:05AM -0800, Dan Williams wrote:
> > On Fri, Dec 10, 2021 at 6:05 AM Vivek Goyal  wrote:
> > >
> > > On Thu, Dec 09, 2021 at 07:38:28AM +0100, Christoph Hellwig wrote:
> > > > While using the MC-safe copy routines is rather pointless on a virtual 
> > > > device
> > > > like virtiofs,
> > >
> > > I was wondering about that. Is it completely pointless.
> > >
> > > Typically we are just mapping host page cache into qemu address space.
> > > That shows as virtiofs device pfn in guest and that pfn is mapped into
> > > guest application address space in mmap() call.
> > >
> > > Given on host its DRAM, so I would not expect machine check on load side
> > > so there was no need to use machine check safe variant.
> >
> > That's a broken assumption, DRAM experiences multi-bit ECC errors.
> > Machine checks, data aborts, etc existed before PMEM.
>
> So the conclusion here is that we should always use the mc safe variant?

The proposal is one of the following:

1/ any paths not currently using the mc safe variant should continue
not using it to avoid the performance regression on older platforms,
i.e. drop this patch.

2/ add plumbing to switch to mcsafe variant, but only on newer
platforms, incremental new patch

3/ always use the mc safe variant, keep this patch

We could go with 3/ and see who screams, because 3/ is smallest
ongoing maintenance burden. However, I feel like 1/ is the path of
least resistance until the platforms with the need to do 'careful'
copying age out of use.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_mmio: pm: Add notification handlers for restore and freeze

2021-12-13 Thread kernel test robot

Hi Mikhail,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v5.16-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Mikhail-Golubev/virtio_mmio-pm-Add-notification-handlers-for-restore-and-freeze/20211214-001636
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
136057256686de39cc3a07c2e39ef6bc43003ff6
config: m68k-randconfig-r033-20211213 
(https://download.01.org/0day-ci/archive/20211214/202112140201.xecfvzym-...@intel.com/config)
compiler: m68k-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/86cd610f19003c4d848c13e5e00e38e9bc41f54e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Mikhail-Golubev/virtio_mmio-pm-Add-notification-handlers-for-restore-and-freeze/20211214-001636
git checkout 86cd610f19003c4d848c13e5e00e38e9bc41f54e
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
O=build_dir ARCH=m68k SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/virtio/virtio_mmio.c: In function 'virtio_mmio_freeze':
>> drivers/virtio/virtio_mmio.c:770:16: error: implicit declaration of function 
>> 'virtio_device_freeze'; did you mean 'virtio_device_ready'? 
>> [-Werror=implicit-function-declaration]
 770 | return virtio_device_freeze(&vm_dev->vdev);
 |^~~~
 |virtio_device_ready
   drivers/virtio/virtio_mmio.c: In function 'virtio_mmio_restore':
>> drivers/virtio/virtio_mmio.c:778:16: error: implicit declaration of function 
>> 'virtio_device_restore'; did you mean 'virtio_mmio_restore'? 
>> [-Werror=implicit-function-declaration]
 778 | return virtio_device_restore(&vm_dev->vdev);
 |^
 |virtio_mmio_restore
   cc1: some warnings being treated as errors


vim +770 drivers/virtio/virtio_mmio.c

   764  
   765  static int __maybe_unused virtio_mmio_freeze(struct device *dev)
   766  {
   767  struct platform_device *pdev = to_platform_device(dev);
   768  struct virtio_mmio_device *vm_dev = platform_get_drvdata(pdev);
   769  
 > 770  return virtio_device_freeze(&vm_dev->vdev);
   771  }
   772  
   773  static int __maybe_unused virtio_mmio_restore(struct device *dev)
   774  {
   775  struct platform_device *pdev = to_platform_device(dev);
   776  struct virtio_mmio_device *vm_dev = platform_get_drvdata(pdev);
   777  
 > 778  return virtio_device_restore(&vm_dev->vdev);
   779  }
   780  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 1/7] vdpa: Provide interface to read driver featuresy





On 12/12/2021 5:38 AM, Eli Cohen wrote:

On Thu, Dec 09, 2021 at 02:29:17PM -0800, Si-Wei Liu wrote:


On 12/8/2021 10:47 PM, Eli Cohen wrote:

On Wed, Dec 08, 2021 at 03:57:21PM -0800, Si-Wei Liu wrote:

On 12/8/2021 12:14 PM, Eli Cohen wrote:

Provide an interface to read the negotiated features. This is needed
when building the netlink message in vdpa_dev_net_config_fill().

Also fix the implementation of vdpa_dev_net_config_fill() to use the
negotiated features instead of the device features.

Are we sure the use of device feature is a bug rather than expected
behavior?

I think so since in vdpa_dev_net_config_fill() we're returning the
current configuration so you you don't want to mislead the user with
wrong information.

You seem to imply vdpa_dev_net_config_fill() should return the
current/running driver config rather than the initial setting of the device
side before feature negotiation kicks in. This seems to work for the running
link status which is propagated to the configuration space, but how do you
infer the other untrackable running config the driver is operating, such as
mtu, the effective value of which is not written back to config space or
propagated back to vdpa backend?


I would think MTU should be configured by devlink before adding the
device and reading should also be done using devlink. This is not
supported currently and I am trying to have this supported although at
least for ConnectX 6DX it might require to add firmware support.
Thanks for keeping me posted on MTU support. But again, what you said is 
device side capability/config that is made static once vdpa is created.


What I'm concerned about is driver running config that is made by the 
guest. Such as the effective number of queues via "ethtool -L" done in 
the guest will show up in "vdpa dev show config".


However the mtu effective value for driver running config set by guest 
"ip link set mtu ..." command won't show up in "vdpa dev show config". 
This is on contrary with other vdpa config, which is why I think it'd be 
quite confusing and misleading to show them all with mixed nomenclature 
in the same place.


-Siwei



How do users determine the configured number of queue pairs at any
point in case of a non-multiqueue supporting guest/driver or that the device
is being reset (where actual_features == 0)?

I would think if you read during reset (which is really a short period
of time), you get what there is at the moment.

I would stress out the case if guest not supporting multi-queue. For e.g
guest firmware might only support single queue without control queue, which
is not considered to be transient. Since the validity of the value connects
to feature negotiation, how does the host admin user infer feature
negotiation is complete? What is the administrative value if they have to
keep track of the change without knowing the effective negotiation status?

So you want to know how many VQ pairs the device supports? I think it is
appropriate to return this information in VDPA_ATTR_DEV_MAX_VQS.
Currently the returned value is not the device capapbilty but rather the
device configured value.
Does that make sense to you?


Thanks,
-Siwei

Maybe we should differentiate
the static device config against driver/device running state here. I thought
the change to vdpa_dev_net_config_fill() in this patch would break vdpa tool
user's expectation that the value of max_vqp config is a fixed value since
the vdpa creation time.

That was not intended to be fixed AFAIU.


Perhaps worth adding another attribute to represent the running state
(cur_max_qps) based on the negotiated features.


You can get that information by running e.g. ethtool -l ens1.


-Siwei


To make APIs clearer, make the following name changes to struct
vdpa_config_ops so they better describe their operations:

get_features -> get_device_features
set_features -> set_driver_features

Finally, add get_driver_features to return the negotiated features and
add implementation to all the upstream drivers.

Signed-off-by: Eli Cohen 
---
drivers/vdpa/alibaba/eni_vdpa.c| 16 
drivers/vdpa/ifcvf/ifcvf_main.c| 16 
drivers/vdpa/mlx5/net/mlx5_vnet.c  | 16 
drivers/vdpa/vdpa.c|  2 +-
drivers/vdpa/vdpa_sim/vdpa_sim.c   | 21 +++--
drivers/vdpa/vdpa_user/vduse_dev.c | 16 
drivers/vdpa/virtio_pci/vp_vdpa.c  | 16 
drivers/vhost/vdpa.c   |  2 +-
drivers/virtio/virtio_vdpa.c   |  2 +-
include/linux/vdpa.h   | 14 +-
10 files changed, 87 insertions(+), 34 deletions(-)

diff --git a/drivers/vdpa/alibaba/eni_vdpa.c b/drivers/vdpa/alibaba/eni_vdpa.c
index 3f788794571a..ac0975660f4d 100644
--- a/drivers/vdpa/alibaba/eni_vdpa.c
+++ b/drivers/vdpa/alibaba/eni_vdpa.c
@@ -58,7 +58,7 @@ static struct virtio_pci_legacy_device *vdpa_to_ldev(struct 
vdpa_device *vdpa)
return &eni_vdpa->

Re: [GIT PULL] vhost: cleanups and fixes

2021-12-13 Thread pr-tracker-bot

The pull request you sent on Sun, 12 Dec 2021 17:59:51 -0500:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5472f14a37421d1bca3dddf33cabd3bd6dbefbbc

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 07/10] vdpa: Add support for returning device configuration information





On 12/13/2021 6:42 AM, Eli Cohen wrote:

Add netlink attribute to store flags indicating current state of the
device.
In addition, introduce a flag to indicate whether control virtqueue is
used.

This indication can be retrieved by:

vdpa dev config show vdpa-a
vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 1
mtu 1500 ctrl_vq yes

Signed-off-by: Eli Cohen 


V1 -> V2
Patch was changed to return only an indication of ctrl VQ
---
  drivers/vdpa/vdpa.c   | 17 +
  include/uapi/linux/vdpa.h |  8 
  2 files changed, 25 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 7b7bef7673b4..130a8d4aeaed 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -787,6 +787,19 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff *msg, 
struct netlink_callba
return msg->len;
  }
  
+static int vdpa_dev_net_ctrl_vq_fill(struct vdpa_device *vdev,

+struct sk_buff *msg,
+struct virtio_net_config *config,
+u64 features)
+{
+   if (!(features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
+   return 0;
+
+   /* currently the only flag can be returned */
+   return nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FLAGS,
+BIT_ULL(VDPA_DEV_ATTR_CVQ), VDPA_ATTR_PAD);
+}
+
  static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
   struct sk_buff *msg, u64 features,
   const struct virtio_net_config *config)
@@ -805,6 +818,7 @@ static int vdpa_dev_net_config_fill(struct vdpa_device 
*vdev, struct sk_buff *ms
struct virtio_net_config config = {};
u64 features;
u16 val_u16;
+   int err;
  
  	vdpa_get_config(vdev, 0, &config, sizeof(config));
  
@@ -821,6 +835,9 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms

return -EMSGSIZE;
  
  	features = vdev->config->get_driver_features(vdev);

+   err = vdpa_dev_net_ctrl_vq_fill(vdev, msg, &config, features);
+   if (err)
+   return err;
  
  	return vdpa_dev_net_mq_config_fill(vdev, msg, features, &config);

  }
diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
index a252f06f9dfd..23b854e3e5e2 100644
--- a/include/uapi/linux/vdpa.h
+++ b/include/uapi/linux/vdpa.h
@@ -20,9 +20,16 @@ enum vdpa_command {
VDPA_CMD_DEV_CONFIG_GET,/* can dump */
  };
  
+enum {

+   VDPA_DEV_ATTR_CVQ,
+};
+
  enum vdpa_attr {
VDPA_ATTR_UNSPEC,
  
+	/* Pad attribute for 64b alignment */

+   VDPA_ATTR_PAD = VDPA_ATTR_UNSPEC,
+
/* bus name (optional) + dev name together make the parent device 
handle */
VDPA_ATTR_MGMTDEV_BUS_NAME, /* string */
VDPA_ATTR_MGMTDEV_DEV_NAME, /* string */
@@ -34,6 +41,7 @@ enum vdpa_attr {
VDPA_ATTR_DEV_MAX_VQS,  /* u32 */
VDPA_ATTR_DEV_MAX_VQ_SIZE,  /* u16 */
VDPA_ATTR_DEV_MIN_VQ_SIZE,  /* u16 */
+   VDPA_ATTR_DEV_FLAGS,/* u64 */
Adding new attr here would break existing userspace that is compiled 
with the older value. Need to append to the end.


-Siwei
  
  	VDPA_ATTR_DEV_NET_CFG_MACADDR,		/* binary */

VDPA_ATTR_DEV_NET_STATUS,   /* u8 */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 09/10] vdpa: Support reporting max device virtqueues





On 12/13/2021 6:42 AM, Eli Cohen wrote:

Add max_supported_vqs field to struct vdpa_mgmt_dev. Upstream drivers
need to feel this value according to the device capabilities.

This value is reported back in a netlink message when showing a device.

Example:

$ vdpa dev show
vdpa-a: type network mgmtdev auxiliary/mlx5_core.sf.1 vendor_id  \
max_vqp 3 max_vq_size 256 max_supported_vqs 256

Signed-off-by: Eli Cohen 
---
  drivers/vdpa/vdpa.c   | 2 ++
  include/linux/vdpa.h  | 1 +
  include/uapi/linux/vdpa.h | 1 +
  3 files changed, 4 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 130a8d4aeaed..b9dd693146be 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -695,6 +695,8 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff 
*msg, u32 portid, u32 seq
goto msg_err;
if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
goto msg_err;
+   if (nla_put_u16(msg, VDPA_ATTR_DEV_MAX_DEV_VQS, 
vdev->mdev->max_supported_vqs))
+   goto msg_err;
  
  	genlmsg_end(msg, hdr);

return 0;
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 9245dfbf1b08..72ea8ad7ba21 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -444,6 +444,7 @@ struct vdpa_mgmt_dev {
const struct vdpa_mgmtdev_ops *ops;
const struct virtio_device_id *id_table;
u64 config_attr_mask;
+   u16 max_supported_vqs;
struct list_head list;
  };
  
diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h

index 23b854e3e5e2..1b758d77e228 100644
--- a/include/uapi/linux/vdpa.h
+++ b/include/uapi/linux/vdpa.h
@@ -42,6 +42,7 @@ enum vdpa_attr {
VDPA_ATTR_DEV_MAX_VQ_SIZE,  /* u16 */
VDPA_ATTR_DEV_MIN_VQ_SIZE,  /* u16 */
VDPA_ATTR_DEV_FLAGS,/* u64 */
+   VDPA_ATTR_DEV_MAX_DEV_VQS,  /* u16 */


Adding new attr here would break existing userspace that is compiled 
with the older value. Need to append to the end.


-Siwei
  
  	VDPA_ATTR_DEV_NET_CFG_MACADDR,		/* binary */

VDPA_ATTR_DEV_NET_STATUS,   /* u8 */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 05/10] vdpa/mlx5: Fix config_attr_mask assignment





On 12/13/2021 6:42 AM, Eli Cohen wrote:

Fix VDPA_ATTR_DEV_NET_CFG_MACADDR assignment to be explicit 64 bit
assignment.

No issue was seen since the value is well below 64 bit max value.
Nevertheless it needs to be fixed.

Fixes: a007d940040c ("vdpa/mlx5: Support configuration of MAC")
Acked-by: Jason Wang 

Reviewed-by: Si-Wei Liu 

Signed-off-by: Eli Cohen 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 98aed4b36c3f..4a2149f70f1e 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2669,7 +2669,7 @@ static int mlx5v_probe(struct auxiliary_device *adev,
mgtdev->mgtdev.ops = &mdev_ops;
mgtdev->mgtdev.device = mdev->device;
mgtdev->mgtdev.id_table = id_table;
-   mgtdev->mgtdev.config_attr_mask = (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR);
+   mgtdev->mgtdev.config_attr_mask = 
BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR);
mgtdev->madev = madev;
  
  	err = vdpa_mgmtdev_register(&mgtdev->mgtdev);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_mmio: pm: Add notification handlers for restore and freeze

On Mon, Dec 13, 2021 at 05:00:08PM +0100, Mikhail Golubev wrote:
> From: Yurii Danilovskyi 
> 
> Handle restore and freeze notifications from the PM core. Expose
> these to individual virtio drivers that can quiesce and resume vq
> operations.
> 
> Signed-off-by: Yurii Danilovskyi 
> Signed-off-by: Mikhail Golubev 
> ---
>  drivers/virtio/virtio_mmio.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 56128b9c46eb..af79f8c2a2f2 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -762,6 +762,24 @@ static void vm_unregister_cmdline_devices(void)
>  
>  #endif
>  
> +static int __maybe_unused virtio_mmio_freeze(struct device *dev)
> +{
> + struct platform_device *pdev = to_platform_device(dev);
> + struct virtio_mmio_device *vm_dev = platform_get_drvdata(pdev);
> +
> + return virtio_device_freeze(&vm_dev->vdev);
> +}
> +
> +static int __maybe_unused virtio_mmio_restore(struct device *dev)
> +{
> + struct platform_device *pdev = to_platform_device(dev);
> + struct virtio_mmio_device *vm_dev = platform_get_drvdata(pdev);
> +
> + return virtio_device_restore(&vm_dev->vdev);
> +}
> +
> +static SIMPLE_DEV_PM_OPS(virtio_mmio_pm_ops, virtio_mmio_freeze, 
> virtio_mmio_restore);
> +
>  /* Platform driver */
>  
>  static const struct of_device_id virtio_mmio_match[] = {
> @@ -785,6 +803,7 @@ static struct platform_driver virtio_mmio_driver = {
>   .name   = "virtio-mmio",
>   .of_match_table = virtio_mmio_match,
>   .acpi_match_table = ACPI_PTR(virtio_mmio_acpi_match),
> + .pm = &virtio_mmio_pm_ops,
>   },
>  };

All this code needs to be within ifdef CONFIG_PM_SLEEP.  Alternatively,
tweak virtio.h to expose virtio_device_freeze/virtio_device_restore
unconditionally.


>  
> -- 
> 2.34.1
> 
> 
> -- 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] Bluetooth: virtio_bt: fix device removal

On Mon, Dec 13, 2021 at 05:44:13AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 09, 2021 at 04:22:58PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 25, 2021 at 09:02:01PM +0100, Marcel Holtmann wrote:
> > > Hi Michael,
> > > 
> > > > Device removal is clearly out of virtio spec: it attempts to remove
> > > > unused buffers from a VQ before invoking device reset. To fix, make
> > > > open/close NOPs and do all cleanup/setup in probe/remove.
> > > 
> > > so the virtbt_{open,close} as NOP is not really what a driver is suppose
> > > to be doing. These are transport enable/disable callbacks from the BT
> > > Core towards the driver. It maps to a device being enabled/disabled by
> > > something like bluetoothd for example. So if disabled, I expect that no
> > > resources/queues are in use.
> > > 
> > > Maybe I misunderstand the virtio spec in that regard, but I would like
> > > to keep this fundamental concept of a Bluetooth driver. It does work
> > > with all other transports like USB, SDIO, UART etc.
> > > 
> > > > The cost here is a single skb wasted on an unused bt device - which
> > > > seems modest.
> > > 
> > > There should be no buffer used if the device is powered off. We also don’t
> > > have any USB URBs in-flight if the transport is not active.
> > > 
> > > > NB: with this fix in place driver still suffers from a race condition if
> > > > an interrupt triggers while device is being reset. Work on a fix for
> > > > that issue is in progress.
> > > 
> > > In the virtbt_close() callback we should deactivate all interrupts.
> > > 
> > > Regards
> > > 
> > > Marcel
> > 
> > So Marcel, do I read it right that you are working on a fix
> > and I can drop this patch for now?
> 
> ping


If I don't hear otherwise I'll queue my version - it might not
be ideal but it at least does not violate the spec.
We can work on not allocating/freeing buffers later
as appropriate.

> > -- 
> > MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 04/10] vdpa: Allow to configure max data virtqueues





On 12/13/2021 6:42 AM, Eli Cohen wrote:

Add netlink support to configure the max virtqueue pairs for a device.
At least one pair is required. The maximum is dictated by the device.

Signed-off-by: Eli Cohen 
---
  drivers/vdpa/vdpa.c  | 14 +-
  include/linux/vdpa.h |  1 +
  2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 5749cf0a1500..7b7bef7673b4 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -555,7 +555,8 @@ vdpa_nl_cmd_mgmtdev_get_dumpit(struct sk_buff *msg, struct 
netlink_callback *cb)
  }
  
  #define VDPA_DEV_NET_ATTRS_MASK ((1 << VDPA_ATTR_DEV_NET_CFG_MACADDR) | \

-(1 << VDPA_ATTR_DEV_NET_CFG_MTU))
+(1 << VDPA_ATTR_DEV_NET_CFG_MTU) | \
+(1 << VDPA_ATTR_DEV_NET_CFG_MAX_VQP))
  
  static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *info)

  {
@@ -581,6 +582,17 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff 
*skb, struct genl_info *i
nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU]);
config.mask |= (1 << VDPA_ATTR_DEV_NET_CFG_MTU);
}
+   if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]) {
+   config.net.max_vq_pairs =
+   nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]);
+   if (!config.net.max_vq_pairs) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "At least one pair of VQs is 
required");
+   err = -EINVAL;
+   goto err;

Should directly return -EINVAL, vdpa_dev_mutex is not taken.


+   }
+   config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
+   }
  
  	/* Skip checking capability if user didn't prefer to configure any

 * device networking attributes. It is likely that user might have used
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 9cc4291a79b3..9245dfbf1b08 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -101,6 +101,7 @@ struct vdpa_dev_set_config {
struct {
u8 mac[ETH_ALEN];
u16 mtu;
+   u16 max_vq_pairs;
} net;
u64 mask;
  };


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 3/7] vdpa: Allow to configure max data virtqueues




On 12/13/2021 5:07 AM, Eli Cohen wrote:

On Mon, Dec 13, 2021 at 08:44:53AM +0200, Eli Cohen wrote:

On Fri, Dec 10, 2021 at 10:29:43AM +0800, Jason Wang wrote:

On Fri, Dec 10, 2021 at 5:51 AM Si-Wei Liu  wrote:



On 12/8/2021 9:36 PM, Jason Wang wrote:

On Thu, Dec 9, 2021 at 8:25 AM Si-Wei Liu  wrote:


On 12/8/2021 12:14 PM, Eli Cohen wrote:

Add netlink support to configure the max virtqueue pairs for a device.
At least one pair is required. The maximum is dictated by the device.

Example:

$ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 5

Not this patch, but I think there should be a mega attribute defined
ahead to specify the virtio class/type to create vdpa instance with.
Like the existing ones e.g. mac_addr and mtu, max_vqp is also vdpa net
specific attribute.

Probably, but this only works for the case when a single parent is
allowed to create different types of devices. It looks to me the
current model to have a per type parent.

Yes, that is the current situation and implementation, although the
model does allow multi-type parent through
VDPA_ATTR_MGMTDEV_SUPPORTED_CLASSES.

Right, so I agree with you, we need to specify the class first.


Or maybe fail device creation if the parent device does not support net
class.

BTW, we already have the mechanism in place to enforce that. If a device
does not support configuration of max_vqp, it will not set
VDPA_ATTR_DEV_NET_CFG_MACADDR in its config_attr_mask so you will not be
able add it if you attempt to specify max_vqp upon creation.
Yes, but that is another level of validation down to the specific class 
of vdpa config. It should be completely fine for user not to specify 
max_vqp or mac address when vdpa is created. Though they may not create 
the expected vdpa class as they wish in the first place if the class 
validation is missing.


Having said, I guess it would be great if users who want to create vdpa 
can get known of the parent's supported class and capability ahead 
through some mgmtdev command, like what I suggested earlier:


$ vdpa mgmtdev capab show
pci/:41:00.2:
  class: net
 mac: off
 mtu: on
 max_mtu: 9000
 max_vqp: 1
auxiliary/mlx5_core.sf.1:
  class: net
 mac: on
 mtu: off
 max_mtu: 1500
 max_vqp: 256


Regards,
-Siwei




You check first the supported class with "vdpa mgmtdev show" and if net
is supported you may use max_vqp.


Regardless, even though with
single-type parent, so far on vdpa creation there's no validation done
yet against which class/type the parent can support. The max-vqp is
essentially vdpa network device only, but command line users don't have
a means to infer it is structured this way, and/or which vdpa parent may
support this config attribute. That said, were the command line usage
structured like below, I would have less worry.

$ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 net.max_vqp 5

It might be easier to specify class as a dedicated parameter, since we
may want to specify mtu and mac.

Thanks


Alternatively, if the goal is to keep the attribute flat, we may show
the mapping for the parent capability and the supported class:

$ vdpa mgmtdev capab show
pci/:41:00.2:
class: net
   mac: off
   mtu: on
   max_mtu: 9000
   max_vqp: 1
auxiliary/mlx5_core.sf.1:
class: net
   mac: on
   mtu: off
   max_mtu: 1500
   max_vqp: 256

Thanks,
-Siwei


Thanks


-Siwei


Signed-off-by: Eli Cohen 
---
v0 -> v1:
1. fix typo
2. move max_vq_pairs to net specific struct

drivers/vdpa/vdpa.c  | 14 +-
include/linux/vdpa.h |  1 +
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index c37d384c0f33..3bf016e03512 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -480,7 +480,8 @@ vdpa_nl_cmd_mgmtdev_get_dumpit(struct sk_buff *msg, struct 
netlink_callback *cb)
}

#define VDPA_DEV_NET_ATTRS_MASK ((1 << VDPA_ATTR_DEV_NET_CFG_MACADDR) | \
-  (1 << VDPA_ATTR_DEV_NET_CFG_MTU))
+  (1 << VDPA_ATTR_DEV_NET_CFG_MTU) | \
+  (1 << VDPA_ATTR_DEV_NET_CFG_MAX_VQP))

static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct 
genl_info *info)
{
@@ -506,6 +507,17 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff 
*skb, struct genl_info *i
nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU]);
config.mask |= (1 << VDPA_ATTR_DEV_NET_CFG_MTU);
}
+ if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]) {
+ config.net.max_vq_pairs =
+ nla_get_u16(nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP]);
+ if (!config.net.max_vq_pairs) {
+ NL_SET_ERR_MSG_MOD(info->extack,
+"At least one pair of VQs is 
required");
+ err = -EINVAL;
+ goto err;
+ }
+

Re: [PATCH v2 09/10] vdpa: Support reporting max device virtqueues





On 12/13/2021 6:42 AM, Eli Cohen wrote:

Add max_supported_vqs field to struct vdpa_mgmt_dev. Upstream drivers
need to feel this value according to the device capabilities.

This value is reported back in a netlink message when showing a device.

Example:

$ vdpa dev show
vdpa-a: type network mgmtdev auxiliary/mlx5_core.sf.1 vendor_id  \
max_vqp 3 max_vq_size 256 max_supported_vqs 256

Signed-off-by: Eli Cohen 
---
  drivers/vdpa/vdpa.c   | 2 ++
  include/linux/vdpa.h  | 1 +
  include/uapi/linux/vdpa.h | 1 +
  3 files changed, 4 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 130a8d4aeaed..b9dd693146be 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -695,6 +695,8 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff 
*msg, u32 portid, u32 seq
goto msg_err;
if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
goto msg_err;
+   if (nla_put_u16(msg, VDPA_ATTR_DEV_MAX_DEV_VQS, 
vdev->mdev->max_supported_vqs))
+   goto msg_err;
What is the default value if vendor driver doesn't expose this value? 
And it doesn't seem this is something to stash on struct vdpa_mgmt_dev, 
only some vdpa vendor for network drive may need to expose it; if 
otherwise not exposed we can assume it's 32767 by the spec. A generic 
vdpa op (get_device_capability?) to query device capability might be 
better I guess (define a VDPA_CAPAB_NET_MAX_VQS subtype to get it).


-Siwei

  
  	genlmsg_end(msg, hdr);

return 0;
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 9245dfbf1b08..72ea8ad7ba21 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -444,6 +444,7 @@ struct vdpa_mgmt_dev {
const struct vdpa_mgmtdev_ops *ops;
const struct virtio_device_id *id_table;
u64 config_attr_mask;
+   u16 max_supported_vqs;
struct list_head list;
  };
  
diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h

index 23b854e3e5e2..1b758d77e228 100644
--- a/include/uapi/linux/vdpa.h
+++ b/include/uapi/linux/vdpa.h
@@ -42,6 +42,7 @@ enum vdpa_attr {
VDPA_ATTR_DEV_MAX_VQ_SIZE,  /* u16 */
VDPA_ATTR_DEV_MIN_VQ_SIZE,  /* u16 */
VDPA_ATTR_DEV_FLAGS,/* u64 */
+   VDPA_ATTR_DEV_MAX_DEV_VQS,  /* u16 */
  
  	VDPA_ATTR_DEV_NET_CFG_MACADDR,		/* binary */

VDPA_ATTR_DEV_NET_STATUS,   /* u8 */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On 12/12/2021 7:02 PM, Jason Wang wrote:

On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin wrote:

On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:

Sorry for reviving this ancient thread. I was kinda lost for the conclusion
it ended up with. I have the following questions,

1. legacy guest support: from the past conversations it doesn't seem the
support will be completely dropped from the table, is my understanding
correct? Actually we're interested in supporting virtio v0.95 guest for x86,
which is backed by the spec at
https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!f64RqPFbYWWTGBgfWLzjlpJR_89WgX9KQTTz2vd-9UvMufMzqEbsajs8dxSfg0G8$
. Though I'm not sure
if there's request/need to support wilder legacy virtio versions earlier
beyond.

I personally feel it's less work to add in kernel than try to
work around it in userspace. Jason feels differently.
Maybe post the patches and this will prove to Jason it's not
too terrible?

That's one way, other than the config access before setting features,
we need to deal with other stuffs:

1) VIRTIO_F_ORDER_PLATFORM
2) there could be a parent device that only support 1.0 device
We do want to involve vendor's support for a legacy (or transitional)
device datapath. Otherwise it'd be too difficult to emulate/translate in
software/QEMU. The above two might not be an issue if the vendor claims
0.95 support in virtqueue and ring layout, plus limiting to x86 support
(LE with weak ordering) seems to simplify a lot of these requirements. I
don't think emulating a legacy device model on top of a 1.0 vdpa parent
for the dataplane would be a good idea, either.

And a lot of other stuff summarized in spec 7.4 which seems not an
easy task. Various vDPA parent drivers were written under the
assumption that only modern devices are supported.
If some of these vDPA vendors do provide the 0.95 support, especially on
the datapath and ring layout that well satisfies a transitional device
model defined in section 7.4, I guess we can scope the initial support
to these vendor drivers and x86 only. Let me know if I miss something else.

Thanks,
-Siwei

Thanks

2. suppose some form of legacy guest support needs to be there, how do we
deal with the bogus assumption below in vdpa_get_config() in the short term?
It looks one of the intuitive fix is to move the vdpa_set_features call out
of vdpa_get_config() to vdpa_set_config().

/*
* Config accesses aren't supposed to trigger before features are
set.
* If it does happen we assume a legacy guest.
*/
if (!vdev->features_valid)
vdpa_set_features(vdev, 0);
ops->get_config(vdev, offset, buf, len);

I can post a patch to fix 2) if there's consensus already reached.

Thanks,
-Siwei

I'm not sure how important it is to change that.
In any case it only affects transitional devices, right?
Legacy only should not care ...

On 3/2/2021 2:53 AM, Jason Wang wrote:

On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:

On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:

On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:

On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:

Detecting it isn't enough though, we will need a new ioctl to notify
the kernel that it's a legacy guest. Ugh :(

Well, although I think adding an ioctl is doable, may I
know what the use
case there will be for kernel to leverage such info
directly? Is there a
case QEMU can't do with dedicate ioctls later if there's indeed
differentiation (legacy v.s. modern) needed?

BTW a good API could be

#define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
#define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)

we did it per vring but maybe that was a mistake ...

Actually, I wonder whether it's good time to just not support
legacy driver
for vDPA. Consider:

1) It's definition is no-normative
2) A lot of budren of codes

So qemu can still present the legacy device since the config
space or other
stuffs that is presented by vhost-vDPA is not expected to be
accessed by
guest directly. Qemu can do the endian conversion when necessary
in this
case?

Thanks

Overall I would be fine with this approach but we need to avoid breaking
working userspace, qemu releases with vdpa support are out there and
seem to work for people. Any changes need to take that into account
and document compatibility concerns.

Agree, let me check.

I note that any hardware
implementation is already broken for legacy except on platforms with
strong ordering which might be helpful in reducing the scope.

Yes.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:

On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:

Sorry for reviving this ancient thread. I was kinda lost for the conclusion
it ended up with. I have the following questions,

1. legacy guest support: from the past conversations it doesn't seem the
support will be completely dropped from the table, is my understanding
correct? Actually we're interested in supporting virtio v0.95 guest for x86,
which is backed by the spec at
https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$
. Though I'm not sure
if there's request/need to support wilder legacy virtio versions earlier
beyond.

I personally feel it's less work to add in kernel than try to
work around it in userspace. Jason feels differently.
Maybe post the patches and this will prove to Jason it's not
too terrible?
I suppose if the vdpa vendor does support 0.95 in the datapath and ring
layout level and is limited to x86 only, there should be easy way out. I
checked with Eli and other Mellanox/NVDIA folks for hardware/firmware
level 0.95 support, it seems all the ingredient had been there already
dated back to the DPDK days. The only major thing limiting is in the
vDPA software that the current vdpa core has the assumption around
VIRTIO_F_ACCESS_PLATFORM for a few DMA setup ops, which is virtio 1.0 only.

I can post a patch to fix 2) if there's consensus already reached.

Thanks,
-Siwei

I'm not sure how important it is to change that.
In any case it only affects transitional devices, right?
Legacy only should not care ...
Yes I'd like to distinguish legacy driver (suppose it is 0.95) against
the modern one in a transitional device model rather than being legacy
only. That way a v0.95 and v1.0 supporting vdpa parent can support both
types of guests without having to reconfigure. Or are you suggesting
limit to legacy only at the time of vdpa creation would simplify the
implementation a lot?

Thanks,
-Siwei

On 3/2/2021 2:53 AM, Jason Wang wrote:

On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:

On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:

On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:

On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:

Detecting it isn't enough though, we will need a new ioctl to notify
the kernel that it's a legacy guest. Ugh :(

BTW a good API could be

#define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
#define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)

we did it per vring but maybe that was a mistake ...

Actually, I wonder whether it's good time to just not support
legacy driver
for vDPA. Consider:

1) It's definition is no-normative
2) A lot of budren of codes

Thanks

Agree, let me check.

I note that any hardware
implementation is already broken for legacy except on platforms with
strong ordering which might be helpful in reducing the scope.

Yes.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Tue, Dec 14, 2021 at 10:00 AM Si-Wei Liu  wrote:
>
>
>
> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> >> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> >> it ended up with. I have the following questions,
> >>
> >> 1. legacy guest support: from the past conversations it doesn't seem the
> >> support will be completely dropped from the table, is my understanding
> >> correct? Actually we're interested in supporting virtio v0.95 guest for 
> >> x86,
> >> which is backed by the spec at
> >> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$
> >>  . Though I'm not sure
> >> if there's request/need to support wilder legacy virtio versions earlier
> >> beyond.
> > I personally feel it's less work to add in kernel than try to
> > work around it in userspace. Jason feels differently.
> > Maybe post the patches and this will prove to Jason it's not
> > too terrible?
> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> layout level and is limited to x86 only, there should be easy way out.

Note that thought I try to mandate 1.0 device when writing the codes
but the core vdpa doesn't mandate it, and we've already had one parent
which is based on the 0.95 spec which is the eni_vdpa:

1) it depends on X86 (so no endian and ordering issues)
2) it has various subtle things like it can't work well without
mrg_rxbuf features negotiated since the device assumes a fixed vnet
header length.
3) it can only be used by legacy drivers in the guest (no VERSION_1
since the device mandates a 4096 alignment which doesn't comply with
1.0)

So it's a proof of 0.95 parent support in the vDPA core.

And we had a modern only parent, that is the vp_vdpa parent (though
it's not hard to add legacy support).

So for all the other vendors, assuming it has full support for
transitional devices for x86. As discussed, we need to handle:

1) config access before features
2) kick before driver_ok

Anything else? If not, it looks easier to do them in the userspace.
The only advantages for doing it in the kernel is to make it work for
virtio-vdpa. But virito-vdpa doesn't need transitional devices.

> I
> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware
> level 0.95 support, it seems all the ingredient had been there already
> dated back to the DPDK days. The only major thing limiting is in the
> vDPA software that the current vdpa core has the assumption around
> VIRTIO_F_ACCESS_PLATFORM for a few DMA setup ops, which is virtio 1.0 only.

The code doesn't have such an assumption or anything I missed? Or you
meant the vhost-vdpa that tries to talk with the IOMMU layer directly,
it should be ok since host IOMMU is hidden from guest anyway.

>
> >
> >> 2. suppose some form of legacy guest support needs to be there, how do we
> >> deal with the bogus assumption below in vdpa_get_config() in the short 
> >> term?
> >> It looks one of the intuitive fix is to move the vdpa_set_features call out
> >> of vdpa_get_config() to vdpa_set_config().
> >>
> >>  /*
> >>   * Config accesses aren't supposed to trigger before features are
> >> set.
> >>   * If it does happen we assume a legacy guest.
> >>   */
> >>  if (!vdev->features_valid)
> >>  vdpa_set_features(vdev, 0);
> >>  ops->get_config(vdev, offset, buf, len);
> >>
> >> I can post a patch to fix 2) if there's consensus already reached.
> >>
> >> Thanks,
> >> -Siwei
> > I'm not sure how important it is to change that.
> > In any case it only affects transitional devices, right?
> > Legacy only should not care ...
> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against
> the modern one in a transitional device model rather than being legacy
> only. That way a v0.95 and v1.0 supporting vdpa parent can support both
> types of guests without having to reconfigure.

I think this is what a transitional device is expected to work.

Thanks

> Or are you suggesting
> limit to legacy only at the time of vdpa creation would simplify the
> implementation a lot?
>
> Thanks,
> -Siwei
>
> >
> >> On 3/2/2021 2:53 AM, Jason Wang wrote:
> >>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>  On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> >> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>  Detecting it isn't enough though, we will need a new ioctl to notify
>  the kernel that it's a legacy guest. Ugh :(
> >>> Well, although I think adding an ioctl is doable, may I
> >>> know what the use
> >>> case there will be for kernel to leverage such info
> >>> directly? Is there a
> >>> case QEMU can't do with dedicate ioctls later if there's indeed
> >>> differenti

Re: [PATCH] virtio-net: make copy len check in xdp_linearize_page

On Mon, Dec 13, 2021 at 5:14 PM 孙蒙恩  wrote:
>
> Jason Wang  于2021年12月13日周一 15:49写道：
> >
> > On Mon, Dec 13, 2021 at 12:50 PM  wrote:
> > >
> > > From: mengensun 
> > >
> > > xdp_linearize_page asume ring elem size is smaller then page size
> > > when copy the first ring elem, but, there may be a elem size bigger
> > > then page size.
> > >
> > > add_recvbuf_mergeable may add a hole to ring elem, the hole size is
> > > not sure, according EWMA.
> >
> > The logic is to try to avoid dropping packets in this case, so I
> > wonder if it's better to "fix" the add_recvbuf_mergeable().
>

Adding lists back.

> turn to XDP generic is so difficulty for me, here can "fix" the
> add_recvbuf_mergeable link follow:
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 36a4b7c195d5..06ce8bb10b47 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1315,6 +1315,7 @@ static int add_recvbuf_mergeable(struct virtnet_info 
> *vi,
> alloc_frag->offset += hole;
> }
> +   len = min(len, PAGE_SIZE - room);
> sg_init_one(rq->sg, buf, len);
> ctx = mergeable_len_to_ctx(len, headroom);

Then the truesize here is wrong.

> err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
>
> it seems a rule that, length of elem giving to vring is away smaller
> or equall then PAGE_SIZE

It aims to be consistent to what EWMA tries to do:

len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
rq->min_buf_len, PAGE_SIZE - hdr_len);

Thanks

>
> >
> > Or another idea is to switch to use XDP generic here where we can use
> > skb_linearize() which should be more robust and we can drop the
> > xdp_linearize_page() logic completely.
> >
> > Thanks
> >
> > >
> > > so, fix it by check copy len,if checked failed, just dropped the
> > > whole frame, not make the memory dirty after the page.
> > >
> > > Signed-off-by: mengensun 
> > > Reviewed-by: MengLong Dong 
> > > Reviewed-by: ZhengXiong Jiang 
> > > ---
> > >  drivers/net/virtio_net.c | 6 +-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 36a4b7c195d5..844bdbd67ff7 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -662,8 +662,12 @@ static struct page *xdp_linearize_page(struct 
> > > receive_queue *rq,
> > >int page_off,
> > >unsigned int *len)
> > >  {
> > > -   struct page *page = alloc_page(GFP_ATOMIC);
> > > +   struct page *page;
> > >
> > > +   if (*len > PAGE_SIZE - page_off)
> > > +   return NULL;
> > > +
> > > +   page = alloc_page(GFP_ATOMIC);
> > > if (!page)
> > > return NULL;
> > >
> > > --
> > > 2.27.0
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 00/10] Allow for configuring max number of virtqueue pairs

On Mon, Dec 13, 2021 at 10:43 PM Eli Cohen  wrote:
>
> Allow the user to configure the max number of virtqueue pairs for a vdpa
> instance. The user can then control the actual number of virtqueue pairs
> using ethtool.
>
> Example, set number of VQPs to 2:
> $ ethtool -L ens1 combined 2
>
> A user can check the max supported virtqueues for a management device by
> runnig:
>
> vdpa dev show
> vdpa-a: type network mgmtdev auxiliary/mlx5_core.sf.1 vendor_id  \
> max_vqp 3 max_vq_size 256 max_supported_vqs 256

I think the maxsupported_vqs should be an odd number since it should
contain control vq.

>
> and refer to this value when adding a device.
>
> To create a device with a max of 5 VQPs:
> vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 5

A question, consider the parent support both net and block, if user use

vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1

How do we know it's a net or block device? Or do you expect some block
specific attributes to be added?

Thanks

>
> V1 -> V2:
> 1. Do not return the index of the control VQ. Instead return an
>indication if Ctrl VQ was negotiated.
> 2. Do not returen conig information if FEATURES_OK is not set to avoid
>reporting out of sync information.
> 3. Minor fixes as described by the individual patches
> 4. Add patches to return the max virtqueues a device is capable of
>supporting.
>
> Eli Cohen (10):
>   vdpa: Provide interface to read driver features
>   vdpa/mlx5: Distribute RX virtqueues in RQT object
>   vdpa: Read device configuration only if FEATURES_OK
>   vdpa: Allow to configure max data virtqueues
>   vdpa/mlx5: Fix config_attr_mask assignment
>   vdpa/mlx5: Support configuring max data virtqueue pairs
>   vdpa: Add support for returning device configuration information
>   vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps()
>   vdpa: Support reporting max device virtqueues
>   vdpa/mlx5: Configure max supported virtqueues
>
>  drivers/vdpa/alibaba/eni_vdpa.c| 16 +++--
>  drivers/vdpa/ifcvf/ifcvf_main.c| 16 +++--
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 94 +-
>  drivers/vdpa/vdpa.c| 42 -
>  drivers/vdpa/vdpa_sim/vdpa_sim.c   | 21 +--
>  drivers/vdpa/vdpa_user/vduse_dev.c | 16 +++--
>  drivers/vdpa/virtio_pci/vp_vdpa.c  | 16 +++--
>  drivers/vhost/vdpa.c   |  2 +-
>  drivers/virtio/virtio_vdpa.c   |  2 +-
>  include/linux/vdpa.h   | 16 +++--
>  include/uapi/linux/vdpa.h  |  9 +++
>  11 files changed, 178 insertions(+), 72 deletions(-)
>
> --
> 2.33.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 01/10] vdpa: Provide interface to read driver features

On Mon, Dec 13, 2021 at 10:43 PM Eli Cohen  wrote:
>
> Provide an interface to read the negotiated features. This is needed
> when building the netlink message in vdpa_dev_net_config_fill().
>
> Also fix the implementation of vdpa_dev_net_config_fill() to use the
> negotiated features instead of the device features.
>
> To make APIs clearer, make the following name changes to struct
> vdpa_config_ops so they better describe their operations:
>
> get_features -> get_device_features
> set_features -> set_driver_features
>
> Finally, add get_driver_features to return the negotiated features and
> add implementation to all the upstream drivers.
>
> Signed-off-by: Eli Cohen 
> -
> v1 -> v2:
> 1. Fix to call vp_modern_get_driver_features() in
>vp_vdpa_get_driver_features()
> 2. Improve calback function description
> ---

Acked-by: Jason Wang 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 03/10] vdpa: Read device configuration only if FEATURES_OK

On Mon, Dec 13, 2021 at 10:43 PM Eli Cohen  wrote:
>
> Avoid reading device configuration during feature negotiation. Read
> device status and verify that VIRTIO_CONFIG_S_FEATURES_OK is set.
> Otherwise, return -EAGAIN.
>
> Signed-off-by: Eli Cohen 



> ---
>  drivers/vdpa/vdpa.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index 42d71d60d5dc..5749cf0a1500 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -819,8 +819,15 @@ vdpa_dev_config_fill(struct vdpa_device *vdev, struct 
> sk_buff *msg, u32 portid,
>  {
> u32 device_id;
> void *hdr;
> +   u8 status;
> int err;
>
> +   status = vdev->config->get_status(vdev);
> +   if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
> +   NL_SET_ERR_MSG_MOD(extack, "Features negotiation not 
> completed");
> +   return -EAGAIN;
> +   }
> +
> hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, flags,
>   VDPA_CMD_DEV_CONFIG_GET);
> if (!hdr)

It looks to me we need to synchronize this with set_status() and
cf_mutex (set_config).

Or simply return all the config space (the maximum set of features)

Thanks

> --
> 2.33.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 07/10] vdpa: Add support for returning device configuration information

On Mon, Dec 13, 2021 at 10:43 PM Eli Cohen  wrote:
>
> Add netlink attribute to store flags indicating current state of the
> device.
> In addition, introduce a flag to indicate whether control virtqueue is
> used.
>
> This indication can be retrieved by:
>
> vdpa dev config show vdpa-a
> vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 1
> mtu 1500 ctrl_vq yes

I think the cvq is kind of duplicated with the driver features?

Thanks

>
> Signed-off-by: Eli Cohen 
>
> 
> V1 -> V2
> Patch was changed to return only an indication of ctrl VQ
> ---
>  drivers/vdpa/vdpa.c   | 17 +
>  include/uapi/linux/vdpa.h |  8 
>  2 files changed, 25 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index 7b7bef7673b4..130a8d4aeaed 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -787,6 +787,19 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff 
> *msg, struct netlink_callba
> return msg->len;
>  }
>
> +static int vdpa_dev_net_ctrl_vq_fill(struct vdpa_device *vdev,
> +struct sk_buff *msg,
> +struct virtio_net_config *config,
> +u64 features)
> +{
> +   if (!(features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
> +   return 0;
> +
> +   /* currently the only flag can be returned */
> +   return nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FLAGS,
> +BIT_ULL(VDPA_DEV_ATTR_CVQ), VDPA_ATTR_PAD);
> +}
> +
>  static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
>struct sk_buff *msg, u64 features,
>const struct virtio_net_config *config)
> @@ -805,6 +818,7 @@ static int vdpa_dev_net_config_fill(struct vdpa_device 
> *vdev, struct sk_buff *ms
> struct virtio_net_config config = {};
> u64 features;
> u16 val_u16;
> +   int err;
>
> vdpa_get_config(vdev, 0, &config, sizeof(config));
>
> @@ -821,6 +835,9 @@ static int vdpa_dev_net_config_fill(struct vdpa_device 
> *vdev, struct sk_buff *ms
> return -EMSGSIZE;
>
> features = vdev->config->get_driver_features(vdev);
> +   err = vdpa_dev_net_ctrl_vq_fill(vdev, msg, &config, features);
> +   if (err)
> +   return err;
>
> return vdpa_dev_net_mq_config_fill(vdev, msg, features, &config);
>  }
> diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
> index a252f06f9dfd..23b854e3e5e2 100644
> --- a/include/uapi/linux/vdpa.h
> +++ b/include/uapi/linux/vdpa.h
> @@ -20,9 +20,16 @@ enum vdpa_command {
> VDPA_CMD_DEV_CONFIG_GET,/* can dump */
>  };
>
> +enum {
> +   VDPA_DEV_ATTR_CVQ,
> +};
> +
>  enum vdpa_attr {
> VDPA_ATTR_UNSPEC,
>
> +   /* Pad attribute for 64b alignment */
> +   VDPA_ATTR_PAD = VDPA_ATTR_UNSPEC,
> +
> /* bus name (optional) + dev name together make the parent device 
> handle */
> VDPA_ATTR_MGMTDEV_BUS_NAME, /* string */
> VDPA_ATTR_MGMTDEV_DEV_NAME, /* string */
> @@ -34,6 +41,7 @@ enum vdpa_attr {
> VDPA_ATTR_DEV_MAX_VQS,  /* u32 */
> VDPA_ATTR_DEV_MAX_VQ_SIZE,  /* u16 */
> VDPA_ATTR_DEV_MIN_VQ_SIZE,  /* u16 */
> +   VDPA_ATTR_DEV_FLAGS,/* u64 */
>
> VDPA_ATTR_DEV_NET_CFG_MACADDR,  /* binary */
> VDPA_ATTR_DEV_NET_STATUS,   /* u8 */
> --
> 2.33.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v2 00/10] Allow for configuring max number of virtqueue pairs

2021-12-13 Thread Parav Pandit via Virtualization



> From: Jason Wang 
> Sent: Tuesday, December 14, 2021 8:47 AM
> 
> A question, consider the parent support both net and block, if user use
> 
> vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1
> 
> How do we know it's a net or block device? Or do you expect some block
> specific attributes to be added?
User should pass the virtio device id of type net/blk etc when creating the 
device.
No such in-kernel driver exists so far, so when it is added netlink interface 
should be enhanced to accept the optional device id.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> 
> 
> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > Sorry for reviving this ancient thread. I was kinda lost for the 
> > > conclusion
> > > it ended up with. I have the following questions,
> > > 
> > > 1. legacy guest support: from the past conversations it doesn't seem the
> > > support will be completely dropped from the table, is my understanding
> > > correct? Actually we're interested in supporting virtio v0.95 guest for 
> > > x86,
> > > which is backed by the spec at
> > > https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$
> > >  . Though I'm not sure
> > > if there's request/need to support wilder legacy virtio versions earlier
> > > beyond.
> > I personally feel it's less work to add in kernel than try to
> > work around it in userspace. Jason feels differently.
> > Maybe post the patches and this will prove to Jason it's not
> > too terrible?
> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> layout level and is limited to x86 only, there should be easy way out.

Note a subtle difference: what matters is that guest, not host is x86.
Matters for emulators which might reorder memory accesses.
I guess this enforcement belongs in QEMU then?

> I
> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> 0.95 support, it seems all the ingredient had been there already dated back
> to the DPDK days. The only major thing limiting is in the vDPA software that
> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> a few DMA setup ops, which is virtio 1.0 only.
> 
> > 
> > > 2. suppose some form of legacy guest support needs to be there, how do we
> > > deal with the bogus assumption below in vdpa_get_config() in the short 
> > > term?
> > > It looks one of the intuitive fix is to move the vdpa_set_features call 
> > > out
> > > of vdpa_get_config() to vdpa_set_config().
> > > 
> > >      /*
> > >   * Config accesses aren't supposed to trigger before features are
> > > set.
> > >   * If it does happen we assume a legacy guest.
> > >   */
> > >      if (!vdev->features_valid)
> > >      vdpa_set_features(vdev, 0);
> > >      ops->get_config(vdev, offset, buf, len);
> > > 
> > > I can post a patch to fix 2) if there's consensus already reached.
> > > 
> > > Thanks,
> > > -Siwei
> > I'm not sure how important it is to change that.
> > In any case it only affects transitional devices, right?
> > Legacy only should not care ...
> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> modern one in a transitional device model rather than being legacy only.
> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> guests without having to reconfigure. Or are you suggesting limit to legacy
> only at the time of vdpa creation would simplify the implementation a lot?
> 
> Thanks,
> -Siwei


I don't know for sure. Take a look at the work Halil was doing
to try and support transitional devices with BE guests.


> > 
> > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > Detecting it isn't enough though, we will need a new ioctl to 
> > > > > > > > > notify
> > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > know what the use
> > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > directly? Is there a
> > > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > BTW a good API could be
> > > > > > > 
> > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > 
> > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > legacy driver
> > > > > > for vDPA. Consider:
> > > > > > 
> > > > > > 1) It's definition is no-normative
> > > > > > 2) A lot of budren of codes
> > > > > > 
> > > > > > So qemu can still present the legacy device since the config
> > > > > > space or other
> > > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > > accessed by
> > > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > > in this
> > > > > > case?
> > > > > > 
> > > > > > Thanks
> > > > > > 
> > > > > Overall I would be fi

Re: [PATCH] virtio-net: make copy len check in xdp_linearize_page

On Tue, Dec 14, 2021 at 11:48 AM 孙蒙恩  wrote:
>
> Jason Wang  于2021年12月14日周二 11:13写道：
> >
> > On Mon, Dec 13, 2021 at 5:14 PM 孙蒙恩  wrote:
> > >
> > > Jason Wang  于2021年12月13日周一 15:49写道：
> > > >
> > > > On Mon, Dec 13, 2021 at 12:50 PM  wrote:
> > > > >
> > > > > From: mengensun 
> > > > >
> > > > > xdp_linearize_page asume ring elem size is smaller then page size
> > > > > when copy the first ring elem, but, there may be a elem size bigger
> > > > > then page size.
> > > > >
> > > > > add_recvbuf_mergeable may add a hole to ring elem, the hole size is
> > > > > not sure, according EWMA.
> > > >
> > > > The logic is to try to avoid dropping packets in this case, so I
> > > > wonder if it's better to "fix" the add_recvbuf_mergeable().
> > >
> >
> > Adding lists back.
> >
> > > turn to XDP generic is so difficulty for me, here can "fix" the
> > > add_recvbuf_mergeable link follow:
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 36a4b7c195d5..06ce8bb10b47 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1315,6 +1315,7 @@ static int add_recvbuf_mergeable(struct 
> > > virtnet_info *vi,
> > > alloc_frag->offset += hole;
> > > }
> > > +   len = min(len, PAGE_SIZE - room);
> > > sg_init_one(rq->sg, buf, len);
> > > ctx = mergeable_len_to_ctx(len, headroom);
> >
> > Then the truesize here is wrong.
> many thanks!! i have  known i'm wrong immediately after click the
> "send" botton , now, this problem seem not only about the *hole* but
> the  EWMA, EWMA will give buff len bewteen min_buf_len and PAGE_SIZE,
> while swith from no-attach-xdp to attach xdp, there may be some buff
> already in ring and filled before xdp attach. xdp_linearize_page
> assume buf size is PAGE_SIZE - VIRTIO_XDP_HEADROOM, and coped "len"
> from the buff, while the buff may be **PAGE_SIZE**

So the issue I see is that though add_recvbuf_mergeable() tries to
make the buffer size is PAGE_SIZE, XDP might requires more on the
header which makes a single page is not sufficient.

>
> because we have no idear when the user attach xdp prog, so, i have no
> idear except make all the buff have a "header hole" len of
> VIRTIO_XDP_HEADROOM(128), but it seem so expensive for no-xdp-attach
> virtio eth to aways leave 128 byte not used at all.

That's an requirement for XDP header adjustment so far.

>
> this problem is found by review code, in really test, it seemed not so
> may big frame. so here we can just "fix" the  xdp_linearize_page, make
> it trying best to not drop frames for now?

I think you can reproduce the issue by keeping sending large frames to
guest and try to attach XDP in the middle.

>
> btw,  it seem no way to fix this thoroughly, except we drained all the
> buff in the ring, and flush it all to "xdp buff" when attaching xdp
> prog.
>
> is that some mistake i have makeed again? #^_^

I see two ways to solve this:

1) reverse more room (but not headerroom) to make sure PAGE_SIZE can
work in the case of linearizing
2) switch to use XDP genreic

2) looks much more easier, you may refer tun_xdp_one() to see how it
works, it's as simple as call do_xdp_generic()

Thanks

>
> >
> >
> > > err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > >
> > > it seems a rule that, length of elem giving to vring is away smaller
> > > or equall then PAGE_SIZE
> >
> > It aims to be consistent to what EWMA tries to do:
> >
> > len = hdr_len + clamp_t(unsigned int, 
> > ewma_pkt_len_read(avg_pkt_len),
> > rq->min_buf_len, PAGE_SIZE - hdr_len);
> >
> > Thanks
> >
> > >
> > > >
> > > > Or another idea is to switch to use XDP generic here where we can use
> > > > skb_linearize() which should be more robust and we can drop the
> > > > xdp_linearize_page() logic completely.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > so, fix it by check copy len,if checked failed, just dropped the
> > > > > whole frame, not make the memory dirty after the page.
> > > > >
> > > > > Signed-off-by: mengensun 
> > > > > Reviewed-by: MengLong Dong 
> > > > > Reviewed-by: ZhengXiong Jiang 
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 6 +-
> > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 36a4b7c195d5..844bdbd67ff7 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -662,8 +662,12 @@ static struct page *xdp_linearize_page(struct 
> > > > > receive_queue *rq,
> > > > >int page_off,
> > > > >unsigned int *len)
> > > > >  {
> > > > > -   struct page *page = alloc_page(GFP_ATOMIC);
> > > > > +   struct page *page;
> > > > >
> > > > > +   if (*len > PAGE_SIZE - page_off)
> > > > > +

Re: [PATCH v2 00/10] Allow for configuring max number of virtqueue pairs