Re: [PATCH] iommu: Explicitly include correct DT includes

2023-08-07 Thread Robin Murphy

On 14/07/2023 6:46 pm, Rob Herring wrote:

The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.


Thanks Rob; FWIW,

Acked-by: Robin Murphy 

I guess you're hoping for Joerg to pick this up? However I wouldn't 
foresee any major conflicts if you do need to take it through the OF tree.


Cheers,
Robin.


Signed-off-by: Rob Herring 
---
  drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c | 2 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c| 1 -
  drivers/iommu/arm/arm-smmu/qcom_iommu.c  | 3 +--
  drivers/iommu/ipmmu-vmsa.c   | 1 -
  drivers/iommu/sprd-iommu.c   | 1 +
  drivers/iommu/tegra-smmu.c   | 2 +-
  drivers/iommu/virtio-iommu.c | 2 +-
  7 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
index b5b14108e086..bb89d49adf8d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
@@ -3,7 +3,7 @@
   * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
   */
  
-#include 

+#include 
  #include 
  #include 
  
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index a86acd76c1df..d6d1a2a55cc0 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -29,7 +29,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index a503ed758ec3..cc3f68a3516c 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -22,8 +22,7 @@
  #include 
  #include 
  #include 
-#include 
-#include 
+#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 9f64c5c9f5b9..0aeedd3e1494 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -17,7 +17,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 39e34fdeccda..51144c232474 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c

index 1cbf063ccf14..e445f80d0226 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -9,7 +9,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 3551ed057774..17dcd826f5c2 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -13,7 +13,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] iommu: Explicitly include correct DT includes

2023-08-07 Thread Joerg Roedel
On Fri, Jul 14, 2023 at 11:46:39AM -0600, Rob Herring wrote:
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c | 2 +-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c| 1 -
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c  | 3 +--
>  drivers/iommu/ipmmu-vmsa.c   | 1 -
>  drivers/iommu/sprd-iommu.c   | 1 +
>  drivers/iommu/tegra-smmu.c   | 2 +-
>  drivers/iommu/virtio-iommu.c | 2 +-
>  7 files changed, 5 insertions(+), 7 deletions(-)

Applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 45/48] mm: shrinker: make global slab shrink lockless

2023-08-07 Thread Dave Chinner via Virtualization
On Mon, Aug 07, 2023 at 07:09:33PM +0800, Qi Zheng wrote:
> The shrinker_rwsem is a global read-write lock in shrinkers subsystem,
> which protects most operations such as slab shrink, registration and
> unregistration of shrinkers, etc. This can easily cause problems in the
> following cases.

> This commit uses the refcount+RCU method [5] proposed by Dave Chinner
> to re-implement the lockless global slab shrink. The memcg slab shrink is
> handled in the subsequent patch.

> ---
>  include/linux/shrinker.h | 17 ++
>  mm/shrinker.c| 70 +---
>  2 files changed, 68 insertions(+), 19 deletions(-)

There's no documentation in the code explaining how the lockless
shrinker algorithm works. It's left to the reader to work out how
this all goes together

> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index eb342994675a..f06225f18531 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -4,6 +4,8 @@
>  
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #define SHRINKER_UNIT_BITS   BITS_PER_LONG
>  
> @@ -87,6 +89,10 @@ struct shrinker {
>   int seeks;  /* seeks to recreate an obj */
>   unsigned flags;
>  
> + refcount_t refcount;
> + struct completion done;
> + struct rcu_head rcu;

What does the refcount protect, why do we need the completion, etc?

> +
>   void *private_data;
>  
>   /* These are for internal use */
> @@ -120,6 +126,17 @@ struct shrinker *shrinker_alloc(unsigned int flags, 
> const char *fmt, ...);
>  void shrinker_register(struct shrinker *shrinker);
>  void shrinker_free(struct shrinker *shrinker);
>  
> +static inline bool shrinker_try_get(struct shrinker *shrinker)
> +{
> + return refcount_inc_not_zero(&shrinker->refcount);
> +}
> +
> +static inline void shrinker_put(struct shrinker *shrinker)
> +{
> + if (refcount_dec_and_test(&shrinker->refcount))
> + complete(&shrinker->done);
> +}
> +
>  #ifdef CONFIG_SHRINKER_DEBUG
>  extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
> const char *fmt, ...);
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index 1911c06b8af5..d318f5621862 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -2,6 +2,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "internal.h"
> @@ -577,33 +578,42 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, 
> struct mem_cgroup *memcg,
>   if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
>   return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
>  
> - if (!down_read_trylock(&shrinker_rwsem))
> - goto out;
> -
> - list_for_each_entry(shrinker, &shrinker_list, list) {
> + rcu_read_lock();
> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
>   struct shrink_control sc = {
>   .gfp_mask = gfp_mask,
>   .nid = nid,
>   .memcg = memcg,
>   };
>  
> + if (!shrinker_try_get(shrinker))
> + continue;
> +
> + /*
> +  * We can safely unlock the RCU lock here since we already
> +  * hold the refcount of the shrinker.
> +  */
> + rcu_read_unlock();
> +
>   ret = do_shrink_slab(&sc, shrinker, priority);
>   if (ret == SHRINK_EMPTY)
>   ret = 0;
>   freed += ret;
> +
>   /*
> -  * Bail out if someone want to register a new shrinker to
> -  * prevent the registration from being stalled for long periods
> -  * by parallel ongoing shrinking.
> +  * This shrinker may be deleted from shrinker_list and freed
> +  * after the shrinker_put() below, but this shrinker is still
> +  * used for the next traversal. So it is necessary to hold the
> +  * RCU lock first to prevent this shrinker from being freed,
> +  * which also ensures that the next shrinker that is traversed
> +  * will not be freed (even if it is deleted from shrinker_list
> +  * at the same time).
>*/

This comment really should be at the head of the function,
describing the algorithm used within the function itself. i.e. how
reference counts are used w.r.t. the rcu_read_lock() usage to
guarantee existence of the shrinker and the validity of the list
walk.

I'm not going to remember all these little details when I look at
this code in another 6 months time, and having to work it out from
first principles every time I look at the code will waste of a lot
of time...

-Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailma

Re: [PATCH v4 44/48] mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}

2023-08-07 Thread Dave Chinner via Virtualization
On Mon, Aug 07, 2023 at 07:09:32PM +0800, Qi Zheng wrote:
> Currently, we maintain two linear arrays per node per memcg, which are
> shrinker_info::map and shrinker_info::nr_deferred. And we need to resize
> them when the shrinker_nr_max is exceeded, that is, allocate a new array,
> and then copy the old array to the new array, and finally free the old
> array by RCU.
> 
> For shrinker_info::map, we do set_bit() under the RCU lock, so we may set
> the value into the old map which is about to be freed. This may cause the
> value set to be lost. The current solution is not to copy the old map when
> resizing, but to set all the corresponding bits in the new map to 1. This
> solves the data loss problem, but bring the overhead of more pointless
> loops while doing memcg slab shrink.
> 
> For shrinker_info::nr_deferred, we will only modify it under the read lock
> of shrinker_rwsem, so it will not run concurrently with the resizing. But
> after we make memcg slab shrink lockless, there will be the same data loss
> problem as shrinker_info::map, and we can't work around it like the map.
> 
> For such resizable arrays, the most straightforward idea is to change it
> to xarray, like we did for list_lru [1]. We need to do xa_store() in the
> list_lru_add()-->set_shrinker_bit(), but this will cause memory
> allocation, and the list_lru_add() doesn't accept failure. A possible
> solution is to pre-allocate, but the location of pre-allocation is not
> well determined.

So you implemented a two level array that preallocates leaf
nodes to work around it? It's remarkable complex for what it does,
I can't help but think a radix tree using a special holder for
nr_deferred values of zero would end up being simpler...

> Therefore, this commit chooses to introduce a secondary array for
> shrinker_info::{map, nr_deferred}, so that we only need to copy this
> secondary array every time the size is resized. Then even if we get the
> old secondary array under the RCU lock, the found map and nr_deferred are
> also true, so no data is lost.

I don't understand what you are trying to describe here. If we get
the old array, then don't we get either a stale nr_deferred value,
or the update we do gets lost because the next shrinker lookup will
find the new array and os the deferred value stored to the old one
is never seen again?

> 
> [1]. 
> https://lore.kernel.org/all/20220228122126.37293-13-songmuc...@bytedance.com/
> 
> Signed-off-by: Qi Zheng 
> Reviewed-by: Muchun Song 
> ---
.
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index a27779ed3798..1911c06b8af5 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -12,15 +12,50 @@ DECLARE_RWSEM(shrinker_rwsem);
>  #ifdef CONFIG_MEMCG
>  static int shrinker_nr_max;
>  
> -/* The shrinker_info is expanded in a batch of BITS_PER_LONG */
> -static inline int shrinker_map_size(int nr_items)
> +static inline int shrinker_unit_size(int nr_items)
>  {
> - return (DIV_ROUND_UP(nr_items, BITS_PER_LONG) * sizeof(unsigned long));
> + return (DIV_ROUND_UP(nr_items, SHRINKER_UNIT_BITS) * sizeof(struct 
> shrinker_info_unit *));
>  }
>  
> -static inline int shrinker_defer_size(int nr_items)
> +static inline void shrinker_unit_free(struct shrinker_info *info, int start)
>  {
> - return (round_up(nr_items, BITS_PER_LONG) * sizeof(atomic_long_t));
> + struct shrinker_info_unit **unit;
> + int nr, i;
> +
> + if (!info)
> + return;
> +
> + unit = info->unit;
> + nr = DIV_ROUND_UP(info->map_nr_max, SHRINKER_UNIT_BITS);
> +
> + for (i = start; i < nr; i++) {
> + if (!unit[i])
> + break;
> +
> + kvfree(unit[i]);
> + unit[i] = NULL;
> + }
> +}
> +
> +static inline int shrinker_unit_alloc(struct shrinker_info *new,
> +struct shrinker_info *old, int nid)
> +{
> + struct shrinker_info_unit *unit;
> + int nr = DIV_ROUND_UP(new->map_nr_max, SHRINKER_UNIT_BITS);
> + int start = old ? DIV_ROUND_UP(old->map_nr_max, SHRINKER_UNIT_BITS) : 0;
> + int i;
> +
> + for (i = start; i < nr; i++) {
> + unit = kvzalloc_node(sizeof(*unit), GFP_KERNEL, nid);

A unit is 576 bytes. Why is this using kvzalloc_node()?

> + if (!unit) {
> + shrinker_unit_free(new, start);
> + return -ENOMEM;
> + }
> +
> + new->unit[i] = unit;
> + }
> +
> + return 0;
>  }
>  
>  void free_shrinker_info(struct mem_cgroup *memcg)
> @@ -32,6 +67,7 @@ void free_shrinker_info(struct mem_cgroup *memcg)
>   for_each_node(nid) {
>   pn = memcg->nodeinfo[nid];
>   info = rcu_dereference_protected(pn->shrinker_info, true);
> + shrinker_unit_free(info, 0);
>   kvfree(info);
>   rcu_assign_pointer(pn->shrinker_info, NULL);
>   }

Why is this safe? The info and maps are looked up by RCU, so why is
freeing them without a RCU grace pe

Re: [PATCH v4 45/48] mm: shrinker: make global slab shrink lockless

2023-08-07 Thread Dave Chinner via Virtualization
On Mon, Aug 07, 2023 at 07:09:33PM +0800, Qi Zheng wrote:
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index eb342994675a..f06225f18531 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -4,6 +4,8 @@
>  
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #define SHRINKER_UNIT_BITS   BITS_PER_LONG
>  
> @@ -87,6 +89,10 @@ struct shrinker {
>   int seeks;  /* seeks to recreate an obj */
>   unsigned flags;
>  
> + refcount_t refcount;
> + struct completion done;
> + struct rcu_head rcu;

Documentation, please. What does the refcount protect, what does the
completion provide, etc.

> +
>   void *private_data;
>  
>   /* These are for internal use */
> @@ -120,6 +126,17 @@ struct shrinker *shrinker_alloc(unsigned int flags, 
> const char *fmt, ...);
>  void shrinker_register(struct shrinker *shrinker);
>  void shrinker_free(struct shrinker *shrinker);
>  
> +static inline bool shrinker_try_get(struct shrinker *shrinker)
> +{
> + return refcount_inc_not_zero(&shrinker->refcount);
> +}
> +
> +static inline void shrinker_put(struct shrinker *shrinker)
> +{
> + if (refcount_dec_and_test(&shrinker->refcount))
> + complete(&shrinker->done);
> +}
> +
>  #ifdef CONFIG_SHRINKER_DEBUG
>  extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
> const char *fmt, ...);
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index 1911c06b8af5..d318f5621862 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -2,6 +2,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "internal.h"
> @@ -577,33 +578,42 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, 
> struct mem_cgroup *memcg,
>   if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
>   return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
>  
> - if (!down_read_trylock(&shrinker_rwsem))
> - goto out;
> -
> - list_for_each_entry(shrinker, &shrinker_list, list) {
> + rcu_read_lock();
> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
>   struct shrink_control sc = {
>   .gfp_mask = gfp_mask,
>   .nid = nid,
>   .memcg = memcg,
>   };
>  
> + if (!shrinker_try_get(shrinker))
> + continue;
> +
> + /*
> +  * We can safely unlock the RCU lock here since we already
> +  * hold the refcount of the shrinker.
> +  */
> + rcu_read_unlock();
> +
>   ret = do_shrink_slab(&sc, shrinker, priority);
>   if (ret == SHRINK_EMPTY)
>   ret = 0;
>   freed += ret;
> +
>   /*
> -  * Bail out if someone want to register a new shrinker to
> -  * prevent the registration from being stalled for long periods
> -  * by parallel ongoing shrinking.
> +  * This shrinker may be deleted from shrinker_list and freed
> +  * after the shrinker_put() below, but this shrinker is still
> +  * used for the next traversal. So it is necessary to hold the
> +  * RCU lock first to prevent this shrinker from being freed,
> +  * which also ensures that the next shrinker that is traversed
> +  * will not be freed (even if it is deleted from shrinker_list
> +  * at the same time).
>*/

This needs to be moved to the head of the function, and document
the whole list walk, get, put and completion parts of the algorithm
that make it safe. There's more to this than "we hold a reference
count", especially the tricky "we might see the shrinker before it
is fully initialised" case


.
>  void shrinker_free(struct shrinker *shrinker)
>  {
>   struct dentry *debugfs_entry = NULL;
> @@ -686,9 +712,18 @@ void shrinker_free(struct shrinker *shrinker)
>   if (!shrinker)
>   return;
>  
> + if (shrinker->flags & SHRINKER_REGISTERED) {
> + shrinker_put(shrinker);
> + wait_for_completion(&shrinker->done);
> + }

Needs a comment explaining why we need to wait here...
> +
>   down_write(&shrinker_rwsem);
>   if (shrinker->flags & SHRINKER_REGISTERED) {
> - list_del(&shrinker->list);
> + /*
> +  * Lookups on the shrinker are over and will fail in the future,
> +  * so we can now remove it from the lists and free it.
> +  */

 rather than here after the wait has been done and provided the
guarantee that no shrinker is running or will run again...

-Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualiz

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-07 Thread Jason Wang
On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo  wrote:
>
> On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo  
> wrote:
> > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin"  
> > wrote:
> > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo 
> > > >  wrote:
> > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" 
> > > > >  wrote:
> > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig 
> > > > > > >  wrote:
> > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so 
> > > > > > > > > on.
> > > > > > > > > There are NOP for non-dma so passing the dma device is 
> > > > > > > > > harmless.
> > > > > > > >
> > > > > > > > Yes, please.
> > > > > > >
> > > > > > >
> > > > > > > I am not sure I got this fully.
> > > > > > >
> > > > > > > Are you mean this:
> > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> > > > > > >
> > > > > > > Then the driver must do dma operation(map and sync) by these 
> > > > > > > virtio_dma_* APIs.
> > > > > > > No care the device is non-dma device or dma device.
> > > > > >
> > > > > > yes
> > > > > >
> > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio 
> > > > > > > device.
> > > > > >
> > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > >
> > > > > YES.
> > > > >
> > > > > We discussed it. They voted 'no'.
> > > > >
> > > > > http://lore.kernel.org/all/20230424082856.15c1e...@kernel.org
> > > >
> > > >
> > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > >
> > > > Let me briefly summarize:
> > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for 
> > > > AF_XDP and
> > > > the driver layer, we need to support these APIs. The current conclusion 
> > > > of
> > > > AF_XDP is no.
> > > >
> > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly 
> > > > inside
> > > > driver. This idea seems to be inconsistent with the framework design of 
> > > > DMA. The
> > > > conclusion is no.
> > > >
> > > > 3. We noticed that if the virtio device supports 
> > > > VIRTIO_F_ACCESS_PLATFORM, it
> > > > uses DMA API. And this type of device is the future direction, so we 
> > > > only
> > > > support DMA premapped for this type of virtio device. The problem with 
> > > > this
> > > > solution is that virtqueue_dma_dev() only returns dev in some cases, 
> > > > because
> > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.

Could you explain the issue a little bit more?

E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
virtqueue_dma_dev() only return dev in some cases?

Thanks

>Otherwise NULL is returned.
> > > > This option is currently NO.
> > > >
> > > > So I'm wondering what should I do, from a DMA point of view, is there 
> > > > any
> > > > solution in case of using DMA API?
> > > >
> > > > Thank you
> > >
> > >
> > > I think it's ok at this point, Christoph just asked you
> > > to add wrappers for map/unmap for use in virtio code.
> > > Seems like a cosmetic change, shouldn't be hard.
> >
> > Yes, that is not hard, I has this code.
> >
> > But, you mean that the wrappers is just used for the virtio driver code?
> > And we also offer the  API virtqueue_dma_dev() at the same time?
> > Then the driver will has two chooses to do DMA.
> >
> > Is that so?
>
> Ping.
>
> Thanks
>
> >
> >
> > > Otherwise I haven't seen significant comments.
> > >
> > >
> > > Christoph do I summarize what you are saying correctly?
> > > --
> > > MST
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-07 Thread Jason Wang
On Mon, Jul 31, 2023 at 2:30 PM Jason Wang  wrote:
>
> On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > > They really shouldn't - any NIC that takes forever to
> > > > program will create issues in the networking stack.
> > >
> > > Unfortunately, it's not rare as the device/cvq could be implemented
> > > via firmware or software.
> >
> > Currently that mean one either has sane firmware with a scheduler that
> > can meet deadlines, or loses ability to report errors back.
> >
> > > > But if they do they can always set this flag too.
> > >
> > > This may have false negatives and may confuse the management.
> > >
> > > Maybe we can extend the networking core to allow some device specific
> > > configurations to be done with device specific lock without rtnl. For
> > > example, split the set_channels to
> > >
> > > pre_set_channels
> > > set_channels
> > > post_set_channels
> > >
> > > The device specific part could be done in pre and post without a rtnl 
> > > lock?
> > >
> > > Thanks
> >
> >
> > Would the benefit be that errors can be reported to userspace then?
> > Then maybe.  I think you will have to show how this works for at least
> > one card besides virtio.
>
> Even for virtio, this seems not easy, as e.g the
> virtnet_send_command() and netif_set_real_num_tx_queues() need to
> appear to be atomic to the networking core.
>
> I wonder if we can re-consider the way of a timeout here and choose a
> sane value as a start.

Michael, any more input on this?

Thanks

>
> Thanks
>
> >
> >
> > --
> > MST
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 46/48] mm: shrinker: make memcg slab shrink lockless

2023-08-07 Thread Dave Chinner via Virtualization
On Mon, Aug 07, 2023 at 07:09:34PM +0800, Qi Zheng wrote:
> Like global slab shrink, this commit also uses refcount+RCU method to make
> memcg slab shrink lockless.

This patch does random code cleanups amongst the actual RCU changes.
Can you please move the cleanups to a spearate patch to reduce the
noise in this one?

> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index d318f5621862..fee6f62904fb 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -107,6 +107,12 @@ static struct shrinker_info 
> *shrinker_info_protected(struct mem_cgroup *memcg,
>lockdep_is_held(&shrinker_rwsem));
>  }
>  
> +static struct shrinker_info *shrinker_info_rcu(struct mem_cgroup *memcg,
> +int nid)
> +{
> + return rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
> +}

This helper doesn't add value. It doesn't tell me that
rcu_read_lock() needs to be held when it is called, for one

>  static int expand_one_shrinker_info(struct mem_cgroup *memcg, int new_size,
>   int old_size, int new_nr_max)
>  {
> @@ -198,7 +204,7 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, 
> int shrinker_id)
>   struct shrinker_info_unit *unit;
>  
>   rcu_read_lock();
> - info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
> + info = shrinker_info_rcu(memcg, nid);

... whilst the original code here was obviously correct.

>   unit = info->unit[shriner_id_to_index(shrinker_id)];
>   if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) {
>   /* Pairs with smp mb in shrink_slab() */
> @@ -211,7 +217,7 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, 
> int shrinker_id)
>  
>  static DEFINE_IDR(shrinker_idr);
>  
> -static int prealloc_memcg_shrinker(struct shrinker *shrinker)
> +static int shrinker_memcg_alloc(struct shrinker *shrinker)

Cleanups in a separate patch.

> @@ -253,10 +258,15 @@ static long xchg_nr_deferred_memcg(int nid, struct 
> shrinker *shrinker,
>  {
>   struct shrinker_info *info;
>   struct shrinker_info_unit *unit;
> + long nr_deferred;
>  
> - info = shrinker_info_protected(memcg, nid);
> + rcu_read_lock();
> + info = shrinker_info_rcu(memcg, nid);
>   unit = info->unit[shriner_id_to_index(shrinker->id)];
> - return 
> atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(shrinker->id)], 0);
> + nr_deferred = 
> atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(shrinker->id)], 0);
> + rcu_read_unlock();
> +
> + return nr_deferred;
>  }

This adds two rcu_read_lock() sections to every call to
do_shrink_slab(). It's not at all clear ifrom any of the other code
that do_shrink_slab() now has internal rcu_read_lock() sections

> @@ -464,18 +480,23 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, 
> int nid,
>   if (!mem_cgroup_online(memcg))
>   return 0;
>  
> - if (!down_read_trylock(&shrinker_rwsem))
> - return 0;
> -
> - info = shrinker_info_protected(memcg, nid);
> +again:
> + rcu_read_lock();
> + info = shrinker_info_rcu(memcg, nid);
>   if (unlikely(!info))
>   goto unlock;
>  
> - for (; index < shriner_id_to_index(info->map_nr_max); index++) {
> + if (index < shriner_id_to_index(info->map_nr_max)) {
>   struct shrinker_info_unit *unit;
>  
>   unit = info->unit[index];
>  
> + /*
> +  * The shrinker_info_unit will not be freed, so we can
> +  * safely release the RCU lock here.
> +  */
> + rcu_read_unlock();

Why - what guarantees that the shrinker_info_unit exists at this
point? We hold no reference to it, we hold no reference to any
shrinker, etc. What provides this existence guarantee?

> +
>   for_each_set_bit(offset, unit->map, SHRINKER_UNIT_BITS) {
>   struct shrink_control sc = {
>   .gfp_mask = gfp_mask,
> @@ -485,12 +506,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, 
> int nid,
>   struct shrinker *shrinker;
>   int shrinker_id = calc_shrinker_id(index, offset);
>  
> + rcu_read_lock();
>   shrinker = idr_find(&shrinker_idr, shrinker_id);
> - if (unlikely(!shrinker || !(shrinker->flags & 
> SHRINKER_REGISTERED))) {
> - if (!shrinker)
> - clear_bit(offset, unit->map);
> + if (unlikely(!shrinker || !shrinker_try_get(shrinker))) 
> {
> + clear_bit(offset, unit->map);
> + rcu_read_unlock();
>   continue;
>   }
> + rcu_read_unlock();
>  
>   /* Call non-slab shri

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-07 Thread Xuan Zhuo
On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang  wrote:
> On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo  wrote:
> >
> > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo  
> > wrote:
> > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin"  
> > > wrote:
> > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo 
> > > > >  wrote:
> > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" 
> > > > > >  wrote:
> > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig 
> > > > > > > >  wrote:
> > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin 
> > > > > > > > > wrote:
> > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and 
> > > > > > > > > > so on.
> > > > > > > > > > There are NOP for non-dma so passing the dma device is 
> > > > > > > > > > harmless.
> > > > > > > > >
> > > > > > > > > Yes, please.
> > > > > > > >
> > > > > > > >
> > > > > > > > I am not sure I got this fully.
> > > > > > > >
> > > > > > > > Are you mean this:
> > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> > > > > > > >
> > > > > > > > Then the driver must do dma operation(map and sync) by these 
> > > > > > > > virtio_dma_* APIs.
> > > > > > > > No care the device is non-dma device or dma device.
> > > > > > >
> > > > > > > yes
> > > > > > >
> > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio 
> > > > > > > > device.
> > > > > > >
> > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > >
> > > > > > YES.
> > > > > >
> > > > > > We discussed it. They voted 'no'.
> > > > > >
> > > > > > http://lore.kernel.org/all/20230424082856.15c1e...@kernel.org
> > > > >
> > > > >
> > > > > Hi guys, this topic is stuck again. How should I proceed with this 
> > > > > work?
> > > > >
> > > > > Let me briefly summarize:
> > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for 
> > > > > AF_XDP and
> > > > > the driver layer, we need to support these APIs. The current 
> > > > > conclusion of
> > > > > AF_XDP is no.
> > > > >
> > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly 
> > > > > inside
> > > > > driver. This idea seems to be inconsistent with the framework design 
> > > > > of DMA. The
> > > > > conclusion is no.
> > > > >
> > > > > 3. We noticed that if the virtio device supports 
> > > > > VIRTIO_F_ACCESS_PLATFORM, it
> > > > > uses DMA API. And this type of device is the future direction, so we 
> > > > > only
> > > > > support DMA premapped for this type of virtio device. The problem 
> > > > > with this
> > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, 
> > > > > because
> > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
>
> Could you explain the issue a little bit more?
>
> E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> virtqueue_dma_dev() only return dev in some cases?

The behavior of virtqueue_dma_dev() is not related to AF_XDP.

The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
ACCESS_PLATFORM then it MUST return NULL.

In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
we can enable AF_XDP. If not, we return error to AF_XDP.

Thanks




>
> Thanks
>
> >Otherwise NULL is returned.
> > > > > This option is currently NO.
> > > > >
> > > > > So I'm wondering what should I do, from a DMA point of view, is there 
> > > > > any
> > > > > solution in case of using DMA API?
> > > > >
> > > > > Thank you
> > > >
> > > >
> > > > I think it's ok at this point, Christoph just asked you
> > > > to add wrappers for map/unmap for use in virtio code.
> > > > Seems like a cosmetic change, shouldn't be hard.
> > >
> > > Yes, that is not hard, I has this code.
> > >
> > > But, you mean that the wrappers is just used for the virtio driver code?
> > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > Then the driver will has two chooses to do DMA.
> > >
> > > Is that so?
> >
> > Ping.
> >
> > Thanks
> >
> > >
> > >
> > > > Otherwise I haven't seen significant comments.
> > > >
> > > >
> > > > Christoph do I summarize what you are saying correctly?
> > > > --
> > > > MST
> > > >
> > >
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/2] vdpa/mlx5: Fix mr->initialized semantics

2023-08-07 Thread Jason Wang
On Thu, Aug 3, 2023 at 7:40 PM Dragos Tatulea  wrote:
>
> On Thu, 2023-08-03 at 16:03 +0800, Jason Wang wrote:
> > On Thu, Aug 3, 2023 at 1:13 AM Dragos Tatulea  wrote:
> > >
> > > The mr->initialized flag is shared between the control vq and data vq
> > > part of the mr init/uninit. But if the control vq and data vq get placed
> > > in different ASIDs, it can happen that initializing the control vq will
> > > prevent the data vq mr from being initialized.
> > >
> > > This patch consolidates the control and data vq init parts into their
> > > own init functions. The mr->initialized will now be used for the data vq
> > > only. The control vq currently doesn't need a flag.
> > >
> > > The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got
> > > split into data and control vq functions which are now also ASID aware.
> > >
> > > Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for
> > > control and data")
> > > Signed-off-by: Dragos Tatulea 
> > > Reviewed-by: Eugenio Pérez 
> > > Reviewed-by: Gal Pressman 
> > > ---
> > >  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
> > >  drivers/vdpa/mlx5/core/mr.c| 97 +-
> > >  2 files changed, 71 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > index 25fc4120b618..a0420be5059f 100644
> > > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > @@ -31,6 +31,7 @@ struct mlx5_vdpa_mr {
> > > struct list_head head;
> > > unsigned long num_directs;
> > > unsigned long num_klms;
> > > +   /* state of dvq mr */
> > > bool initialized;
> > >
> > > /* serialize mkey creation and destruction */
> > > diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> > > index 03e543229791..4ae14a248a4b 100644
> > > --- a/drivers/vdpa/mlx5/core/mr.c
> > > +++ b/drivers/vdpa/mlx5/core/mr.c
> > > @@ -489,60 +489,103 @@ static void destroy_user_mr(struct mlx5_vdpa_dev
> > > *mvdev, struct mlx5_vdpa_mr *mr
> > > }
> > >  }
> > >
> > > -void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
> > > +static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev, 
> > > unsigned
> > > int asid)
> > > +{
> > > +   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
> > > +   return;
> > > +
> > > +   prune_iotlb(mvdev);
> > > +}
> > > +
> > > +static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, 
> > > unsigned
> > > int asid)
> > >  {
> > > struct mlx5_vdpa_mr *mr = &mvdev->mr;
> > >
> > > -   mutex_lock(&mr->mkey_mtx);
> > > +   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)
> > > +   return;
> > > +
> > > if (!mr->initialized)
> > > -   goto out;
> > > +   return;
> > >
> > > -   prune_iotlb(mvdev);
> > > if (mr->user_mr)
> > > destroy_user_mr(mvdev, mr);
> > > else
> > > destroy_dma_mr(mvdev, mr);
> > >
> > > mr->initialized = false;
> > > -out:
> > > +}
> > > +
> > > +static void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, 
> > > unsigned
> > > int asid)
> > > +{
> > > +   struct mlx5_vdpa_mr *mr = &mvdev->mr;
> > > +
> > > +   mutex_lock(&mr->mkey_mtx);
> > > +
> > > +   _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
> > > +   _mlx5_vdpa_destroy_cvq_mr(mvdev, asid);
> > > +
> > > mutex_unlock(&mr->mkey_mtx);
> > >  }
> > >
> > > -static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
> > > -   struct vhost_iotlb *iotlb, unsigned int
> > > asid)
> > > +void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
> > > +{
> > > +   mlx5_vdpa_destroy_mr_asid(mvdev, mvdev-
> > > >group2asid[MLX5_VDPA_CVQ_GROUP]);
> > > +   mlx5_vdpa_destroy_mr_asid(mvdev, mvdev-
> > > >group2asid[MLX5_VDPA_DATAVQ_GROUP]);
> > > +}
> > > +
> > > +static int _mlx5_vdpa_create_cvq_mr(struct mlx5_vdpa_dev *mvdev,
> > > +   struct vhost_iotlb *iotlb,
> > > +   unsigned int asid)
> > > +{
> > > +   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
> > > +   return 0;
> > > +
> > > +   return dup_iotlb(mvdev, iotlb);
> >
> > This worries me as conceptually, there should be no difference between
> > dvq mr and cvq mr. The virtqueue should be loosely coupled with mr.
> >
> Are you worried by the changes in this patch or about the possibility of 
> having
>
> The reason for this change is that I noticed if you create one mr in one asid
> you could be blocked out from creating another one in a different asid due to
> mr->initialized being true. To me that seemed problematic. Is it not?

My feeling is that mr.c should be device agnostic. It needs to know
nothing about the device details to work. But this patch seems to
break the layer.

>
> > One example is 

Re: [PATCH 1/2] vdpa/mlx5: Fix mr->initialized semantics

2023-08-07 Thread Jason Wang
On Fri, Aug 4, 2023 at 1:58 AM Si-Wei Liu  wrote:
>
>
>
> On 8/3/2023 1:03 AM, Jason Wang wrote:
> > On Thu, Aug 3, 2023 at 1:13 AM Dragos Tatulea  wrote:
> >> The mr->initialized flag is shared between the control vq and data vq
> >> part of the mr init/uninit. But if the control vq and data vq get placed
> >> in different ASIDs, it can happen that initializing the control vq will
> >> prevent the data vq mr from being initialized.
> >>
> >> This patch consolidates the control and data vq init parts into their
> >> own init functions. The mr->initialized will now be used for the data vq
> >> only. The control vq currently doesn't need a flag.
> >>
> >> The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got
> >> split into data and control vq functions which are now also ASID aware.
> >>
> >> Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for 
> >> control and data")
> >> Signed-off-by: Dragos Tatulea 
> >> Reviewed-by: Eugenio Pérez 
> >> Reviewed-by: Gal Pressman 
> >> ---
> >>   drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
> >>   drivers/vdpa/mlx5/core/mr.c| 97 +-
> >>   2 files changed, 71 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
> >> b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> >> index 25fc4120b618..a0420be5059f 100644
> >> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> >> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> >> @@ -31,6 +31,7 @@ struct mlx5_vdpa_mr {
> >>  struct list_head head;
> >>  unsigned long num_directs;
> >>  unsigned long num_klms;
> >> +   /* state of dvq mr */
> >>  bool initialized;
> >>
> >>  /* serialize mkey creation and destruction */
> >> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> >> index 03e543229791..4ae14a248a4b 100644
> >> --- a/drivers/vdpa/mlx5/core/mr.c
> >> +++ b/drivers/vdpa/mlx5/core/mr.c
> >> @@ -489,60 +489,103 @@ static void destroy_user_mr(struct mlx5_vdpa_dev 
> >> *mvdev, struct mlx5_vdpa_mr *mr
> >>  }
> >>   }
> >>
> >> -void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
> >> +static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev, 
> >> unsigned int asid)
> >> +{
> >> +   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
> >> +   return;
> >> +
> >> +   prune_iotlb(mvdev);
> >> +}
> >> +
> >> +static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, 
> >> unsigned int asid)
> >>   {
> >>  struct mlx5_vdpa_mr *mr = &mvdev->mr;
> >>
> >> -   mutex_lock(&mr->mkey_mtx);
> >> +   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)
> >> +   return;
> >> +
> >>  if (!mr->initialized)
> >> -   goto out;
> >> +   return;
> >>
> >> -   prune_iotlb(mvdev);
> >>  if (mr->user_mr)
> >>  destroy_user_mr(mvdev, mr);
> >>  else
> >>  destroy_dma_mr(mvdev, mr);
> >>
> >>  mr->initialized = false;
> >> -out:
> >> +}
> >> +
> >> +static void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, 
> >> unsigned int asid)
> >> +{
> >> +   struct mlx5_vdpa_mr *mr = &mvdev->mr;
> >> +
> >> +   mutex_lock(&mr->mkey_mtx);
> >> +
> >> +   _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
> >> +   _mlx5_vdpa_destroy_cvq_mr(mvdev, asid);
> >> +
> >>  mutex_unlock(&mr->mkey_mtx);
> >>   }
> >>
> >> -static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
> >> -   struct vhost_iotlb *iotlb, unsigned int 
> >> asid)
> >> +void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
> >> +{
> >> +   mlx5_vdpa_destroy_mr_asid(mvdev, 
> >> mvdev->group2asid[MLX5_VDPA_CVQ_GROUP]);
> >> +   mlx5_vdpa_destroy_mr_asid(mvdev, 
> >> mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]);
> >> +}
> >> +
> >> +static int _mlx5_vdpa_create_cvq_mr(struct mlx5_vdpa_dev *mvdev,
> >> +   struct vhost_iotlb *iotlb,
> >> +   unsigned int asid)
> >> +{
> >> +   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
> >> +   return 0;
> >> +
> >> +   return dup_iotlb(mvdev, iotlb);
> > This worries me as conceptually, there should be no difference between
> > dvq mr and cvq mr. The virtqueue should be loosely coupled with mr.
> >
> > One example is that, if we only do dup_iotlb() but not try to create
> > dma mr here, we will break virtio-vdpa:
> For this case, I guess we may need another way to support virtio-vdpa
> 1:1 mapping rather than overloading virtio device reset semantics, see:
>
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>
>  > Conceptually, the address mapping is not a part of the abstraction for
>  > a virtio device now. So resetting the memory mapping during virtio
>  > device reset seems wrong.
>
> where we want to keep memory mapping intact across virtio device reset
> for best live migration laten

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-07 Thread Jason Wang
On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo  wrote:
>
> On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang  wrote:
> > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo  wrote:
> > >
> > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo  
> > > wrote:
> > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" 
> > > >  wrote:
> > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo 
> > > > > >  wrote:
> > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" 
> > > > > > >  wrote:
> > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig 
> > > > > > > > >  wrote:
> > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. 
> > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and 
> > > > > > > > > > > so on.
> > > > > > > > > > > There are NOP for non-dma so passing the dma device is 
> > > > > > > > > > > harmless.
> > > > > > > > > >
> > > > > > > > > > Yes, please.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I am not sure I got this fully.
> > > > > > > > >
> > > > > > > > > Are you mean this:
> > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> > > > > > > > >
> > > > > > > > > Then the driver must do dma operation(map and sync) by these 
> > > > > > > > > virtio_dma_* APIs.
> > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > >
> > > > > > > > yes
> > > > > > > >
> > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio 
> > > > > > > > > device.
> > > > > > > >
> > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > >
> > > > > > > YES.
> > > > > > >
> > > > > > > We discussed it. They voted 'no'.
> > > > > > >
> > > > > > > http://lore.kernel.org/all/20230424082856.15c1e...@kernel.org
> > > > > >
> > > > > >
> > > > > > Hi guys, this topic is stuck again. How should I proceed with this 
> > > > > > work?
> > > > > >
> > > > > > Let me briefly summarize:
> > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for 
> > > > > > AF_XDP and
> > > > > > the driver layer, we need to support these APIs. The current 
> > > > > > conclusion of
> > > > > > AF_XDP is no.
> > > > > >
> > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly 
> > > > > > inside
> > > > > > driver. This idea seems to be inconsistent with the framework 
> > > > > > design of DMA. The
> > > > > > conclusion is no.
> > > > > >
> > > > > > 3. We noticed that if the virtio device supports 
> > > > > > VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > uses DMA API. And this type of device is the future direction, so 
> > > > > > we only
> > > > > > support DMA premapped for this type of virtio device. The problem 
> > > > > > with this
> > > > > > solution is that virtqueue_dma_dev() only returns dev in some 
> > > > > > cases, because
> > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> >
> > Could you explain the issue a little bit more?
> >
> > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > virtqueue_dma_dev() only return dev in some cases?
>
> The behavior of virtqueue_dma_dev() is not related to AF_XDP.
>
> The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> ACCESS_PLATFORM then it MUST return NULL.
>
> In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> we can enable AF_XDP. If not, we return error to AF_XDP.

Yes, as discussed, just having wrappers in the virtio_ring and doing
the switch there. Then can virtio-net use them without worrying about
DMA details?

Thanks

>
> Thanks
>
>
>
>
> >
> > Thanks
> >
> > >Otherwise NULL is returned.
> > > > > > This option is currently NO.
> > > > > >
> > > > > > So I'm wondering what should I do, from a DMA point of view, is 
> > > > > > there any
> > > > > > solution in case of using DMA API?
> > > > > >
> > > > > > Thank you
> > > > >
> > > > >
> > > > > I think it's ok at this point, Christoph just asked you
> > > > > to add wrappers for map/unmap for use in virtio code.
> > > > > Seems like a cosmetic change, shouldn't be hard.
> > > >
> > > > Yes, that is not hard, I has this code.
> > > >
> > > > But, you mean that the wrappers is just used for the virtio driver code?
> > > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > > Then the driver will has two chooses to do DMA.
> > > >
> > > > Is that so?
> > >
> > > Ping.
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > > Otherwise I haven't seen significant comments.
> > > > >
> > > > >
> > > > > Christoph do I summarize what you are saying correctly?
> > > > > --
> > > > > MST
> > > >

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-07 Thread Xuan Zhuo
On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang  wrote:
> On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo  wrote:
> >
> > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang  wrote:
> > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo 
> > > >  wrote:
> > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" 
> > > > >  wrote:
> > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" 
> > > > > > > >  wrote:
> > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig 
> > > > > > > > > >  wrote:
> > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. 
> > > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync 
> > > > > > > > > > > > and so on.
> > > > > > > > > > > > There are NOP for non-dma so passing the dma device is 
> > > > > > > > > > > > harmless.
> > > > > > > > > > >
> > > > > > > > > > > Yes, please.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > >
> > > > > > > > > > Are you mean this:
> > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> > > > > > > > > >
> > > > > > > > > > Then the driver must do dma operation(map and sync) by 
> > > > > > > > > > these virtio_dma_* APIs.
> > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > >
> > > > > > > > > yes
> > > > > > > > >
> > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio 
> > > > > > > > > > device.
> > > > > > > > >
> > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > >
> > > > > > > > YES.
> > > > > > > >
> > > > > > > > We discussed it. They voted 'no'.
> > > > > > > >
> > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e...@kernel.org
> > > > > > >
> > > > > > >
> > > > > > > Hi guys, this topic is stuck again. How should I proceed with 
> > > > > > > this work?
> > > > > > >
> > > > > > > Let me briefly summarize:
> > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, 
> > > > > > > for AF_XDP and
> > > > > > > the driver layer, we need to support these APIs. The current 
> > > > > > > conclusion of
> > > > > > > AF_XDP is no.
> > > > > > >
> > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API 
> > > > > > > uniformly inside
> > > > > > > driver. This idea seems to be inconsistent with the framework 
> > > > > > > design of DMA. The
> > > > > > > conclusion is no.
> > > > > > >
> > > > > > > 3. We noticed that if the virtio device supports 
> > > > > > > VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > uses DMA API. And this type of device is the future direction, so 
> > > > > > > we only
> > > > > > > support DMA premapped for this type of virtio device. The problem 
> > > > > > > with this
> > > > > > > solution is that virtqueue_dma_dev() only returns dev in some 
> > > > > > > cases, because
> > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > >
> > > Could you explain the issue a little bit more?
> > >
> > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > virtqueue_dma_dev() only return dev in some cases?
> >
> > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> >
> > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> > ACCESS_PLATFORM then it MUST return NULL.
> >
> > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > we can enable AF_XDP. If not, we return error to AF_XDP.
>
> Yes, as discussed, just having wrappers in the virtio_ring and doing
> the switch there. Then can virtio-net use them without worrying about
> DMA details?


Yes. In the virtio drivers, we can use the wrappers. That is ok.

But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
AF_XDP will not use the wrappers.

So that is ok for you?

Thanks.

>
> Thanks
>
> >
> > Thanks
> >
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >Otherwise NULL is returned.
> > > > > > > This option is currently NO.
> > > > > > >
> > > > > > > So I'm wondering what should I do, from a DMA point of view, is 
> > > > > > > there any
> > > > > > > solution in case of using DMA API?
> > > > > > >
> > > > > > > Thank you
> > > > > >
> > > > > >
> > > > > > I think it's ok at this point, Christoph just asked you
> > > > > > to add wrappers for map/unmap for use in virtio code.
> > > > > > Seems like a cosmetic change, shouldn't be hard.
> > > > >
> > > > > Yes, that

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-07 Thread Jason Wang
On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo  wrote:
>
> On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang  wrote:
> > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo  
> > wrote:
> > >
> > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang  wrote:
> > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo 
> > > > >  wrote:
> > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" 
> > > > > >  wrote:
> > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo 
> > > > > > > >  wrote:
> > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" 
> > > > > > > > >  wrote:
> > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. 
> > > > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync 
> > > > > > > > > > > > > and so on.
> > > > > > > > > > > > > There are NOP for non-dma so passing the dma device 
> > > > > > > > > > > > > is harmless.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, please.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > >
> > > > > > > > > > > Are you mean this:
> > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> > > > > > > > > > >
> > > > > > > > > > > Then the driver must do dma operation(map and sync) by 
> > > > > > > > > > > these virtio_dma_* APIs.
> > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > >
> > > > > > > > > > yes
> > > > > > > > > >
> > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for 
> > > > > > > > > > > virtio device.
> > > > > > > > > >
> > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > >
> > > > > > > > > YES.
> > > > > > > > >
> > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > >
> > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e...@kernel.org
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi guys, this topic is stuck again. How should I proceed with 
> > > > > > > > this work?
> > > > > > > >
> > > > > > > > Let me briefly summarize:
> > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, 
> > > > > > > > for AF_XDP and
> > > > > > > > the driver layer, we need to support these APIs. The current 
> > > > > > > > conclusion of
> > > > > > > > AF_XDP is no.
> > > > > > > >
> > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API 
> > > > > > > > uniformly inside
> > > > > > > > driver. This idea seems to be inconsistent with the framework 
> > > > > > > > design of DMA. The
> > > > > > > > conclusion is no.
> > > > > > > >
> > > > > > > > 3. We noticed that if the virtio device supports 
> > > > > > > > VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > uses DMA API. And this type of device is the future direction, 
> > > > > > > > so we only
> > > > > > > > support DMA premapped for this type of virtio device. The 
> > > > > > > > problem with this
> > > > > > > > solution is that virtqueue_dma_dev() only returns dev in some 
> > > > > > > > cases, because
> > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > >
> > > > Could you explain the issue a little bit more?
> > > >
> > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > virtqueue_dma_dev() only return dev in some cases?
> > >
> > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > >
> > > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it 
> > > can
> > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is 
> > > without
> > > ACCESS_PLATFORM then it MUST return NULL.
> > >
> > > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > > we can enable AF_XDP. If not, we return error to AF_XDP.
> >
> > Yes, as discussed, just having wrappers in the virtio_ring and doing
> > the switch there. Then can virtio-net use them without worrying about
> > DMA details?
>
>
> Yes. In the virtio drivers, we can use the wrappers. That is ok.
>
> But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
> AF_XDP will not use the wrappers.

You mean AF_XDP core or other? Could you give me an example?

Thanks

>
> So that is ok for you?
>
> Thanks.
>
> >
> > Thanks
> >
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >Otherwise NULL is returned.
> > > > > > > > This option is currently NO.
> > > > > > > >
> > > > > > > > So I'm wondering what should I do, from a 

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-07 Thread Xuan Zhuo
On Tue, 8 Aug 2023 11:49:08 +0800, Jason Wang  wrote:
> On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo  wrote:
> >
> > On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang  wrote:
> > > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang  
> > > > wrote:
> > > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo 
> > > > > >  wrote:
> > > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" 
> > > > > > >  wrote:
> > > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo 
> > > > > > > > >  wrote:
> > > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" 
> > > > > > > > > >  wrote:
> > > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. 
> > > > > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > > > > Well I think we can add wrappers like 
> > > > > > > > > > > > > > virtio_dma_sync and so on.
> > > > > > > > > > > > > > There are NOP for non-dma so passing the dma device 
> > > > > > > > > > > > > > is harmless.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, please.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > > >
> > > > > > > > > > > > Are you mean this:
> > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> > > > > > > > > > > >
> > > > > > > > > > > > Then the driver must do dma operation(map and sync) by 
> > > > > > > > > > > > these virtio_dma_* APIs.
> > > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > > >
> > > > > > > > > > > yes
> > > > > > > > > > >
> > > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for 
> > > > > > > > > > > > virtio device.
> > > > > > > > > > >
> > > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > > >
> > > > > > > > > > YES.
> > > > > > > > > >
> > > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > > >
> > > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e...@kernel.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi guys, this topic is stuck again. How should I proceed with 
> > > > > > > > > this work?
> > > > > > > > >
> > > > > > > > > Let me briefly summarize:
> > > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is 
> > > > > > > > > that, for AF_XDP and
> > > > > > > > > the driver layer, we need to support these APIs. The current 
> > > > > > > > > conclusion of
> > > > > > > > > AF_XDP is no.
> > > > > > > > >
> > > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API 
> > > > > > > > > uniformly inside
> > > > > > > > > driver. This idea seems to be inconsistent with the framework 
> > > > > > > > > design of DMA. The
> > > > > > > > > conclusion is no.
> > > > > > > > >
> > > > > > > > > 3. We noticed that if the virtio device supports 
> > > > > > > > > VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > > uses DMA API. And this type of device is the future 
> > > > > > > > > direction, so we only
> > > > > > > > > support DMA premapped for this type of virtio device. The 
> > > > > > > > > problem with this
> > > > > > > > > solution is that virtqueue_dma_dev() only returns dev in some 
> > > > > > > > > cases, because
> > > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > > >
> > > > > Could you explain the issue a little bit more?
> > > > >
> > > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > > virtqueue_dma_dev() only return dev in some cases?
> > > >
> > > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > > >
> > > > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it 
> > > > can
> > > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is 
> > > > without
> > > > ACCESS_PLATFORM then it MUST return NULL.
> > > >
> > > > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > > > we can enable AF_XDP. If not, we return error to AF_XDP.
> > >
> > > Yes, as discussed, just having wrappers in the virtio_ring and doing
> > > the switch there. Then can virtio-net use them without worrying about
> > > DMA details?
> >
> >
> > Yes. In the virtio drivers, we can use the wrappers. That is ok.
> >
> > But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
> > AF_XDP will not use the wrappers.
>
> You mean AF_XDP core or other? Could you give me an example?


Yes. The AF_XDP c

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-07 Thread Jason Wang
On Tue, Aug 8, 2023 at 11:57 AM Xuan Zhuo  wrote:
>
> On Tue, 8 Aug 2023 11:49:08 +0800, Jason Wang  wrote:
> > On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo  
> > wrote:
> > >
> > > On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang  wrote:
> > > > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" 
> > > > > > > >  wrote:
> > > > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo 
> > > > > > > > > >  wrote:
> > > > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael 
> > > > > > > > > > > > > > S. Tsirkin wrote:
> > > > > > > > > > > > > > > Well I think we can add wrappers like 
> > > > > > > > > > > > > > > virtio_dma_sync and so on.
> > > > > > > > > > > > > > > There are NOP for non-dma so passing the dma 
> > > > > > > > > > > > > > > device is harmless.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, please.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Are you mean this:
> > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> > > > > > > > > > > > >
> > > > > > > > > > > > > Then the driver must do dma operation(map and sync) 
> > > > > > > > > > > > > by these virtio_dma_* APIs.
> > > > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > > > >
> > > > > > > > > > > > yes
> > > > > > > > > > > >
> > > > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for 
> > > > > > > > > > > > > virtio device.
> > > > > > > > > > > >
> > > > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > > > >
> > > > > > > > > > > YES.
> > > > > > > > > > >
> > > > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > > > >
> > > > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e...@kernel.org
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi guys, this topic is stuck again. How should I proceed 
> > > > > > > > > > with this work?
> > > > > > > > > >
> > > > > > > > > > Let me briefly summarize:
> > > > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is 
> > > > > > > > > > that, for AF_XDP and
> > > > > > > > > > the driver layer, we need to support these APIs. The 
> > > > > > > > > > current conclusion of
> > > > > > > > > > AF_XDP is no.
> > > > > > > > > >
> > > > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API 
> > > > > > > > > > uniformly inside
> > > > > > > > > > driver. This idea seems to be inconsistent with the 
> > > > > > > > > > framework design of DMA. The
> > > > > > > > > > conclusion is no.
> > > > > > > > > >
> > > > > > > > > > 3. We noticed that if the virtio device supports 
> > > > > > > > > > VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > > > uses DMA API. And this type of device is the future 
> > > > > > > > > > direction, so we only
> > > > > > > > > > support DMA premapped for this type of virtio device. The 
> > > > > > > > > > problem with this
> > > > > > > > > > solution is that virtqueue_dma_dev() only returns dev in 
> > > > > > > > > > some cases, because
> > > > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > > > >
> > > > > > Could you explain the issue a little bit more?
> > > > > >
> > > > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > > > virtqueue_dma_dev() only return dev in some cases?
> > > > >
> > > > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > > > >
> > > > > The return value of virtqueue_dma_dev() is used for the DMA APIs. So 
> > > > > it can
> > > > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is 
> > > > > without
> > > > > ACCESS_PLATFORM then it MUST return NULL.
> > > > >
> > > > > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > > > > we can enable AF_XDP. If not, we return error to AF_XDP.
> > > >
> > > > Yes, as discussed, just having wrappers in the virtio_ring and doing
> > > > the switch there. Then can virtio-net use them without worrying about
> > > > DMA detai

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-07 Thread Xuan Zhuo
On Tue, 8 Aug 2023 11:59:04 +0800, Jason Wang  wrote:
> On Tue, Aug 8, 2023 at 11:57 AM Xuan Zhuo  wrote:
> >
> > On Tue, 8 Aug 2023 11:49:08 +0800, Jason Wang  wrote:
> > > On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang  
> > > > > > wrote:
> > > > > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo 
> > > > > > > >  wrote:
> > > > > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" 
> > > > > > > > >  wrote:
> > > > > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. 
> > > > > > > > > > > > Tsirkin"  wrote:
> > > > > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph 
> > > > > > > > > > > > > > Hellwig  wrote:
> > > > > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael 
> > > > > > > > > > > > > > > S. Tsirkin wrote:
> > > > > > > > > > > > > > > > Well I think we can add wrappers like 
> > > > > > > > > > > > > > > > virtio_dma_sync and so on.
> > > > > > > > > > > > > > > > There are NOP for non-dma so passing the dma 
> > > > > > > > > > > > > > > > device is harmless.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, please.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Are you mean this:
> > > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> > > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Then the driver must do dma operation(map and sync) 
> > > > > > > > > > > > > > by these virtio_dma_* APIs.
> > > > > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > > > > >
> > > > > > > > > > > > > yes
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs 
> > > > > > > > > > > > > > for virtio device.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > > > > >
> > > > > > > > > > > > YES.
> > > > > > > > > > > >
> > > > > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > > > > >
> > > > > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e...@kernel.org
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi guys, this topic is stuck again. How should I proceed 
> > > > > > > > > > > with this work?
> > > > > > > > > > >
> > > > > > > > > > > Let me briefly summarize:
> > > > > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is 
> > > > > > > > > > > that, for AF_XDP and
> > > > > > > > > > > the driver layer, we need to support these APIs. The 
> > > > > > > > > > > current conclusion of
> > > > > > > > > > > AF_XDP is no.
> > > > > > > > > > >
> > > > > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API 
> > > > > > > > > > > uniformly inside
> > > > > > > > > > > driver. This idea seems to be inconsistent with the 
> > > > > > > > > > > framework design of DMA. The
> > > > > > > > > > > conclusion is no.
> > > > > > > > > > >
> > > > > > > > > > > 3. We noticed that if the virtio device supports 
> > > > > > > > > > > VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > > > > uses DMA API. And this type of device is the future 
> > > > > > > > > > > direction, so we only
> > > > > > > > > > > support DMA premapped for this type of virtio device. The 
> > > > > > > > > > > problem with this
> > > > > > > > > > > solution is that virtqueue_dma_dev() only returns dev in 
> > > > > > > > > > > some cases, because
> > > > > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > > > > >
> > > > > > > Could you explain the issue a little bit more?
> > > > > > >
> > > > > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > > > > virtqueue_dma_dev() only return dev in some cases?
> > > > > >
> > > > > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > > > > >
> > > > > > The return value of virtqueue_dma_dev() is used for the DMA APIs. 
> > > > > > So it can
> > > > > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio 
> > > > > > is without
> > > > > > ACCESS_PLATFORM then it MUST return NULL.
> > > > > >
> > > > > > In the virtio-net driver, if the virtqueue_dma_dev

Re: [PATCH] virtio_ring: fix avail_wrap_counter in virtqueue_add_packed

2023-08-07 Thread Jason Wang
On Tue, Aug 8, 2023 at 1:11 PM Yuan Yao  wrote:
>
> In current packed virtqueue implementation, the avail_wrap_counter won't
> flip, in the case when the driver supplies a descriptor chain with a
> length equals to the queue size; total_sg == vq->packed.vring.num.
>
> Let’s assume the following situation:
> vq->packed.vring.num=4
> vq->packed.next_avail_idx: 1
> vq->packed.avail_wrap_counter: 0
>
> Then the driver adds a descriptor chain containing 4 descriptors.
>
> We expect the following result with avail_wrap_counter flipped:
> vq->packed.next_avail_idx: 1
> vq->packed.avail_wrap_counter: 1
>
> But, the current implementation gives the following result:
> vq->packed.next_avail_idx: 1
> vq->packed.avail_wrap_counter: 0
>
> To reproduce the bug, you can set a packed queue size as small as
> possible, so that the driver is more likely to provide a descriptor
> chain with a length equal to the packed queue size. For example, in
> qemu run following commands:
> sudo qemu-system-x86_64 \
> -enable-kvm \
> -nographic \
> -kernel "path/to/kernel_image" \
> -m 1G \
> -drive file="path/to/rootfs",if=none,id=disk \
> -device virtio-blk,drive=disk \
> -drive file="path/to/disk_image",if=none,id=rwdisk \
> -device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
> indirect_desc=off \
> -append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
>
> Inside the VM, create a directory and mount the rwdisk device on it. The
> rwdisk will hang and mount operation will not complete.
>
> This commit fixes the wrap counter error by flipping the
> packed.avail_wrap_counter, when start of descriptor chain equals to the
> end of descriptor chain (head == i).
>
> Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support")
> Signed-off-by: Yuan Yao 
> ---
>
>  drivers/virtio/virtio_ring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c5310eaf8b46..da1150d127c2 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1461,7 +1461,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
> *_vq,
> }
> }
>
> -   if (i < head)
> +   if (i <= head)
> vq->packed.avail_wrap_counter ^= 1;

Would it be better to move the flipping to the place where we flip
avail_used_flags?

if ((unlikely(++i >= vq->packed.vring.num))) {
i = 0;
vq->packed.avail_used_flags ^=
1 << VRING_PACKED_DESC_F_AVAIL |
1 << VRING_PACKED_DESC_F_USED;
}

Thanks

>
> /* We're using some buffers from the free list. */
> --
> 2.41.0.640.ga95def55d0-goog
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_ring: fix avail_wrap_counter in virtqueue_add_packed

2023-08-07 Thread Michael S. Tsirkin
On Tue, Aug 08, 2023 at 01:43:02PM +0800, Jason Wang wrote:
> On Tue, Aug 8, 2023 at 1:11 PM Yuan Yao  wrote:
> >
> > In current packed virtqueue implementation, the avail_wrap_counter won't
> > flip, in the case when the driver supplies a descriptor chain with a
> > length equals to the queue size; total_sg == vq->packed.vring.num.
> >
> > Let’s assume the following situation:
> > vq->packed.vring.num=4
> > vq->packed.next_avail_idx: 1
> > vq->packed.avail_wrap_counter: 0
> >
> > Then the driver adds a descriptor chain containing 4 descriptors.
> >
> > We expect the following result with avail_wrap_counter flipped:
> > vq->packed.next_avail_idx: 1
> > vq->packed.avail_wrap_counter: 1
> >
> > But, the current implementation gives the following result:
> > vq->packed.next_avail_idx: 1
> > vq->packed.avail_wrap_counter: 0
> >
> > To reproduce the bug, you can set a packed queue size as small as
> > possible, so that the driver is more likely to provide a descriptor
> > chain with a length equal to the packed queue size. For example, in
> > qemu run following commands:
> > sudo qemu-system-x86_64 \
> > -enable-kvm \
> > -nographic \
> > -kernel "path/to/kernel_image" \
> > -m 1G \
> > -drive file="path/to/rootfs",if=none,id=disk \
> > -device virtio-blk,drive=disk \
> > -drive file="path/to/disk_image",if=none,id=rwdisk \
> > -device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
> > indirect_desc=off \
> > -append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
> >
> > Inside the VM, create a directory and mount the rwdisk device on it. The
> > rwdisk will hang and mount operation will not complete.
> >
> > This commit fixes the wrap counter error by flipping the
> > packed.avail_wrap_counter, when start of descriptor chain equals to the
> > end of descriptor chain (head == i).
> >
> > Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support")
> > Signed-off-by: Yuan Yao 
> > ---
> >
> >  drivers/virtio/virtio_ring.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c5310eaf8b46..da1150d127c2 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1461,7 +1461,7 @@ static inline int virtqueue_add_packed(struct 
> > virtqueue *_vq,
> > }
> > }
> >
> > -   if (i < head)
> > +   if (i <= head)
> > vq->packed.avail_wrap_counter ^= 1;
> 
> Would it be better to move the flipping to the place where we flip
> avail_used_flags?

I think I prefer this patch for stable, refactoring can
be done on top.

> if ((unlikely(++i >= vq->packed.vring.num))) {
> i = 0;
> vq->packed.avail_used_flags ^=
> 1 << VRING_PACKED_DESC_F_AVAIL |
> 1 << VRING_PACKED_DESC_F_USED;
> }
> 
> Thanks
> 
> >
> > /* We're using some buffers from the free list. */
> > --
> > 2.41.0.640.ga95def55d0-goog
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_ring: fix avail_wrap_counter in virtqueue_add_packed

2023-08-07 Thread Jason Wang
On Tue, Aug 8, 2023 at 1:59 PM Michael S. Tsirkin  wrote:
>
> On Tue, Aug 08, 2023 at 01:43:02PM +0800, Jason Wang wrote:
> > On Tue, Aug 8, 2023 at 1:11 PM Yuan Yao  wrote:
> > >
> > > In current packed virtqueue implementation, the avail_wrap_counter won't
> > > flip, in the case when the driver supplies a descriptor chain with a
> > > length equals to the queue size; total_sg == vq->packed.vring.num.
> > >
> > > Let’s assume the following situation:
> > > vq->packed.vring.num=4
> > > vq->packed.next_avail_idx: 1
> > > vq->packed.avail_wrap_counter: 0
> > >
> > > Then the driver adds a descriptor chain containing 4 descriptors.
> > >
> > > We expect the following result with avail_wrap_counter flipped:
> > > vq->packed.next_avail_idx: 1
> > > vq->packed.avail_wrap_counter: 1
> > >
> > > But, the current implementation gives the following result:
> > > vq->packed.next_avail_idx: 1
> > > vq->packed.avail_wrap_counter: 0
> > >
> > > To reproduce the bug, you can set a packed queue size as small as
> > > possible, so that the driver is more likely to provide a descriptor
> > > chain with a length equal to the packed queue size. For example, in
> > > qemu run following commands:
> > > sudo qemu-system-x86_64 \
> > > -enable-kvm \
> > > -nographic \
> > > -kernel "path/to/kernel_image" \
> > > -m 1G \
> > > -drive file="path/to/rootfs",if=none,id=disk \
> > > -device virtio-blk,drive=disk \
> > > -drive file="path/to/disk_image",if=none,id=rwdisk \
> > > -device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
> > > indirect_desc=off \
> > > -append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
> > >
> > > Inside the VM, create a directory and mount the rwdisk device on it. The
> > > rwdisk will hang and mount operation will not complete.
> > >
> > > This commit fixes the wrap counter error by flipping the
> > > packed.avail_wrap_counter, when start of descriptor chain equals to the
> > > end of descriptor chain (head == i).
> > >
> > > Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support")
> > > Signed-off-by: Yuan Yao 
> > > ---
> > >
> > >  drivers/virtio/virtio_ring.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index c5310eaf8b46..da1150d127c2 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -1461,7 +1461,7 @@ static inline int virtqueue_add_packed(struct 
> > > virtqueue *_vq,
> > > }
> > > }
> > >
> > > -   if (i < head)
> > > +   if (i <= head)
> > > vq->packed.avail_wrap_counter ^= 1;
> >
> > Would it be better to move the flipping to the place where we flip
> > avail_used_flags?
>
> I think I prefer this patch for stable, refactoring can
> be done on top.

Ok. So

Acked-by: Jason Wang 

Thanks

>
> > if ((unlikely(++i >= vq->packed.vring.num))) {
> > i = 0;
> > vq->packed.avail_used_flags ^=
> > 1 << VRING_PACKED_DESC_F_AVAIL |
> > 1 << VRING_PACKED_DESC_F_USED;
> > }
> >
> > Thanks
> >
> > >
> > > /* We're using some buffers from the free list. */
> > > --
> > > 2.41.0.640.ga95def55d0-goog
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_ring: fix avail_wrap_counter in virtqueue_add_packed

2023-08-07 Thread Jason Wang
On Tue, Aug 8, 2023 at 1:59 PM Yuan Yao  wrote:
>
> Since there is a check for 0-length chain in code;BUG_ON(total_sg == 0), we 
> won’t get a 0-length chain in practice. So, I think use (i <= head) makes the 
> commit as small as possible.

Ok, offered ack in other mail.

Thanks

>
>
> Best regards
>
>
>
> On Tue, Aug 8, 2023 at 2:43 PM Jason Wang  wrote:
>>
>> On Tue, Aug 8, 2023 at 1:11 PM Yuan Yao  wrote:
>> >
>> > In current packed virtqueue implementation, the avail_wrap_counter won't
>> > flip, in the case when the driver supplies a descriptor chain with a
>> > length equals to the queue size; total_sg == vq->packed.vring.num.
>> >
>> > Let’s assume the following situation:
>> > vq->packed.vring.num=4
>> > vq->packed.next_avail_idx: 1
>> > vq->packed.avail_wrap_counter: 0
>> >
>> > Then the driver adds a descriptor chain containing 4 descriptors.
>> >
>> > We expect the following result with avail_wrap_counter flipped:
>> > vq->packed.next_avail_idx: 1
>> > vq->packed.avail_wrap_counter: 1
>> >
>> > But, the current implementation gives the following result:
>> > vq->packed.next_avail_idx: 1
>> > vq->packed.avail_wrap_counter: 0
>> >
>> > To reproduce the bug, you can set a packed queue size as small as
>> > possible, so that the driver is more likely to provide a descriptor
>> > chain with a length equal to the packed queue size. For example, in
>> > qemu run following commands:
>> > sudo qemu-system-x86_64 \
>> > -enable-kvm \
>> > -nographic \
>> > -kernel "path/to/kernel_image" \
>> > -m 1G \
>> > -drive file="path/to/rootfs",if=none,id=disk \
>> > -device virtio-blk,drive=disk \
>> > -drive file="path/to/disk_image",if=none,id=rwdisk \
>> > -device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
>> > indirect_desc=off \
>> > -append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
>> >
>> > Inside the VM, create a directory and mount the rwdisk device on it. The
>> > rwdisk will hang and mount operation will not complete.
>> >
>> > This commit fixes the wrap counter error by flipping the
>> > packed.avail_wrap_counter, when start of descriptor chain equals to the
>> > end of descriptor chain (head == i).
>> >
>> > Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support")
>> > Signed-off-by: Yuan Yao 
>> > ---
>> >
>> >  drivers/virtio/virtio_ring.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> > index c5310eaf8b46..da1150d127c2 100644
>> > --- a/drivers/virtio/virtio_ring.c
>> > +++ b/drivers/virtio/virtio_ring.c
>> > @@ -1461,7 +1461,7 @@ static inline int virtqueue_add_packed(struct 
>> > virtqueue *_vq,
>> > }
>> > }
>> >
>> > -   if (i < head)
>> > +   if (i <= head)
>> > vq->packed.avail_wrap_counter ^= 1;
>>
>> Would it be better to move the flipping to the place where we flip
>> avail_used_flags?
>>
>> if ((unlikely(++i >= vq->packed.vring.num))) {
>> i = 0;
>> vq->packed.avail_used_flags ^=
>> 1 << VRING_PACKED_DESC_F_AVAIL |
>> 1 << VRING_PACKED_DESC_F_USED;
>> }
>>
>> Thanks
>>
>> >
>> > /* We're using some buffers from the free list. */
>> > --
>> > 2.41.0.640.ga95def55d0-goog
>> >
>>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization