Re: [PATCH v2 2/2] kunit: enable hardware acceleration when available

2024-11-05 Thread Kristina Martsenko
On 02/11/2024 12:09, Tamir Duberstein wrote:
> Use KVM or HVF if supported by the QEMU binary and available on the
> system.
> 
> This produces a nice improvement on my Apple M3 Pro running macOS 14.7:
> 
> Before:
> ./tools/testing/kunit/kunit.py exec --arch arm64
> [HH:MM:SS] Elapsed time: 10.145s
> 
> After:
> ./tools/testing/kunit/kunit.py exec --arch arm64
> [HH:MM:SS] Elapsed time: 1.773s
> 
> Signed-off-by: Tamir Duberstein 
> ---
>  tools/testing/kunit/kunit_kernel.py   | 3 +++
>  tools/testing/kunit/qemu_configs/arm64.py | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/kunit/kunit_kernel.py 
> b/tools/testing/kunit/kunit_kernel.py
> index 
> 61931c4926fd6645f2c62dd13f9842a432ec4167..3146acb884ecf0bcff94d5938535aabd4486fe82
>  100644
> --- a/tools/testing/kunit/kunit_kernel.py
> +++ b/tools/testing/kunit/kunit_kernel.py
> @@ -123,6 +123,9 @@ class 
> LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
>   '-append', ' '.join(params + 
> [self._kernel_command_line]),
>   '-no-reboot',
>   '-nographic',
> + '-accel', 'kvm',
> + '-accel', 'hvf',
> + '-accel', 'tcg',
>   '-serial', self._serial] + 
> self._extra_qemu_params
>   # Note: shlex.join() does what we want, but requires python 
> 3.8+.
>   print('Running tests with:\n$', ' '.join(shlex.quote(arg) for 
> arg in qemu_command))
> diff --git a/tools/testing/kunit/qemu_configs/arm64.py 
> b/tools/testing/kunit/qemu_configs/arm64.py
> index 
> d3ff27024755411441f910799be30399295c9541..5c44d3a87e6dd2cd6b086138186a277a1473585b
>  100644
> --- a/tools/testing/kunit/qemu_configs/arm64.py
> +++ b/tools/testing/kunit/qemu_configs/arm64.py
> @@ -9,4 +9,4 @@ CONFIG_SERIAL_AMBA_PL011_CONSOLE=y''',
>  qemu_arch='aarch64',
>  kernel_path='arch/arm64/boot/Image.gz',
>  kernel_command_line='console=ttyAMA0',
> -extra_qemu_params=['-machine', 'virt', '-cpu', 
> 'max,pauth-impdef=on'])
> +extra_qemu_params=['-machine', 'virt', '-cpu', 
> 'max'])

Would it be possible to keep 'pauth-impdef=on' for TCG emulation? Otherwise
performance regresses by about 20%.

Before this patch:
./tools/testing/kunit/kunit.py exec --arch=arm64 --cross_compile=aarch64-linux-
[11:03:38] Elapsed time: 15.494s

After this patch:
./tools/testing/kunit/kunit.py exec --arch=arm64 --cross_compile=aarch64-linux-
[11:10:47] Elapsed time: 19.099s

Thanks,
Kristina




Re: [PATCH RESEND v13 00/12] Initial Marvell PXA1908 support

2024-11-05 Thread Rob Herring (Arm)


On Mon, 04 Nov 2024 17:37:02 +0100, Duje Mihanović wrote:
> Hello,
> 
> This series adds initial support for the Marvell PXA1908 SoC and
> "samsung,coreprimevelte", a smartphone using the SoC.
> 
> USB works and the phone can boot a rootfs from an SD card, but there are
> some warnings in the dmesg:
> 
> During SMP initialization:
> [0.006519] CPU features: SANITY CHECK: Unexpected variation in 
> SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU1: 0x00
> [0.006542] CPU features: Unsupported CPU feature variation detected.
> [0.006589] CPU1: Booted secondary processor 0x01 [0x410fd032]
> [0.010710] Detected VIPT I-cache on CPU2
> [0.010716] CPU features: SANITY CHECK: Unexpected variation in 
> SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU2: 0x00
> [0.010758] CPU2: Booted secondary processor 0x02 [0x410fd032]
> [0.014849] Detected VIPT I-cache on CPU3
> [0.014855] CPU features: SANITY CHECK: Unexpected variation in 
> SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU3: 0x00
> [0.014895] CPU3: Booted secondary processor 0x03 [0x410fd032]
> 
> SMMU probing fails:
> [0.101798] arm-smmu c001.iommu: probing hardware configuration...
> [0.101809] arm-smmu c001.iommu: SMMUv1 with:
> [0.101816] arm-smmu c001.iommu: no translation support!
> 
> A 3.14 based Marvell tree is available on GitHub
> acorn-marvell/brillo_pxa_kernel, and a Samsung one on GitHub
> CoderCharmander/g361f-kernel.
> 
> Andreas Färber attempted to upstream support for this SoC in 2017:
> https://lore.kernel.org/lkml/20170222022929.10540-1-afaer...@suse.de/
> 
> Signed-off-by: Duje Mihanović 
> 
> Changes in v13:
> - Better describe the hardware in bindings/arm commit message
> - Rebase on v6.12-rc1
> - Link to v12: 
> https://lore.kernel.org/r/20240823-pxa1908-lkml-v12-0-cc3ada51b...@skole.hr
> 
> Changes in v12:
> - Rebase on v6.11-rc4
> - Fix schmitt properties in accordance with 78d8815031fb ("dt-bindings: 
> pinctrl: pinctrl-single: fix schmitt related properties")
> - Drop a few redundant includes in clock drivers
> - Link to v11: 
> https://lore.kernel.org/r/20240730-pxa1908-lkml-v11-0-21dbb3e28...@skole.hr
> 
> Changes in v11:
> - Rebase on v6.11-rc1 (conflict with DTS Makefile), no changes
> - Link to v10: 
> https://lore.kernel.org/r/20240424-pxa1908-lkml-v10-0-36cdfb584...@skole.hr
> 
> Changes in v10:
> - Update trailers
> - Rebase on v6.9-rc5
> - Clock driver changes:
>   - Add a couple of forgotten clocks in APBC
> - The clocks are thermal_clk, ipc_clk, ssp0_clk, ssp2_clk and swjtag
> - The IDs and register offsets were already present, but I forgot to
>   actually register them
>   - Split each controller block into own file
>   - Drop unneeded -of in clock driver filenames
>   - Simplify struct pxa1908_clk_unit
>   - Convert to platform driver
>   - Add module metadata
> - DTS changes:
>   - Properly name pinctrl nodes
>   - Drop pinctrl #size-cells, #address-cells, ranges and #gpio-size-cells
>   - Fix pinctrl input-schmitt configuration
> - Link to v9: 
> https://lore.kernel.org/20240402-pxa1908-lkml-v9-0-25a003e83...@skole.hr
> 
> Changes in v9:
> - Update trailers and rebase on v6.9-rc2, no changes
> - Link to v8: 
> https://lore.kernel.org/20240110-pxa1908-lkml-v8-0-fea768a59...@skole.hr
> 
> Changes in v8:
> - Drop SSPA patch
> - Drop broken-cd from eMMC node
> - Specify S-Boot hardcoded initramfs location in device tree
> - Add ARM PMU node
> - Correct inverted modem memory base and size
> - Update trailers
> - Rebase on next-20240110
> - Link to v7: 
> https://lore.kernel.org/20231102-pxa1908-lkml-v7-0-cabb1a0cb...@skole.hr
>   and https://lore.kernel.org/20231102152033.5511-1-duje.mihano...@skole.hr
> 
> Changes in v7:
> - Suppress SND_MMP_SOC_SSPA on ARM64
> - Update trailers
> - Rebase on v6.6-rc7
> - Link to v6: 
> https://lore.kernel.org/r/20231010-pxa1908-lkml-v6-0-b2fe09240...@skole.hr
> 
> Changes in v6:
> - Address maintainer comments:
>   - Add "marvell,pxa1908-padconf" binding to pinctrl-single driver
> - Drop GPIO patch as it's been pulled
> - Update trailers
> - Rebase on v6.6-rc5
> - Link to v5: 
> https://lore.kernel.org/r/20230812-pxa1908-lkml-v5-0-a5d51937e...@skole.hr
> 
> Changes in v5:
> - Address maintainer comments:
>   - Move *_NR_CLKS to clock driver from dt binding file
> - Allocate correct number of clocks for each block instead of blindly
>   allocating 50 for each
> - Link to v4: 
> https://lore.kernel.org/r/20230807-pxa1908-lkml-v4-0-cb387d73b...@skole.hr
> 
> Changes in v4:
> - Address maintainer comments:
>   - Relicense clock binding file to BSD-2
> - Add pinctrl-names to SD card node
> - Add vgic registers to GIC node
> - Rebase on v6.5-rc5
> - Link to v3: 
> https://lore.kernel.org/r/20230804-pxa1908-lkml-v3-0-8e48fca37...@skole.hr
> 
> Changes in v3:
> - Address maintainer comments:
>   - Drop GPIO dynamic allocation patch
>   - Move clock register o

Re: [PATCH net-next v11 06/23] ovpn: introduce the ovpn_peer object

2024-11-05 Thread Sabrina Dubroca
2024-10-30, 21:47:58 +0100, Antonio Quartulli wrote:
> On 30/10/2024 17:37, Sabrina Dubroca wrote:
> > 2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
> > > +static void ovpn_peer_release(struct ovpn_peer *peer)
> > > +{
> > > + ovpn_bind_reset(peer, NULL);
> > > +
> > > + dst_cache_destroy(&peer->dst_cache);
> > 
> > Is it safe to destroy the cache at this time? In the same function, we
> > use rcu to free the peer, but AFAICT the dst_cache will be freed
> > immediately:
> > 
> > void dst_cache_destroy(struct dst_cache *dst_cache)
> > {
> > [...]
> > free_percpu(dst_cache->cache);
> > }
> > 
> > (probably no real issue because ovpn_udp_send_skb gets called while we
> > hold a reference to the peer?)
> 
> Right.
> That was my assumption: release happens on refcnt = 0 only, therefore no
> field should be in use anymore.
> Anything that may still be in use will have its own refcounter.

My worry is that code changes over time, assumptions are forgotten,
and we end up with code that was a bit odd but safe not being safe
anymore.

> > 
> > > + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
> > > + kfree_rcu(peer, rcu);
> > > +}
> > 
> > 
> > [...]
> > > +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
> > > +  enum ovpn_del_peer_reason reason)
> > > + __must_hold(&peer->ovpn->lock)
> > > +{
> > > + struct ovpn_peer *tmp;
> > > +
> > > + tmp = rcu_dereference_protected(peer->ovpn->peer,
> > > + lockdep_is_held(&peer->ovpn->lock));
> > > + if (tmp != peer) {
> > > + DEBUG_NET_WARN_ON_ONCE(1);
> > > + if (tmp)
> > > + ovpn_peer_put(tmp);
> > 
> > Does peer->ovpn->peer need to be set to NULL here as well? Or is it
> > going to survive this _put?
> 
> First of all consider that this is truly something that we don't expect to
> happen (hence the WARN_ON).
> If this is happening it's because we are trying to delete a peer that is not
> the one we are connected to (unexplainable scenario in p2p mode).
>
> Still, should we hit this case (I truly can't see how), I'd say "leave
> everything as is - maybe this call was just a mistake".

Yeah, true, let's leave it. Thanks.

-- 
Sabrina




[PATCH v3 6/9] vhost: Add kthread support in function vhost_worker_destroy()

2024-11-05 Thread Cindy Lu
The function vhost_worker_destroy() will use struct vhost_task_fn and
selects the different mode based on the value of inherit_owner.

Signed-off-by: Cindy Lu 
---
 drivers/vhost/vhost.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8b7ddfb33c61..c17dc01febcc 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -718,12 +718,14 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 static void vhost_worker_destroy(struct vhost_dev *dev,
 struct vhost_worker *worker)
 {
-   if (!worker)
+   if (!worker && !worker->fn)
return;
 
WARN_ON(!llist_empty(&worker->work_list));
xa_erase(&dev->worker_xa, worker->id);
-   vhost_task_stop(worker->vtsk);
+   worker->fn->stop(dev->inherit_owner ? (void *)worker->vtsk :
+ (void *)worker->task);
+   kfree(worker->fn);
kfree(worker);
 }
 
-- 
2.45.0




[PATCH v3 4/9] vhost: Add kthread support in function vhost_worker_create

2024-11-05 Thread Cindy Lu
Restored the previous functions kthread_wakeup and kthread_stop.
Also add a new structure, vhost_task_fn. The function vhost_worker_create
Will initializes this structure based on the value of inherit_owner.

Signed-off-by: Cindy Lu 
---
 drivers/vhost/vhost.c | 71 ---
 drivers/vhost/vhost.h |  6 
 2 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e40cef3a1fa5..603b146fccc1 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -741,43 +741,92 @@ static void vhost_workers_free(struct vhost_dev *dev)
xa_destroy(&dev->worker_xa);
 }
 
+static int vhost_task_wakeup_fn(void *vtsk)
+{
+   vhost_task_wake((struct vhost_task *)vtsk);
+   return 0;
+}
+static int vhost_kthread_wakeup_fn(void *p)
+{
+   return wake_up_process((struct task_struct *)p);
+}
+static int vhost_task_stop_fn(void *vtsk)
+{
+   vhost_task_stop((struct vhost_task *)vtsk);
+   return 0;
+}
+static int vhost_kthread_stop_fn(void *k)
+{
+   return kthread_stop((struct task_struct *)k);
+}
+
 static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
 {
struct vhost_worker *worker;
-   struct vhost_task *vtsk;
+   struct vhost_task *vtsk = NULL;
+   struct task_struct *task = NULL;
char name[TASK_COMM_LEN];
int ret;
u32 id;
 
+   /* Allocate resources for the worker */
worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
if (!worker)
return NULL;
 
+   worker->fn = kzalloc(sizeof(struct vhost_task_fn), GFP_KERNEL_ACCOUNT);
+   if (!worker->fn) {
+   kfree(worker);
+   return NULL;
+   }
+
worker->dev = dev;
snprintf(name, sizeof(name), "vhost-%d", current->pid);
 
-   vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed,
-worker, name);
-   if (!vtsk)
-   goto free_worker;
-
mutex_init(&worker->mutex);
init_llist_head(&worker->work_list);
worker->kcov_handle = kcov_common_handle();
-   worker->vtsk = vtsk;
 
-   vhost_task_start(vtsk);
+   if (dev->inherit_owner) {
+   /* Create and start a vhost task */
+   vtsk = vhost_task_create(vhost_run_work_list,
+vhost_worker_killed, worker, name);
+   if (!vtsk)
+   goto free_worker;
+
+   worker->vtsk = vtsk;
+   worker->fn->wakeup = vhost_task_wakeup_fn;
+   worker->fn->stop = vhost_task_stop_fn;
+
+   vhost_task_start(vtsk);
+   } else {
+   /* Create and start a kernel thread */
+   task = kthread_create(vhost_run_work_kthread_list, worker,
+ "vhost-%d", current->pid);
+   if (IS_ERR(task)) {
+   ret = PTR_ERR(task);
+   goto free_worker;
+   }
+   worker->task = task;
+   worker->fn->wakeup = vhost_kthread_wakeup_fn;
+   worker->fn->stop = vhost_kthread_stop_fn;
+
+   wake_up_process(task);
+   /* Attach to the vhost cgroup */
+   ret = vhost_attach_cgroups(dev);
+   if (ret)
+   goto stop_worker;
+   }
 
ret = xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL);
if (ret < 0)
goto stop_worker;
worker->id = id;
-
return worker;
-
 stop_worker:
-   vhost_task_stop(vtsk);
+   worker->fn->stop(dev->inherit_owner ? (void *)vtsk : (void *)task);
 free_worker:
+   kfree(worker->fn);
kfree(worker);
return NULL;
 }
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index c650c4506c70..ebababa4e340 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -25,8 +25,13 @@ struct vhost_work {
vhost_work_fn_t fn;
unsigned long   flags;
 };
+struct vhost_task_fn {
+   int (*wakeup)(void *task);
+   int (*stop)(void *task);
+};
 
 struct vhost_worker {
+   struct task_struct  *task;
struct vhost_task   *vtsk;
struct vhost_dev*dev;
/* Used to serialize device wide flushing with worker swapping. */
@@ -36,6 +41,7 @@ struct vhost_worker {
u32 id;
int attachment_cnt;
boolkilled;
+   struct vhost_task_fn *fn;
 };
 
 /* Poll a file (eventfd or socket) */
-- 
2.45.0




[PATCH v3 9/9] vhost: Expose the modparam inherit_owner_default

2024-11-05 Thread Cindy Lu
Expose the inherit_owner_default modparam by module_param().

Signed-off-by: Cindy Lu 
---
 drivers/vhost/vhost.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 70c793b63905..1a4ccf4f7316 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -43,6 +43,9 @@ module_param(max_iotlb_entries, int, 0444);
 MODULE_PARM_DESC(max_iotlb_entries,
"Maximum number of iotlb entries. (default: 2048)");
 static bool inherit_owner_default = true;
+module_param(inherit_owner_default, bool, 0444);
+MODULE_PARM_DESC(inherit_owner_default,
+"Set vhost_task mode as the default(default: Y)");
 
 enum {
VHOST_MEMORY_F_LOG = 0x1,
-- 
2.45.0




Re: [PATCH v2 2/2] kunit: enable hardware acceleration when available

2024-11-05 Thread David Gow
On Sat, 2 Nov 2024 at 20:10, Tamir Duberstein  wrote:
>
> Use KVM or HVF if supported by the QEMU binary and available on the
> system.
>
> This produces a nice improvement on my Apple M3 Pro running macOS 14.7:
>
> Before:
> ./tools/testing/kunit/kunit.py exec --arch arm64
> [HH:MM:SS] Elapsed time: 10.145s
>
> After:
> ./tools/testing/kunit/kunit.py exec --arch arm64
> [HH:MM:SS] Elapsed time: 1.773s
>
> Signed-off-by: Tamir Duberstein 
> ---

Thanks a lot.

I finally managed to dig up an arm64 machine and test this, and I can
reproduce the performance improvement.

Reviewed-by: David Gow 

Cheers,
-- David

>  tools/testing/kunit/kunit_kernel.py   | 3 +++
>  tools/testing/kunit/qemu_configs/arm64.py | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/kunit/kunit_kernel.py 
> b/tools/testing/kunit/kunit_kernel.py
> index 
> 61931c4926fd6645f2c62dd13f9842a432ec4167..3146acb884ecf0bcff94d5938535aabd4486fe82
>  100644
> --- a/tools/testing/kunit/kunit_kernel.py
> +++ b/tools/testing/kunit/kunit_kernel.py
> @@ -123,6 +123,9 @@ class 
> LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
> '-append', ' '.join(params + 
> [self._kernel_command_line]),
> '-no-reboot',
> '-nographic',
> +   '-accel', 'kvm',
> +   '-accel', 'hvf',
> +   '-accel', 'tcg',
> '-serial', self._serial] + 
> self._extra_qemu_params
> # Note: shlex.join() does what we want, but requires python 
> 3.8+.
> print('Running tests with:\n$', ' '.join(shlex.quote(arg) for 
> arg in qemu_command))
> diff --git a/tools/testing/kunit/qemu_configs/arm64.py 
> b/tools/testing/kunit/qemu_configs/arm64.py
> index 
> d3ff27024755411441f910799be30399295c9541..5c44d3a87e6dd2cd6b086138186a277a1473585b
>  100644
> --- a/tools/testing/kunit/qemu_configs/arm64.py
> +++ b/tools/testing/kunit/qemu_configs/arm64.py
> @@ -9,4 +9,4 @@ CONFIG_SERIAL_AMBA_PL011_CONSOLE=y''',
>qemu_arch='aarch64',
>kernel_path='arch/arm64/boot/Image.gz',
>kernel_command_line='console=ttyAMA0',
> -  extra_qemu_params=['-machine', 'virt', '-cpu', 
> 'max,pauth-impdef=on'])
> +  extra_qemu_params=['-machine', 'virt', '-cpu', 
> 'max'])
>
> --
> 2.47.0
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v2 1/2] kunit: add fallback for os.sched_getaffinity

2024-11-05 Thread David Gow
On Sat, 2 Nov 2024 at 20:10, Tamir Duberstein  wrote:
>
> Python 3.13 added os.process_cpu_count as a cross-platform alternative
> for the Linux-only os.sched_getaffinity. Use it when it's available and
> provide a fallback when it's not.
>
> This allows kunit to run on macOS.
>
> Signed-off-by: Tamir Duberstein 
> ---

Looks plausible enough to me. Thanks very much!

Reviewed-by: David Gow 

Cheers,
-- David

>  tools/testing/kunit/kunit.py | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
> index 
> bc74088c458aee20b1a21fdeb9f3cb01ab20fec4..3a8cbb868ac559f68d047e38be92f7c64a3314ea
>  100755
> --- a/tools/testing/kunit/kunit.py
> +++ b/tools/testing/kunit/kunit.py
> @@ -303,7 +303,16 @@ def massage_argv(argv: Sequence[str]) -> Sequence[str]:
> return list(map(massage_arg, argv))
>
>  def get_default_jobs() -> int:
> -   return len(os.sched_getaffinity(0))
> +   if sys.version_info >= (3, 13):
> +   if (ncpu := os.process_cpu_count()) is not None:
> +   return ncpu
> +   raise RuntimeError("os.process_cpu_count() returned None")
> +# See 
> https://github.com/python/cpython/blob/b61fece/Lib/os.py#L1175-L1186.
> +   if sys.platform != "darwin":
> +   return len(os.sched_getaffinity(0))
> +   if (ncpu := os.cpu_count()) is not None:
> +   return ncpu
> +   raise RuntimeError("os.cpu_count() returned None")
>
>  def add_common_opts(parser: argparse.ArgumentParser) -> None:
> parser.add_argument('--build_dir',
>
> --
> 2.47.0
>


smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH v3 8/9] vhost_scsi: Add check for inherit_owner status

2024-11-05 Thread Cindy Lu
The vhost_scsi VHOST_NEW_WORKER requires the inherit_owner
setting to be true. So we need to implement a check for this.

Signed-off-by: Cindy Lu 
---
 drivers/vhost/scsi.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 006ffacf1c56..05290298b5ab 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -2083,6 +2083,11 @@ vhost_scsi_ioctl(struct file *f,
return -EFAULT;
return vhost_scsi_set_features(vs, features);
case VHOST_NEW_WORKER:
+   /*vhost-scsi VHOST_NEW_WORKER requires inherit_owner to be 
true*/
+   if (vs->dev.inherit_owner != true)
+   return -EFAULT;
+
+   fallthrough;
case VHOST_FREE_WORKER:
case VHOST_ATTACH_VRING_WORKER:
case VHOST_GET_VRING_WORKER:
-- 
2.45.0




Re: [PATCH v3 7/9] vhost: Add new UAPI to support change to task mode

2024-11-05 Thread Jason Wang
On Tue, Nov 5, 2024 at 3:28 PM Cindy Lu  wrote:
>
> Add a new UAPI to enable setting the vhost device to task mode.
> The userspace application can use VHOST_SET_INHERIT_FROM_OWNER
> to configure the mode if necessary.
> This setting must be applied before VHOST_SET_OWNER, as the worker
> will be created in the VHOST_SET_OWNER function
>
> Signed-off-by: Cindy Lu 
> ---
>  drivers/vhost/vhost.c  | 15 ++-
>  include/uapi/linux/vhost.h |  2 ++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c17dc01febcc..70c793b63905 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2274,8 +2274,9 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
> ioctl, void __user *argp)
>  {
> struct eventfd_ctx *ctx;
> u64 p;
> -   long r;
> +   long r = 0;
> int i, fd;
> +   bool inherit_owner;
>
> /* If you are not the owner, you can become one */
> if (ioctl == VHOST_SET_OWNER) {
> @@ -2332,6 +2333,18 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
> ioctl, void __user *argp)
> if (ctx)
> eventfd_ctx_put(ctx);
> break;
> +   case VHOST_SET_INHERIT_FROM_OWNER:
> +   /*inherit_owner can only be modified before owner is set*/
> +   if (vhost_dev_has_owner(d))
> +   break;
> +
> +   if (copy_from_user(&inherit_owner, argp,
> +  sizeof(inherit_owner))) {
> +   r = -EFAULT;
> +   break;
> +   }
> +   d->inherit_owner = inherit_owner;
> +   break;

Is there any case that we need to switch from owner back to kthread?
If not I would choose a more simplified API that is just
VHOST_INHERIT_OWNER.

> default:
> r = -ENOIOCTLCMD;
> break;
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index b95dd84eef2d..1e192038633d 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -235,4 +235,6 @@
>   */
>  #define VHOST_VDPA_GET_VRING_SIZE  _IOWR(VHOST_VIRTIO, 0x82,   \
>   struct vhost_vring_state)
> +
> +#define VHOST_SET_INHERIT_FROM_OWNER _IOW(VHOST_VIRTIO, 0x83, bool)
>  #endif
> --
> 2.45.0

Thanks

>




Testing Quality Call notes - 2024-10-31

2024-11-05 Thread Laura Nao
Hello,

KernelCI is hosting a bi-weekly call on Thursday to discuss improvements
to existing upstream tests, the development of new tests to increase 
kernel testing coverage, and the enablement of these tests in KernelCI. 
In recent months, we at Collabora have focused on various kernel areas,
assessing the tests already available upstream and contributing patches 
to make them easily runnable in CIs.

Below is a list of the tests we've been working on and their latest
status updates, as discussed in the last meeting held on 2024-10-31:

*Boot time test*

- Sent v2:
  https://lore.kernel.org/all/20241018101439.20849-1-laura@collabora.com/
- Changes in v2 driven by feedback received in the LPC session

*ACPI probe kselftest*

- RFCv2:
  https://lore.kernel.org/all/20240308144933.337107-1-laura@collabora.com/
- Related upstream discussion on probe status tracking through printks:
  
https://lore.kernel.org/lkml/dc7af563-530d-4e1f-bcbb-90bcfc2fe11a@stanley.mountain/

Please reply to this thread if you'd like to join the call or discuss
any of the topics further. We look forward to collaborating with the
community to improve upstream tests and expand coverage to more areas of 
interest within the kernel.

Best regards,

Laura Nao



Re: [PATCH net-next v2 4/4] selftests: hsr: Add test for VLAN

2024-11-05 Thread MD Danish Anwar



On 31/10/24 8:11 pm, Paolo Abeni wrote:
> On 10/24/24 12:30, MD Danish Anwar wrote:
>> @@ -183,9 +232,21 @@ trap cleanup_all_ns EXIT
>>  setup_hsr_interfaces 0
>>  do_complete_ping_test
>>  
>> +# Run VLAN Test
>> +if $vlan; then
>> +setup_vlan_interfaces
>> +hsr_vlan_ping
>> +fi
>> +
>>  setup_ns ns1 ns2 ns3
>>  
>>  setup_hsr_interfaces 1
>>  do_complete_ping_test
>>  
>> +# Run VLAN Test
>> +if $vlan; then
>> +setup_vlan_interfaces
>> +hsr_vlan_ping
>> +fi
> 
> The new tests should be enabled by default. Indeed ideally the test
> script should be able to run successfully on kernel not supporting such
> feature; you could cope with that looking for the hsr exposed feature
> and skipping the vlan test when the relevant feature is not present.
> 

Sure Paolo, I will make the new tests enabled by default. During the
test I will check the exposed feature `ethtool -k hsr1 | grep
"vlan-challenged"` and if vlan is not supported, skip the vlan test.

Below will be my API to run VLAN tests,

run_vlan_tests() {
vlan_challenged_hsr1=$(ip net exec "$ns1" ethtool -k hsr1 | grep
"vlan-challenged" | awk '{print $2}')
vlan_challenged_hsr2=$(ip net exec "$ns1" ethtool -k hsr2 | grep
"vlan-challenged" | awk '{print $2}')
vlan_challenged_hsr3=$(ip net exec "$ns1" ethtool -k hsr3 | grep
"vlan-challenged" | awk '{print $2}')

if [[ "$vlan_challenged_hsr1" = "off" || "$vlan_challenged_hsr2" =
"off" || "$vlan_challenged_hsr3" = "off" ]]; then
setup_vlan_interfaces
hsr_vlan_ping
else
echo "INFO: Not Running VLAN tests as the device does not 
support VLAN"
fi
}

I will call this function after the ping test.
Let me know if this looks okay to you.

Thanks for the review.

> Cheers,
> 
> Paolo
> 

-- 
Thanks and Regards,
Danish



Re: [syzbot] [lvs?] possible deadlock in start_sync_thread

2024-11-05 Thread Pablo Neira Ayuso
Hi,

I am Cc'ing SHARED MEMORY COMMUNICATIONS (SMC) SOCKETS maintainers.

Similar issue already reported by syzkaller here:

https://lore.kernel.org/netdev/ZyIgRmJUbnZpzXNV@calendula/T/#mf1f03a65108226102d8567c9fb6bab98c072444c

related to smc->clcsock_release_lock.

I think this is a false possible lockdep considers smc->clcsock_release_lock
is a lock of the same class sk_lock-AF_INET.

Could you please advise?

Thanks.

On Fri, Nov 01, 2024 at 05:20:27PM -0700, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:6c52d4da1c74 Merge tag 'for-linus' of git://git.kernel.org..
> git tree:   upstream
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=1288963058
> kernel config:  https://syzkaller.appspot.com/x/.config?x=672325e7ab17fdf7
> dashboard link: https://syzkaller.appspot.com/bug?extid=e929093395ec65f969c7
> compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
> 2.40
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1788e18798
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14a4f2a798
> 
> Downloadable assets:
> disk image: 
> https://storage.googleapis.com/syzbot-assets/70526f6a5c28/disk-6c52d4da.raw.xz
> vmlinux: 
> https://storage.googleapis.com/syzbot-assets/8ca3cd20d331/vmlinux-6c52d4da.xz
> kernel image: 
> https://storage.googleapis.com/syzbot-assets/9c4393fc9a08/bzImage-6c52d4da.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+e929093395ec65f96...@syzkaller.appspotmail.com
> 
> ==
> WARNING: possible circular locking dependency detected
> 6.12.0-rc5-syzkaller-00181-g6c52d4da1c74 #0 Not tainted
> --
> syz-executor158/5839 is trying to acquire lock:
> 8fcd3448 (rtnl_mutex){+.+.}-{3:3}, at: start_sync_thread+0xdc/0x2dc0 
> net/netfilter/ipvs/ip_vs_sync.c:1761
> 
> but task is already holding lock:
> 888034ac8aa8 (&smc->clcsock_release_lock){+.+.}-{3:3}, at: 
> smc_setsockopt+0x1c3/0xe50 net/smc/af_smc.c:3056
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&smc->clcsock_release_lock){+.+.}-{3:3}:
>lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
>__mutex_lock_common kernel/locking/mutex.c:608 [inline]
>__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
>smc_switch_to_fallback+0x35/0xdb0 net/smc/af_smc.c:902
>smc_sendmsg+0x11f/0x530 net/smc/af_smc.c:2771
>sock_sendmsg_nosec net/socket.c:729 [inline]
>__sock_sendmsg+0x221/0x270 net/socket.c:744
>__sys_sendto+0x39b/0x4f0 net/socket.c:2214
>__do_sys_sendto net/socket.c:2226 [inline]
>__se_sys_sendto net/socket.c: [inline]
>__x64_sys_sendto+0xde/0x100 net/socket.c:
>do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> -> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
>lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
>lock_sock_nested+0x48/0x100 net/core/sock.c:3611
>do_ip_setsockopt+0x1a2d/0x3cd0 net/ipv4/ip_sockglue.c:1078
>ip_setsockopt+0x63/0x100 net/ipv4/ip_sockglue.c:1417
>do_sock_setsockopt+0x3af/0x720 net/socket.c:2334
>__sys_setsockopt+0x1a2/0x250 net/socket.c:2357
>__do_sys_setsockopt net/socket.c:2366 [inline]
>__se_sys_setsockopt net/socket.c:2363 [inline]
>__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2363
>do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> -> #0 (rtnl_mutex){+.+.}-{3:3}:
>check_prev_add kernel/locking/lockdep.c:3161 [inline]
>check_prevs_add kernel/locking/lockdep.c:3280 [inline]
>validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
>__lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202
>lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
>__mutex_lock_common kernel/locking/mutex.c:608 [inline]
>__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
>start_sync_thread+0xdc/0x2dc0 net/netfilter/ipvs/ip_vs_sync.c:1761
>do_ip_vs_set_ctl+0x442/0x13d0 net/netfilter/ipvs/ip_vs_ctl.c:2732
>nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
>smc_setsockopt+0x275/0xe50 net/smc/af_smc.c:3064
>do_sock_setsockopt+0x3af/0x720 net/socket.c:2334
>__sys_setsockopt+0x1a2/0x250 net/socket.c:2357
>__do_sys_setsockopt net/socket.c:2366 [inline]
>__se_sys_setsockopt net/socket.c:2363 [inline]
>__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2363
>do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>do_syscall_64+0xf3/0x230 arch/

Re: [PATCH bpf-next 1/2] libbpf: Add missing per-arch include path

2024-11-05 Thread Kexy Biscuit

On 9/27/2024 9:13 PM, Björn Töpel wrote:

From: Björn Töpel 

libbpf does not include the per-arch tools include path, e.g.
tools/arch/riscv/include. Some architectures depend those files to
build properly.

Include tools/arch/$(SUBARCH)/include in the libbpf build.

Fixes: 6d74d178fe6e ("tools: Add riscv barrier implementation")
Signed-off-by: Björn Töpel 
---
  tools/lib/bpf/Makefile | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 1b22f0f37288..857a5f7b413d 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -61,7 +61,8 @@ ifndef VERBOSE
  endif
  
  INCLUDES = -I$(or $(OUTPUT),.) \

-  -I$(srctree)/tools/include -I$(srctree)/tools/include/uapi
+  -I$(srctree)/tools/include -I$(srctree)/tools/include/uapi \
+  -I$(srctree)/tools/arch/$(SRCARCH)/include
  
  export prefix libdir src obj
  


base-commit: db5ca265e3334b48c4e3fa07eef79e8bc578c430


This fixes building of bpf tools, thanks! You can add the following tags...

Reported-by: Andreas Schwab 
Closes: https://lore.kernel.org/all/mvmo74441tr@suse.de/
Tested-by: Kexy Biscuit 
--
Best Regards,
Kexy Biscuit



[syzbot] [acpi?] [nvdimm?] KASAN: vmalloc-out-of-bounds Read in acpi_nfit_ctl (2)

2024-11-05 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:2e1b3cc9d7f7 Merge tag 'arm-fixes-6.12-2' of git://git.ker..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12418e3058
kernel config:  https://syzkaller.appspot.com/x/.config?x=11254d3590b16717
dashboard link: https://syzkaller.appspot.com/bug?extid=7534f060ebda6b8b51b3
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12170f4058
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16418e3058

Downloadable assets:
disk image (non-bootable): 
https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-2e1b3cc9.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/2f2588b04ae9/vmlinux-2e1b3cc9.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/2c9324cf16df/bzImage-2e1b3cc9.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+7534f060ebda6b8b5...@syzkaller.appspotmail.com

==
BUG: KASAN: vmalloc-out-of-bounds in cmd_to_func drivers/acpi/nfit/core.c:416 
[inline]
BUG: KASAN: vmalloc-out-of-bounds in acpi_nfit_ctl+0x20e8/0x24a0 
drivers/acpi/nfit/core.c:459
Read of size 4 at addr c9e0e038 by task syz-executor229/5316

CPU: 0 UID: 0 PID: 5316 Comm: syz-executor229 Not tainted 
6.12.0-rc6-syzkaller-00077-g2e1b3cc9d7f7 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 cmd_to_func drivers/acpi/nfit/core.c:416 [inline]
 acpi_nfit_ctl+0x20e8/0x24a0 drivers/acpi/nfit/core.c:459
 __nd_ioctl drivers/nvdimm/bus.c:1186 [inline]
 nd_ioctl+0x1844/0x1fd0 drivers/nvdimm/bus.c:1264
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:907 [inline]
 __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb399ccda79
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7ffcf6cb8d88 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX:  RCX: 7fb399ccda79
RDX: 2180 RSI: c008640a RDI: 0003
RBP: 7fb399d405f0 R08: 0006 R09: 0006
R10: 0006 R11: 0246 R12: 0001
R13: 431bde82d7b634db R14: 0001 R15: 0001
 

The buggy address belongs to the virtual mapping at
 [c9e0e000, c9e1) created by:
 __nd_ioctl drivers/nvdimm/bus.c:1169 [inline]
 nd_ioctl+0x1594/0x1fd0 drivers/nvdimm/bus.c:1264

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping: index:0x8880401b9a80 
pfn:0x401b9
flags: 0x4fff000(node=1|zone=1|lastcpupid=0x7ff)
raw: 04fff000  dead0122 
raw: 8880401b9a80  0001 
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 
0x2cc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), pid 5316, tgid 5316 
(syz-executor229), ts 69039468240, free_ts 68666765389
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
 prep_new_page mm/page_alloc.c:1545 [inline]
 get_page_from_freelist+0x303f/0x3190 mm/page_alloc.c:3457
 __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733
 alloc_pages_bulk_noprof+0x729/0xd40 mm/page_alloc.c:4681
 alloc_pages_bulk_array_mempolicy_noprof+0x8ea/0x1600 mm/mempolicy.c:2556
 vm_area_alloc_pages mm/vmalloc.c:3542 [inline]
 __vmalloc_area_node mm/vmalloc.c:3646 [inline]
 __vmalloc_node_range_noprof+0x752/0x13f0 mm/vmalloc.c:3828
 __vmalloc_node_noprof mm/vmalloc.c:3893 [inline]
 vmalloc_noprof+0x79/0x90 mm/vmalloc.c:3926
 __nd_ioctl drivers/nvdimm/bus.c:1169 [inline]
 nd_ioctl+0x1594/0x1fd0 drivers/nvdimm/bus.c:1264
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:907 [inline]
 __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5312 tgid 5312 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1108 [inline]
 free_unref_page+0xcfb/0xf20 mm/page_alloc.c:2638
 __folio_put+

Re: [PATCH v5 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

2024-11-05 Thread Reinette Chatre
Hi Maciej,

On 10/29/24 6:00 AM, Maciej Wieczor-Retman wrote:
> Sub-NUMA Cluster divides CPUs sharing an L3 cache into separate NUMA
> nodes. Systems may support splitting into either two, three or four
> nodes.
> 
> When SNC mode is enabled the effective amount of L3 cache available
> for allocation is divided by the number of nodes per L3.
> 
> Detect which SNC mode is active by comparing the number of CPUs
> that share a cache with CPU0, with the number of CPUs on node0.
> 
> Co-developed-by: Tony Luck 
> Signed-off-by: Tony Luck 
> Signed-off-by: Maciej Wieczor-Retman 
> ---

...

> diff --git a/tools/testing/selftests/resctrl/resctrl.h 
> b/tools/testing/selftests/resctrl/resctrl.h
> index 2dda56084588..851b37c9c38a 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -43,6 +44,8 @@
>  
>  #define DEFAULT_SPAN (250 * MB)
>  
> +#define MAX_SNC  4
> +
>  /*
>   * user_params:  User supplied parameters
>   * @cpu: CPU number to which the benchmark will be bound to
> @@ -120,6 +123,7 @@ extern volatile int *value_sink;
>  
>  extern char llc_occup_path[1024];
>  
> +int snc_nodes_per_l3_cache(void);
>  int get_vendor(void);
>  bool check_resctrlfs_support(void);
>  int filter_dmesg(void);
> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c 
> b/tools/testing/selftests/resctrl/resctrl_tests.c
> index ecbb7605a981..4b84d6199a36 100644
> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> @@ -118,11 +118,17 @@ static bool test_vendor_specific_check(const struct 
> resctrl_test *test)
>  
>  static void run_single_test(const struct resctrl_test *test, const struct 
> user_params *uparams)
>  {
> - int ret;
> + int ret, snc_mode;
>  
>   if (test->disabled)
>   return;
>  
> + snc_mode = snc_nodes_per_l3_cache();
> + if (snc_mode > 1)
> + ksft_print_msg("SNC-%d mode discovered.\n", snc_mode);
> + else if (snc_unreliable)
> + ksft_print_msg("SNC detection unreliable due to offline CPUs. 
> Test results may not be accurate if SNC enabled.\n");

As I understand none of the tests can be considered reliable if SNC detection 
is unreliable
so skipping the test can be done here in a central spot without duplicating it 
in each test.

> +
>   if (!test_vendor_specific_check(test)) {
>   ksft_test_result_skip("Hardware does not support %s\n", 
> test->name);
>   return;
> diff --git a/tools/testing/selftests/resctrl/resctrlfs.c 
> b/tools/testing/selftests/resctrl/resctrlfs.c
> index 250c320349a7..d6384f404d95 100644
> --- a/tools/testing/selftests/resctrl/resctrlfs.c
> +++ b/tools/testing/selftests/resctrl/resctrlfs.c
> @@ -13,6 +13,8 @@
>  
>  #include "resctrl.h"
>  
> +int snc_unreliable;
> +
>  static int find_resctrl_mount(char *buffer)
>  {
>   FILE *mounts;
> @@ -156,6 +158,93 @@ int get_domain_id(const char *resource, int cpu_no, int 
> *domain_id)
>   return 0;
>  }
>  
> +/*
> + * Count number of CPUs in a /sys bitmap
> + */
> +static unsigned int count_sys_bitmap_bits(char *name)
> +{
> + FILE *fp = fopen(name, "r");
> + int count = 0, c;
> +
> + if (!fp)
> + return 0;
> +
> + while ((c = fgetc(fp)) != EOF) {
> + if (!isxdigit(c))
> + continue;
> + switch (c) {
> + case 'f':
> + count++;
> + case '7': case 'b': case 'd': case 'e':
> + count++;
> + case '3': case '5': case '6': case '9': case 'a': case 'c':
> + count++;
> + case '1': case '2': case '4': case '8':
> + count++;
> + }
> + }
> + fclose(fp);
> +
> + return count;
> +}
> +
> +static bool cpus_offline_empty(void)
> +{
> + char offline_cpus_str[64];
> + FILE *fp;
> +
> + fp = fopen("/sys/devices/system/cpu/offline", "r");
> + if (fscanf(fp, "%s", offline_cpus_str) < 0) {
> + if (!errno) {
> + fclose(fp);
> + return 1;
> + }
> + ksft_perror("Could not read /sys/devices/system/cpu/offline");
> + }
> +
> + fclose(fp);
> +
> + return 0;
> +}
> +
> +/*
> + * Detect SNC by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0.
> + * If some CPUs are offline the numbers may not be exact multiples of each
> + * other. Any offline CPUs on node0 will be also gone from shared_cpu_map of
> + * CPU0 but offline CPUs from other nodes will only make the cache_cpus value
> + * lower. Still try to get the ratio right by preventing the second 
> possibility.

Similar to v4 this "try to get the ratio right" is unnecessary because of the 
explicit
check for offline CPU

Re: [PATCH v5 2/2] selftests/resctrl: Adjust SNC support messages

2024-11-05 Thread Reinette Chatre
Hi Maciej,

On 10/29/24 6:00 AM, Maciej Wieczor-Retman wrote:
> Resctrl selftest prints a message on test failure that Sub-Numa
> Clustering (SNC) could be enabled and points the user to check their BIOS
> settings. No actual check is performed before printing that message so
> it is not very accurate in pinpointing a problem.
> 
> Figuring out if SNC is enabled is only one part of the problem, the
> others being whether the detected SNC mode is reliable and whether the
> kernel supports SNC in resctrl.

Starting to sound like previous patch ...

> 
> When there is SNC support for kernel's resctrl subsystem and SNC is
> enabled then sub node files are created for each node in the resctrlfs.
> The sub node files exist in each regular node's L3 monitoring directory.
> The reliable path to check for existence of sub node files is
> /sys/fs/resctrl/mon_data/mon_L3_00/mon_sub_L3_00.

... this is about previous patch ...

> 
> To check if SNC detection is reliable one can check the
> /sys/devices/system/cpu/offline file. If it's empty, it means all cores
> are operational and the ratio should be calculated correctly. If it has
> any contents, it means the detected SNC mode can't be trusted and should
> be disabled.

... also about previous patch ...

> 
> Add helpers for all operations mentioned above.

Not done in this patch.

> 
> Detect SNC mode once and let other tests inherit that information.

Not done in this patch.

> 
> Add messages to alert the user when SNC detection could return incorrect
> results. Correct old messages to account for kernel support of SNC in
> resctrl.

Sounds like description of what earlier version of this patch did ...

> 
> Signed-off-by: Maciej Wieczor-Retman 
> ---
> Changelog v5:
> - Move all resctrlfs.c code from this patch to 1/2. (Reinette)

Please update the changelog to match the changes made to the patch.

> - Remove kernel support check and error message from CAT since it can't
>   be happen.
> - Remove snc checks in CAT since snc doesn't affect it here.
> - Skip MBM, MBA and CMT tests if snc is unreliable.
> 
> Changelog v4:
> - Change messages at the end of tests and at the start of
>   run_single_test. (Reinette)
> - Add messages at the end of CAT since it can also fail due to enabled
>   SNC + lack of kernel support.
> - Remove snc_mode global variable. (Reinette)
> - Fix wrong description of snc_kernel_support(). (Reinette)
> - Move call to cpus_offline_empty() into snc_nodes_per_l3_cache() so the
>   whole detection flow is in one place as discussed. (Reinette)
> 
> Changelog v3:
> - Change snc_ways() to snc_nodes_per_l3_cache(). (Reinette)
> - Add printing the discovered SNC mode. (Reinette)
> - Change method of kernel support discovery from cache sizes to
>   existance of sub node files.
> - Check if SNC detection is unreliable.
> - Move SNC detection to only the first run_single_test() instead on
>   error at the end of test runs.
> - Add global value to remind user at the end of relevant tests if SNC
>   detection was found to be unreliable.
> - Redo the patch message after the changes.
> 
> Changelog v2:
> - Move snc_ways() checks from individual tests into
>   snc_kernel_support().
> - Write better comment for snc_kernel_support().
> 
>  tools/testing/selftests/resctrl/cmt_test.c |  8 ++--
>  tools/testing/selftests/resctrl/mba_test.c |  8 +++-
>  tools/testing/selftests/resctrl/mbm_test.c | 10 +++---
>  tools/testing/selftests/resctrl/resctrl.h  |  3 +++
>  4 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cmt_test.c 
> b/tools/testing/selftests/resctrl/cmt_test.c
> index 0c045080d808..1470bd64d158 100644
> --- a/tools/testing/selftests/resctrl/cmt_test.c
> +++ b/tools/testing/selftests/resctrl/cmt_test.c
> @@ -133,6 +133,10 @@ static int cmt_run_test(const struct resctrl_test *test, 
> const struct user_param
>   ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
>   if (ret)
>   return ret;
> +
> + if ((get_vendor() == ARCH_INTEL) && snc_unreliable)
> + ksft_exit_skip("Sub-NUMA Clustering could not be detected 
> properly. Skipping...\n");
> +

This (inconsistent) duplication can be moved to run_single_test().

>   ksft_print_msg("Cache size :%lu\n", cache_total_size);
>  
>   count_of_bits = count_bits(long_mask);
> @@ -175,8 +179,8 @@ static int cmt_run_test(const struct resctrl_test *test, 
> const struct user_param
>   goto out;
>  
>   ret = check_results(¶m, span, n);
> - if (ret && (get_vendor() == ARCH_INTEL))
> - ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA 
> Clustering is enabled. Check BIOS configuration.\n");
> + if (ret && (get_vendor() == ARCH_INTEL) && !snc_kernel_support())
> + ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but 
> it is enabled on the system.\n");
>  
>  out:
>   free(span_str);
> diff --git a/tools/testing/selftests/resctrl/mba_te

[PATCH v3 0/9] vhost: Add support of kthread API

2024-11-05 Thread Cindy Lu
In commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"),
The vhost now use vhost_task and workers working as a child of the owner thread,
which aligns with containerization principles. However, this change has caused
confusion for some legacy userspace applications. 
Therefore, we are reintroducing support for the kthread API.

In this patch, we introduce a module_param that allows users to select the
operating mode. Additionally, a new UAPI is implemented to enable
userspace applications to set their desired mode

Changelog v2: 
 1. Change the module_param's name to enforce_inherit_owner, and the default 
value is true.
 2. Change the UAPI's name to VHOST_SET_INHERIT_FROM_OWNER.
 
Changelog v3: 
 1. Change the module_param's name to inherit_owner_default, and the default 
value is true.
 2. Add a structure for task function; the worker will select a different mode 
based on the value inherit_owner.
 3. device will have their own inherit_owner in struct vhost_dev
 4. Address other comments
 
Tested with QEMU.

Cindy Lu (9):
  vhost: Add a new parameter to allow user select kthread
  vhost: Add the vhost_worker to support kthread
  vhost: Add the cgroup related function
  vhost: Add kthread support in function vhost_worker_create
  vhost: Add kthread support in function vhost_worker_queue()
  vhost: Add kthread support in function vhost_worker_destroy()
  vhost: Add new UAPI to support change to task mode
  vhost_scsi: Add check for inherit_owner status
  vhost: Expose the modparam inherit_owner_default

 drivers/vhost/scsi.c   |   5 +
 drivers/vhost/vhost.c  | 194 ++---
 drivers/vhost/vhost.h  |   7 ++
 include/uapi/linux/vhost.h |   2 +
 4 files changed, 193 insertions(+), 15 deletions(-)

-- 
2.45.0




Re: [PATCH v3 1/9] vhost: Add a new parameter to allow user select kthread

2024-11-05 Thread Jason Wang
On Tue, Nov 5, 2024 at 3:27 PM Cindy Lu  wrote:
>
> The vhost now uses vhost_task and workers as a child of the owner thread.
> While this aligns with containerization principles,it confuses some legacy
> userspace app, Therefore, we are reintroducing kthread API support.
>
> Introduce a new parameter to enable users to choose between
> kthread and task mode. This will be exposed by module_param() later.
>
> Signed-off-by: Cindy Lu 
> ---
>  drivers/vhost/vhost.c | 2 ++
>  drivers/vhost/vhost.h | 1 +
>  2 files changed, 3 insertions(+)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9ac25d08f473..eff6acbbb63b 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -41,6 +41,7 @@ static int max_iotlb_entries = 2048;
>  module_param(max_iotlb_entries, int, 0444);
>  MODULE_PARM_DESC(max_iotlb_entries,
> "Maximum number of iotlb entries. (default: 2048)");
> +static bool inherit_owner_default = true;

I wonder how management can make a decision for this value.

Thanks




Re: [PATCH v3 4/9] vhost: Add kthread support in function vhost_worker_create

2024-11-05 Thread Jason Wang
On Tue, Nov 5, 2024 at 3:27 PM Cindy Lu  wrote:
>
> Restored the previous functions kthread_wakeup and kthread_stop.
> Also add a new structure, vhost_task_fn. The function vhost_worker_create
> Will initializes this structure based on the value of inherit_owner.
>
> Signed-off-by: Cindy Lu 
> ---
>  drivers/vhost/vhost.c | 71 ---
>  drivers/vhost/vhost.h |  6 
>  2 files changed, 66 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e40cef3a1fa5..603b146fccc1 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -741,43 +741,92 @@ static void vhost_workers_free(struct vhost_dev *dev)
> xa_destroy(&dev->worker_xa);
>  }
>
> +static int vhost_task_wakeup_fn(void *vtsk)
> +{
> +   vhost_task_wake((struct vhost_task *)vtsk);
> +   return 0;
> +}

Let's have a newline between two functions.

> +static int vhost_kthread_wakeup_fn(void *p)
> +{
> +   return wake_up_process((struct task_struct *)p);
> +}
> +static int vhost_task_stop_fn(void *vtsk)
> +{
> +   vhost_task_stop((struct vhost_task *)vtsk);
> +   return 0;
> +}
> +static int vhost_kthread_stop_fn(void *k)
> +{
> +   return kthread_stop((struct task_struct *)k);
> +}
> +
>  static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>  {
> struct vhost_worker *worker;
> -   struct vhost_task *vtsk;
> +   struct vhost_task *vtsk = NULL;
> +   struct task_struct *task = NULL;
> char name[TASK_COMM_LEN];
> int ret;
> u32 id;
>
> +   /* Allocate resources for the worker */
> worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
> if (!worker)
> return NULL;
>
> +   worker->fn = kzalloc(sizeof(struct vhost_task_fn), 
> GFP_KERNEL_ACCOUNT);
> +   if (!worker->fn) {
> +   kfree(worker);
> +   return NULL;
> +   }
> +
> worker->dev = dev;
> snprintf(name, sizeof(name), "vhost-%d", current->pid);
>
> -   vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed,
> -worker, name);
> -   if (!vtsk)
> -   goto free_worker;
> -
> mutex_init(&worker->mutex);
> init_llist_head(&worker->work_list);
> worker->kcov_handle = kcov_common_handle();
> -   worker->vtsk = vtsk;
>
> -   vhost_task_start(vtsk);
> +   if (dev->inherit_owner) {
> +   /* Create and start a vhost task */
> +   vtsk = vhost_task_create(vhost_run_work_list,
> +vhost_worker_killed, worker, name);
> +   if (!vtsk)
> +   goto free_worker;
> +
> +   worker->vtsk = vtsk;
> +   worker->fn->wakeup = vhost_task_wakeup_fn;
> +   worker->fn->stop = vhost_task_stop_fn;
> +
> +   vhost_task_start(vtsk);
> +   } else {
> +   /* Create and start a kernel thread */
> +   task = kthread_create(vhost_run_work_kthread_list, worker,
> + "vhost-%d", current->pid);
> +   if (IS_ERR(task)) {
> +   ret = PTR_ERR(task);
> +   goto free_worker;
> +   }
> +   worker->task = task;
> +   worker->fn->wakeup = vhost_kthread_wakeup_fn;
> +   worker->fn->stop = vhost_kthread_stop_fn;
> +
> +   wake_up_process(task);
> +   /* Attach to the vhost cgroup */
> +   ret = vhost_attach_cgroups(dev);
> +   if (ret)
> +   goto stop_worker;
> +   }
>
> ret = xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, 
> GFP_KERNEL);
> if (ret < 0)
> goto stop_worker;
> worker->id = id;
> -
> return worker;
> -
>  stop_worker:
> -   vhost_task_stop(vtsk);
> +   worker->fn->stop(dev->inherit_owner ? (void *)vtsk : (void *)task);
>  free_worker:
> +   kfree(worker->fn);
> kfree(worker);
> return NULL;
>  }
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index c650c4506c70..ebababa4e340 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -25,8 +25,13 @@ struct vhost_work {
> vhost_work_fn_t fn;
> unsigned long   flags;
>  };
> +struct vhost_task_fn {
> +   int (*wakeup)(void *task);

Let's have comments to explain the semantics of each operation.

> +   int (*stop)(void *task);
> +};

I think the goal is to reduce if/else, so while at this, let's
introduce more ops. For example the create_worker one?

>
>  struct vhost_worker {
> +   struct task_struct  *task;
> struct vhost_task   *vtsk;
> struct vhost_dev*dev;
> /* Used to serialize device wide flushing with worker swapping. */
> @@ -36,6 +41,7 @@ struct vhos

Re: [PATCH v3 5/9] vhost: Add kthread support in function vhost_worker_queue()

2024-11-05 Thread Jason Wang
On Tue, Nov 5, 2024 at 3:27 PM Cindy Lu  wrote:
>
> The function vhost_worker_queue() uses vhost_task_fn and
> selects the different mode based on the value of inherit_owner.
>
> Signed-off-by: Cindy Lu 
> ---
>  drivers/vhost/vhost.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 603b146fccc1..8b7ddfb33c61 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -238,13 +238,18 @@ EXPORT_SYMBOL_GPL(vhost_poll_stop);
>  static void vhost_worker_queue(struct vhost_worker *worker,
>struct vhost_work *work)
>  {
> +   if (!worker && !worker->fn)
> +   return;
> +
> if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
> /* We can only add the work to the list after we're
>  * sure it was not in the list.
>  * test_and_set_bit() implies a memory barrier.
>  */
> llist_add(&work->node, &worker->work_list);
> -   vhost_task_wake(worker->vtsk);
> +   worker->fn->wakeup(worker->dev->inherit_owner ?
> +  (void *)worker->vtsk :
> +  (void *)worker->task);

Logically, it looks better to introduce the ops before introducing
back the kthread behaviour?

Thanks

> }
>  }
>
> --
> 2.45.0
>




Re: [PATCH net v6] ipv6: Fix soft lockups in fib6_select_path under high next hop churn

2024-11-05 Thread Paolo Abeni
On 10/31/24 15:10, David Ahern wrote:
> On 10/31/24 4:13 AM, Paolo Abeni wrote:
>> Given the issue is long-standing, and the fix is somewhat invasive, I
>> suggest steering this patch on net-next.
> 
> FWIW, I think net-next is best.

Should I count the above as a formal ack? :-P

FWIW, I went through the patch as thoroughly as I could and LGTM, but it
does not apply (anymore?) to net-next.

@Omid: could you please rebase it on top of net-next and resend (with a
proper net-next tag)?

Thanks!

Paolo




Re: [PATCH net-next v11 19/23] ovpn: implement key add/get/del/swap via netlink

2024-11-05 Thread Sabrina Dubroca
2024-10-29, 11:47:32 +0100, Antonio Quartulli wrote:
> This change introduces the netlink commands needed to add, get, delete
> and swap keys for a specific peer.
> 
> Userspace is expected to use these commands to create, inspect (non
> sensible data only), destroy and rotate session keys for a specific

nit: s/sensible/sensitive/

> +int ovpn_crypto_config_get(struct ovpn_crypto_state *cs,
> +enum ovpn_key_slot slot,
> +struct ovpn_key_config *keyconf)
> +{
[...]
> +
> + rcu_read_lock();
> + ks = rcu_dereference(cs->slots[idx]);
> + if (!ks || (ks && !ovpn_crypto_key_slot_hold(ks))) {
> + rcu_read_unlock();
> + return -ENOENT;
> + }
> + rcu_read_unlock();

You could stay under rcu_read_lock a little bit longer and avoid
taking a reference just to release it immediately.

> + keyconf->cipher_alg = ovpn_aead_crypto_alg(ks);
> + keyconf->key_id = ks->key_id;
> +
> + ovpn_crypto_key_slot_put(ks);
> +
> + return 0;
> +}


[...]
>  int ovpn_nl_key_get_doit(struct sk_buff *skb, struct genl_info *info)
>  {
[...]
> + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
> +   OVPN_A_KEYCONF_PEER_ID))
> + return -EINVAL;
> +
> + peer_id = nla_get_u32(attrs[OVPN_A_KEYCONF_PEER_ID]);
> +
> + peer = ovpn_peer_get_by_id(ovpn, peer_id);
> + if (!peer) {
> + NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +"cannot find peer with id %u", 0);

   peer_id?

> + return -ENOENT;
> + }
> +
> + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
> +   OVPN_A_KEYCONF_SLOT))
> + return -EINVAL;

Move this check before ovpn_peer_get_by_id? We're leaking a reference
on the peer.


> +
> + slot = nla_get_u32(attrs[OVPN_A_KEYCONF_SLOT]);
> +
> + ret = ovpn_crypto_config_get(&peer->crypto, slot, &keyconf);
> + if (ret < 0) {
> + NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +"cannot extract key from slot %u for 
> peer %u",
> +slot, peer_id);
> + goto err;
> + }
> +
> + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> + if (!msg) {
> + ret = -ENOMEM;
> + goto err;
> + }
> +
> + ret = ovpn_nl_send_key(msg, info, peer->id, slot, &keyconf,
> +info->snd_portid, info->snd_seq, 0);

info->snd_portid and info->snd_seq can be extracted from info directly
in ovpn_nl_send_key since there's no other caller, and flags=0 can be
skipped as well.

> + if (ret < 0) {
> + nlmsg_free(msg);
> + goto err;
> + }
> +
> + ret = genlmsg_reply(msg, info);
> +err:
> + ovpn_peer_put(peer);
> + return ret;
>  }



[...]
>  int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info)
>  {
> - return -EOPNOTSUPP;
> + struct nlattr *attrs[OVPN_A_KEYCONF_MAX + 1];
> + struct ovpn_struct *ovpn = info->user_ptr[0];
> + enum ovpn_key_slot slot;
> + struct ovpn_peer *peer;
> + u32 peer_id;
> + int ret;
> +
> + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_KEYCONF))
> + return -EINVAL;
> +
> + ret = nla_parse_nested(attrs, OVPN_A_KEYCONF_MAX,
> +info->attrs[OVPN_A_KEYCONF],
> +ovpn_keyconf_nl_policy, info->extack);
> + if (ret)
> + return ret;
> +
> + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
> +   OVPN_A_KEYCONF_PEER_ID))
> + return -EINVAL;
> +
> + if (ret)
> + return ret;

leftover?


> + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
> +   OVPN_A_KEYCONF_SLOT))
> + return -EINVAL;

-- 
Sabrina



Re: [PATCH net-next v11 20/23] ovpn: kill key and notify userspace in case of IV exhaustion

2024-11-05 Thread Sabrina Dubroca
2024-10-29, 11:47:33 +0100, Antonio Quartulli wrote:
> +int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id)
> +{
[...]
> +
> + nla_nest_end(msg, k_attr);
> + genlmsg_end(msg, hdr);
> +
> + genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg,
> + 0, OVPN_NLGRP_PEERS, GFP_ATOMIC);
> +

Is openvpn meant to support moving the device to a different netns? In
that case I'm not sure the netns the ovpn netdevice is in is the right
one, the userspace client will be in the encap socket's netns instead
of the netdevice's?

(same thing in the next patch)

-- 
Sabrina



Re: [PATCH v3 7/9] vhost: Add new UAPI to support change to task mode

2024-11-05 Thread Stefano Garzarella

On Tue, Nov 05, 2024 at 03:25:26PM +0800, Cindy Lu wrote:

Add a new UAPI to enable setting the vhost device to task mode.
The userspace application can use VHOST_SET_INHERIT_FROM_OWNER
to configure the mode if necessary.
This setting must be applied before VHOST_SET_OWNER, as the worker
will be created in the VHOST_SET_OWNER function

Signed-off-by: Cindy Lu 
---
drivers/vhost/vhost.c  | 15 ++-
include/uapi/linux/vhost.h |  2 ++
2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c17dc01febcc..70c793b63905 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2274,8 +2274,9 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *argp)
{
struct eventfd_ctx *ctx;
u64 p;
-   long r;
+   long r = 0;


I don't know if something is missing in this patch, but I am confused:

`r` is set few lines below...


int i, fd;
+   bool inherit_owner;

/* If you are not the owner, you can become one */
if (ioctl == VHOST_SET_OWNER) {

...

/* You must be the owner to do anything else */
r = vhost_dev_check_owner(d);
if (r)
goto done;

So, why we are now initializing it to 0?


@@ -2332,6 +2333,18 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *argp)
if (ctx)
eventfd_ctx_put(ctx);
break;
+   case VHOST_SET_INHERIT_FROM_OWNER:
+   /*inherit_owner can only be modified before owner is set*/
+   if (vhost_dev_has_owner(d))


And here, how this check can be false, if at the beginning of the
function we call vhost_dev_check_owner()?

Maybe your intention was to add this code before the
`vhost_dev_check_owner()` call, so this should explain why initialize
`r` to 0, but I'm not sure.


+   break;


Should we return an error (e.g. -EPERM) in this case?


+
+   if (copy_from_user(&inherit_owner, argp,
+  sizeof(inherit_owner))) {
+   r = -EFAULT;
+   break;
+   }
+   d->inherit_owner = inherit_owner;
+   break;
default:
r = -ENOIOCTLCMD;
break;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index b95dd84eef2d..1e192038633d 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -235,4 +235,6 @@
 */
#define VHOST_VDPA_GET_VRING_SIZE   _IOWR(VHOST_VIRTIO, 0x82,   \
  struct vhost_vring_state)
+


Please add a documentation here, this is UAPI, so the user should
know what this ioctl does based on the parameter.

Thanks,
Stefano


+#define VHOST_SET_INHERIT_FROM_OWNER _IOW(VHOST_VIRTIO, 0x83, bool)
#endif
--
2.45.0






Re: [PATCH net-next v6 6/7] selftests: net: Add busy_poll_test

2024-11-05 Thread Joe Damato
On Mon, Nov 04, 2024 at 07:50:21PM -0800, Stanislav Fomichev wrote:
> On 11/04, Joe Damato wrote:
> > Add an epoll busy poll test using netdevsim.
> > 
> > This test is comprised of:
> >   - busy_poller (via busy_poller.c)
> >   - busy_poll_test.sh which loads netdevsim, sets up network namespaces,
> > and runs busy_poller to receive data and socat to send data.
> > 
> > The selftest tests two different scenarios:
> >   - busy poll (the pre-existing version in the kernel)
> >   - busy poll with suspend enabled (what this series adds)
> > 
> > The data transmit is a 1MiB temporary file generated from /dev/urandom
> > and the test is considered passing if the md5sum of the input file to
> > socat matches the md5sum of the output file from busy_poller.
> > 
> > netdevsim was chosen instead of veth due to netdevsim's support for
> > netdev-genl.
> > 
> > For now, this test uses the functionality that netdevsim provides. In the
> > future, perhaps netdevsim can be extended to emulate device IRQs to more
> > thoroughly test all pre-existing kernel options (like defer_hard_irqs)
> > and suspend.
> > 
> > Signed-off-by: Joe Damato 
> > Co-developed-by: Martin Karsten 
> > Signed-off-by: Martin Karsten 
> > ---

[...]

> > +
> > +static void run_poller(void)
> > +{
> > +   struct epoll_event events[cfg_max_events];
> > +   struct epoll_params epoll_params = {0};
> > +   struct sockaddr_in server_addr;
> > +   int i, epfd, nfds;
> > +   ssize_t readlen;
> > +   int outfile_fd;
> > +   char buf[1024];
> > +   int sockfd;
> > +   int conn;
> > +   int val;
> 
> [..]
> 
> > +   outfile_fd = open(cfg_outfile, O_WRONLY | O_CREAT, 0644);
> > +   if (outfile_fd == -1)
> > +   error(1, errno, "unable to open outfile: %s", cfg_outfile);
> 
> Any reason you're not printing to stdout? And then redirect it to a file
> in the shell script if needed. Lets you save some code on open/close
> and flag parsing :-p But I guess can keep it since you already have it
> all working.

No reason in particular; I thought about this while writing it, but
ended up adding it as a flag in case others come along to extend
this test in some capacity.

> Acked-by: Stanislav Fomichev 

Thanks for the ack!



[PATCH] vdpa/mlx5: Fix error path during device add

2024-11-05 Thread Dragos Tatulea
In the error recovery path of mlx5_vdpa_dev_add(), the cleanup is
executed and at the end put_device() is called which ends up calling
mlx5_vdpa_free(). This function will execute the same cleanup all over
again. Most resources support being cleaned up twice, but the recent
mlx5_vdpa_destroy_mr_resources() doesn't.

This change drops the explicit cleanup from within the
mlx5_vdpa_dev_add() and lets mlx5_vdpa_free() do its work.

This issue was discovered while trying to add 2 vdpa devices with the
same name:
$> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.2
$> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.3

... yields the following dump:

  BUG: kernel NULL pointer dereference, address: 00b8
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x) - not-present page
  PGD 0 P4D 0
  Oops: Oops:  [#1] SMP
  CPU: 4 UID: 0 PID: 2811 Comm: vdpa Not tainted 6.12.0-rc6 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:destroy_workqueue+0xe/0x2a0
  Code: ...
  RSP: 0018:88814920b9a8 EFLAGS: 00010282
  RAX:  RBX: 888105c1 RCX: 
  RDX: 0001 RSI: 888100400168 RDI: 
  RBP:  R08: 888100120c00 R09: 828578c0
  R10:  R11:  R12: 
  R13: 888131fd99a0 R14:  R15: 888105c10580
  FS:  7fdfa6b4f740() GS:88852ca0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 00b8 CR3: 00018db09006 CR4: 00372eb0
  DR0:  DR1:  DR2: 
  DR3:  DR6: fffe0ff0 DR7: 0400
  Call Trace:
   
   ? __die+0x20/0x60
   ? page_fault_oops+0x150/0x3e0
   ? exc_page_fault+0x74/0x130
   ? asm_exc_page_fault+0x22/0x30
   ? destroy_workqueue+0xe/0x2a0
   mlx5_vdpa_destroy_mr_resources+0x2b/0x40 [mlx5_vdpa]
   mlx5_vdpa_free+0x45/0x150 [mlx5_vdpa]
   vdpa_release_dev+0x1e/0x50 [vdpa]
   device_release+0x31/0x90
   kobject_put+0x8d/0x230
   mlx5_vdpa_dev_add+0x328/0x8b0 [mlx5_vdpa]
   vdpa_nl_cmd_dev_add_set_doit+0x2b8/0x4c0 [vdpa]
   genl_family_rcv_msg_doit+0xd0/0x120
   genl_rcv_msg+0x180/0x2b0
   ? __vdpa_alloc_device+0x1b0/0x1b0 [vdpa]
   ? genl_family_rcv_msg_dumpit+0xf0/0xf0
   netlink_rcv_skb+0x54/0x100
   genl_rcv+0x24/0x40
   netlink_unicast+0x1fc/0x2d0
   netlink_sendmsg+0x1e4/0x410
   __sock_sendmsg+0x38/0x60
   ? sockfd_lookup_light+0x12/0x60
   __sys_sendto+0x105/0x160
   ? __count_memcg_events+0x53/0xe0
   ? handle_mm_fault+0x100/0x220
   ? do_user_addr_fault+0x40d/0x620
   __x64_sys_sendto+0x20/0x30
   do_syscall_64+0x4c/0x100
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7fdfa6c66b57
  Code: ...
  RSP: 002b:7ffeace22998 EFLAGS: 0202 ORIG_RAX: 002c
  RAX: ffda RBX: 55a498608350 RCX: 7fdfa6c66b57
  RDX: 006c RSI: 55a498608350 RDI: 0003
  RBP: 7ffeace229c0 R08: 7fdfa6d35200 R09: 000c
  R10:  R11: 0202 R12: 55a4986082a0
  R13:  R14:  R15: 7ffeace233f3
   
  Modules linked in: ...
  CR2: 00b8

Fixes: 62111654481d ("vdpa/mlx5: Postpone MR deletion")
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 21 +
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index dee019977716..5f581e71e201 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3963,28 +3963,28 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
mvdev->vdev.dma_dev = &mdev->pdev->dev;
err = mlx5_vdpa_alloc_resources(&ndev->mvdev);
if (err)
-   goto err_mpfs;
+   goto err_alloc;
 
err = mlx5_vdpa_init_mr_resources(mvdev);
if (err)
-   goto err_res;
+   goto err_alloc;
 
if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
err = mlx5_vdpa_create_dma_mr(mvdev);
if (err)
-   goto err_mr_res;
+   goto err_alloc;
}
 
err = alloc_fixed_resources(ndev);
if (err)
-   goto err_mr;
+   goto err_alloc;
 
ndev->cvq_ent.mvdev = mvdev;
INIT_WORK(&ndev->cvq_ent.work, mlx5_cvq_kick_handler);
mvdev->wq = create_singlethread_workqueue("mlx5_vdpa_wq");
if (!mvdev->wq) {
err = -ENOMEM;
-   goto err_res2;
+   goto err_alloc;
}
 
mvdev->vdev.mdev = &mgtdev->mgtdev;
@@ -4010,17 +4010,6 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
_vdpa_unregister_device(&mvde

Re: [PATCH v2 2/2] kunit: enable hardware acceleration when available

2024-11-05 Thread Tamir Duberstein
On Tue, Nov 5, 2024 at 8:36 AM Kristina Martsenko
 wrote:
>
> On 02/11/2024 12:09, Tamir Duberstein wrote:
> > Use KVM or HVF if supported by the QEMU binary and available on the
> > system.
> >
> > This produces a nice improvement on my Apple M3 Pro running macOS 14.7:
> >
> > Before:
> > ./tools/testing/kunit/kunit.py exec --arch arm64
> > [HH:MM:SS] Elapsed time: 10.145s
> >
> > After:
> > ./tools/testing/kunit/kunit.py exec --arch arm64
> > [HH:MM:SS] Elapsed time: 1.773s
> >
> > Signed-off-by: Tamir Duberstein 
> > ---
> >  tools/testing/kunit/kunit_kernel.py   | 3 +++
> >  tools/testing/kunit/qemu_configs/arm64.py | 2 +-
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/kunit/kunit_kernel.py 
> > b/tools/testing/kunit/kunit_kernel.py
> > index 
> > 61931c4926fd6645f2c62dd13f9842a432ec4167..3146acb884ecf0bcff94d5938535aabd4486fe82
> >  100644
> > --- a/tools/testing/kunit/kunit_kernel.py
> > +++ b/tools/testing/kunit/kunit_kernel.py
> > @@ -123,6 +123,9 @@ class 
> > LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
> >   '-append', ' '.join(params + 
> > [self._kernel_command_line]),
> >   '-no-reboot',
> >   '-nographic',
> > + '-accel', 'kvm',
> > + '-accel', 'hvf',
> > + '-accel', 'tcg',
> >   '-serial', self._serial] + 
> > self._extra_qemu_params
> >   # Note: shlex.join() does what we want, but requires python 
> > 3.8+.
> >   print('Running tests with:\n$', ' '.join(shlex.quote(arg) for 
> > arg in qemu_command))
> > diff --git a/tools/testing/kunit/qemu_configs/arm64.py 
> > b/tools/testing/kunit/qemu_configs/arm64.py
> > index 
> > d3ff27024755411441f910799be30399295c9541..5c44d3a87e6dd2cd6b086138186a277a1473585b
> >  100644
> > --- a/tools/testing/kunit/qemu_configs/arm64.py
> > +++ b/tools/testing/kunit/qemu_configs/arm64.py
> > @@ -9,4 +9,4 @@ CONFIG_SERIAL_AMBA_PL011_CONSOLE=y''',
> >  qemu_arch='aarch64',
> >  kernel_path='arch/arm64/boot/Image.gz',
> >  kernel_command_line='console=ttyAMA0',
> > -extra_qemu_params=['-machine', 'virt', '-cpu', 
> > 'max,pauth-impdef=on'])
> > +extra_qemu_params=['-machine', 'virt', '-cpu', 
> > 'max'])
>
> Would it be possible to keep 'pauth-impdef=on' for TCG emulation? Otherwise
> performance regresses by about 20%.
>
> Before this patch:
> ./tools/testing/kunit/kunit.py exec --arch=arm64 
> --cross_compile=aarch64-linux-
> [11:03:38] Elapsed time: 15.494s
>
> After this patch:
> ./tools/testing/kunit/kunit.py exec --arch=arm64 
> --cross_compile=aarch64-linux-
> [11:10:47] Elapsed time: 19.099s
>
> Thanks,
> Kristina

Hi Kristina, thanks for pointing that out. I'm able to reproduce the
regression. I poked around and I can't find a way to enable
`pauth-impdef` only for TCG, and the problem is that enabling it
globally produces:

tools/testing/kunit/kunit.py exec --arch arm64 --make_options LLVM=1
--raw_output=all
[15:34:05] Starting KUnit Kernel (1/1)...
Running tests with:
$ qemu-system-aarch64 -nodefaults -m 1024 -kernel
.kunit/arch/arm64/boot/Image.gz -append 'kunit.enable=1
console=ttyAMA0 kunit_shutdown=reboot' -no-reboot -nographic -accel
kvm -accel hvf -accel tcg -serial stdio -machine virt -cpu
max,pauth-impdef=on
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
qemu-system-aarch64: falling back to HVF
qemu-system-aarch64: can't apply global max-arm-cpu.pauth-impdef=on:
Property 'max-arm-cpu.pauth-impdef' not found

This behavior is at least somewhat intentional[0]. I have filed a bug
with qemu[1]. If someone can conceive of a way to achieve this, I'd be
delighted to send a v3.

Link: 
https://gitlab.com/qemu-project/qemu/-/commit/92d6528dbb20c6aec4022dfd63c7ffee44f19f77
[0]
Link: https://gitlab.com/qemu-project/qemu/-/issues/2656 [1]



Re: [PATCH net-next v11 15/23] ovpn: implement keepalive mechanism

2024-11-05 Thread Sabrina Dubroca
2024-10-29, 11:47:28 +0100, Antonio Quartulli wrote:
> @@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret)
>   goto drop;
>   }
>  
> + /* keep track of last received authenticated packet for keepalive */
> + peer->last_recv = ktime_get_real_seconds();

It doesn't look like we're locking the peer here so that should be a
WRITE_ONCE() (and READ_ONCE(peer->last_recv) for all reads).

> +
>   /* point to encapsulated IP packet */
>   __skb_pull(skb, payload_offset);
>  
> @@ -121,6 +151,12 @@ void ovpn_decrypt_post(void *data, int ret)
>   goto drop;
>   }
>  
> + if (ovpn_is_keepalive(skb)) {
> + net_dbg_ratelimited("%s: ping received from peer %u\n",
> + peer->ovpn->dev->name, peer->id);
> + goto drop;

To help with debugging connectivity issues, maybe keepalives shouldn't
be counted as drops? (consume_skb instead of kfree_skb, and not
incrementing rx_dropped)
The packet was successfully received and did all it had to do.

> + }
> +
>   net_info_ratelimited("%s: unsupported protocol received from 
> peer %u\n",
>peer->ovpn->dev->name, peer->id);
>   goto drop;
> @@ -221,6 +257,10 @@ void ovpn_encrypt_post(void *data, int ret)
>   /* no transport configured yet */
>   goto err;
>   }
> +
> + /* keep track of last sent packet for keepalive */
> + peer->last_sent = ktime_get_real_seconds();

And another WRITE_ONCE() here (also paired with READ_ONCE() on the
read side).


> +static int ovpn_peer_del_nolock(struct ovpn_peer *peer,
> + enum ovpn_del_peer_reason reason)
> +{
> + switch (peer->ovpn->mode) {
> + case OVPN_MODE_MP:

I think it would be nice to add

lockdep_assert_held(&peer->ovpn->peers->lock);

> + return ovpn_peer_del_mp(peer, reason);
> + case OVPN_MODE_P2P:

and here

lockdep_assert_held(&peer->ovpn->lock);

(I had to check that ovpn_peer_del_nolock is indeed called with those
locks held since they're taken by ovpn_peer_keepalive_work_{mp,p2p},
adding these assertions would make it clear that ovpn_peer_del_nolock
is not an unsafe version of ovpn_peer_del)

> + return ovpn_peer_del_p2p(peer, reason);
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
>  /**
>   * ovpn_peers_free - free all peers in the instance
>   * @ovpn: the instance whose peers should be released
> @@ -830,3 +871,150 @@ void ovpn_peers_free(struct ovpn_struct *ovpn)
>   ovpn_peer_unhash(peer, OVPN_DEL_PEER_REASON_TEARDOWN);
>   spin_unlock_bh(&ovpn->peers->lock);
>  }
> +
> +static time64_t ovpn_peer_keepalive_work_single(struct ovpn_peer *peer,
> + time64_t now)
> +{
> + time64_t next_run1, next_run2, delta;
> + unsigned long timeout, interval;
> + bool expired;
> +
> + spin_lock_bh(&peer->lock);
> + /* we expect both timers to be configured at the same time,
> +  * therefore bail out if either is not set
> +  */
> + if (!peer->keepalive_timeout || !peer->keepalive_interval) {
> + spin_unlock_bh(&peer->lock);
> + return 0;
> + }
> +
> + /* check for peer timeout */
> + expired = false;
> + timeout = peer->keepalive_timeout;
> + delta = now - peer->last_recv;

I'm not sure that's always > 0 if we finish decrypting a packet just
as the workqueue starts:

  ovpn_peer_keepalive_work
now = ...

   ovpn_decrypt_post
 peer->last_recv = ...

  ovpn_peer_keepalive_work_single
delta: now < peer->last_recv



> + if (delta < timeout) {
> + peer->keepalive_recv_exp = now + timeout - delta;

I'd shorten that to

peer->keepalive_recv_exp = peer->last_recv + timeout;

it's a bit more readable to my eyes and avoids risks of wrapping
values.

So I'd probably get rid of delta and go with:

last_recv = READ_ONCE(peer->last_recv)
if (now < last_recv + timeout) {
peer->keepalive_recv_exp = last_recv + timeout;
next_run1 = peer->keepalive_recv_exp;
} else if ...

> + next_run1 = peer->keepalive_recv_exp;
> + } else if (peer->keepalive_recv_exp > now) {
> + next_run1 = peer->keepalive_recv_exp;
> + } else {
> + expired = true;
> + }

[...]
> + /* check for peer keepalive */
> + expired = false;
> + interval = peer->keepalive_interval;
> + delta = now - peer->last_sent;
> + if (delta < interval) {
> + peer->keepalive_xmit_exp = now + interval - delta;
> + next_run2 = peer->keepalive_xmit_exp;

and same here

-- 
Sabrina



Re: qrtr/mhi: NULL-deref with in-kernel pd-mapper

2024-11-05 Thread Chris Lew




On 11/4/2024 9:08 PM, Johan Hovold wrote:

On Mon, Nov 04, 2024 at 04:26:15PM -0800, Chris Lew wrote:

On 11/1/2024 8:01 AM, Johan Hovold wrote:



[8.825593] Unable to handle kernel NULL pointer dereference at virtual
address 0034
.



[9.002030] CPU: 10 UID: 0 PID: 11 Comm: kworker/u48:0 Not tainted 
6.12.0-rc5 #4
[9.029550] Hardware name: Qualcomm CRD, BIOS 
6.0.231221.BOOT.MXF.2.4-00348.1-HAMOA-1 12/21/2023
[9.029552] Workqueue: qrtr_ns_handler qrtr_ns_worker [qrtr]
[9.061350] pstate: a145 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[9.061353] pc : mhi_gen_tre+0x44/0x224 [mhi]
[9.106931] lr : mhi_gen_tre+0x40/0x224 [mhi]
[9.106934] sp : 8000800fb7d0
[9.106935] x29: 8000800fb7d0 x28: 6db7852bd000 x27: 800082490188
[9.120382] dwc3 a00.usb: Adding to iommu group 5
[9.133750]
[9.133752] x26:  x25: 6db783e65080 x24: 80008248ff88
[9.133754] x23:  x22: 80008248ff80 x21: 8000800fb890
[9.133756] x20:  x19: 0002 x18: 0005cf20
[9.133758] x17: 0028 x16: 
[9.172738]  x15: a5834131fbd0
[9.172741] x14: a5834137caf0 x13: ce30 x12: 6db7808bc028
[9.172743] x11: a58341993000 x10:  x9 : cf3f2b90
[9.172745] x8 : 94e5072b x7 : 000404ce x6 : a5834162cfb0
[9.172747] x5 : 008b x4 : a583419cddf0 x3 : 0007
[9.172750] x2 : 
[9.192697]  x1 : 000a x0 : 6db7808bb700
[9.192700] Call trace:
[9.192701]  mhi_gen_tre+0x44/0x224 [mhi]
[9.192704]  mhi_queue+0x74/0x194 [mhi]
[9.192706]  mhi_queue_skb+0x5c/0x8c [mhi]
[9.210985]  qcom_mhi_qrtr_send+0x6c/0x160 [qrtr_mhi]
[9.210989]  qrtr_node_enqueue+0xd0/0x4a0 [qrtr]
[9.210992]  qrtr_bcast_enqueue+0x78/0xe8 [qrtr]
[9.225530]  qrtr_sendmsg+0x15c/0x33c [qrtr]
[9.225532]  sock_sendmsg+0xc0/0xec
[9.240436]  kernel_sendmsg+0x30/0x40
[9.240438]  service_announce_new+0xbc/0x1c4 [qrtr]
[9.240440]  qrtr_ns_worker+0x714/0x794 [qrtr]
[9.240441]  process_one_work+0x210/0x614
[9.254527]  worker_thread+0x23c/0x378
[9.254529]  kthread+0x124/0x128
[9.254531]  ret_from_fork+0x10/0x20
[9.254534] Code: aa0003f9 aa1b03e0 94001a4d f9401b14 (3940d280)
[9.267369] ---[ end trace  ]---
[9.267371] Kernel panic - not syncing: Oops: Fatal exception in interrupt


Thanks for reporting this.


Thanks for taking a look, Chris.


I'm not sure the in-kernel pd-mapper should be affecting this path. I
think this is for WLAN since it is the mhi qrtr and I'm not aware of
WLAN needing to listen to the pd-mapper framework.


This function is called for both the WWAN and WLAN on this machine, and
it seems like the modem is typically probed first and around the time
when I saw the NULL-deref.

[8.802728] mhi-pci-generic 0005:01:00.0: mhi_gen_tre - buf_info = 
800080d75000, offsetof(buf_info->used) = 0x34
...
[9.980638] ath12k_pci 0004:01:00.0: mhi_gen_tre - buf_info = 800081d35000, 
offsetof(buf_info->used) = 0x34
  

The offset seems to be mapped back to
linux/drivers/bus/mhi/host/main.c:1220, I had some extra debug configs
enabled so not sure the offset is still valid.

WARN_ON(buf_info->used);
buf_info->pre_mapped = info->pre_mapped;

This looks like the null pointer would happen if qrtr tried to send
before mhi_channel_prepare() is called.


I didn't look closely at the code, but I can confirm that the offset of
buf_info->used is indeed 0x34, which could indicate that it's the

buf_info = buf_ring->wp;

pointer that was NULL.


I think we have a patch that might fix this, let me dig it up and send
it out.


Would that patch still help?


https://lore.kernel.org/lkml/20241104-qrtr_mhi-v1-1-79adf7e3b...@quicinc.com/



Yea this is the exact patch I had in mind, didnt realize the patch was 
already sent a while back.



I naively tried adding a sleep after registering the endpoint, but that
is at least not sufficient to trigger the NULL-deref.



Looking at the callstack, this is broadcasting a NEW_SERVER notification 
from qrtr_ns. I think you can force this by starting and stopping some 
qmi service with the added sleep. Do you have tqftpserv or diag-router 
in your environment? Those will open qmi services, so starting and 
stopping those will cause the new_server broadcast in qrtr_ns.



Johan




Re: [PATCH net-next v7 06/12] selftests: ncdevmem: Switch to AF_INET6

2024-11-05 Thread Stanislav Fomichev
On 11/04, Stanislav Fomichev wrote:
> On 11/04, Jakub Kicinski wrote:
> > On Mon,  4 Nov 2024 10:14:24 -0800 Stanislav Fomichev wrote:
> > > -static int configure_flow_steering(void)
> > > +static int configure_flow_steering(struct sockaddr_in6 *server_sin)
> > >  {
> > > - return run_command("sudo ethtool -N %s flow-type tcp4 %s %s dst-ip %s 
> > > %s %s dst-port %s queue %d >&2",
> > > + const char *type = "tcp6";
> > > + const char *server_addr;
> > > + char buf[256];
> > > +
> > > + inet_ntop(AF_INET6, &server_sin->sin6_addr, buf, sizeof(buf));
> > > + server_addr = buf;
> > > +
> > > + if (IN6_IS_ADDR_V4MAPPED(&server_sin->sin6_addr)) {
> > > + type = "tcp4";
> > > + server_addr = strrchr(server_addr, ':') + 1;
> > > + }
> > > +
> > > + return run_command("sudo ethtool -N %s flow-type %s %s %s dst-ip %s %s 
> > > %s dst-port %s queue %d >&2",
> > >  ifname,
> > > +type,
> > >  client_ip ? "src-ip" : "",
> > >  client_ip ?: "",
> > > -server_ip,
> > > +server_addr,
> > >  client_ip ? "src-port" : "",
> > >  client_ip ? port : "",
> > >  port, start_queue);
> > 
> > nit: I think this generate a truncation warning, not sure if it's easy
> > to fix:
> > 
> > ncdevmem.c:259:28: warning: ‘%s’ directive output may be truncated writing 
> > up to 255 bytes into a region of size between 209 and 215 
> > [-Wformat-truncation=]
> >   259 | return run_command("sudo ethtool -N %s flow-type %s %s %s 
> > dst-ip %s %s %s dst-port %s queue %d >&2",
> >   |
> > ^~~~
> > 
> > maybe make buf smaller? 🤔️
> 
> Are you having some extra -W arguments to make it trigger on the build
> bot? Wonder why I don't see the warning locally.. (will try to pass
> -Wformat-truncation)

It is indeed -Wformat-truncation but only gcc complains about it, not clang.
Setting buf size to 40 calms it down.



[PATCH v4 2/2] selftests: tmpfs: Add kselftest support to tmpfs

2024-11-05 Thread Shivam Chaudhary
Replace direct error handling with 'ksft_test_result_*',
'ksft_print_msg' and KSFT_SKIP  macros for better reporting.

Test logs:

Before change:

- Without root
 error: unshare, errno 1

- With root
 No, output

After change:

- Without root
 TAP version 13
 1..1
  ok 1 # SKIP This test needs root to run

- With root
 TAP version 13
1..1
  unshare(): Creat new mount namespace: Success.
  mount(): Root filesystem private mount: Success
  mount(): Mounting tmpfs on /tmp: Success
  openat(): Open first temporary file: Success
  linkat(): Linking the temporary file: Success
  openat(): Opening the second temporary file: Success
  ok 1 Test : Success
  Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Shivam Chaudhary 
---

 .../selftests/tmpfs/bug-link-o-tmpfile.c  | 66 ++-
 1 file changed, 49 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/tmpfs/bug-link-o-tmpfile.c 
b/tools/testing/selftests/tmpfs/bug-link-o-tmpfile.c
index cdab1e8c0392..f2e6a5b20698 100644
--- a/tools/testing/selftests/tmpfs/bug-link-o-tmpfile.c
+++ b/tools/testing/selftests/tmpfs/bug-link-o-tmpfile.c
@@ -42,39 +42,71 @@ int main(void)
 
if (unshare(CLONE_NEWNS) == -1) {
if (errno == ENOSYS || errno == EPERM) {
-   fprintf(stderr, "error: unshare, errno %d\n", errno);
-   return 4;
+   ksft_print_msg("unshare() error: unshare, errno %d\n", 
errno);
+   exit(KSFT_SKIP);
+
+   }
+   else{
+   fprintf(stderr, "unshare() error: unshare, errno %d\n", 
errno);
+   return 1;
+
}
-   fprintf(stderr, "error: unshare, errno %d\n", errno);
-   return 1;
+   } 
+   
+   else {
+   ksft_print_msg("unshare(): Creat new mount namespace: 
Success.\n");
+
}
-   if (mount(NULL, "/", NULL, MS_PRIVATE|MS_REC, NULL) == -1) {
-   fprintf(stderr, "error: mount '/', errno %d\n", errno);
-   return 1;
+
+
+
+   if (mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL) == -1) {
+   ksft_print_msg("mount() error: Root filesystem private mount: 
Fail %d\n", errno);
+   exit(KSFT_SKIP);
+   } else {
+   ksft_print_msg("mount(): Root filesystem private mount: 
Success\n");
}
 
+
/* Our heroes: 1 root inode, 1 O_TMPFILE inode, 1 permanent inode. */
if (mount(NULL, "/tmp", "tmpfs", 0, "nr_inodes=3") == -1) {
-   fprintf(stderr, "error: mount tmpfs, errno %d\n", errno);
-   return 1;
+   ksft_print_msg("mount() error: Mounting tmpfs on /tmp: Fail 
%d\n", errno);
+   exit(KSFT_SKIP);
+   } else {
+   ksft_print_msg("mount(): Mounting tmpfs on /tmp: Success\n");
}
 
-   fd = openat(AT_FDCWD, "/tmp", O_WRONLY|O_TMPFILE, 0600);
+
+   fd = openat(AT_FDCWD, "/tmp", O_WRONLY | O_TMPFILE, 0600);
if (fd == -1) {
-   fprintf(stderr, "error: open 1, errno %d\n", errno);
-   return 1;
+   ksft_print_msg("openat() error: Open first temporary file: Fail 
%d\n", errno);
+   exit(KSFT_SKIP);
+   } else {
+   ksft_print_msg("openat(): Open first temporary file: 
Success\n");
}
+
+
if (linkat(fd, "", AT_FDCWD, "/tmp/1", AT_EMPTY_PATH) == -1) {
-   fprintf(stderr, "error: linkat, errno %d\n", errno);
-   return 1;
+   ksft_print_msg("linkat() error: Linking the temporary file: 
Fail %d\n", errno);
+   /* Ensure fd is closed on failure */
+   close(fd); 
+   exit(KSFT_SKIP);
+   } else {
+   ksft_print_msg("linkat(): Linking the temporary file: 
Success\n");
}
close(fd);
 
-   fd = openat(AT_FDCWD, "/tmp", O_WRONLY|O_TMPFILE, 0600);
+
+   fd = openat(AT_FDCWD, "/tmp", O_WRONLY | O_TMPFILE, 0600);
if (fd == -1) {
-   fprintf(stderr, "error: open 2, errno %d\n", errno);
-   return 1;
+   ksft_print_msg("openat() error: Opening the second temporary 
file: Fail %d\n", errno);
+   exit(KSFT_SKIP);
+   } else {
+   ksft_print_msg("openat(): Opening the second temporary file: 
Success\n");
}
 
+ksft_test_result_pass("Test : Success\n");
+   ksft_exit_pass();
return 0;
 }
+
-- 
2.45.2




[PATCH v4 0/2] kselftest: tmpfs: Add ksft macros and skip if no root

2024-11-05 Thread Shivam Chaudhary


This version 4 patch series replace direct error handling methods with ksft
macros, which provide better reporting.Currently, when the tmpfs test runs,
it does not display any output if it passes,and if it fails
(particularly when not run as root),it simply exits without any warning or 
message.

This series of patch adds:

1. Add 'ksft_print_header()' and 'ksft_set_plan()'
   to structure test outputs more effectively.

2. Skip if not run as root.

3. Replace direct error handling with 'ksft_test_result_*',
   'ksft_print_msg' and KSFT_SKIP  macros for better reporting.

v3->v4:
 - Start a patchset
 - Split patch into smaller pathes to make it easy to review.
  Patch1 Replace  'ksft_test_result_skip' with 'KSFT_SKIP' during root run 
check.
  Patch2 Replace  'ksft_test_result_fail' with 'KSFT_SKIP' where fail does not 
make sense,
 or failure could be due to not unsupported APIs with appropriate 
warnings.
  

v3: https://lore.kernel.org/all/20241028185756.111832-1-cvam0...@gmail.com/

v2->v3:
- Remove extra ksft_set_plan()
- Remove function for unshare()
- Fix the comment style
v2: https://lore.kernel.org/all/20241026191621.2860376-1-cvam0...@gmail.com/

v1->v2:
- Make the commit message more clear.
v1: https://lore.kernel.org/all/20241024200228.1075840-1-cvam0...@gmail.com/T/#u


thanks 
Shivam

Shivam Chaudhary (2):
  selftests:tmpfs: Add Skip test if not run as root
  selftests:tmpfs: Add kselftest support to tmpfs

 .../selftests/tmpfs/bug-link-o-tmpfile.c  | 79 +++
 1 file changed, 62 insertions(+), 17 deletions(-)

-- 
2.45.2



Re: [PATCH net 3/4] virtio_net: Sync rss config to device when virtnet_probe

2024-11-05 Thread Joe Damato
On Mon, Nov 04, 2024 at 04:57:05PM +0800, Philo Lu wrote:
> During virtnet_probe, default rss configuration is initialized, but was
> not committed to the device. This patch fix this by sending rss command
> after device ready in virtnet_probe. Otherwise, the actual rss
> configuration used by device can be different with that read by user
> from driver, which may confuse the user.
> 
> If the command committing fails, driver rss will be disabled.
> 
> Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
> Signed-off-by: Philo Lu 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio_net.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index acc3e5dc112e..59d9fdf562e0 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -6584,6 +6584,15 @@ static int virtnet_probe(struct virtio_device *vdev)
>  
>   virtio_device_ready(vdev);
>  
> + if (vi->has_rss || vi->has_rss_hash_report) {
> + if (!virtnet_commit_rss_command(vi)) {
> + dev_warn(&vdev->dev, "RSS disabled because committing 
> failed.\n");
> + dev->hw_features &= ~NETIF_F_RXHASH;
> + vi->has_rss_hash_report = false;
> + vi->has_rss = false;
> + }
> + }
> +
>   virtnet_set_queues(vi, vi->curr_queue_pairs);
>  
>   /* a random MAC address has been assigned, notify the device.

Acked-by: Joe Damato 



Re: [PATCH net 2/4] virtio_net: Add hash_key_length check

2024-11-05 Thread Joe Damato
On Mon, Nov 04, 2024 at 04:57:04PM +0800, Philo Lu wrote:
> Add hash_key_length check in virtnet_probe() to avoid possible out of
> bound errors when setting/reading the hash key.
> 
> Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
> Signed-off-by: Philo Lu 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio_net.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 75c1ff4efd13..acc3e5dc112e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -6451,6 +6451,12 @@ static int virtnet_probe(struct virtio_device *vdev)
>   if (vi->has_rss || vi->has_rss_hash_report) {
>   vi->rss_key_size =
>   virtio_cread8(vdev, offsetof(struct virtio_net_config, 
> rss_max_key_size));
> + if (vi->rss_key_size > VIRTIO_NET_RSS_MAX_KEY_SIZE) {
> + dev_err(&vdev->dev, "rss_max_key_size=%u exceeds the 
> limit %u.\n",
> + vi->rss_key_size, VIRTIO_NET_RSS_MAX_KEY_SIZE);
> + err = -EINVAL;
> + goto free;
> + }

I agree that an out of bounds error could occur and a check here
is needed.

I have no idea if returning -EINVAL from probe is the correct
solution (vs say using min()) as I am just a casual observer of
virtio_net and not a maintainer.

Acked-by: Joe Damato 



Re: [PATCH net 1/4] virtio_net: Support dynamic rss indirection table size

2024-11-05 Thread Joe Damato
On Mon, Nov 04, 2024 at 04:57:03PM +0800, Philo Lu wrote:
> When reading/writing virtio_net_ctrl_rss, we get the indirection table
> size from vi->rss_indir_table_size, which is initialized in
> virtnet_probe(). However, the actual size of indirection_table was set
> as VIRTIO_NET_RSS_MAX_TABLE_LEN=128. This collision may cause issues if
> the vi->rss_indir_table_size exceeds 128.
> 
> This patch instead uses dynamic indirection table, allocated with
> vi->rss after vi->rss_indir_table_size initialized. And free it in
> virtnet_remove().
> 
> In virtnet_commit_rss_command(), sgs for rss is initialized differently
> with hash_report. So indirection_table is not used if !vi->has_rss, and
> then we don't need to alloc indirection_table for hash_report only uses.
> 
> Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
> Signed-off-by: Philo Lu 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio_net.c | 39 ++-
>  1 file changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 869586c17ffd..75c1ff4efd13 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -368,15 +368,16 @@ struct receive_queue {
>   * because table sizes may be differ according to the device configuration.
>   */
>  #define VIRTIO_NET_RSS_MAX_KEY_SIZE 40
> -#define VIRTIO_NET_RSS_MAX_TABLE_LEN128
>  struct virtio_net_ctrl_rss {
>   u32 hash_types;
>   u16 indirection_table_mask;
>   u16 unclassified_queue;
> - u16 indirection_table[VIRTIO_NET_RSS_MAX_TABLE_LEN];
> + u16 hash_cfg_reserved; /* for HASH_CONFIG (see virtio_net_hash_config 
> for details) */
>   u16 max_tx_vq;
>   u8 hash_key_length;
>   u8 key[VIRTIO_NET_RSS_MAX_KEY_SIZE];
> +
> + u16 *indirection_table;
>  };
>  
>  /* Control VQ buffers: protected by the rtnl lock */
> @@ -512,6 +513,25 @@ static struct sk_buff *virtnet_skb_append_frag(struct 
> sk_buff *head_skb,
>  struct page *page, void *buf,
>  int len, int truesize);
>  
> +static int rss_indirection_table_alloc(struct virtio_net_ctrl_rss *rss, u16 
> indir_table_size)
> +{
> + if (!indir_table_size) {
> + rss->indirection_table = NULL;
> + return 0;
> + }
> +
> + rss->indirection_table = kmalloc_array(indir_table_size, sizeof(u16), 
> GFP_KERNEL);
> + if (!rss->indirection_table)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static void rss_indirection_table_free(struct virtio_net_ctrl_rss *rss)
> +{
> + kfree(rss->indirection_table);
> +}
> +
>  static bool is_xdp_frame(void *ptr)
>  {
>   return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> @@ -3828,11 +3848,15 @@ static bool virtnet_commit_rss_command(struct 
> virtnet_info *vi)
>   /* prepare sgs */
>   sg_init_table(sgs, 4);
>  
> - sg_buf_size = offsetof(struct virtio_net_ctrl_rss, indirection_table);
> + sg_buf_size = offsetof(struct virtio_net_ctrl_rss, hash_cfg_reserved);
>   sg_set_buf(&sgs[0], &vi->rss, sg_buf_size);
>  
> - sg_buf_size = sizeof(uint16_t) * (vi->rss.indirection_table_mask + 1);
> - sg_set_buf(&sgs[1], vi->rss.indirection_table, sg_buf_size);
> + if (vi->has_rss) {
> + sg_buf_size = sizeof(uint16_t) * vi->rss_indir_table_size;
> + sg_set_buf(&sgs[1], vi->rss.indirection_table, sg_buf_size);
> + } else {
> + sg_set_buf(&sgs[1], &vi->rss.hash_cfg_reserved, 
> sizeof(uint16_t));
> + }
>  
>   sg_buf_size = offsetof(struct virtio_net_ctrl_rss, key)
>   - offsetof(struct virtio_net_ctrl_rss, max_tx_vq);
> @@ -6420,6 +6444,9 @@ static int virtnet_probe(struct virtio_device *vdev)
>   virtio_cread16(vdev, offsetof(struct virtio_net_config,
>   rss_max_indirection_table_length));
>   }
> + err = rss_indirection_table_alloc(&vi->rss, vi->rss_indir_table_size);
> + if (err)
> + goto free;
>  
>   if (vi->has_rss || vi->has_rss_hash_report) {
>   vi->rss_key_size =
> @@ -6674,6 +6701,8 @@ static void virtnet_remove(struct virtio_device *vdev)
>  
>   remove_vq_common(vi);
>  
> + rss_indirection_table_free(&vi->rss);
> +
>   free_netdev(vi->dev);
>  }
>

I'm not an expert on virtio, so I don't feel comfortable giving a
Reviewed-by, but this does seem to fix a potential out of bounds
access in virtnet_init_default_rss if rss_indir_table_size were
larger than VIRTIO_NET_RSS_MAX_TABLE_LEN (128).

Acked-by: Joe Damato 



[PATCH v4 1/2] selftests:tmpfs: Add Skip test if not run as root

2024-11-05 Thread Shivam Chaudhary
Add skip test if  not run as root, with an appropriate Warning.

Add 'ksft_print_header()' and 'ksft_set_plan()' to structure test
outputs more effectively.

Test logs :

Before change:

- Without root
 error: unshare, errno 1

- With root
 No, output

After change:

- Without root
TAP version 13
1..1

- With root
TAP version 13
1..1

Signed-off-by: Shivam Chaudhary 
---
  
 tools/testing/selftests/tmpfs/bug-link-o-tmpfile.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tools/testing/selftests/tmpfs/bug-link-o-tmpfile.c 
b/tools/testing/selftests/tmpfs/bug-link-o-tmpfile.c
index b5c3ddb90942..cdab1e8c0392 100644
--- a/tools/testing/selftests/tmpfs/bug-link-o-tmpfile.c
+++ b/tools/testing/selftests/tmpfs/bug-link-o-tmpfile.c
@@ -23,10 +23,23 @@
 #include 
 #include 
 
+#include "../kselftest.h"
+
 int main(void)
 {
int fd;
 
+   /* Setting up kselftest framework */
+   ksft_print_header();
+   ksft_set_plan(1);
+
+   /* Check if test is run as root */
+   if (geteuid()) {
+   ksft_print_msg("Skip : Need to run as root");
+   exit(KSFT_SKIP);
+
+   }
+
if (unshare(CLONE_NEWNS) == -1) {
if (errno == ENOSYS || errno == EPERM) {
fprintf(stderr, "error: unshare, errno %d\n", errno);
-- 
2.45.2



Re: [PATCH net 4/4] virtio_net: Update rss when set queue

2024-11-05 Thread Joe Damato
On Mon, Nov 04, 2024 at 04:57:06PM +0800, Philo Lu wrote:
> RSS configuration should be updated with queue number. In particular, it
> should be updated when (1) rss enabled and (2) default rss configuration
> is used without user modification.
> 
> During rss command processing, device updates queue_pairs using
> rss.max_tx_vq. That is, the device updates queue_pairs together with
> rss, so we can skip the sperate queue_pairs update
> (VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly.
> 
> Also remove the `vi->has_rss ?` check when setting vi->rss.max_tx_vq,
> because this is not used in the other hash_report case.
> 
> Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
> Signed-off-by: Philo Lu 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio_net.c | 65 +++-
>  1 file changed, 51 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 59d9fdf562e0..189afad3ffaa 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3394,15 +3394,59 @@ static void virtnet_ack_link_announce(struct 
> virtnet_info *vi)
>   dev_warn(&vi->dev->dev, "Failed to ack link announce.\n");
>  }
>  
> +static bool virtnet_commit_rss_command(struct virtnet_info *vi);
> +
> +static void virtnet_rss_update_by_qpairs(struct virtnet_info *vi, u16 
> queue_pairs)
> +{
> + u32 indir_val = 0;
> + int i = 0;
> +
> + for (; i < vi->rss_indir_table_size; ++i) {
> + indir_val = ethtool_rxfh_indir_default(i, queue_pairs);
> + vi->rss.indirection_table[i] = indir_val;
> + }
> + vi->rss.max_tx_vq = queue_pairs;
> +}
> +
>  static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
>  {
>   struct virtio_net_ctrl_mq *mq __free(kfree) = NULL;
> - struct scatterlist sg;
> + struct virtio_net_ctrl_rss old_rss;
>   struct net_device *dev = vi->dev;
> + struct scatterlist sg;
>  
>   if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
>   return 0;
>  
> + /* Firstly check if we need update rss. Do updating if both (1) rss 
> enabled and
> +  * (2) no user configuration.
> +  *
> +  * During rss command processing, device updates queue_pairs using 
> rss.max_tx_vq. That is,
> +  * the device updates queue_pairs together with rss, so we can skip the 
> sperate queue_pairs
> +  * update (VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly.
> +  */
> + if (vi->has_rss && !netif_is_rxfh_configured(dev)) {

Does there need to be an error case when:

vi->has_rss && netif_is_rxfh_configured(dev)

to return EINVAL? I noted that other drivers don't let users adjust
the queue count and return error in this case.


> + memcpy(&old_rss, &vi->rss, sizeof(old_rss));
> + if (rss_indirection_table_alloc(&vi->rss, 
> vi->rss_indir_table_size)) {
> + vi->rss.indirection_table = old_rss.indirection_table;
> + return -ENOMEM;
> + }
> +
> + virtnet_rss_update_by_qpairs(vi, queue_pairs);
> +
> + if (!virtnet_commit_rss_command(vi)) {
> + /* restore ctrl_rss if commit_rss_command failed */
> + rss_indirection_table_free(&vi->rss);
> + memcpy(&vi->rss, &old_rss, sizeof(old_rss));
> +
> + dev_warn(&dev->dev, "Fail to set num of queue pairs to 
> %d, because committing RSS failed\n",
> +  queue_pairs);
> + return -EINVAL;
> + }
> + rss_indirection_table_free(&old_rss);
> + goto succ;
> + }
> +



Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long

2024-11-05 Thread Stanislav Fomichev
On 11/05, Mina Almasry wrote:
> On Wed, Oct 30, 2024 at 8:07 AM Stanislav Fomichev  
> wrote:
> >
> > On 10/30, Mina Almasry wrote:
> > > On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev  
> > > wrote:
> > > >
> > > > On 10/29, Mina Almasry wrote:
> > > > > Check we're going to free a reasonable number of frags in token_count
> > > > > before starting the loop, to prevent looping too long.
> > > > >
> > > > > Also minor code cleanups:
> > > > > - Flip checks to reduce indentation.
> > > > > - Use sizeof(*tokens) everywhere for consistentcy.
> > > > >
> > > > > Cc: Yi Lai 
> > > > >
> > > > > Signed-off-by: Mina Almasry 
> > > > >
> > > > > ---
> > > > >  net/core/sock.c | 46 --
> > > > >  1 file changed, 28 insertions(+), 18 deletions(-)
> > > > >
> > > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > > index 7f398bd07fb7..8603b8d87f2e 100644
> > > > > --- a/net/core/sock.c
> > > > > +++ b/net/core/sock.c
> > > > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock 
> > > > > *sk, int bytes)
> > > > >
> > > > >  #ifdef CONFIG_PAGE_POOL
> > > > >
> > > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED 
> > > > > in
> > > > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED 
> > > > > in
> > > > >   * 1 syscall. The limit exists to limit the amount of memory the 
> > > > > kernel
> > > > > - * allocates to copy these tokens.
> > > > > + * allocates to copy these tokens, and to prevent looping over the 
> > > > > frags for
> > > > > + * too long.
> > > > >   */
> > > > > -#define MAX_DONTNEED_TOKENS 128
> > > > > +#define MAX_DONTNEED_FRAGS 1024
> > > > >
> > > > >  static noinline_for_stack int
> > > > >  sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int 
> > > > > optlen)
> > > > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, 
> > > > > sockptr_t optval, unsigned int optlen)
> > > > >   unsigned int num_tokens, i, j, k, netmem_num = 0;
> > > > >   struct dmabuf_token *tokens;
> > > > >   netmem_ref netmems[16];
> > > > > + u64 num_frags = 0;
> > > > >   int ret = 0;
> > > > >
> > > > >   if (!sk_is_tcp(sk))
> > > > >   return -EBADF;
> > > > >
> > > > > - if (optlen % sizeof(struct dmabuf_token) ||
> > > > > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS)
> > > > > + if (optlen % sizeof(*tokens) ||
> > > > > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS)
> > > > >   return -EINVAL;
> > > > >
> > > > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL);
> > > > > + num_tokens = optlen / sizeof(*tokens);
> > > > > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), 
> > > > > GFP_KERNEL);
> > > > >   if (!tokens)
> > > > >   return -ENOMEM;
> > > > >
> > > > > - num_tokens = optlen / sizeof(struct dmabuf_token);
> > > > >   if (copy_from_sockptr(tokens, optval, optlen)) {
> > > > >   kvfree(tokens);
> > > > >   return -EFAULT;
> > > > >   }
> > > > >
> > > > > + for (i = 0; i < num_tokens; i++) {
> > > > > + num_frags += tokens[i].token_count;
> > > > > + if (num_frags > MAX_DONTNEED_FRAGS) {
> > > > > + kvfree(tokens);
> > > > > + return -E2BIG;
> > > > > + }
> > > > > + }
> > > > > +
> > > > >   xa_lock_bh(&sk->sk_user_frags);
> > > > >   for (i = 0; i < num_tokens; i++) {
> > > > >   for (j = 0; j < tokens[i].token_count; j++) {
> > > > >   netmem_ref netmem = (__force 
> > > > > netmem_ref)__xa_erase(
> > > > >   &sk->sk_user_frags, 
> > > > > tokens[i].token_start + j);
> > > > >
> > > > > - if (netmem &&
> > > > > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) {
> > > > > - netmems[netmem_num++] = netmem;
> > > > > - if (netmem_num == ARRAY_SIZE(netmems)) {
> > > > > - 
> > > > > xa_unlock_bh(&sk->sk_user_frags);
> > > > > - for (k = 0; k < netmem_num; k++)
> > > > > - 
> > > > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
> > > > > - netmem_num = 0;
> > > > > - xa_lock_bh(&sk->sk_user_frags);
> > > > > - }
> > > > > - ret++;
> > > >
> > > > [..]
> > > >
> > > > > + if (!netmem || 
> > > > > WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
> > > > > + continue;
> > > >
> > > > Any reason we are not returning explicit error to the callers here?
> > > > That probably needs some mechanism to signal which particular one failed
> > > > so the users can restart?
> > >
> > > Only bec

Re: [PATCH v5 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

2024-11-05 Thread Reinette Chatre
Hi Maciej,

On 10/29/24 6:00 AM, Maciej Wieczor-Retman wrote:
> diff --git a/tools/testing/selftests/resctrl/resctrl.h 
> b/tools/testing/selftests/resctrl/resctrl.h
> index 2dda56084588..851b37c9c38a 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -43,6 +44,8 @@
>  
>  #define DEFAULT_SPAN (250 * MB)
>  
> +#define MAX_SNC  4
> +

fyi, it seems that 6 is the new max:
https://lore.kernel.org/all/20241031220213.17991-1-tony.l...@intel.com/

Reinette




Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long

2024-11-05 Thread Mina Almasry
On Wed, Oct 30, 2024 at 8:07 AM Stanislav Fomichev  wrote:
>
> On 10/30, Mina Almasry wrote:
> > On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev  
> > wrote:
> > >
> > > On 10/29, Mina Almasry wrote:
> > > > Check we're going to free a reasonable number of frags in token_count
> > > > before starting the loop, to prevent looping too long.
> > > >
> > > > Also minor code cleanups:
> > > > - Flip checks to reduce indentation.
> > > > - Use sizeof(*tokens) everywhere for consistentcy.
> > > >
> > > > Cc: Yi Lai 
> > > >
> > > > Signed-off-by: Mina Almasry 
> > > >
> > > > ---
> > > >  net/core/sock.c | 46 --
> > > >  1 file changed, 28 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > index 7f398bd07fb7..8603b8d87f2e 100644
> > > > --- a/net/core/sock.c
> > > > +++ b/net/core/sock.c
> > > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, 
> > > > int bytes)
> > > >
> > > >  #ifdef CONFIG_PAGE_POOL
> > > >
> > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in
> > > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in
> > > >   * 1 syscall. The limit exists to limit the amount of memory the kernel
> > > > - * allocates to copy these tokens.
> > > > + * allocates to copy these tokens, and to prevent looping over the 
> > > > frags for
> > > > + * too long.
> > > >   */
> > > > -#define MAX_DONTNEED_TOKENS 128
> > > > +#define MAX_DONTNEED_FRAGS 1024
> > > >
> > > >  static noinline_for_stack int
> > > >  sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int 
> > > > optlen)
> > > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t 
> > > > optval, unsigned int optlen)
> > > >   unsigned int num_tokens, i, j, k, netmem_num = 0;
> > > >   struct dmabuf_token *tokens;
> > > >   netmem_ref netmems[16];
> > > > + u64 num_frags = 0;
> > > >   int ret = 0;
> > > >
> > > >   if (!sk_is_tcp(sk))
> > > >   return -EBADF;
> > > >
> > > > - if (optlen % sizeof(struct dmabuf_token) ||
> > > > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS)
> > > > + if (optlen % sizeof(*tokens) ||
> > > > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS)
> > > >   return -EINVAL;
> > > >
> > > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL);
> > > > + num_tokens = optlen / sizeof(*tokens);
> > > > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL);
> > > >   if (!tokens)
> > > >   return -ENOMEM;
> > > >
> > > > - num_tokens = optlen / sizeof(struct dmabuf_token);
> > > >   if (copy_from_sockptr(tokens, optval, optlen)) {
> > > >   kvfree(tokens);
> > > >   return -EFAULT;
> > > >   }
> > > >
> > > > + for (i = 0; i < num_tokens; i++) {
> > > > + num_frags += tokens[i].token_count;
> > > > + if (num_frags > MAX_DONTNEED_FRAGS) {
> > > > + kvfree(tokens);
> > > > + return -E2BIG;
> > > > + }
> > > > + }
> > > > +
> > > >   xa_lock_bh(&sk->sk_user_frags);
> > > >   for (i = 0; i < num_tokens; i++) {
> > > >   for (j = 0; j < tokens[i].token_count; j++) {
> > > >   netmem_ref netmem = (__force 
> > > > netmem_ref)__xa_erase(
> > > >   &sk->sk_user_frags, tokens[i].token_start 
> > > > + j);
> > > >
> > > > - if (netmem &&
> > > > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) {
> > > > - netmems[netmem_num++] = netmem;
> > > > - if (netmem_num == ARRAY_SIZE(netmems)) {
> > > > - xa_unlock_bh(&sk->sk_user_frags);
> > > > - for (k = 0; k < netmem_num; k++)
> > > > - 
> > > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
> > > > - netmem_num = 0;
> > > > - xa_lock_bh(&sk->sk_user_frags);
> > > > - }
> > > > - ret++;
> > >
> > > [..]
> > >
> > > > + if (!netmem || 
> > > > WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
> > > > + continue;
> > >
> > > Any reason we are not returning explicit error to the callers here?
> > > That probably needs some mechanism to signal which particular one failed
> > > so the users can restart?
> >
> > Only because I can't think of a simple way to return an array of frags
> > failed to DONTNEED to the user.
>
> I'd expect the call to return as soon as it hits the invalid frag
> entry (plus the number of entries that it successfully refilled up to
> the invalid one). But too late I guess.
>
> > Also, this error should be 

Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long

2024-11-05 Thread Mina Almasry
On Tue, Nov 5, 2024 at 1:46 PM Stanislav Fomichev  wrote:
> > > > Also, the information is useless to the user. If the user sees 'frag
> > > > 128 failed to free'. There is nothing really the user can do to
> > > > recover at runtime. Only usefulness that could come is for the user to
> > > > log the error. We already WARN_ON_ONCE on the error the user would not
> > > > be able to trigger.
> > >
> > > I'm thinking from the pow of user application. It might have bugs as
> > > well and try to refill something that should not have been refilled.
> > > Having info about which particular token has failed (even just for
> > > the logging purposes) might have been nice.
> >
> > Yeah, it may have been nice. On the flip side it complicates calling
> > sock_devmem_dontneed(). The userspace need to count the freed frags in
> > its input, remove them, skip the leaked one, and re-call the syscall.
> > On the flipside the userspace gets to know the id of the frag that
> > leaked but the usefulness of the information is slightly questionable
> > for me. :shrug:
>
> Right, because I was gonna suggest for this patch, instead of having
> a separate extra loop that returns -E2BIG (since this loop is basically
> mostly cycles wasted assuming most of the calls are gonna be well behaved),
> can we keep num_frags freed as we go and stop and return once
> we reach MAX_DONTNEED_FRAGS?
>
> for (i = 0; i < num_tokens; i++) {
> for (j ...) {
> netmem_ref netmem ...
> ...
> }
> num_frags += tokens[i].token_count;
> if (num_frags > MAX_DONTNEED_FRAGS)
> return ret;
> }
>
> Or do you still find it confusing because userspace has to handle that?

Ah, I don't think this will work, because it creates this scenario:

- user calls SO_DEVMEM_DONTNEED passing 1030 tokens.
- Kernel returns 500 freed.
- User doesn't know whether:
(a)  The remaining 530 tokens are all attached to the last
tokens.count and that's why the kernel returned early, or
(b) the kernel leaked 530 tokens because it could not find any of them
in the sk_user_frags.

In (a) the user is supposed to recall SO_DEVMEM_DONTNEED on the
remaining 530 tokens, but in (b) the user is not supposed to do that
(the tokens have leaked and there is nothing the user can do to
recover).

The current interface is more simple. The kernel either returns an
error (nothing has been freed): recall SO_DEVMEM_DONTNEED on all the
tokens after resolving the error, or,

the kernel returns a positive value which means all the tokens have
been freed (or unrecoverably leaked), and the userspace must not call
SO_DEVMEM_DONTNEED on this batch again.


--
Thanks,
Mina



Re: [PATCH v8 2/3] modpost: Produce extended MODVERSIONS information

2024-11-05 Thread Matthew Maurer
On Fri, Nov 1, 2024 at 2:10 PM Luis Chamberlain  wrote:
>
> On Thu, Oct 31, 2024 at 01:00:28PM -0700, Matthew Maurer wrote:
> > > The question is, if only extended moversions are used, what new tooling
> > > requirements are there? Can you test using only extended modversions?
> > >
> > >   Luis
> >
> > I'm not sure precisely what you're asking for. Do you want:
> > 1. A kconfig that suppresses the emission of today's MODVERSIONS
> > format?
>
> Yes that's right, a brave new world, and with the warning of that.

OK, I can send another revision with a suppression config, perhaps
CONFIG_NO_BASIC_MODVERSIONS

>
>
> > This would be fairly easy to do, but I was leaving it enabled
> > for compatibility's sake, at least until extended modversions become
> > more common. This way existing `kmod` tools and kernels would continue
> > to be able to load new-style modules.
>
> Sure, understood why we'd have both.
>
> > 2. libkmod support for parsing the new format? I can do that fairly
> > easily too, but wanted the format actually decided on and accepted
> > before I started modifying things that read modversions.
>
> This is implied, what I'd like is for an A vs B comparison to be able to
> be done on even without rust modules, so that we can see if really
> libkmod changes are all that's needed. Does boot fail without a new
> libkmod for this? If so the Kconfig should specificy that for this new
> brave new world.

libkmod changes are not needed for boot - the userspace tools do not
examine this data for anything inline with boot at the moment, libkmod
only looks at it for kmod_module_get_versions, and modprobe only looks
at that with --show-modversions or --dump-modversions, which are not
normally part of boot.

With the code as is, the only change will be that if a module with
EXTENDED_MODVERSIONS set contains an over-length symbol (which
wouldn't have been possible before), the overlong symbol's modversion
data will not appear in --show-modversions. After patching `libkmod`
in a follow-up patch, long symbols would appear as well. If booted
against an old kernel, long symbols will not have their CRCs in the
list to be checked. However, the old kernel could not export these
symbols, so it will fail to resolve the symbol and fail the load
regardless.

If we add and enable NO_BASIC_MODVERSIONS like you suggested above,
today's --show-modversions will claim there is no modversions data.
Applying a libkmod patch will result in modversions info being
displayed by that command again. If booted against a new kernel,
everything will be fine. If booted against an old kernel, it will
behave as though there is no modversions information.

>
>
> If a distribution can leverage just one format, why would they not
> consider it if they can ensure the proper tooling is in place. We
> haven't itemized the differences in practice and this could help
> with this. One clear difference so far is the kabi stuff, but that's

The kabi stuff is at least partially decoupled - you can (and it
sounds like from the responses to Sami's change, occasionally might
want to) enable debug symbol based modversions even without extended
modversions. You can also enable extended modversions without the
debug symbol based modversions, though there are less clear use-cases
for that.

> just evaluating one way of doing things so far, I suspect we'll get
> more review on that from Petr soon.
>
>   Luis



Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long

2024-11-05 Thread Stanislav Fomichev
On 11/05, Mina Almasry wrote:
> On Tue, Nov 5, 2024 at 1:46 PM Stanislav Fomichev  
> wrote:
> > > > > Also, the information is useless to the user. If the user sees 'frag
> > > > > 128 failed to free'. There is nothing really the user can do to
> > > > > recover at runtime. Only usefulness that could come is for the user to
> > > > > log the error. We already WARN_ON_ONCE on the error the user would not
> > > > > be able to trigger.
> > > >
> > > > I'm thinking from the pow of user application. It might have bugs as
> > > > well and try to refill something that should not have been refilled.
> > > > Having info about which particular token has failed (even just for
> > > > the logging purposes) might have been nice.
> > >
> > > Yeah, it may have been nice. On the flip side it complicates calling
> > > sock_devmem_dontneed(). The userspace need to count the freed frags in
> > > its input, remove them, skip the leaked one, and re-call the syscall.
> > > On the flipside the userspace gets to know the id of the frag that
> > > leaked but the usefulness of the information is slightly questionable
> > > for me. :shrug:
> >
> > Right, because I was gonna suggest for this patch, instead of having
> > a separate extra loop that returns -E2BIG (since this loop is basically
> > mostly cycles wasted assuming most of the calls are gonna be well behaved),
> > can we keep num_frags freed as we go and stop and return once
> > we reach MAX_DONTNEED_FRAGS?
> >
> > for (i = 0; i < num_tokens; i++) {
> > for (j ...) {
> > netmem_ref netmem ...
> > ...
> > }
> > num_frags += tokens[i].token_count;
> > if (num_frags > MAX_DONTNEED_FRAGS)
> > return ret;
> > }
> >
> > Or do you still find it confusing because userspace has to handle that?
> 
> Ah, I don't think this will work, because it creates this scenario:
> 
> - user calls SO_DEVMEM_DONTNEED passing 1030 tokens.
> - Kernel returns 500 freed.
> - User doesn't know whether:
> (a)  The remaining 530 tokens are all attached to the last
> tokens.count and that's why the kernel returned early, or
> (b) the kernel leaked 530 tokens because it could not find any of them
> in the sk_user_frags.
> 
> In (a) the user is supposed to recall SO_DEVMEM_DONTNEED on the
> remaining 530 tokens, but in (b) the user is not supposed to do that
> (the tokens have leaked and there is nothing the user can do to
> recover).

I kinda feel like people will still write code against internal limits anyway?
At least that's what we did with the internal version of your code: you
know that you can't return more than 128 tokens per call
so you don't even try. If you get an error, or ret != the expected
length, you kill the connection. It seems like there is no graceful
recovery from that?

Regarding your (a) vs (b) example, you can try to call DONTNEED another
time for both cases and either get non-zero and make some progress
or get 0 and give up?

> The current interface is more simple. The kernel either returns an
> error (nothing has been freed): recall SO_DEVMEM_DONTNEED on all the
> tokens after resolving the error, or,
> 
> the kernel returns a positive value which means all the tokens have
> been freed (or unrecoverably leaked), and the userspace must not call
> SO_DEVMEM_DONTNEED on this batch again.

Totally agree that it's more simple. But my worry is that we now
essentially waste a bunch of cpu looping over and testing for the
condition that's not gonna happed in a well-behaved applications.
But maybe I'm over blowing it, idk.

(I'm gonna wait for you to respin before formally sending acks because
 it's not clear which series goes where...)



Re: [PATCH v3] selftests: add new kallsyms selftests

2024-11-05 Thread Luis Chamberlain
On Tue, Oct 29, 2024 at 09:24:30AM -0700, Sami Tolvanen wrote:
> Hi Luis,
> 
> On Mon, Oct 21, 2024 at 12:33 PM Luis Chamberlain  wrote:
> >
> > diff --git a/lib/tests/module/gen_test_kallsyms.sh 
> > b/lib/tests/module/gen_test_kallsyms.sh
> > new file mode 100755
> > index ..e85f10dc11bd
> > --- /dev/null
> > +++ b/lib/tests/module/gen_test_kallsyms.sh
> > @@ -0,0 +1,128 @@
> [..]
> > +gen_template_module_data_b()
> > +{
> > +   printf "\nextern int auto_test_a_%010d;\n\n" 28
> > +   echo "static int auto_runtime_test(void)"
> > +   echo "{"
> > +   printf "\nreturn auto_test_a_%010d;\n" 28
> > +   echo "}"
> > +}
> 
> FYI, I get a warning when loading test_kallsyms_b because the init
> function return value is >0:

This fixes it.

>From b776d662c8e05d67c7879d0f6f5208dd431d900a Mon Sep 17 00:00:00 2001
From: Luis Chamberlain 
Date: Wed, 6 Nov 2024 00:17:21 +
Subject: [PATCH] tests/module/gen_test_kallsyms.sh: use 0 value for variables

Use 0 for the values as we use them for the return value on init
to keep the test modules simple. This fixes a splat reported

do_init_module: 'test_kallsyms_b'->init suspiciously returned 255, it should 
follow 0/-E convention
do_init_module: loading module anyway...
CPU: 5 UID: 0 PID: 1873 Comm: modprobe Not tainted 6.12.0-rc3+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.08-1 09/18/2024
Call Trace:
 
 dump_stack_lvl+0x56/0x80
 do_init_module.cold+0x21/0x26
 init_module_from_file+0x88/0xf0
 idempotent_init_module+0x108/0x300
 __x64_sys_finit_module+0x5a/0xb0
 do_syscall_64+0x4b/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f4f3a718839
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff>
RSP: 002b:7fff97d1a9e8 EFLAGS: 0246 ORIG_RAX: 0139
RAX: ffda RBX: 55b94001ab90 RCX: 7f4f3a718839
RDX:  RSI: 55b910e68a10 RDI: 0004
RBP:  R08: 7f4f3a7f1b20 R09: 55b94001c5b0
R10: 0040 R11: 0246 R12: 55b910e68a10
R13: 0004 R14: 55b94001ad60 R15: 
 
do_init_module: 'test_kallsyms_b'->init suspiciously returned 255, it should 
follow 0/-E convention
do_init_module: loading module anyway...
CPU: 1 UID: 0 PID: 1884 Comm: modprobe Not tainted 6.12.0-rc3+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.08-1 09/18/2024
Call Trace:
 
 dump_stack_lvl+0x56/0x80
 do_init_module.cold+0x21/0x26
 init_module_from_file+0x88/0xf0
 idempotent_init_module+0x108/0x300
 __x64_sys_finit_module+0x5a/0xb0
 do_syscall_64+0x4b/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7ffaa5d18839

Reported-by: Sami Tolvanen 
Signed-off-by: Luis Chamberlain 
---
 lib/tests/module/gen_test_kallsyms.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/tests/module/gen_test_kallsyms.sh 
b/lib/tests/module/gen_test_kallsyms.sh
index ae5966f1f904..3f2c626350ad 100755
--- a/lib/tests/module/gen_test_kallsyms.sh
+++ b/lib/tests/module/gen_test_kallsyms.sh
@@ -32,7 +32,7 @@ gen_num_syms()
PREFIX=$1
NUM=$2
for i in $(seq 1 $NUM); do
-   printf "int auto_test_%s_%010d = 0xff;\n" $PREFIX $i
+   printf "int auto_test_%s_%010d = 0;\n" $PREFIX $i
printf "EXPORT_SYMBOL_GPL(auto_test_%s_%010d);\n" $PREFIX $i
done
echo
-- 
2.45.2




[PATCH] tests/module/gen_test_kallsyms.sh: use 0 value for variables

2024-11-05 Thread Luis Chamberlain
Use 0 for the values as we use them for the return value on init
to keep the test modules simple. This fixes a splat reported

do_init_module: 'test_kallsyms_b'->init suspiciously returned 255, it should 
follow 0/-E convention
do_init_module: loading module anyway...
CPU: 5 UID: 0 PID: 1873 Comm: modprobe Not tainted 6.12.0-rc3+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.08-1 09/18/2024
Call Trace:
 
 dump_stack_lvl+0x56/0x80
 do_init_module.cold+0x21/0x26
 init_module_from_file+0x88/0xf0
 idempotent_init_module+0x108/0x300
 __x64_sys_finit_module+0x5a/0xb0
 do_syscall_64+0x4b/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f4f3a718839
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff>
RSP: 002b:7fff97d1a9e8 EFLAGS: 0246 ORIG_RAX: 0139
RAX: ffda RBX: 55b94001ab90 RCX: 7f4f3a718839
RDX:  RSI: 55b910e68a10 RDI: 0004
RBP:  R08: 7f4f3a7f1b20 R09: 55b94001c5b0
R10: 0040 R11: 0246 R12: 55b910e68a10
R13: 0004 R14: 55b94001ad60 R15: 
 
do_init_module: 'test_kallsyms_b'->init suspiciously returned 255, it should 
follow 0/-E convention
do_init_module: loading module anyway...
CPU: 1 UID: 0 PID: 1884 Comm: modprobe Not tainted 6.12.0-rc3+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.08-1 09/18/2024
Call Trace:
 
 dump_stack_lvl+0x56/0x80
 do_init_module.cold+0x21/0x26
 init_module_from_file+0x88/0xf0
 idempotent_init_module+0x108/0x300
 __x64_sys_finit_module+0x5a/0xb0
 do_syscall_64+0x4b/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7ffaa5d18839

Reported-by: Sami Tolvanen 
Signed-off-by: Luis Chamberlain 
---
 lib/tests/module/gen_test_kallsyms.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/tests/module/gen_test_kallsyms.sh 
b/lib/tests/module/gen_test_kallsyms.sh
index ae5966f1f904..3f2c626350ad 100755
--- a/lib/tests/module/gen_test_kallsyms.sh
+++ b/lib/tests/module/gen_test_kallsyms.sh
@@ -32,7 +32,7 @@ gen_num_syms()
PREFIX=$1
NUM=$2
for i in $(seq 1 $NUM); do
-   printf "int auto_test_%s_%010d = 0xff;\n" $PREFIX $i
+   printf "int auto_test_%s_%010d = 0;\n" $PREFIX $i
printf "EXPORT_SYMBOL_GPL(auto_test_%s_%010d);\n" $PREFIX $i
done
echo
-- 
2.45.2




Re: [PATCH net-next] selftests: net: include lib/sh/*.sh with lib.sh

2024-11-05 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Mon, 04 Nov 2024 12:34:26 +0100 you wrote:
> Recently, the net/lib.sh file has been modified to include defer.sh from
> net/lib/sh/ directory. The Makefile from net/lib has been modified
> accordingly, but not the ones from the sub-targets using net/lib.sh.
> 
> Because of that, the new file is not installed as expected when
> installing the Forwarding, MPTCP, and Netfilter targets, e.g.
> 
> [...]

Here is the summary with links:
  - [net-next] selftests: net: include lib/sh/*.sh with lib.sh
https://git.kernel.org/netdev/net-next/c/f72aa1b27628

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html





Re: [PATCH net 4/4] virtio_net: Update rss when set queue

2024-11-05 Thread Philo Lu




On 2024/11/6 04:31, Joe Damato wrote:

On Mon, Nov 04, 2024 at 04:57:06PM +0800, Philo Lu wrote:

RSS configuration should be updated with queue number. In particular, it
should be updated when (1) rss enabled and (2) default rss configuration
is used without user modification.

During rss command processing, device updates queue_pairs using
rss.max_tx_vq. That is, the device updates queue_pairs together with
rss, so we can skip the sperate queue_pairs update
(VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly.

Also remove the `vi->has_rss ?` check when setting vi->rss.max_tx_vq,
because this is not used in the other hash_report case.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Philo Lu 
Signed-off-by: Xuan Zhuo 
---
  drivers/net/virtio_net.c | 65 +++-
  1 file changed, 51 insertions(+), 14 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 59d9fdf562e0..189afad3ffaa 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3394,15 +3394,59 @@ static void virtnet_ack_link_announce(struct 
virtnet_info *vi)
dev_warn(&vi->dev->dev, "Failed to ack link announce.\n");
  }
  
+static bool virtnet_commit_rss_command(struct virtnet_info *vi);

+
+static void virtnet_rss_update_by_qpairs(struct virtnet_info *vi, u16 
queue_pairs)
+{
+   u32 indir_val = 0;
+   int i = 0;
+
+   for (; i < vi->rss_indir_table_size; ++i) {
+   indir_val = ethtool_rxfh_indir_default(i, queue_pairs);
+   vi->rss.indirection_table[i] = indir_val;
+   }
+   vi->rss.max_tx_vq = queue_pairs;
+}
+
  static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
  {
struct virtio_net_ctrl_mq *mq __free(kfree) = NULL;
-   struct scatterlist sg;
+   struct virtio_net_ctrl_rss old_rss;
struct net_device *dev = vi->dev;
+   struct scatterlist sg;
  
  	if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))

return 0;
  
+	/* Firstly check if we need update rss. Do updating if both (1) rss enabled and

+* (2) no user configuration.
+*
+* During rss command processing, device updates queue_pairs using 
rss.max_tx_vq. That is,
+* the device updates queue_pairs together with rss, so we can skip the 
sperate queue_pairs
+* update (VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly.
+*/
+   if (vi->has_rss && !netif_is_rxfh_configured(dev)) {


Does there need to be an error case when:

vi->has_rss && netif_is_rxfh_configured(dev)

to return EINVAL? I noted that other drivers don't let users adjust
the queue count and return error in this case.



In fact, there are 2 possible cases if users have adjusted rss, 
depending on the total queue pairs used in the indirection table (x), 
and the requested new queue count (y).


Case A: If y < x, it's illegal and will be rejected by
ethtool_check_max_channel().
Case B: If x <= y, we only adjust the queue number without touching the
rss configuration set by users.

So I don't think it necessary to add the check (if the above processing 
is agreed).


Thanks for your review, Joe.

--
Philo




[PATCH net-next v7] ipv6: Fix soft lockups in fib6_select_path under high next hop churn

2024-11-05 Thread Omid Ehtemam-Haghighi
Soft lockups have been observed on a cluster of Linux-based edge routers
located in a highly dynamic environment. Using the `bird` service, these
routers continuously update BGP-advertised routes due to frequently
changing nexthop destinations, while also managing significant IPv6
traffic. The lockups occur during the traversal of the multipath
circular linked-list in the `fib6_select_path` function, particularly
while iterating through the siblings in the list. The issue typically
arises when the nodes of the linked list are unexpectedly deleted
concurrently on a different core—indicated by their 'next' and
'previous' elements pointing back to the node itself and their reference
count dropping to zero. This results in an infinite loop, leading to a
soft lockup that triggers a system panic via the watchdog timer.

Apply RCU primitives in the problematic code sections to resolve the
issue. Where necessary, update the references to fib6_siblings to
annotate or use the RCU APIs.

Include a test script that reproduces the issue. The script
periodically updates the routing table while generating a heavy load
of outgoing IPv6 traffic through multiple iperf3 clients. It
consistently induces infinite soft lockups within a couple of minutes.

Kernel log:

 0 [bd13003e8d30] machine_kexec at 8ceaf3eb
 1 [bd13003e8d90] __crash_kexec at 8d0120e3
 2 [bd13003e8e58] panic at 8cef65d4
 3 [bd13003e8ed8] watchdog_timer_fn at 8d05cb03
 4 [bd13003e8f08] __hrtimer_run_queues at 8cfec62f
 5 [bd13003e8f70] hrtimer_interrupt at 8cfed756
 6 [bd13003e8fd0] __sysvec_apic_timer_interrupt at 8cea01af
 7 [bd13003e8ff0] sysvec_apic_timer_interrupt at 8df1b83d
--  --
 8 [bd13003d3708] asm_sysvec_apic_timer_interrupt at 8e000ecb
[exception RIP: fib6_select_path+299]
RIP: 8ddafe7b  RSP: bd13003d37b8  RFLAGS: 0287
RAX: 975850b43600  RBX: 975850b40200  RCX: 
RDX: 3fff  RSI: 51d383e4  RDI: 975850b43618
RBP: bd13003d3800   R8:    R9: 975850b40200
R10:   R11:   R12: bd13003d3830
R13: 975850b436a8  R14: 975850b43600  R15: 0007
ORIG_RAX:   CS: 0010  SS: 0018
 9 [bd13003d3808] ip6_pol_route at 8ddb030c
10 [bd13003d3888] ip6_pol_route_input at 8ddb068c
11 [bd13003d3898] fib6_rule_lookup at 8ddf02b5
12 [bd13003d3928] ip6_route_input at 8ddb0f47
13 [bd13003d3a18] ip6_rcv_finish_core.constprop.0 at 8dd950d0
14 [bd13003d3a30] ip6_list_rcv_finish.constprop.0 at 8dd96274
15 [bd13003d3a98] ip6_sublist_rcv at 8dd96474
16 [bd13003d3af8] ipv6_list_rcv at 8dd96615
17 [bd13003d3b60] __netif_receive_skb_list_core at 8dc16fec
18 [bd13003d3be0] netif_receive_skb_list_internal at 8dc176b3
19 [bd13003d3c50] napi_gro_receive at 8dc565b9
20 [bd13003d3c80] ice_receive_skb at c087e4f5 [ice]
21 [bd13003d3c90] ice_clean_rx_irq at c0881b80 [ice]
22 [bd13003d3d20] ice_napi_poll at c088232f [ice]
23 [bd13003d3d80] __napi_poll at 8dc18000
24 [bd13003d3db8] net_rx_action at 8dc18581
25 [bd13003d3e40] __do_softirq at 8df352e9
26 [bd13003d3eb0] run_ksoftirqd at 8ceffe47
27 [bd13003d3ec0] smpboot_thread_fn at 8cf36a30
28 [bd13003d3ee8] kthread at 8cf2b39f
29 [bd13003d3f28] ret_from_fork at 8ce5fa64
30 [bd13003d3f50] ret_from_fork_asm at 8ce03cbb

Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Reported-by: Adrian Oliver 
Signed-off-by: Omid Ehtemam-Haghighi 
Cc: David S. Miller 
Cc: David Ahern 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Cc: Shuah Khan 
Cc: Ido Schimmel 
Cc: Kuniyuki Iwashima 
Cc: Simon Horman 
Cc: Omid Ehtemam-Haghighi 
Cc: net...@vger.kernel.org
Cc: linux-kselft...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
v6 -> v7: 
* Rebased on top of 'net-next'

v5 -> v6:
* Adjust the comment line lengths in the test script to a maximum of
  80 characters
* Change memory allocation in inet6_rt_notify from gfp_any() to 
GFP_ATOMIC for
  atomic allocation in non-blocking contexts, as suggested by Ido 
Schimmel
* NOTE: I have executed the test script on both bare-metal servers and
  virtualized environments such as QEMU and vng. In the case of 
bare-metal, it
  consistently triggers a soft lockup in under a minute on unpatched 
kernels.
  For the virtualized environments, an unpatched kernel compiled with 
the
  Ubuntu 24.04 configuration also triggers a soft lockup, though it 
takes
  longer; however, it did not trigger a soft lockup on kernels compiled 
with
  configuration

Re: [PATCH net-next v11 00/23] Introducing OpenVPN Data Channel Offload

2024-11-05 Thread Sergey Ryazanov

Hi Antonio,

On 29.10.2024 12:47, Antonio Quartulli wrote:

Notable changes from v10:
* extended commit message of 23/23 with brief description of the output
* Link to v10: 
https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777...@openvpn.net

Please note that some patches were already reviewed by Andre Lunn,
Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag
since no major code modification has happened since the review.

The latest code can also be found at:

https://github.com/OpenVPN/linux-kernel-ovpn


As I promised many months ago I am starting publishing some nit picks 
regarding the series. The review was started when series was V3 
"rebasing" the review to every next version to publish it at once. But I 
lost this race to the new version releasing velocity :) So, I am going 
to publish it patch-by-patch.


Anyway you and all participants have done a great progress toward making 
accelerator part of the kernel. Most of considerable things already 
resolved so do not wait me please to finish picking every nit.


Regarding "big" topics I have only two concerns: link creation using 
RTNL and a switch statement usage. In the corresponding thread, I asked 
Jiri to clarify that "should" regarding .newlink implementation. Hope he 
will have a chance to find a time to reply.


For the 'switch' statement, I see a repeating pattern of handling 
mode-or family-specific cases like this:


int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
{
  switch (ovpn->mode) {
  case OVPN_MODE_MP:
return ovpn_peer_add_mp(ovpn, peer);
  case OVPN_MODE_P2P:
return ovpn_peer_add_p2p(ovpn, peer);
  default:
return -EOPNOTSUPP;
  }
}

or

void ovpn_encrypt_post(void *data, int ret)
{
  ...
  switch (peer->sock->sock->sk->sk_protocol) {
  case IPPROTO_UDP:
ovpn_udp_send_skb(peer->ovpn, peer, skb);
break;
  case IPPROTO_TCP:
ovpn_tcp_send_skb(peer, skb);
break;
  default:
/* no transport configured yet */
goto err;
  }
  ...
}

or

void ovpn_peer_keepalive_work(...)
{
  ...
  switch (ovpn->mode) {
  case OVPN_MODE_MP:
next_run = ovpn_peer_keepalive_work_mp(ovpn, now);
break;
  case OVPN_MODE_P2P:
next_run = ovpn_peer_keepalive_work_p2p(ovpn, now);
break;
  }
  ...
}

Did you consider to implement mode specific operations as a set of 
operations like this:


ovpn_ops {
  int (*peer_add)(struct ovpn_struct *ovpn, struct ovpn_peer *peer);
  int (*peer_del)(struct ovpn_peer *peer, enum ovpn_del_peer_reason 
reason);

  void (*send_skb)(struct ovpn_peer *peer, struct sk_buff *skb);
  time64_t (*keepalive_work)(...);
};

Initialize them during the interface creation and invoke these 
operations indirectly. E.g.


int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
{
  return ovpn->ops->peer_add(ovpn, peer);
}

void ovpn_encrypt_post(void *data, int ret)
{
  ...
  ovpn->ops->send_skb(peer, skb);
  ...
}

void ovpn_peer_keepalive_work(...)
{
  ...
  next_run = ovpn->ops->keepalive_work(ovpn, now);
  ...
}

Anyway the module has all these option values in advance during the 
network interface creation phase and I believe replacing 'switch' 
statements with indirect calls can make code easy to read.


--
Sergey



Re: [PATCH v8 2/3] modpost: Produce extended MODVERSIONS information

2024-11-05 Thread Luis Chamberlain
On Tue, Nov 05, 2024 at 04:26:51PM -0800, Matthew Maurer wrote:
> On Fri, Nov 1, 2024 at 2:10 PM Luis Chamberlain  wrote:
> >
> > On Thu, Oct 31, 2024 at 01:00:28PM -0700, Matthew Maurer wrote:
> > > > The question is, if only extended moversions are used, what new tooling
> > > > requirements are there? Can you test using only extended modversions?
> > > >
> > > >   Luis
> > >
> > > I'm not sure precisely what you're asking for. Do you want:
> > > 1. A kconfig that suppresses the emission of today's MODVERSIONS
> > > format?
> >
> > Yes that's right, a brave new world, and with the warning of that.
> 
> OK, I can send another revision with a suppression config, perhaps
> CONFIG_NO_BASIC_MODVERSIONS

Great.

> > > This would be fairly easy to do, but I was leaving it enabled
> > > for compatibility's sake, at least until extended modversions become
> > > more common. This way existing `kmod` tools and kernels would continue
> > > to be able to load new-style modules.
> >
> > Sure, understood why we'd have both.
> >
> > > 2. libkmod support for parsing the new format? I can do that fairly
> > > easily too, but wanted the format actually decided on and accepted
> > > before I started modifying things that read modversions.
> >
> > This is implied, what I'd like is for an A vs B comparison to be able to
> > be done on even without rust modules, so that we can see if really
> > libkmod changes are all that's needed. Does boot fail without a new
> > libkmod for this? If so the Kconfig should specificy that for this new
> > brave new world.
> 
> libkmod changes are not needed for boot - the userspace tools do not
> examine this data for anything inline with boot at the moment, libkmod
> only looks at it for kmod_module_get_versions, and modprobe only looks
> at that with --show-modversions or --dump-modversions, which are not
> normally part of boot.
> 
> With the code as is, the only change will be that if a module with
> EXTENDED_MODVERSIONS set contains an over-length symbol (which
> wouldn't have been possible before), the overlong symbol's modversion
> data will not appear in --show-modversions. After patching `libkmod`
> in a follow-up patch, long symbols would appear as well. If booted
> against an old kernel, long symbols will not have their CRCs in the
> list to be checked. However, the old kernel could not export these
> symbols, so it will fail to resolve the symbol and fail the load
> regardless.

Thanks for checking all this. It is exactly what I was looking for.
All this should be part of the cover letter and Kconfig documentation.

> If we add and enable NO_BASIC_MODVERSIONS like you suggested above,
> today's --show-modversions will claim there is no modversions data.
> Applying a libkmod patch will result in modversions info being
> displayed by that command again. If booted against a new kernel,
> everything will be fine.

*This* is is the sort of information I was also looking for and I think
it would be good to make it clear for the upcoming NO_BASIC_MODVERSIONS.

> If booted against an old kernel, it will
> behave as though there is no modversions information.

Huh? This I don't get. If you have the new libkmod and boot
an old kernel, that should just not break becauase well, long
symbols were not ever supported properly anyway, so no regression.

I'm not quite sure I understood your last comment here though,
can you clarify what you meant?

Anyway, so now that this is all cleared up, the next question I have
is, let's compare a NO_BASIC_MODVERSIONS world now, given that the
userspace requirements aren't large at all, what actual benefits does
using this new extended mod versions have? Why wouldn't a distro end
up preferring this for say a future release for all modules?

  Luis



Re: [PATCH] vdpa/mlx5: Fix error path during device add

2024-11-05 Thread Jason Wang
On Wed, Nov 6, 2024 at 2:52 AM Dragos Tatulea  wrote:
>
> In the error recovery path of mlx5_vdpa_dev_add(), the cleanup is
> executed and at the end put_device() is called which ends up calling
> mlx5_vdpa_free(). This function will execute the same cleanup all over
> again. Most resources support being cleaned up twice, but the recent
> mlx5_vdpa_destroy_mr_resources() doesn't.
>
> This change drops the explicit cleanup from within the
> mlx5_vdpa_dev_add() and lets mlx5_vdpa_free() do its work.
>
> This issue was discovered while trying to add 2 vdpa devices with the
> same name:
> $> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.2
> $> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.3
>
> ... yields the following dump:
>
>   BUG: kernel NULL pointer dereference, address: 00b8
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x) - not-present page
>   PGD 0 P4D 0
>   Oops: Oops:  [#1] SMP
>   CPU: 4 UID: 0 PID: 2811 Comm: vdpa Not tainted 6.12.0-rc6 #1
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>   RIP: 0010:destroy_workqueue+0xe/0x2a0
>   Code: ...
>   RSP: 0018:88814920b9a8 EFLAGS: 00010282
>   RAX:  RBX: 888105c1 RCX: 
>   RDX: 0001 RSI: 888100400168 RDI: 
>   RBP:  R08: 888100120c00 R09: 828578c0
>   R10:  R11:  R12: 
>   R13: 888131fd99a0 R14:  R15: 888105c10580
>   FS:  7fdfa6b4f740() GS:88852ca0() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2: 00b8 CR3: 00018db09006 CR4: 00372eb0
>   DR0:  DR1:  DR2: 
>   DR3:  DR6: fffe0ff0 DR7: 0400
>   Call Trace:
>
>? __die+0x20/0x60
>? page_fault_oops+0x150/0x3e0
>? exc_page_fault+0x74/0x130
>? asm_exc_page_fault+0x22/0x30
>? destroy_workqueue+0xe/0x2a0
>mlx5_vdpa_destroy_mr_resources+0x2b/0x40 [mlx5_vdpa]
>mlx5_vdpa_free+0x45/0x150 [mlx5_vdpa]
>vdpa_release_dev+0x1e/0x50 [vdpa]
>device_release+0x31/0x90
>kobject_put+0x8d/0x230
>mlx5_vdpa_dev_add+0x328/0x8b0 [mlx5_vdpa]
>vdpa_nl_cmd_dev_add_set_doit+0x2b8/0x4c0 [vdpa]
>genl_family_rcv_msg_doit+0xd0/0x120
>genl_rcv_msg+0x180/0x2b0
>? __vdpa_alloc_device+0x1b0/0x1b0 [vdpa]
>? genl_family_rcv_msg_dumpit+0xf0/0xf0
>netlink_rcv_skb+0x54/0x100
>genl_rcv+0x24/0x40
>netlink_unicast+0x1fc/0x2d0
>netlink_sendmsg+0x1e4/0x410
>__sock_sendmsg+0x38/0x60
>? sockfd_lookup_light+0x12/0x60
>__sys_sendto+0x105/0x160
>? __count_memcg_events+0x53/0xe0
>? handle_mm_fault+0x100/0x220
>? do_user_addr_fault+0x40d/0x620
>__x64_sys_sendto+0x20/0x30
>do_syscall_64+0x4c/0x100
>entry_SYSCALL_64_after_hwframe+0x4b/0x53
>   RIP: 0033:0x7fdfa6c66b57
>   Code: ...
>   RSP: 002b:7ffeace22998 EFLAGS: 0202 ORIG_RAX: 002c
>   RAX: ffda RBX: 55a498608350 RCX: 7fdfa6c66b57
>   RDX: 006c RSI: 55a498608350 RDI: 0003
>   RBP: 7ffeace229c0 R08: 7fdfa6d35200 R09: 000c
>   R10:  R11: 0202 R12: 55a4986082a0
>   R13:  R14:  R15: 7ffeace233f3
>
>   Modules linked in: ...
>   CR2: 00b8
>
> Fixes: 62111654481d ("vdpa/mlx5: Postpone MR deletion")
> Signed-off-by: Dragos Tatulea 

Acked-by: Jason Wang 

Thanks




Re: [PATCH v3 7/9] vhost: Add new UAPI to support change to task mode

2024-11-05 Thread Michael S. Tsirkin
On Tue, Nov 05, 2024 at 03:25:26PM +0800, Cindy Lu wrote:
> index b95dd84eef2d..1e192038633d 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -235,4 +235,6 @@
>   */
>  #define VHOST_VDPA_GET_VRING_SIZE_IOWR(VHOST_VIRTIO, 0x82,   \
> struct vhost_vring_state)
> +
> +#define VHOST_SET_INHERIT_FROM_OWNER _IOW(VHOST_VIRTIO, 0x83, bool)


set with no get is also not great.

>  #endif
> -- 
> 2.45.0




Re: [PATCH v3 7/9] vhost: Add new UAPI to support change to task mode

2024-11-05 Thread Michael S. Tsirkin
On Tue, Nov 05, 2024 at 03:25:26PM +0800, Cindy Lu wrote:
> Add a new UAPI to enable setting the vhost device to task mode.
> The userspace application can use VHOST_SET_INHERIT_FROM_OWNER
> to configure the mode if necessary.
> This setting must be applied before VHOST_SET_OWNER, as the worker
> will be created in the VHOST_SET_OWNER function
> 
> Signed-off-by: Cindy Lu 
> ---
>  drivers/vhost/vhost.c  | 15 ++-
>  include/uapi/linux/vhost.h |  2 ++
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c17dc01febcc..70c793b63905 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2274,8 +2274,9 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
> ioctl, void __user *argp)
>  {
>   struct eventfd_ctx *ctx;
>   u64 p;
> - long r;
> + long r = 0;
>   int i, fd;
> + bool inherit_owner;
>  
>   /* If you are not the owner, you can become one */
>   if (ioctl == VHOST_SET_OWNER) {
> @@ -2332,6 +2333,18 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
> ioctl, void __user *argp)
>   if (ctx)
>   eventfd_ctx_put(ctx);
>   break;
> + case VHOST_SET_INHERIT_FROM_OWNER:
> + /*inherit_owner can only be modified before owner is set*/

bad coding style

> + if (vhost_dev_has_owner(d))
> + break;

is this silently failing? should return EBUSY or something like this.

> +
> + if (copy_from_user(&inherit_owner, argp,
> +sizeof(inherit_owner))) {

not good, 


> + r = -EFAULT;
> + break;
> + }
> + d->inherit_owner = inherit_owner;




> + break;
>   default:
>   r = -ENOIOCTLCMD;
>   break;



This means any task can break out of jail
and steal root group system time by setting inherit_owner to 0
even if system is configured to inherit by default.

we need a modparam to block this.


> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index b95dd84eef2d..1e192038633d 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -235,4 +235,6 @@
>   */
>  #define VHOST_VDPA_GET_VRING_SIZE_IOWR(VHOST_VIRTIO, 0x82,   \
> struct vhost_vring_state)
> +
> +#define VHOST_SET_INHERIT_FROM_OWNER _IOW(VHOST_VIRTIO, 0x83, bool)

do not put bool in interfaces. something like u8 and validate it is 0 or
1.

>  #endif
> -- 
> 2.45.0