Re: [PATCH] module: Refine kmemleak scanned areas

2024-09-09 Thread Vincent Donnefort
On Sat, Sep 07, 2024 at 03:12:13PM +0100, Catalin Marinas wrote:
> On Fri, Sep 06, 2024 at 04:38:56PM +0100, Vincent Donnefort wrote:
> > commit ac3b43283923 ("module: replace module_layout with module_memory")
> > introduced a set of memory regions for the module layout sharing the
> > same attributes but didn't update the kmemleak scanned areas which
> > intended to limit kmemleak scan to sections containing writable data.
> > This means sections such as .text and .rodata are scanned by kmemleak.
> > 
> > Refine the scanned areas for modules by limiting it to MOD_TEXT and
> > MOD_INIT_TEXT mod_mem regions.
> > 
> > CC: Song Liu 
> > CC: Catalin Marinas 
> > Signed-off-by: Vincent Donnefort 
> > 
> > diff --git a/kernel/module/debug_kmemleak.c b/kernel/module/debug_kmemleak.c
> > index 12a569d361e8..b4cc03842d70 100644
> > --- a/kernel/module/debug_kmemleak.c
> > +++ b/kernel/module/debug_kmemleak.c
> > @@ -12,19 +12,9 @@
> >  void kmemleak_load_module(const struct module *mod,
> >   const struct load_info *info)
> >  {
> > -   unsigned int i;
> > -
> > -   /* only scan the sections containing data */
> > -   kmemleak_scan_area(mod, sizeof(struct module), GFP_KERNEL);
> > -
> > -   for (i = 1; i < info->hdr->e_shnum; i++) {
> > -   /* Scan all writable sections that's not executable */
> > -   if (!(info->sechdrs[i].sh_flags & SHF_ALLOC) ||
> > -   !(info->sechdrs[i].sh_flags & SHF_WRITE) ||
> > -   (info->sechdrs[i].sh_flags & SHF_EXECINSTR))
> > -   continue;
> > -
> > -   kmemleak_scan_area((void *)info->sechdrs[i].sh_addr,
> > -  info->sechdrs[i].sh_size, GFP_KERNEL);
> > +   /* only scan writable, non-executable sections */
> > +   for_each_mod_mem_type(type) {
> > +   if (type != MOD_DATA && type != MOD_INIT_DATA)
> > +   kmemleak_no_scan(mod->mem[type].base);
> > }
> >  }
> 
> I lost track of how module memory allocation works. Is struct module
> still scanned after this change?

That section being RW, it will be part of the MOD_DATA vmalloc and scanned.

-- 
Vincent

> 
> -- 
> Catalin



Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth

2024-09-09 Thread Ilpo Järvinen
On Fri, 6 Sep 2024, Reinette Chatre wrote:
> On 9/6/24 1:44 AM, Ilpo Järvinen wrote:
> > On Thu, 5 Sep 2024, Reinette Chatre wrote:
> > > On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
> > > > On Wed, 4 Sep 2024, Reinette Chatre wrote:
> > > > > On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
> > > > > > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > > > > > On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
> > > > > > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > > > > > > > 
> > > > > > > > > The MBA test incrementally throttles memory bandwidth, each
> > > > > > > > > time
> > > > > > > > > followed by a comparison between the memory bandwidth observed
> > > > > > > > > by the performance counters and resctrl respectively.
> > > > > > > > > 
> > > > > > > > > While a comparison between performance counters and resctrl is
> > > > > > > > > generally appropriate, they do not have an identical view of
> > > > > > > > > memory bandwidth. For example RAS features or memory
> > > > > > > > > performance
> > > > > > > > > features that generate memory traffic may drive accesses that
> > > > > > > > > are
> > > > > > > > > counted differently by performance counters and MBM
> > > > > > > > > respectively,
> > > > > > > > > for instance generating "overhead" traffic which is not
> > > > > > > > > counted
> > > > > > > > > against any specific RMID. As a ratio, this different view of
> > > > > > > > > memory
> > > > > > > > > bandwidth becomes more apparent at low memory bandwidths.
> > > > > > > > 
> > > > > > > > Interesting.
> > > > > > > > 
> > > > > > > > I did some time back prototype with a change to MBM test such
> > > > > > > > that
> > > > > > > > instead
> > > > > > > > of using once=false I changed fill_buf to be able to run N
> > > > > > > > passes
> > > > > > > > through
> > > > > > > > the buffer which allowed me to know how many reads were
> > > > > > > > performed by
> > > > > > > > the
> > > > > > > > benchmark. This yielded numerical difference between all those 3
> > > > > > > > values
> > > > > > > > (# of reads, MBM, perf) which also varied from arch to another
> > > > > > > > so it
> > > > > > > > didn't end up making an usable test.
> > > > > > > > 
> > > > > > > > I guess I now have an explanation for at least a part of the
> > > > > > > > differences.
> > > > > > > > 
> > > > > > > > > It is not practical to enable/disable the various features
> > > > > > > > > that
> > > > > > > > > may generate memory bandwidth to give performance counters and
> > > > > > > > > resctrl an identical view. Instead, do not compare performance
> > > > > > > > > counters and resctrl view of memory bandwidth when the memory
> > > > > > > > > bandwidth is low.
> > > > > > > > > 
> > > > > > > > > Bandwidth throttling behaves differently across platforms
> > > > > > > > > so it is not appropriate to drop measurement data simply based
> > > > > > > > > on the throttling level. Instead, use a threshold of 750MiB
> > > > > > > > > that has been observed to support adequate comparison between
> > > > > > > > > performance counters and resctrl.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Reinette Chatre 
> > > > > > > > > ---
> > > > > > > > >  tools/testing/selftests/resctrl/mba_test.c | 7 +++
> > > > > > > > >  tools/testing/selftests/resctrl/resctrl.h  | 6 ++
> > > > > > > > >  2 files changed, 13 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > b/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > index cad473b81a64..204b9ac4b108 100644
> > > > > > > > > --- a/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > +++ b/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long
> > > > > > > > > *bw_imc,
> > > > > > > > > unsigned long *bw_resc)
> > > > > > > > >   avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS 
> > > > > > > > > - 1);
> > > > > > > > >   avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> > > > > > > > > + if (avg_bw_imc < THROTTLE_THRESHOLD ||
> > > > > > > > > avg_bw_resc <
> > > > > > > > > THROTTLE_THRESHOLD) {
> > > > > > > > > + ksft_print_msg("Bandwidth below
> > > > > > > > > threshold (%d
> > > > > > > > > MiB).
> > > > > > > > > Dropping results from MBA schemata %u.\n",
> > > > > > > > > + THROTTLE_THRESHOLD,
> > > > > > > > > + ALLOCATION_MAX -
> > > > > > > > > ALLOCATION_STEP *
> > > > > > > > > allocation);
> > > > > > > > 
> > > > > > > > The second one too should be %d.
> > > > > > > > 
> > > > > > > 
> > > > > > > hmmm ... I intended to have it be consistent with the
> > > > > > > ksft_print_msg()
> > > > > > > that
> > > > > > > follows. Perhaps allocation can be made unsigned instead?
> > > > > > 
> > > > > > If you go that way, then it would be good to make the related
> > > > > > defines

Re: [PATCH] module: Refine kmemleak scanned areas

2024-09-09 Thread Catalin Marinas
On Mon, Sep 09, 2024 at 08:40:34AM +0100, Vincent Donnefort wrote:
> On Sat, Sep 07, 2024 at 03:12:13PM +0100, Catalin Marinas wrote:
> > On Fri, Sep 06, 2024 at 04:38:56PM +0100, Vincent Donnefort wrote:
> > > commit ac3b43283923 ("module: replace module_layout with module_memory")
> > > introduced a set of memory regions for the module layout sharing the
> > > same attributes but didn't update the kmemleak scanned areas which
> > > intended to limit kmemleak scan to sections containing writable data.
> > > This means sections such as .text and .rodata are scanned by kmemleak.
> > > 
> > > Refine the scanned areas for modules by limiting it to MOD_TEXT and
> > > MOD_INIT_TEXT mod_mem regions.
> > > 
> > > CC: Song Liu 
> > > CC: Catalin Marinas 
> > > Signed-off-by: Vincent Donnefort 
> > > 
> > > diff --git a/kernel/module/debug_kmemleak.c 
> > > b/kernel/module/debug_kmemleak.c
> > > index 12a569d361e8..b4cc03842d70 100644
> > > --- a/kernel/module/debug_kmemleak.c
> > > +++ b/kernel/module/debug_kmemleak.c
> > > @@ -12,19 +12,9 @@
> > >  void kmemleak_load_module(const struct module *mod,
> > > const struct load_info *info)
> > >  {
> > > - unsigned int i;
> > > -
> > > - /* only scan the sections containing data */
> > > - kmemleak_scan_area(mod, sizeof(struct module), GFP_KERNEL);
> > > -
> > > - for (i = 1; i < info->hdr->e_shnum; i++) {
> > > - /* Scan all writable sections that's not executable */
> > > - if (!(info->sechdrs[i].sh_flags & SHF_ALLOC) ||
> > > - !(info->sechdrs[i].sh_flags & SHF_WRITE) ||
> > > - (info->sechdrs[i].sh_flags & SHF_EXECINSTR))
> > > - continue;
> > > -
> > > - kmemleak_scan_area((void *)info->sechdrs[i].sh_addr,
> > > -info->sechdrs[i].sh_size, GFP_KERNEL);
> > > + /* only scan writable, non-executable sections */
> > > + for_each_mod_mem_type(type) {
> > > + if (type != MOD_DATA && type != MOD_INIT_DATA)
> > > + kmemleak_no_scan(mod->mem[type].base);
> > >   }
> > >  }
> > 
> > I lost track of how module memory allocation works. Is struct module
> > still scanned after this change?
> 
> That section being RW, it will be part of the MOD_DATA vmalloc and scanned.

Ah, makes sense. I'm fine with this patch, it simplifies the code now
that we have mod->mem[type]. I wouldn't say it's a fix, though no
backporting needed.

Reviewed-by: Catalin Marinas 



[PATCH net-next] page_pool: add a test module for page_pool

2024-09-09 Thread Yunsheng Lin
The testing is done by ensuring that the page allocated from
the page_pool instance is pushed into a ptr_ring instance in
a kthread/napi binded to a specified cpu, and a kthread/napi
binded to a specified cpu will pop the page from the ptr_ring
and free it back to the page_pool.

Signed-off-by: Yunsheng Lin 
---
 tools/testing/selftests/net/Makefile  |   3 +
 .../testing/selftests/net/page_pool/Makefile  |  18 +
 .../selftests/net/page_pool/page_pool_test.c  | 433 ++
 tools/testing/selftests/net/test_page_pool.sh | 175 +++
 4 files changed, 629 insertions(+)
 create mode 100644 tools/testing/selftests/net/page_pool/Makefile
 create mode 100644 tools/testing/selftests/net/page_pool/page_pool_test.c
 create mode 100755 tools/testing/selftests/net/test_page_pool.sh

diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index 27362e40eb37..4d4ddd853ef8 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,6 +6,8 @@ CFLAGS += -I../../../../usr/include/ $(KHDR_INCLUDES)
 # Additional include paths needed by kselftest.h
 CFLAGS += -I../
 
+TEST_GEN_MODS_DIR := page_pool
+
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh \
  rtnetlink.sh xfrm_policy.sh test_blackhole_dev.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
@@ -96,6 +98,7 @@ TEST_PROGS += fdb_flush.sh
 TEST_PROGS += fq_band_pktlimit.sh
 TEST_PROGS += vlan_hw_filter.sh
 TEST_PROGS += bpf_offload.py
+TEST_PROGS += test_page_pool.sh
 
 TEST_FILES := settings
 TEST_FILES += in_netns.sh lib.sh net_helper.sh setup_loopback.sh setup_veth.sh
diff --git a/tools/testing/selftests/net/page_pool/Makefile 
b/tools/testing/selftests/net/page_pool/Makefile
new file mode 100644
index ..4380a70d6391
--- /dev/null
+++ b/tools/testing/selftests/net/page_pool/Makefile
@@ -0,0 +1,18 @@
+PAGE_POOL_TEST_DIR := $(realpath $(dir $(abspath $(lastword 
$(MAKEFILE_LIST)
+KDIR ?= $(abspath $(PAGE_POOL_TEST_DIR)/../../../../..)
+
+ifeq ($(V),1)
+Q =
+else
+Q = @
+endif
+
+MODULES = page_pool_test.ko
+
+obj-m += page_pool_test.o
+
+all:
+   +$(Q)make -C $(KDIR) M=$(PAGE_POOL_TEST_DIR) modules
+
+clean:
+   +$(Q)make -C $(KDIR) M=$(PAGE_POOL_TEST_DIR) clean
diff --git a/tools/testing/selftests/net/page_pool/page_pool_test.c 
b/tools/testing/selftests/net/page_pool/page_pool_test.c
new file mode 100644
index ..475b64f21b78
--- /dev/null
+++ b/tools/testing/selftests/net/page_pool/page_pool_test.c
@@ -0,0 +1,433 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Test module for page_pool
+ *
+ * Copyright (C) 2024 Yunsheng Lin 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct ptr_ring ptr_ring;
+static int nr_objs = 512;
+static atomic_t nthreads;
+static struct completion wait;
+static struct page_pool *test_pool;
+static struct device *dev;
+static u64 dma_mask = DMA_BIT_MASK(64);
+
+static int nr_test = 200;
+module_param(nr_test, int, 0);
+MODULE_PARM_DESC(nr_test, "number of iterations to test");
+
+static bool test_frag;
+module_param(test_frag, bool, 0);
+MODULE_PARM_DESC(test_frag, "use frag API for testing");
+
+static bool test_dma;
+module_param(test_dma, bool, 0);
+MODULE_PARM_DESC(test_dma, "enable dma mapping for testing");
+
+static bool test_napi;
+module_param(test_napi, bool, 0);
+MODULE_PARM_DESC(test_napi, "use NAPI softirq for testing");
+
+static bool test_direct;
+module_param(test_direct, bool, 0);
+MODULE_PARM_DESC(test_direct, "enable direct recycle for testing");
+
+static int test_alloc_len = 2048;
+module_param(test_alloc_len, int, 0);
+MODULE_PARM_DESC(test_alloc_len, "alloc len for testing");
+
+static int test_push_cpu;
+module_param(test_push_cpu, int, 0);
+MODULE_PARM_DESC(test_push_cpu, "test cpu for pushing page");
+
+static int test_pop_cpu;
+module_param(test_pop_cpu, int, 0);
+MODULE_PARM_DESC(test_pop_cpu, "test cpu for popping page");
+
+static void page_pool_test_dev_release(struct device *dev)
+{
+   kfree(dev);
+}
+
+static struct page_pool *page_pool_test_create(void)
+{
+   struct page_pool_params page_pool_params = {
+   .pool_size = nr_objs,
+   .flags = 0,
+   .nid = cpu_to_mem(test_push_cpu),
+   };
+   int ret;
+
+   if (test_dma) {
+   dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+   if (!dev)
+   return ERR_PTR(-ENOMEM);
+
+   dev->release = page_pool_test_dev_release;
+   dev->dma_mask = &dma_mask;
+   device_initialize(dev);
+
+   ret = dev_set_name(dev, "page_pool_dev");
+   if (ret) {
+   pr_err("page_pool_test dev_set_name() failed: %d\n",
+  ret);
+   goto err_out;
+   }
+
+   ret = dma_set_mask_and_coherent

Re: [PATCH vhost v2 0/7] vdpa/mlx5: Optimze MKEY operations

2024-09-09 Thread Dragos Tatulea



On 30.08.24 12:58, Dragos Tatulea wrote:
> This series improves the time of .set_map() operations by parallelizing
> the MKEY creation and deletion for direct MKEYs. Looking at the top
> level MKEY creation/deletion functions, the following improvement can be
> seen:
> 
> |---+-|
> | operation | improvement |
> |---+-|
> | create_user_mr()  | 3-5x|
> | destroy_user_mr() | 8x  |
> |---+-|
> 
> The last part of the series introduces lazy MKEY deletion which
> postpones the MKEY deletion to a later point in a workqueue.
> 
> As this series and the previous ones were targeting live migration,
> we can also observe improvements on this front:
> 
> |---+--+--|
> | Stage | Downtime #1 (ms) | Downtime #2 (ms) |
> |---+--+--|
> | Baseline  | 3140 | 3630 |
> | Parallel MKEY ops | 1200 | 2000 |
> | Deferred deletion | 1014 | 1253 |
> |---+--+--|
> 
> Test configuration: 256 GB VM, 32 CPUs x 2 threads per core, 4 x mlx5
> vDPA devices x 32 VQs (16 VQPs)
> 
> This series must be applied on top of the parallel VQ suspend/resume
> series [0].
> 
> [0] https://lore.kernel.org/all/20240816090159.1967650-1-dtatu...@nvidia.com/
> 
> ---
> v2:
> - Swapped flex array usage for plain zero length array in first patch.
> - Updated code to use Scope-Based Cleanup Helpers where appropriate
>   (only second patch).
> - Added macro define for MTT alignment in first patch.
> - Improved commit messages/comments based on review comments.
> - Removed extra newlines.
Gentle ping for the remaining patches in v2.

Thanks,
Dragos



Re: [PATCH mlx5-vhost v2 01/10] net/mlx5: Support throttled commands from async API

2024-09-09 Thread Dragos Tatulea



On 16.08.24 11:01, Dragos Tatulea wrote:
> Currently, commands that qualify as throttled can't be used via the
> async API. That's due to the fact that the throttle semaphore can sleep
> but the async API can't.
> 
> This patch allows throttling in the async API by using the tentative
> variant of the semaphore and upon failure (semaphore at 0) returns EBUSY
> to signal to the caller that they need to wait for the completion of
> previously issued commands.
> 
> Furthermore, make sure that the semaphore is released in the callback.
> 
> Signed-off-by: Dragos Tatulea 
> Cc: Leon Romanovsky 
> Reviewed-by: Tariq Toukan 
Same reminder as in v1: Tariq is the maintainer for mlx5 so his review
also counts as Acked-by.

Thanks,
Dragos

> ---
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 ++-
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> index 20768ef2e9d2..f69c977c1569 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> @@ -1882,10 +1882,12 @@ static int cmd_exec(struct mlx5_core_dev *dev, void 
> *in, int in_size, void *out,
>  
>   throttle_op = mlx5_cmd_is_throttle_opcode(opcode);
>   if (throttle_op) {
> - /* atomic context may not sleep */
> - if (callback)
> - return -EINVAL;
> - down(&dev->cmd.vars.throttle_sem);
> + if (callback) {
> + if (down_trylock(&dev->cmd.vars.throttle_sem))
> + return -EBUSY;
> + } else {
> + down(&dev->cmd.vars.throttle_sem);
> + }
>   }
>  
>   pages_queue = is_manage_pages(in);
> @@ -2091,10 +2093,19 @@ static void mlx5_cmd_exec_cb_handler(int status, void 
> *_work)
>  {
>   struct mlx5_async_work *work = _work;
>   struct mlx5_async_ctx *ctx;
> + struct mlx5_core_dev *dev;
> + u16 opcode;
>  
>   ctx = work->ctx;
> - status = cmd_status_err(ctx->dev, status, work->opcode, work->op_mod, 
> work->out);
> + dev = ctx->dev;
> + opcode = work->opcode;
> + status = cmd_status_err(dev, status, work->opcode, work->op_mod, 
> work->out);
>   work->user_callback(status, work);
> + /* Can't access "work" from this point on. It could have been freed in
> +  * the callback.
> +  */
> + if (mlx5_cmd_is_throttle_opcode(opcode))
> + up(&dev->cmd.vars.throttle_sem);
>   if (atomic_dec_and_test(&ctx->num_inflight))
>   complete(&ctx->inflight_done);
>  }




Re: [PATCH] module: Refine kmemleak scanned areas

2024-09-09 Thread Vincent Donnefort
On Mon, Sep 09, 2024 at 09:52:57AM +0100, Catalin Marinas wrote:
> On Mon, Sep 09, 2024 at 08:40:34AM +0100, Vincent Donnefort wrote:
> > On Sat, Sep 07, 2024 at 03:12:13PM +0100, Catalin Marinas wrote:
> > > On Fri, Sep 06, 2024 at 04:38:56PM +0100, Vincent Donnefort wrote:
> > > > commit ac3b43283923 ("module: replace module_layout with module_memory")
> > > > introduced a set of memory regions for the module layout sharing the
> > > > same attributes but didn't update the kmemleak scanned areas which
> > > > intended to limit kmemleak scan to sections containing writable data.
> > > > This means sections such as .text and .rodata are scanned by kmemleak.
> > > > 
> > > > Refine the scanned areas for modules by limiting it to MOD_TEXT and
> > > > MOD_INIT_TEXT mod_mem regions.
> > > > 
> > > > CC: Song Liu 
> > > > CC: Catalin Marinas 
> > > > Signed-off-by: Vincent Donnefort 
> > > > 
> > > > diff --git a/kernel/module/debug_kmemleak.c 
> > > > b/kernel/module/debug_kmemleak.c
> > > > index 12a569d361e8..b4cc03842d70 100644
> > > > --- a/kernel/module/debug_kmemleak.c
> > > > +++ b/kernel/module/debug_kmemleak.c
> > > > @@ -12,19 +12,9 @@
> > > >  void kmemleak_load_module(const struct module *mod,
> > > >   const struct load_info *info)
> > > >  {
> > > > -   unsigned int i;
> > > > -
> > > > -   /* only scan the sections containing data */
> > > > -   kmemleak_scan_area(mod, sizeof(struct module), GFP_KERNEL);
> > > > -
> > > > -   for (i = 1; i < info->hdr->e_shnum; i++) {
> > > > -   /* Scan all writable sections that's not executable */
> > > > -   if (!(info->sechdrs[i].sh_flags & SHF_ALLOC) ||
> > > > -   !(info->sechdrs[i].sh_flags & SHF_WRITE) ||
> > > > -   (info->sechdrs[i].sh_flags & SHF_EXECINSTR))
> > > > -   continue;
> > > > -
> > > > -   kmemleak_scan_area((void *)info->sechdrs[i].sh_addr,
> > > > -  info->sechdrs[i].sh_size, 
> > > > GFP_KERNEL);
> > > > +   /* only scan writable, non-executable sections */
> > > > +   for_each_mod_mem_type(type) {
> > > > +   if (type != MOD_DATA && type != MOD_INIT_DATA)
> > > > +   kmemleak_no_scan(mod->mem[type].base);
> > > > }
> > > >  }
> > > 
> > > I lost track of how module memory allocation works. Is struct module
> > > still scanned after this change?
> > 
> > That section being RW, it will be part of the MOD_DATA vmalloc and scanned.
> 
> Ah, makes sense. I'm fine with this patch, it simplifies the code now
> that we have mod->mem[type]. I wouldn't say it's a fix, though no
> backporting needed.

Agreed, it's "fixing" because it was scanning unecessary regions, but it's not
worth any backport.

Aside, judging by the size of this function, I am not sure it's worth to keep
a separated file, but the patch that introduced it didn't really explain why so
I kept it that way.

> 
> Reviewed-by: Catalin Marinas 

Cheers!

-- 
Vincent



Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark

2024-09-09 Thread Ilpo Järvinen
On Fri, 6 Sep 2024, Reinette Chatre wrote:
> On 9/6/24 3:00 AM, Ilpo Järvinen wrote:
> > On Thu, 5 Sep 2024, Reinette Chatre wrote:
> > > On 9/5/24 5:10 AM, Ilpo Järvinen wrote:
> > > > On Wed, 4 Sep 2024, Reinette Chatre wrote:
> > > > > On 9/4/24 4:57 AM, Ilpo Järvinen wrote:
> > > > > > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > > > > > On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
> > > > > > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > > > 
> > > > > > > > > @@ -699,111 +639,80 @@ int resctrl_val(const struct
> > > > > > > > > resctrl_test
> > > > > > > > > *test,
> > > > > > > > >   return ret;
> > > > > > > > >   }
> > > > > > > > >  -/*
> > > > > > > > > -  * If benchmark wasn't successfully started by child,
> > > > > > > > > then
> > > > > > > > > child
> > > > > > > > > should
> > > > > > > > > -  * kill parent, so save parent's pid
> > > > > > > > > -  */
> > > > > > > > >   ppid = getpid();
> > > > > > > > >  -if (pipe(pipefd)) {
> > > > > > > > > - ksft_perror("Unable to create pipe");
> > > > > > > > > + /* Taskset test to specified CPU. */
> > > > > > > > > + ret = taskset_benchmark(ppid, uparams->cpu,
> > > > > > > > > &old_affinity);
> > > > > > > > 
> > > > > > > > Previously only CPU affinity for bm_pid was set but now it's set
> > > > > > > > before
> > > > > > > > fork(). Quickly checking the Internet, it seems that CPU
> > > > > > > > affinity
> > > > > > > > gets
> > > > > > > > inherited on fork() so now both processes will have the same
> > > > > > > > affinity
> > > > > > > > which might make the other process to interfere with the
> > > > > > > > measurement.
> > > > > > > 
> > > > > > > Setting the affinity is intended to ensure that the buffer
> > > > > > > preparation
> > > > > > > occurs in the same topology as where the runtime portion will run.
> > > > > > > This preparation is done before the work to be measured starts.
> > > > > > > 
> > > > > > > This does tie in with the association with the resctrl group and I
> > > > > > > will elaborate more below ...
> > > > > > 
> > > > > > Okay, that's useful to retain but thinking this further, now we're
> > > > > > also
> > > > > > going do non-trivial amount of work in between the setup and the
> > > > > > test by
> > > > > 
> > > > > Could you please elaborate how the amount of work during setup can be
> > > > > an
> > > > > issue? I have been focused on the measurements that are done
> > > > > afterwards
> > > > > that do have clear boundaries from what I can tell.
> > > > 
> > > > Well, you used it as a justification: "Setting the affinity is intended
> > > > to ensure that the buffer preparation occurs in the same topology as
> > > > where
> > > > the runtime portion will run." So I assumed you had some expectations
> > > > about
> > > > "preparations" done outside of those "clear boundaries" but now you seem
> > > > to take entirely opposite stance?
> > > 
> > > I do not follow you here. With the "clear boundaries" I meant the
> > > measurements and associated variables that have  a clear start/init and
> > > stop/read while the other task sleeps. Yes, preparations are done outside
> > > of this since that should not be measured.
> > 
> > Of course the preparations are not measured (at least not after this
> > patch :-)).
> > 
> > But that's not what I meant. You said the preparations are to be done
> > using the same topology as the test but if it's outside of the measurement
> > period, the topology during preparations only matters if you make some
> > hidden assumption that some state carries from preparations to the actual
> > test. If no state carry-over is assumed, the topology during preparations
> > is not important. So which way it is? Is the topology during preparations
> > important or not?
> 
> Topology during preparations is important.
> 
> In the CMT test this is more relevant with the transition to using
> memflush = false. The preparation phase and measure phase uses the same
> cache alloc configuration and just as importantly, the same monitoring
> configuration. During preparation the cache portion that will be
> used during measurement will be filled with the contents of the buffer.
> During measurement it will be the same cache portion into which any new reads
> will be allocated and measured. In fact, the preparation phase will thus form
> part of the measurement. If preparation was done with different
> configuration, then I see a problem whether memflush = true as well as when
> memflush = false. In the case of memflush = true it will have the familiar
> issue of the test needing to "guess" the workload settle time. In the case
> of memflush = false the buffer will remain allocated into the cache portion
> used during preparation phase, when the workload runs it will read the
> data from a cache portion that does not belong to it and since it does
> not need to allocate into its own cache portion its LLC occupancy c

[PATCH] ieee802154: Fix build error

2024-09-09 Thread Jinjie Ruan
If REGMAP_SPI is m and IEEE802154_MCR20A is y,

mcr20a.c:(.text+0x3ed6c5b): undefined reference to 
`__devm_regmap_init_spi'
ld: mcr20a.c:(.text+0x3ed6cb5): undefined reference to 
`__devm_regmap_init_spi'

Select REGMAP_SPI for IEEE802154_MCR20A to fix it.

Fixes: 8c6ad9cc5157 ("ieee802154: Add NXP MCR20A IEEE 802.15.4 transceiver 
driver")
Signed-off-by: Jinjie Ruan 
---
 drivers/net/ieee802154/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ieee802154/Kconfig b/drivers/net/ieee802154/Kconfig
index 95da876c5613..1075e24b11de 100644
--- a/drivers/net/ieee802154/Kconfig
+++ b/drivers/net/ieee802154/Kconfig
@@ -101,6 +101,7 @@ config IEEE802154_CA8210_DEBUGFS
 
 config IEEE802154_MCR20A
tristate "MCR20A transceiver driver"
+   select REGMAP_SPI
depends on IEEE802154_DRIVERS && MAC802154
depends on SPI
help
-- 
2.34.1




[PATCH 0/7] net: Use IRQF_NO_AUTOEN flag in request_irq()

2024-09-09 Thread Jinjie Ruan
As commit cbe16f35bee6 ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")
said, reqeust_irq() and then disable_irq() is unsafe.

And the code below is subobtimal:
 irq_set_status_flags(irq, IRQ_NOAUTOEN);
 request_irq(dev, irq...);

IRQF_NO_AUTOEN flag can be used by drivers to request_irq(). It prevents
the automatic enabling of the requested interrupt in the same safe way.
With that the usage can be simplified and corrected.

Only compile-tested.

Jinjie Ruan (7):
  net: apple: bmac: Use IRQF_NO_AUTOEN flag in request_irq()
  net: enetc: Use IRQF_NO_AUTOEN flag in request_irq()
  nfp: Use IRQF_NO_AUTOEN flag in request_irq()
  net: ieee802154: mcr20a: Use IRQF_NO_AUTOEN flag in request_irq()
  wifi: p54: Use IRQF_NO_AUTOEN flag in request_irq()
  wifi: mwifiex: Use IRQF_NO_AUTOEN flag in request_irq()
  wifi: wl1251: Use IRQF_NO_AUTOEN flag in request_irq()

 drivers/net/ethernet/apple/bmac.c   | 3 +--
 drivers/net/ethernet/freescale/enetc/enetc.c| 3 +--
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 5 ++---
 drivers/net/ieee802154/mcr20a.c | 5 +
 drivers/net/wireless/intersil/p54/p54spi.c  | 4 +---
 drivers/net/wireless/marvell/mwifiex/main.c | 4 ++--
 drivers/net/wireless/ti/wl1251/sdio.c   | 4 ++--
 7 files changed, 10 insertions(+), 18 deletions(-)

-- 
2.34.1




[PATCH 1/7] net: apple: bmac: Use IRQF_NO_AUTOEN flag in request_irq()

2024-09-09 Thread Jinjie Ruan
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable when request IRQ.

Signed-off-by: Jinjie Ruan 
---
 drivers/net/ethernet/apple/bmac.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/apple/bmac.c 
b/drivers/net/ethernet/apple/bmac.c
index 292b1f9cd9e7..785f4b4ff758 100644
--- a/drivers/net/ethernet/apple/bmac.c
+++ b/drivers/net/ethernet/apple/bmac.c
@@ -1317,7 +1317,7 @@ static int bmac_probe(struct macio_dev *mdev, const 
struct of_device_id *match)
 
timer_setup(&bp->tx_timeout, bmac_tx_timeout, 0);
 
-   ret = request_irq(dev->irq, bmac_misc_intr, 0, "BMAC-misc", dev);
+   ret = request_irq(dev->irq, bmac_misc_intr, IRQF_NO_AUTOEN, 
"BMAC-misc", dev);
if (ret) {
printk(KERN_ERR "BMAC: can't get irq %d\n", dev->irq);
goto err_out_iounmap_rx;
@@ -1336,7 +1336,6 @@ static int bmac_probe(struct macio_dev *mdev, const 
struct of_device_id *match)
/* Mask chip interrupts and disable chip, will be
 * re-enabled on open()
 */
-   disable_irq(dev->irq);
pmac_call_feature(PMAC_FTR_BMAC_ENABLE, macio_get_of_node(bp->mdev), 0, 
0);
 
if (register_netdev(dev) != 0) {
-- 
2.34.1




[PATCH 3/7] nfp: Use IRQF_NO_AUTOEN flag in request_irq()

2024-09-09 Thread Jinjie Ruan
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable when request IRQ.

Signed-off-by: Jinjie Ruan 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 182ba0a8b095..6e0929af0f72 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -821,14 +821,13 @@ nfp_net_prepare_vector(struct nfp_net *nn, struct 
nfp_net_r_vector *r_vec,
 
snprintf(r_vec->name, sizeof(r_vec->name),
 "%s-rxtx-%d", nfp_net_name(nn), idx);
-   err = request_irq(r_vec->irq_vector, r_vec->handler, 0, r_vec->name,
- r_vec);
+   err = request_irq(r_vec->irq_vector, r_vec->handler, IRQF_NO_AUTOEN,
+ r_vec->name, r_vec);
if (err) {
nfp_net_napi_del(&nn->dp, r_vec);
nn_err(nn, "Error requesting IRQ %d\n", r_vec->irq_vector);
return err;
}
-   disable_irq(r_vec->irq_vector);
 
irq_set_affinity_hint(r_vec->irq_vector, &r_vec->affinity_mask);
 
-- 
2.34.1




[PATCH 2/7] net: enetc: Use IRQF_NO_AUTOEN flag in request_irq()

2024-09-09 Thread Jinjie Ruan
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable when request IRQ.

Fixes: bbb96dc7fa1a ("enetc: Factor out the traffic start/stop procedures")
Signed-off-by: Jinjie Ruan 
---
 drivers/net/ethernet/freescale/enetc/enetc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c 
b/drivers/net/ethernet/freescale/enetc/enetc.c
index 5c45f42232d3..f04f42ea60c0 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2305,12 +2305,11 @@ static int enetc_setup_irqs(struct enetc_ndev_priv 
*priv)
 
snprintf(v->name, sizeof(v->name), "%s-rxtx%d",
 priv->ndev->name, i);
-   err = request_irq(irq, enetc_msix, 0, v->name, v);
+   err = request_irq(irq, enetc_msix, IRQF_NO_AUTOEN, v->name, v);
if (err) {
dev_err(priv->dev, "request_irq() failed!\n");
goto irq_err;
}
-   disable_irq(irq);
 
v->tbier_base = hw->reg + ENETC_BDR(TX, 0, ENETC_TBIER);
v->rbier = hw->reg + ENETC_BDR(RX, i, ENETC_RBIER);
-- 
2.34.1




[PATCH 4/7] net: ieee802154: mcr20a: Use IRQF_NO_AUTOEN flag in request_irq()

2024-09-09 Thread Jinjie Ruan
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable when request IRQ.

Fixes: 8c6ad9cc5157 ("ieee802154: Add NXP MCR20A IEEE 802.15.4 transceiver 
driver")
Signed-off-by: Jinjie Ruan 
---
 drivers/net/ieee802154/mcr20a.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ieee802154/mcr20a.c b/drivers/net/ieee802154/mcr20a.c
index 433fb5839203..020d392a98b6 100644
--- a/drivers/net/ieee802154/mcr20a.c
+++ b/drivers/net/ieee802154/mcr20a.c
@@ -1302,16 +1302,13 @@ mcr20a_probe(struct spi_device *spi)
irq_type = IRQF_TRIGGER_FALLING;
 
ret = devm_request_irq(&spi->dev, spi->irq, mcr20a_irq_isr,
-  irq_type, dev_name(&spi->dev), lp);
+  irq_type | IRQF_NO_AUTOEN, dev_name(&spi->dev), 
lp);
if (ret) {
dev_err(&spi->dev, "could not request_irq for mcr20a\n");
ret = -ENODEV;
goto free_dev;
}
 
-   /* disable_irq by default and wait for starting hardware */
-   disable_irq(spi->irq);
-
ret = ieee802154_register_hw(hw);
if (ret) {
dev_crit(&spi->dev, "ieee802154_register_hw failed\n");
-- 
2.34.1




[PATCH 5/7] wifi: p54: Use IRQF_NO_AUTOEN flag in request_irq()

2024-09-09 Thread Jinjie Ruan
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable when request IRQ.

Fixes:
Signed-off-by: Jinjie Ruan 
---
 drivers/net/wireless/intersil/p54/p54spi.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/intersil/p54/p54spi.c 
b/drivers/net/wireless/intersil/p54/p54spi.c
index d33a994906a7..27f44a9f0bc1 100644
--- a/drivers/net/wireless/intersil/p54/p54spi.c
+++ b/drivers/net/wireless/intersil/p54/p54spi.c
@@ -624,7 +624,7 @@ static int p54spi_probe(struct spi_device *spi)
gpio_direction_input(p54spi_gpio_irq);
 
ret = request_irq(gpio_to_irq(p54spi_gpio_irq),
- p54spi_interrupt, 0, "p54spi",
+ p54spi_interrupt, IRQF_NO_AUTOEN, "p54spi",
  priv->spi);
if (ret < 0) {
dev_err(&priv->spi->dev, "request_irq() failed");
@@ -633,8 +633,6 @@ static int p54spi_probe(struct spi_device *spi)
 
irq_set_irq_type(gpio_to_irq(p54spi_gpio_irq), IRQ_TYPE_EDGE_RISING);
 
-   disable_irq(gpio_to_irq(p54spi_gpio_irq));
-
INIT_WORK(&priv->work, p54spi_work);
init_completion(&priv->fw_comp);
INIT_LIST_HEAD(&priv->tx_pending);
-- 
2.34.1




[PATCH 6/7] wifi: mwifiex: Use IRQF_NO_AUTOEN flag in request_irq()

2024-09-09 Thread Jinjie Ruan
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable when request IRQ.

Signed-off-by: Jinjie Ruan 
---
 drivers/net/wireless/marvell/mwifiex/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c 
b/drivers/net/wireless/marvell/mwifiex/main.c
index d99127dc466e..6c60a4c21a31 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1633,7 +1633,8 @@ static void mwifiex_probe_of(struct mwifiex_adapter 
*adapter)
}
 
ret = devm_request_irq(dev, adapter->irq_wakeup,
-  mwifiex_irq_wakeup_handler, IRQF_TRIGGER_LOW,
+  mwifiex_irq_wakeup_handler,
+  IRQF_TRIGGER_LOW | IRQF_NO_AUTOEN,
   "wifi_wake", adapter);
if (ret) {
dev_err(dev, "Failed to request irq_wakeup %d (%d)\n",
@@ -1641,7 +1642,6 @@ static void mwifiex_probe_of(struct mwifiex_adapter 
*adapter)
goto err_exit;
}
 
-   disable_irq(adapter->irq_wakeup);
if (device_init_wakeup(dev, true)) {
dev_err(dev, "fail to init wakeup for mwifiex\n");
goto err_exit;
-- 
2.34.1




[PATCH 7/7] wifi: wl1251: Use IRQF_NO_AUTOEN flag in request_irq()

2024-09-09 Thread Jinjie Ruan
As commit cbe16f35bee6 ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")
said, the code below is subobtimal. IRQF_NO_AUTOEN flag can be used by
drivers to request_irq(). It prevents the automatic enabling of the
requested interrupt in the same safe way. With that the usage can be
simplified and corrected.

irq_set_status_flags(irq, IRQ_NOAUTOEN);
request_irq(dev, irq...);

Signed-off-by: Jinjie Ruan 
---
 drivers/net/wireless/ti/wl1251/sdio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ti/wl1251/sdio.c 
b/drivers/net/wireless/ti/wl1251/sdio.c
index c705081249d6..b45050243129 100644
--- a/drivers/net/wireless/ti/wl1251/sdio.c
+++ b/drivers/net/wireless/ti/wl1251/sdio.c
@@ -233,8 +233,8 @@ static int wl1251_sdio_probe(struct sdio_func *func,
}
 
if (wl->irq) {
-   irq_set_status_flags(wl->irq, IRQ_NOAUTOEN);
-   ret = request_irq(wl->irq, wl1251_line_irq, 0, "wl1251", wl);
+   ret = request_irq(wl->irq, wl1251_line_irq, IRQF_NO_AUTOEN,
+ "wl1251", wl);
if (ret < 0) {
wl1251_error("request_irq() failed: %d", ret);
goto disable;
-- 
2.34.1




Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-09 Thread Paolo Bonzini

On 9/9/24 07:30, Yan Zhao wrote:

On Thu, Sep 05, 2024 at 05:43:17PM +0800, Yan Zhao wrote:

On Wed, Sep 04, 2024 at 05:41:06PM -0700, Sean Christopherson wrote:

On Wed, Sep 04, 2024, Yan Zhao wrote:

On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:

On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:

Sean Christopherson  writes:


On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:

FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
Silver 4410Y".


Has this been reproduced on any other hardware besides SPR?  I.e. did we stumble
on another hardware issue?


Very possible, as according to Yan Zhao this doesn't reproduce on at
least "Coffee Lake-S". Let me try to grab some random hardware around
and I'll be back with my observations.


Update some new findings from my side:

BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys range
from 0xfd00 to 0xfe00.

On "Sapphire Rapids XCC":

1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can launch
correctly.
i.e.
if (gfn >= 0xfd000 && gfn < 0xfe000) {
return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
}
return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;

2. If KVM forces this fb_map range to be UC+IPAT, installer failes to show / gdm
restarts endlessly. (though on Coffee Lake-S, installer/gdm can launch
correctly in this case).

3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called to set
this fb_map range as WC, with
iosys_map_set_vaddr_iomem(&iter_io->dmap, ioremap_wc(mem->bus.offset, 
mem->size));

However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns for
this fb_map has been reserved as uc- by ioremap().
Then, the ioremap_wc() during starting GDM will only map guest PAT with UC-.

So, with KVM setting WB (no IPAT) to this fb_map range, the effective
memory type is UC- and installer/gdm restarts endlessly.

4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest bochs driver
to call ioremap_wc() instead in bochs_hw_init(), gdm can launch correctly.
(didn't verify the installer's case as I can't update the driver in that 
case).

The reason is that the ioremap_wc() called during starting GDM will no 
longer
meet conflict and can map guest PAT as WC.


Huh.  The upside of this is that it sounds like there's nothing broken with WC
or self-snoop.

Considering a different perspective, the fb_map range is used as frame buffer
(vram), with the guest writing to this range and the host reading from it.
If the issue were related to self-snooping, we would expect the VNC window to
display distorted data. However, the observed behavior is that the GDM window
shows up correctly for a sec and restarts over and over.

So, do you think we can simply fix this issue by calling ioremap_wc() for the
frame buffer/vram range in bochs driver, as is commonly done in other gpu
drivers?

--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
 if (pci_request_region(pdev, 0, "bochs-drm") != 0)
 DRM_WARN("Cannot request framebuffer, boot fb still 
active?\n");

-   bochs->fb_map = ioremap(addr, size);
+   bochs->fb_map = ioremap_wc(addr, size);
 if (bochs->fb_map == NULL) {
 DRM_ERROR("Cannot map framebuffer\n");
 return -ENOMEM;


While this is a fix for future kernels, it doesn't change the result for 
VMs already in existence.


I don't think there's an alternative to putting this behind a quirk.

Paolo




Re: [PATCH net-next 0/3] lan743x: This series of patches are for lan743x driver testing

2024-09-09 Thread Andrew Lunn
> I am currently working on this and would rework as soon as possible.
> The feedback that you provided is highly helpful and I will remodel the 
> implementation with these points in mind.
> Hopefully you can see that in the next version.

Great.

Don't worry too much about link speeds you cannot test yourself. If
your tests happen to fail on a 10G card, i would expect the Maintainer
of the 10G card to debug if its the driver for the card or the test
which is broken, and then help fix the test if its the test.

Andrew



Re: [PATCH v3] virtio_pmem: Check device status before requesting flush

2024-09-09 Thread Pankaj Gupta
+CC MST

> If a pmem device is in a bad status, the driver side could wait for
> host ack forever in virtio_pmem_flush(), causing the system to hang.
>
> So add a status check in the beginning of virtio_pmem_flush() to return
> early if the device is not activated.
>
> Signed-off-by: Philip Chen 

Looks good to me.

Acked-by: Pankaj Gupta  ---
> v3:
> - Fix a typo in the comment (s/acticated/activated/)
>
> v2:
> - Remove change id from the patch description
> - Add more details to the patch description
>
>  drivers/nvdimm/nd_virtio.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> index 35c8fbbba10e..f55d60922b87 100644
> --- a/drivers/nvdimm/nd_virtio.c
> +++ b/drivers/nvdimm/nd_virtio.c
> @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> unsigned long flags;
> int err, err1;
>
> +   /*
> +* Don't bother to submit the request to the device if the device is
> +* not activated.
> +*/
> +   if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
> +   dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
> +   return -EIO;
> +   }
> +
> might_sleep();
> req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
> if (!req_data)



Re: [PATCH 0/7] net: Use IRQF_NO_AUTOEN flag in request_irq()

2024-09-09 Thread Kalle Valo
Jinjie Ruan  writes:

> As commit cbe16f35bee6 ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")
> said, reqeust_irq() and then disable_irq() is unsafe.
>
> And the code below is subobtimal:
>irq_set_status_flags(irq, IRQ_NOAUTOEN);
>request_irq(dev, irq...);
>
> IRQF_NO_AUTOEN flag can be used by drivers to request_irq(). It prevents
> the automatic enabling of the requested interrupt in the same safe way.
> With that the usage can be simplified and corrected.
>
> Only compile-tested.
>
> Jinjie Ruan (7):
>   net: apple: bmac: Use IRQF_NO_AUTOEN flag in request_irq()
>   net: enetc: Use IRQF_NO_AUTOEN flag in request_irq()
>   nfp: Use IRQF_NO_AUTOEN flag in request_irq()
>   net: ieee802154: mcr20a: Use IRQF_NO_AUTOEN flag in request_irq()
>   wifi: p54: Use IRQF_NO_AUTOEN flag in request_irq()
>   wifi: mwifiex: Use IRQF_NO_AUTOEN flag in request_irq()
>   wifi: wl1251: Use IRQF_NO_AUTOEN flag in request_irq()
>
>  drivers/net/ethernet/apple/bmac.c   | 3 +--
>  drivers/net/ethernet/freescale/enetc/enetc.c| 3 +--
>  drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 5 ++---
>  drivers/net/ieee802154/mcr20a.c | 5 +
>  drivers/net/wireless/intersil/p54/p54spi.c  | 4 +---
>  drivers/net/wireless/marvell/mwifiex/main.c | 4 ++--
>  drivers/net/wireless/ti/wl1251/sdio.c   | 4 ++--
>  7 files changed, 10 insertions(+), 18 deletions(-)

Wireless patches go to wireless-next, please submit them in a separate
patchset.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



Re: [PATCH net] selftests: net: csum: Fix checksums for packets with non-zero padding

2024-09-09 Thread Sean Anderson
On 9/6/24 22:05, Willem de Bruijn wrote:
> Sean Anderson wrote:
>> Padding is not included in UDP and TCP checksums. Therefore, reduce the
>> length of the checksummed data to include only the data in the IP
>> payload. This fixes spurious reported checksum failures like
>> 
>> rx: pkt: sport=33000 len=26 csum=0xc850 verify=0xf9fe
>> pkt: bad csum
> 
> Are you using this test as receiver for other input?
> 
> The packet builder in the test doesn't generate these, does it?

It's added by the MAC before transmission.

This is permitted by the standard, but in this case it actually appears
to be due to the MAC using 32-bit reads for the data and not masking off
the end. Not sure whether this is a bug in the driver/device, since
technically we may leak up to 3 bytes of memory.

That said, it would be a nice enhancement to generate packets with
non-zero padding as well, since they are an interesting edge case.

> Just trying to understand if this is a bug fix or a new use for
> csum.c as receiver.

Bug fix.

>> Technically it is possible for there to be trailing bytes after the UDP
>> data but before the Ethernet padding (e.g. if sizeof(ip) + sizeof(udp) +
>> udp.len < ip.len). However, we don't generate such packets.
> 
> More likely is that L3 and L4 length fields agree, as both are
> generated at the sender, but that some trailer is attached in the
> network. Such as a timestamp trailer.

Yes, as noted above we don't generate packets with differing L3 and L4
lengths.

>> Fixes: 91a7de85600d ("selftests/net: add csum offload test")
>> Signed-off-by: Sean Anderson 
>> ---
>> Found while testing for this very bug in hardware checksum offloads.
>> 
>>  tools/testing/selftests/net/lib/csum.c | 16 ++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/testing/selftests/net/lib/csum.c 
>> b/tools/testing/selftests/net/lib/csum.c
>> index b9f3fc3c3426..e0a34e5e8dd5 100644
>> --- a/tools/testing/selftests/net/lib/csum.c
>> +++ b/tools/testing/selftests/net/lib/csum.c
>> @@ -654,10 +654,16 @@ static int recv_verify_packet_ipv4(void *nh, int len)
>>  {
>>  struct iphdr *iph = nh;
>>  uint16_t proto = cfg_encap ? IPPROTO_UDP : cfg_proto;
>> +uint16_t ip_len;
>>  
>>  if (len < sizeof(*iph) || iph->protocol != proto)
>>  return -1;
>>  
>> +ip_len = ntohs(iph->tot_len);
>> +if (ip_len > len || ip_len < sizeof(*iph))
>> +return -1;
>> +
>> +len = ip_len;
>>  iph_addr_p = &iph->saddr;
>>  if (proto == IPPROTO_TCP)
>>  return recv_verify_packet_tcp(iph + 1, len - sizeof(*iph));
>> @@ -669,16 +675,22 @@ static int recv_verify_packet_ipv6(void *nh, int len)
>>  {
>>  struct ipv6hdr *ip6h = nh;
>>  uint16_t proto = cfg_encap ? IPPROTO_UDP : cfg_proto;
>> +uint16_t ip_len;
> 
> nit: payload_len, as it never includes sizeof ipv6hdr

OK

--Sean

>>  if (len < sizeof(*ip6h) || ip6h->nexthdr != proto)
>>  return -1;
>>  
>> +ip_len = ntohs(ip6h->payload_len);
>> +if (ip_len > len - sizeof(*ip6h))
>> +return -1;
>> +
>> +len = ip_len;
>>  iph_addr_p = &ip6h->saddr;
>>  
>>  if (proto == IPPROTO_TCP)
>> -return recv_verify_packet_tcp(ip6h + 1, len - sizeof(*ip6h));
>> +return recv_verify_packet_tcp(ip6h + 1, len);
>>  else
>> -return recv_verify_packet_udp(ip6h + 1, len - sizeof(*ip6h));
>> +return recv_verify_packet_udp(ip6h + 1, len);
>>  }
>>  
>>  /* return whether auxdata includes TP_STATUS_CSUM_VALID */
>> -- 
>> 2.35.1.1320.gc452695387.dirty
>> 
> 
> 



Re: [PATCH net] selftests: net: csum: Fix checksums for packets with non-zero padding

2024-09-09 Thread Willem de Bruijn
Sean Anderson wrote:
> On 9/6/24 22:05, Willem de Bruijn wrote:
> > Sean Anderson wrote:
> >> Padding is not included in UDP and TCP checksums. Therefore, reduce the
> >> length of the checksummed data to include only the data in the IP
> >> payload. This fixes spurious reported checksum failures like
> >> 
> >> rx: pkt: sport=33000 len=26 csum=0xc850 verify=0xf9fe
> >> pkt: bad csum
> > 
> > Are you using this test as receiver for other input?
> > 
> > The packet builder in the test doesn't generate these, does it?
> 
> It's added by the MAC before transmission.
> 
> This is permitted by the standard, but in this case it actually appears
> to be due to the MAC using 32-bit reads for the data and not masking off
> the end. Not sure whether this is a bug in the driver/device, since
> technically we may leak up to 3 bytes of memory.
> 
> That said, it would be a nice enhancement to generate packets with
> non-zero padding as well, since they are an interesting edge case.

Thanks for that context.
 
> > Just trying to understand if this is a bug fix or a new use for
> > csum.c as receiver.
> 
> Bug fix.
> 
> >> Technically it is possible for there to be trailing bytes after the UDP
> >> data but before the Ethernet padding (e.g. if sizeof(ip) + sizeof(udp) +
> >> udp.len < ip.len). However, we don't generate such packets.
> > 
> > More likely is that L3 and L4 length fields agree, as both are
> > generated at the sender, but that some trailer is attached in the
> > network. Such as a timestamp trailer.
> 
> Yes, as noted above we don't generate packets with differing L3 and L4
> lengths.
> 
> >> Fixes: 91a7de85600d ("selftests/net: add csum offload test")
> >> Signed-off-by: Sean Anderson 

Reviewed-by: Willem de Bruijn 

> >> ---
> >> Found while testing for this very bug in hardware checksum offloads.
> >> 
> >>  tools/testing/selftests/net/lib/csum.c | 16 ++--
> >>  1 file changed, 14 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/tools/testing/selftests/net/lib/csum.c 
> >> b/tools/testing/selftests/net/lib/csum.c
> >> index b9f3fc3c3426..e0a34e5e8dd5 100644
> >> --- a/tools/testing/selftests/net/lib/csum.c
> >> +++ b/tools/testing/selftests/net/lib/csum.c
> >> @@ -654,10 +654,16 @@ static int recv_verify_packet_ipv4(void *nh, int len)
> >>  {
> >>struct iphdr *iph = nh;
> >>uint16_t proto = cfg_encap ? IPPROTO_UDP : cfg_proto;
> >> +  uint16_t ip_len;
> >>  
> >>if (len < sizeof(*iph) || iph->protocol != proto)
> >>return -1;
> >>  
> >> +  ip_len = ntohs(iph->tot_len);
> >> +  if (ip_len > len || ip_len < sizeof(*iph))
> >> +  return -1;
> >> +
> >> +  len = ip_len;
> >>iph_addr_p = &iph->saddr;
> >>if (proto == IPPROTO_TCP)
> >>return recv_verify_packet_tcp(iph + 1, len - sizeof(*iph));
> >> @@ -669,16 +675,22 @@ static int recv_verify_packet_ipv6(void *nh, int len)
> >>  {
> >>struct ipv6hdr *ip6h = nh;
> >>uint16_t proto = cfg_encap ? IPPROTO_UDP : cfg_proto;
> >> +  uint16_t ip_len;
> > 
> > nit: payload_len, as it never includes sizeof ipv6hdr
> 
> OK
> 
> --Sean
> 
> >>if (len < sizeof(*ip6h) || ip6h->nexthdr != proto)
> >>return -1;
> >>  
> >> +  ip_len = ntohs(ip6h->payload_len);
> >> +  if (ip_len > len - sizeof(*ip6h))
> >> +  return -1;
> >> +
> >> +  len = ip_len;
> >>iph_addr_p = &ip6h->saddr;
> >>  
> >>if (proto == IPPROTO_TCP)
> >> -  return recv_verify_packet_tcp(ip6h + 1, len - sizeof(*ip6h));
> >> +  return recv_verify_packet_tcp(ip6h + 1, len);
> >>else
> >> -  return recv_verify_packet_udp(ip6h + 1, len - sizeof(*ip6h));
> >> +  return recv_verify_packet_udp(ip6h + 1, len);
> >>  }
> >>  
> >>  /* return whether auxdata includes TP_STATUS_CSUM_VALID */
> >> -- 
> >> 2.35.1.1320.gc452695387.dirty
> >> 
> > 
> > 





Re: [PATCH net] selftests: net: csum: Fix checksums for packets with non-zero padding

2024-09-09 Thread Eric Dumazet
On Mon, Sep 9, 2024 at 5:02 PM Sean Anderson  wrote:
>
> On 9/6/24 22:05, Willem de Bruijn wrote:
> > Sean Anderson wrote:
> >> Padding is not included in UDP and TCP checksums. Therefore, reduce the
> >> length of the checksummed data to include only the data in the IP
> >> payload. This fixes spurious reported checksum failures like
> >>
> >> rx: pkt: sport=33000 len=26 csum=0xc850 verify=0xf9fe
> >> pkt: bad csum
> >
> > Are you using this test as receiver for other input?
> >
> > The packet builder in the test doesn't generate these, does it?
>
> It's added by the MAC before transmission.
>
> This is permitted by the standard, but in this case it actually appears
> to be due to the MAC using 32-bit reads for the data and not masking off
> the end. Not sure whether this is a bug in the driver/device, since
> technically we may leak up to 3 bytes of memory.

This seems to be a bug in the driver.

A call to skb_put_padto(skb, ETH_ZLEN) should be added.

>
> That said, it would be a nice enhancement to generate packets with
> non-zero padding as well, since they are an interesting edge case.
>
> > Just trying to understand if this is a bug fix or a new use for
> > csum.c as receiver.
>
> Bug fix.
>
> >> Technically it is possible for there to be trailing bytes after the UDP
> >> data but before the Ethernet padding (e.g. if sizeof(ip) + sizeof(udp) +
> >> udp.len < ip.len). However, we don't generate such packets.
> >
> > More likely is that L3 and L4 length fields agree, as both are
> > generated at the sender, but that some trailer is attached in the
> > network. Such as a timestamp trailer.
>
> Yes, as noted above we don't generate packets with differing L3 and L4
> lengths.
>
> >> Fixes: 91a7de85600d ("selftests/net: add csum offload test")
> >> Signed-off-by: Sean Anderson 
> >> ---
> >> Found while testing for this very bug in hardware checksum offloads.
> >>
> >>  tools/testing/selftests/net/lib/csum.c | 16 ++--
> >>  1 file changed, 14 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tools/testing/selftests/net/lib/csum.c 
> >> b/tools/testing/selftests/net/lib/csum.c
> >> index b9f3fc3c3426..e0a34e5e8dd5 100644
> >> --- a/tools/testing/selftests/net/lib/csum.c
> >> +++ b/tools/testing/selftests/net/lib/csum.c
> >> @@ -654,10 +654,16 @@ static int recv_verify_packet_ipv4(void *nh, int len)
> >>  {
> >>  struct iphdr *iph = nh;
> >>  uint16_t proto = cfg_encap ? IPPROTO_UDP : cfg_proto;
> >> +uint16_t ip_len;
> >>
> >>  if (len < sizeof(*iph) || iph->protocol != proto)
> >>  return -1;
> >>
> >> +ip_len = ntohs(iph->tot_len);
> >> +if (ip_len > len || ip_len < sizeof(*iph))
> >> +return -1;
> >> +
> >> +len = ip_len;
> >>  iph_addr_p = &iph->saddr;
> >>  if (proto == IPPROTO_TCP)
> >>  return recv_verify_packet_tcp(iph + 1, len - sizeof(*iph));
> >> @@ -669,16 +675,22 @@ static int recv_verify_packet_ipv6(void *nh, int len)
> >>  {
> >>  struct ipv6hdr *ip6h = nh;
> >>  uint16_t proto = cfg_encap ? IPPROTO_UDP : cfg_proto;
> >> +uint16_t ip_len;
> >
> > nit: payload_len, as it never includes sizeof ipv6hdr
>
> OK
>
> --Sean
>
> >>  if (len < sizeof(*ip6h) || ip6h->nexthdr != proto)
> >>  return -1;
> >>
> >> +ip_len = ntohs(ip6h->payload_len);
> >> +if (ip_len > len - sizeof(*ip6h))
> >> +return -1;
> >> +
> >> +len = ip_len;
> >>  iph_addr_p = &ip6h->saddr;
> >>
> >>  if (proto == IPPROTO_TCP)
> >> -return recv_verify_packet_tcp(ip6h + 1, len - sizeof(*ip6h));
> >> +return recv_verify_packet_tcp(ip6h + 1, len);
> >>  else
> >> -return recv_verify_packet_udp(ip6h + 1, len - sizeof(*ip6h));
> >> +return recv_verify_packet_udp(ip6h + 1, len);
> >>  }
> >>
> >>  /* return whether auxdata includes TP_STATUS_CSUM_VALID */
> >> --
> >> 2.35.1.1320.gc452695387.dirty
> >>
> >
> >



Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-09 Thread Sean Christopherson
On Mon, Sep 09, 2024, Paolo Bonzini wrote:
> On 9/9/24 07:30, Yan Zhao wrote:
> > On Thu, Sep 05, 2024 at 05:43:17PM +0800, Yan Zhao wrote:
> > > On Wed, Sep 04, 2024 at 05:41:06PM -0700, Sean Christopherson wrote:
> > > > On Wed, Sep 04, 2024, Yan Zhao wrote:
> > > > > On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:
> > > > > > On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
> > > > > > > Sean Christopherson  writes:
> > > > > > > 
> > > > > > > > On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> > > > > > > > > FWIW, I use QEMU-9.0 from the same C10S 
> > > > > > > > > (qemu-kvm-9.0.0-7.el10.x86_64)
> > > > > > > > > but I don't think it matters in this case. My CPU is 
> > > > > > > > > "Intel(R) Xeon(R)
> > > > > > > > > Silver 4410Y".
> > > > > > > > 
> > > > > > > > Has this been reproduced on any other hardware besides SPR?  
> > > > > > > > I.e. did we stumble
> > > > > > > > on another hardware issue?
> > > > > > > 
> > > > > > > Very possible, as according to Yan Zhao this doesn't reproduce on 
> > > > > > > at
> > > > > > > least "Coffee Lake-S". Let me try to grab some random hardware 
> > > > > > > around
> > > > > > > and I'll be back with my observations.
> > > > > > 
> > > > > > Update some new findings from my side:
> > > > > > 
> > > > > > BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys 
> > > > > > range
> > > > > > from 0xfd00 to 0xfe00.
> > > > > > 
> > > > > > On "Sapphire Rapids XCC":
> > > > > > 
> > > > > > 1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can 
> > > > > > launch
> > > > > > correctly.
> > > > > > i.e.
> > > > > > if (gfn >= 0xfd000 && gfn < 0xfe000) {
> > > > > > return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | 
> > > > > > VMX_EPT_IPAT_BIT;
> > > > > > }
> > > > > > return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> > > > > > 
> > > > > > 2. If KVM forces this fb_map range to be UC+IPAT, installer failes 
> > > > > > to show / gdm
> > > > > > restarts endlessly. (though on Coffee Lake-S, installer/gdm can 
> > > > > > launch
> > > > > > correctly in this case).
> > > > > > 
> > > > > > 3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is 
> > > > > > called to set
> > > > > > this fb_map range as WC, with
> > > > > > iosys_map_set_vaddr_iomem(&iter_io->dmap, 
> > > > > > ioremap_wc(mem->bus.offset, mem->size));
> > > > > > 
> > > > > > However, during 
> > > > > > bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns for
> > > > > > this fb_map has been reserved as uc- by ioremap().
> > > > > > Then, the ioremap_wc() during starting GDM will only map guest 
> > > > > > PAT with UC-.
> > > > > > 
> > > > > > So, with KVM setting WB (no IPAT) to this fb_map range, the 
> > > > > > effective
> > > > > > memory type is UC- and installer/gdm restarts endlessly.
> > > > > > 
> > > > > > 4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest 
> > > > > > bochs driver
> > > > > > to call ioremap_wc() instead in bochs_hw_init(), gdm can launch 
> > > > > > correctly.
> > > > > > (didn't verify the installer's case as I can't update the 
> > > > > > driver in that case).
> > > > > > 
> > > > > > The reason is that the ioremap_wc() called during starting GDM 
> > > > > > will no longer
> > > > > > meet conflict and can map guest PAT as WC.
> > > > 
> > > > Huh.  The upside of this is that it sounds like there's nothing broken 
> > > > with WC
> > > > or self-snoop.
> > > Considering a different perspective, the fb_map range is used as frame 
> > > buffer
> > > (vram), with the guest writing to this range and the host reading from it.
> > > If the issue were related to self-snooping, we would expect the VNC 
> > > window to
> > > display distorted data. However, the observed behavior is that the GDM 
> > > window
> > > shows up correctly for a sec and restarts over and over.
> > > 
> > > So, do you think we can simply fix this issue by calling ioremap_wc() for 
> > > the
> > > frame buffer/vram range in bochs driver, as is commonly done in other gpu
> > > drivers?
> > > 
> > > --- a/drivers/gpu/drm/tiny/bochs.c
> > > +++ b/drivers/gpu/drm/tiny/bochs.c
> > > @@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
> > >  if (pci_request_region(pdev, 0, "bochs-drm") != 0)
> > >  DRM_WARN("Cannot request framebuffer, boot fb still 
> > > active?\n");
> > > 
> > > -   bochs->fb_map = ioremap(addr, size);
> > > +   bochs->fb_map = ioremap_wc(addr, size);
> > >  if (bochs->fb_map == NULL) {
> > >  DRM_ERROR("Cannot map framebuffer\n");
> > >  return -ENOMEM;
> 
> While this is a fix for future kernels, it doesn't change the result for VMs
> already in existence.

I would prefer to bottom out on exactly whether or not the SPR/CLX behavior is
working as intended.  Maybe the ~8x slowdown is just a side effect of any Intel

Re: [PATCH 06/15] riscv: migrate to the generic rule for built-in DTB

2024-09-09 Thread Conor Dooley
On Thu, Sep 05, 2024 at 08:47:42AM +0900, Masahiro Yamada wrote:
> Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled.
> 
> To keep consistency across architectures, this commit also renames
> CONFIG_BUILTIN_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME.
> 
> Signed-off-by: Masahiro Yamada 

Acked-by: Conor Dooley 


signature.asc
Description: PGP signature


Re: [PATCH net] selftests: net: csum: Fix checksums for packets with non-zero padding

2024-09-09 Thread Willem de Bruijn
Eric Dumazet wrote:
> On Mon, Sep 9, 2024 at 5:02 PM Sean Anderson  wrote:
> >
> > On 9/6/24 22:05, Willem de Bruijn wrote:
> > > Sean Anderson wrote:
> > >> Padding is not included in UDP and TCP checksums. Therefore, reduce the
> > >> length of the checksummed data to include only the data in the IP
> > >> payload. This fixes spurious reported checksum failures like
> > >>
> > >> rx: pkt: sport=33000 len=26 csum=0xc850 verify=0xf9fe
> > >> pkt: bad csum
> > >
> > > Are you using this test as receiver for other input?
> > >
> > > The packet builder in the test doesn't generate these, does it?
> >
> > It's added by the MAC before transmission.
> >
> > This is permitted by the standard, but in this case it actually appears
> > to be due to the MAC using 32-bit reads for the data and not masking off
> > the end. Not sure whether this is a bug in the driver/device, since
> > technically we may leak up to 3 bytes of memory.
> 
> This seems to be a bug in the driver.
> 
> A call to skb_put_padto(skb, ETH_ZLEN) should be added.

In which case this test detecting it may be nice to have, for lack of
a more targeted test.



Re: [PATCH net-next] page_pool: add a test module for page_pool

2024-09-09 Thread Mina Almasry
On Mon, Sep 9, 2024 at 2:25 AM Yunsheng Lin  wrote:
>
> The testing is done by ensuring that the page allocated from
> the page_pool instance is pushed into a ptr_ring instance in
> a kthread/napi binded to a specified cpu, and a kthread/napi
> binded to a specified cpu will pop the page from the ptr_ring
> and free it back to the page_pool.
>
> Signed-off-by: Yunsheng Lin 

It seems this test is has a correctness part and a performance part.
For the performance test, Jesper has out of tree tests for the
page_pool:
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c

I have these rebased on top of net-next and use them to verify devmem
& memory-provider performance:
https://github.com/mina/linux/commit/07fd1c04591395d15d83c07298b4d37f6b56157f

My preference here (for the performance part) is to upstream the
out-of-tree tests that Jesper (and probably others) are using, rather
than adding a new performance test that is not as battle-hardened.

--
Thanks,
Mina



Re: [PATCH v6 1/2] selftests: Rename sigaltstack to generic signal

2024-09-09 Thread Shuah Khan

On 9/8/24 23:16, Dev Jain wrote:


On 9/7/24 01:29, Shuah Khan wrote:

On 9/4/24 23:56, Dev Jain wrote:


On 9/4/24 22:35, Shuah Khan wrote:

On 9/3/24 22:52, Dev Jain wrote:


On 9/4/24 03:14, Shuah Khan wrote:

On 8/30/24 10:29, Dev Jain wrote:


On 8/27/24 17:16, Dev Jain wrote:


On 8/27/24 17:14, Shuah Khan wrote:

On 8/22/24 06:14, Dev Jain wrote:

Rename sigaltstack to generic signal directory, to allow adding more
signal tests in the future.


Sorry - I think I mentioned I don't like this test renamed. Why are you sending
this rename still included in the patch series?


I am not renaming the test, just the directory. The directory name
is changed to signal, and I have retained the name of the test -
sas.c.


Gentle ping: I guess there was a misunderstanding; in v5, I was
also changing the name of the test, to which you objected, and
I agreed. But, we need to change the name of the directory since
the new test has no relation to the current directory name,
"sigaltstack". The patch description explains that the directory
should be generically named.



Right. You are no longer changing the test name. You are still
changing the directory name. The problem I mentioned stays the
same. Any fixes to the existing tests in this directory can no
longer auto applied to stables releases.


I understand your point, but commit baa489fabd01 (selftests/vm: rename
selftests/vm to selftests/mm) is also present. That was a lot bigger change;
sigaltstack contains just one test currently, whose fixes possibly would have
to be backported, so I guess it should not be that much of a big problem?





So who does the backports whenevenr something changes? You are adding
work where as the automated process would just work without this
change. It doesn't matter if there is another test that changed
the name.


Other than the desire to rename the directory to generic, what
other value does this change bring?


Do you have an alternative suggestion as to where I should put my new test then;
I do not see what is the value of creating another directory to just include
my test. This will unnecessarily clutter the selftests/ directory with
directories containing single tests. And, putting this in "sigaltstack" is just
wrong since this test has no relation with sigaltstack.



If this new test has no relation to sigaltstack, then why are you changing
and renaming the sigaltstack directory?


Because the functionality I am testing is of signals, and signals are a superset
of sigaltstack. Still, I can think of a compromise, if semantically you want to
consider the new test as not testing signals, but a specific syscall "sigaction"
and its interaction with blocking of signals, how about naming the new directory 
"sigaction"?

Adding a new directory is much better
than going down a path that is more confusing and adding backport overhead.



Okay - they are related except that you view signalstack as a subset
of signals. I saw Mark's response as well saying sigaction isn't
a good name for this.

Rename usually wipe out git history as well based on what have seen
in the past.

My main concern is backports. Considering sigstack hasn't changed
2021 (as Mark's email), let's rename it.

I am reluctantly agreeing to the rename as it seems to make sense
in this case.


Thanks! I guess there is no update required from my side, and you can
pull this series?




I can pull this with x86v maintainer ack.

Or to go through x86 tree:

Acked-by: Shuah Khan 

thanks,
-- Shuah




[PATCH RFC 0/3] Verify bias functionality for pinctrl_paris driver through new gpio test

2024-09-09 Thread Nícolas F . R . A . Prado
This series was motivated by the regression fixed by 166bf8af9122
("pinctrl: mediatek: common-v2: Fix broken bias-disable for
PULL_PU_PD_RSEL_TYPE"). A bug was introduced in the pinctrl_paris driver
which prevented certain pins from having their bias configured.

Running this test on the mt8195-tomato platform with the test plan
included below[1] shows the test passing with the fix applied, but failing
without the fix:

With fix:
  $ ./gpio-setget-config.py
  TAP version 13
  # Using test plan file: ./google,tomato.yaml
  1..3
  ok 1 pinctrl_paris.34.pull-up
  ok 2 pinctrl_paris.34.pull-down
  ok 3 pinctrl_paris.34.disabled
  # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Without fix:
  $ ./gpio-setget-config.py
  TAP version 13
  # Using test plan file: ./google,tomato.yaml
  1..3
  # Bias doesn't match: Expected pull-up, read pull-down.
  not ok 1 pinctrl_paris.34.pull-up
  ok 2 pinctrl_paris.34.pull-down
  # Bias doesn't match: Expected disabled, read pull-down.
  not ok 3 pinctrl_paris.34.disabled
  # Totals: pass:1 fail:2 xfail:0 xpass:0 skip:0 error:0

In order to achieve this, the first patch exposes bias configuration
through the GPIO API in the pinctrl_paris driver, patch 2 extends the
gpio-mockup-cdev utility for use by patch 3, and patch 3 introduces a
new GPIO kselftest that takes a test plan in YAML, which can be tailored
per-platform to specify the configurations to test, and sets and gets
back each pin configuration to verify that they match and thus that the
driver is behaving as expected.

Since the GPIO uAPI only allows setting the pin configuration, getting
it back is done through pinconf-pins in the pinctrl debugfs folder.

The test currently only verifies bias but it would be easy to extend to
verify other pin configurations.

The test plan YAML file can be customized for each use-case and is
platform-dependant. For that reason, only an example is included in
patch 3 and the user is supposed to provide their test plan. That said,
the aim is to collect test plans for ease of use at [2].

[1] This is the test plan used for mt8195-tomato:

- label: "pinctrl_paris"
  tests:
  # Pin 34 has type MTK_PULL_PU_PD_RSEL_TYPE and is unused.
  # Setting bias to MTK_PULL_PU_PD_RSEL_TYPE pins was fixed by
  # 166bf8af9122 ("pinctrl: mediatek: common-v2: Fix broken bias-disable for 
PULL_PU_PD_RSEL_TYPE")
  - pin: 34
bias: "pull-up"
  - pin: 34
bias: "pull-down"
  - pin: 34
bias: "disabled"

[2] https://github.com/kernelci/platform-test-parameters

Signed-off-by: Nícolas F. R. A. Prado 
---
Nícolas F. R. A. Prado (3):
  pinctrl: mediatek: paris: Expose more configurations to GPIO set_config
  selftest: gpio: Add wait flag to gpio-mockup-cdev
  selftest: gpio: Add a new set-get config test

 drivers/pinctrl/mediatek/pinctrl-paris.c   |  20 +--
 tools/testing/selftests/gpio/Makefile  |   2 +-
 tools/testing/selftests/gpio/gpio-mockup-cdev.c|  14 +-
 .../gpio-set-get-config-example-test-plan.yaml |  15 ++
 .../testing/selftests/gpio/gpio-set-get-config.py  | 183 +
 5 files changed, 220 insertions(+), 14 deletions(-)
---
base-commit: 6a7917c89f219f09b1d88d09f376000914a52763
change-id: 20240906-kselftest-gpio-set-get-config-6e5bb670c1a5

Best regards,
-- 
Nícolas F. R. A. Prado 




[PATCH RFC 1/3] pinctrl: mediatek: paris: Expose more configurations to GPIO set_config

2024-09-09 Thread Nícolas F . R . A . Prado
Currently the set_config callback in the gpio_chip registered by the
pinctrl_paris driver only supports PIN_CONFIG_INPUT_DEBOUNCE, despite
many other configurations already being implemented and available
through the pinctrl API for configuration of pins by the Devicetree and
other drivers.

Expose all configurations currently implemented through the GPIO API so
they can also be set from userspace, which is particularly useful to
allow testing them from userspace.

Signed-off-by: Nícolas F. R. A. Prado 
---
 drivers/pinctrl/mediatek/pinctrl-paris.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/pinctrl/mediatek/pinctrl-paris.c 
b/drivers/pinctrl/mediatek/pinctrl-paris.c
index e12316c42698..668f8055a544 100644
--- a/drivers/pinctrl/mediatek/pinctrl-paris.c
+++ b/drivers/pinctrl/mediatek/pinctrl-paris.c
@@ -255,10 +255,9 @@ static int mtk_pinconf_get(struct pinctrl_dev *pctldev,
return err;
 }
 
-static int mtk_pinconf_set(struct pinctrl_dev *pctldev, unsigned int pin,
+static int mtk_pinconf_set(struct mtk_pinctrl *hw, unsigned int pin,
   enum pin_config_param param, u32 arg)
 {
-   struct mtk_pinctrl *hw = pinctrl_dev_get_drvdata(pctldev);
const struct mtk_pin_desc *desc;
int err = -ENOTSUPP;
u32 reg;
@@ -795,7 +794,7 @@ static int mtk_pconf_group_set(struct pinctrl_dev *pctldev, 
unsigned group,
int i, ret;
 
for (i = 0; i < num_configs; i++) {
-   ret = mtk_pinconf_set(pctldev, grp->pin,
+   ret = mtk_pinconf_set(hw, grp->pin,
  pinconf_to_config_param(configs[i]),
  pinconf_to_config_argument(configs[i]));
if (ret < 0)
@@ -937,18 +936,19 @@ static int mtk_gpio_set_config(struct gpio_chip *chip, 
unsigned int offset,
 {
struct mtk_pinctrl *hw = gpiochip_get_data(chip);
const struct mtk_pin_desc *desc;
-   u32 debounce;
+   enum pin_config_param param = pinconf_to_config_param(config);
+   u32 arg = pinconf_to_config_argument(config);
 
desc = (const struct mtk_pin_desc *)&hw->soc->pins[offset];
 
-   if (!hw->eint ||
-   pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE ||
-   desc->eint.eint_n == EINT_NA)
-   return -ENOTSUPP;
+   if (param == PIN_CONFIG_INPUT_DEBOUNCE) {
+   if (!hw->eint || desc->eint.eint_n == EINT_NA)
+   return -ENOTSUPP;
 
-   debounce = pinconf_to_config_argument(config);
+   return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, arg);
+   }
 
-   return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, debounce);
+   return mtk_pinconf_set(hw, offset, param, arg);
 }
 
 static int mtk_build_gpiochip(struct mtk_pinctrl *hw)

-- 
2.46.0




[PATCH RFC 2/3] selftest: gpio: Add wait flag to gpio-mockup-cdev

2024-09-09 Thread Nícolas F . R . A . Prado
Add a -w flag to the gpio-mockup-cdev utility that causes the program to
wait until a signal is received before exiting, even when its behavior
is to retrieve the GPIO value of the line. This allows using this
utility to keep a GPIO line configured even when in input mode, which
will be relied on in other tests.

Signed-off-by: Nícolas F. R. A. Prado 
---
 tools/testing/selftests/gpio/gpio-mockup-cdev.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/gpio/gpio-mockup-cdev.c 
b/tools/testing/selftests/gpio/gpio-mockup-cdev.c
index d1640f44f8ac..f674dcafa60a 100644
--- a/tools/testing/selftests/gpio/gpio-mockup-cdev.c
+++ b/tools/testing/selftests/gpio/gpio-mockup-cdev.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CONSUMER   "gpio-mockup-cdev"
 
@@ -95,6 +96,7 @@ static void usage(char *prog)
printf("   (default is to leave bias unchanged):\n");
printf("-l: set line active low (default is active high)\n");
printf("-s: set line value (default is to get line value)\n");
+   printf("-w: wait even in get mode\n");
printf("-u: uAPI version to use (default is 2)\n");
exit(-1);
 }
@@ -120,13 +122,14 @@ int main(int argc, char *argv[])
unsigned int offset, val = 0, abiv;
uint32_t flags_v1;
uint64_t flags_v2;
+   bool wait = false;
 
abiv = 2;
ret = 0;
flags_v1 = GPIOHANDLE_REQUEST_INPUT;
flags_v2 = GPIO_V2_LINE_FLAG_INPUT;
 
-   while ((opt = getopt(argc, argv, "lb:s:u:")) != -1) {
+   while ((opt = getopt(argc, argv, "lb:s:u:w")) != -1) {
switch (opt) {
case 'l':
flags_v1 |= GPIOHANDLE_REQUEST_ACTIVE_LOW;
@@ -150,10 +153,14 @@ int main(int argc, char *argv[])
flags_v1 |= GPIOHANDLE_REQUEST_OUTPUT;
flags_v2 &= ~GPIO_V2_LINE_FLAG_INPUT;
flags_v2 |= GPIO_V2_LINE_FLAG_OUTPUT;
+   wait = true;
break;
case 'u':
abiv = atoi(optarg);
break;
+   case 'w':
+   wait = true;
+   break;
default:
usage(argv[0]);
}
@@ -183,9 +190,10 @@ int main(int argc, char *argv[])
return lfd;
}
 
-   if (flags_v2 & GPIO_V2_LINE_FLAG_OUTPUT) {
+   if (wait)
wait_signal();
-   } else {
+
+   if (flags_v2 & GPIO_V2_LINE_FLAG_INPUT) {
if (abiv == 1)
ret = get_value_v1(lfd);
else

-- 
2.46.0




[PATCH RFC 3/3] selftest: gpio: Add a new set-get config test

2024-09-09 Thread Nícolas F . R . A . Prado
Add a new kselftest that sets a configuration to a GPIO line and then
gets it back to verify that it was correctly carried out by the driver.

Setting a configuration is done through the GPIO uAPI, but retrieving it
is done through the debugfs interface since that is the only place where
it can be retrieved from userspace.

The test reads the test plan from a YAML file, which includes the chips
and pin settings to set and validate.

Signed-off-by: Nícolas F. R. A. Prado 
---
 tools/testing/selftests/gpio/Makefile  |   2 +-
 .../gpio-set-get-config-example-test-plan.yaml |  15 ++
 .../testing/selftests/gpio/gpio-set-get-config.py  | 183 +
 3 files changed, 199 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/gpio/Makefile 
b/tools/testing/selftests/gpio/Makefile
index e0884390447d..bdfeb0c9aadd 100644
--- a/tools/testing/selftests/gpio/Makefile
+++ b/tools/testing/selftests/gpio/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 
-TEST_PROGS := gpio-mockup.sh gpio-sim.sh
+TEST_PROGS := gpio-mockup.sh gpio-sim.sh gpio-set-get-config.py
 TEST_FILES := gpio-mockup-sysfs.sh
 TEST_GEN_PROGS_EXTENDED := gpio-mockup-cdev gpio-chip-info gpio-line-name
 CFLAGS += -O2 -g -Wall $(KHDR_INCLUDES)
diff --git 
a/tools/testing/selftests/gpio/gpio-set-get-config-example-test-plan.yaml 
b/tools/testing/selftests/gpio/gpio-set-get-config-example-test-plan.yaml
new file mode 100644
index ..3b749be3c8dc
--- /dev/null
+++ b/tools/testing/selftests/gpio/gpio-set-get-config-example-test-plan.yaml
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0
+# Top-level contains a list of the GPIO chips that will be tested. Each one is
+# chosen based on the GPIO chip's info label.
+- label: "gpiochip_device_label"
+  # For each GPIO chip, multiple pin configurations can be tested, which are
+  # listed under 'tests'
+  tests:
+  # pin indicates the pin number to test
+  - pin: 34
+# bias can be 'pull-up', 'pull-down', 'disabled'
+bias: "pull-up"
+  - pin: 34
+bias: "pull-down"
+  - pin: 34
+bias: "disabled"
diff --git a/tools/testing/selftests/gpio/gpio-set-get-config.py 
b/tools/testing/selftests/gpio/gpio-set-get-config.py
new file mode 100755
index ..6f1444c8d46b
--- /dev/null
+++ b/tools/testing/selftests/gpio/gpio-set-get-config.py
@@ -0,0 +1,183 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024 Collabora Ltd
+
+#
+# This test validates GPIO pin configuration. It takes a test plan in YAML (see
+# gpio-set-get-config-example-test-plan.yaml) and sets and gets back each pin
+# configuration described in the plan and checks that they match in order to
+# validate that they are being applied correctly.
+#
+# When the file name for the test plan is not provided through --test-plan, it
+# will be guessed based on the platform ID (DT compatible or DMI).
+#
+
+import time
+import os
+import sys
+import argparse
+import re
+import subprocess
+import glob
+import signal
+
+import yaml
+
+# Allow ksft module to be imported from different directory
+this_dir = os.path.dirname(os.path.realpath(__file__))
+sys.path.append(os.path.join(this_dir, "../kselftest/"))
+
+import ksft
+
+
+def config_pin(chip_dev, pin_config):
+flags = []
+if pin_config.get("bias"):
+flags += f"-b {pin_config['bias']}".split()
+flags += ["-w", chip_dev, str(pin_config["pin"])]
+gpio_mockup_cdev_path = os.path.join(this_dir, "gpio-mockup-cdev")
+return subprocess.Popen([gpio_mockup_cdev_path] + flags)
+
+
+def get_bias_debugfs(chip_debugfs_path, pin):
+with open(os.path.join(chip_debugfs_path, "pinconf-pins")) as f:
+for l in f:
+m = re.match(rf"pin {pin}.*bias (?P(pull )?\w+)", l)
+if m:
+return m.group("bias")
+
+
+def check_config_pin(chip, chip_debugfs_dir, pin_config):
+test_passed = True
+
+if pin_config.get("bias"):
+bias = get_bias_debugfs(chip_debugfs_dir, pin_config["pin"])
+# Convert "pull up" / "pull down" to "pull-up" / "pull-down"
+bias = bias.replace(" ", "-")
+if bias != pin_config["bias"]:
+ksft.print_msg(
+f"Bias doesn't match: Expected {pin_config['bias']}, read 
{bias}."
+)
+test_passed = False
+
+ksft.test_result(
+test_passed,
+f"{chip['label']}.{pin_config['pin']}.{pin_config['bias']}",
+)
+
+
+def get_devfs_chip_file(chip_dict):
+gpio_chip_info_path = os.path.join(this_dir, 'gpio-chip-info')
+for f in glob.glob("/dev/gpiochip*"):
+proc = subprocess.run(
+f"{gpio_chip_info_path} {f} label".split(), capture_output=True, 
text=True
+)
+if proc.returncode:
+ksft.print_msg(f"Error opening gpio device {f}: {proc.returncode}")
+ksft.exit_fail()
+
+if chip_dict["label"] in proc.stdout:
+return f
+
+
+def get_debugfs_chip_dir(chip):
+pinct

Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize performance and improve readability

2024-09-09 Thread Thomas Gleixner
On Wed, Sep 04 2024 at 21:41, Zhen Lei wrote:

> Currently, there are multiple instances where several nodes are extracted
> from one list and added to another list. One by one extraction, and then
> one by one splicing, not only low efficiency, readability is also poor.
> The work can be done well with hlist_cut_number() and hlist_splice_init(),
> which move the entire sublist at once.
>
> When the number of nodes expected to be moved is less than or equal to 0,
> or the source list is empty, hlist_cut_number() safely returns 0. The
> splicing is performed only when the return value of hlist_cut_number() is
> greater than 0.
>
> For two calls to hlist_cut_number() in __free_object(), the result is
> obviously positive, the check of the return value is omitted.

Sure but hlist_cut_number() suffers from the same problem as the current
code. If is a massive cache line chase as you actually have to walk the
list to figure out where to cut it off.

All related functions have this problem and all of this code is very
strict about boundaries. Instead of accurately doing the refill, purge
etc. we should look into proper batch mode mechanisms. Let me think
about it.

Thanks,

tglx





Re: [PATCH] i2c: virtio: Constify struct i2c_algorithm and struct virtio_device_id

2024-09-09 Thread Andi Shyti
Hi Christophe,

On Sun, Sep 08, 2024 at 08:52:07AM GMT, Christophe JAILLET wrote:
> 'struct i2c_algorithm' and 'struct virtio_device_id' are not modified in
> this driver.
> 
> Constifying this structure moves some data to a read-only section, so
> increase overall security, especially when the structure holds some
> function pointers, which is the case for struct i2c_algorithm.
> 
> On a x86_64, with allmodconfig:
> Before:
> ==
>text  data bss dec hex filename
>6663   568  1672471c4f drivers/i2c/busses/i2c-virtio.o
> 
> After:
> =
>text  data bss dec hex filename
>6735   472  1672231c37 drivers/i2c/busses/i2c-virtio.o
> 
> Signed-off-by: Christophe JAILLET 
> --
> Compile tested only

Makes sense to me... if this works, you could pioneer a sequence
of simiar changes :-)

Merged to i2c/i2c-host.

Thanks,
Andi



[PATCH net v3 0/2] bpf: devmap: provide rxq after redirect

2024-09-09 Thread Florian Kauer
rxq contains a pointer to the device from where
the redirect happened. Currently, the BPF program
that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
does not have it set.

Add bugfix and related selftest.

Signed-off-by: Florian Kauer 
---
Changes in v3:
- initialize skel to NULL, thanks Stanislav
- Link to v2: 
https://lore.kernel.org/r/20240906-devel-koalo-fix-ingress-ifindex-v2-0-4caa12c64...@linutronix.de

Changes in v2:
- changed fixes tag
- added selftest
- Link to v1: 
https://lore.kernel.org/r/20240905-devel-koalo-fix-ingress-ifindex-v1-1-d12a0d74c...@linutronix.de

---
Florian Kauer (2):
  bpf: devmap: provide rxq after redirect
  bpf: selftests: send packet to devmap redirect XDP

 kernel/bpf/devmap.c|  11 +-
 .../selftests/bpf/prog_tests/xdp_devmap_attach.c   | 114 +++--
 2 files changed, 115 insertions(+), 10 deletions(-)
---
base-commit: 8e69c96df771ab469cec278edb47009351de4da6
change-id: 20240905-devel-koalo-fix-ingress-ifindex-b9293d471db6

Best regards,
-- 
Florian Kauer 




Re: [PATCH] i2c: virtio: Constify struct i2c_algorithm and struct virtio_device_id

2024-09-09 Thread Christophe JAILLET

Le 09/09/2024 à 20:54, Andi Shyti a écrit :

Hi Christophe,

On Sun, Sep 08, 2024 at 08:52:07AM GMT, Christophe JAILLET wrote:

'struct i2c_algorithm' and 'struct virtio_device_id' are not modified in
this driver.

Constifying this structure moves some data to a read-only section, so
increase overall security, especially when the structure holds some
function pointers, which is the case for struct i2c_algorithm.

On a x86_64, with allmodconfig:
Before:
==
text   data bss dec hex filename
6663568  1672471c4f drivers/i2c/busses/i2c-virtio.o

After:
=
text   data bss dec hex filename
6735472  1672231c37 drivers/i2c/busses/i2c-virtio.o

Signed-off-by: Christophe JAILLET 
--
Compile tested only


Makes sense to me... if this works, you could pioneer a sequence
of simiar changes :-)


Hi,

I already did some other ones [1]. A few more still need to be checked.

I'll finish it after ending struct regulator_desc that I'm currently 
looking at. [2]



[1]: https://lore.kernel.org/all/?q=%22Constify+struct+i2c_algorithm%22
[2]: https://lore.kernel.org/all/?q=%22Constify+struct+regulator_desc%22



Merged to i2c/i2c-host.


Thanks.
CJ



Thanks,
Andi







[PATCH net v3 1/2] bpf: devmap: provide rxq after redirect

2024-09-09 Thread Florian Kauer
rxq contains a pointer to the device from where
the redirect happened. Currently, the BPF program
that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
does not have it set.

This is particularly bad since accessing ingress_ifindex, e.g.

SEC("xdp")
int prog(struct xdp_md *pkt)
{
return bpf_redirect_map(&dev_redirect_map, 0, 0);
}

SEC("xdp/devmap")
int prog_after_redirect(struct xdp_md *pkt)
{
bpf_printk("ifindex %i", pkt->ingress_ifindex);
return XDP_PASS;
}

depends on access to rxq, so a NULL pointer gets dereferenced:

<1>[  574.475170] BUG: kernel NULL pointer dereference, address: 

<1>[  574.475188] #PF: supervisor read access in kernel mode
<1>[  574.475194] #PF: error_code(0x) - not-present page
<6>[  574.475199] PGD 0 P4D 0
<4>[  574.475207] Oops: Oops:  [#1] PREEMPT SMP NOPTI
<4>[  574.475217] CPU: 4 UID: 0 PID: 217 Comm: kworker/4:1 Not tainted 
6.11.0-rc5-reduced-00859-g780801200300 #23
<4>[  574.475226] Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, 
BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
<4>[  574.475231] Workqueue: mld mld_ifc_work
<4>[  574.475247] RIP: 
0010:bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
<4>[  574.475257] Code: cc cc cc cc cc cc cc 80 00 00 00 cc cc cc cc cc cc cc 
cc f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 57 20 <48> 8b 
52 00 8b 92 e0 00 00 00 48 bf f8 a6 d5 c4 5d a0 ff ff be 0b
<4>[  574.475263] RSP: 0018:a62440280c98 EFLAGS: 00010206
<4>[  574.475269] RAX: a62440280cd8 RBX: 0001 RCX: 

<4>[  574.475274] RDX:  RSI: a62440549048 RDI: 
a62440280ce0
<4>[  574.475278] RBP: a62440280c98 R08: 0002 R09: 
0001
<4>[  574.475281] R10: a05dc8b98000 R11: a05f577fca40 R12: 
a05dcab24000
<4>[  574.475285] R13: a62440280ce0 R14: a62440549048 R15: 
a62440549000
<4>[  574.475289] FS:  () GS:a05f4f70() 
knlGS:
<4>[  574.475294] CS:  0010 DS:  ES:  CR0: 80050033
<4>[  574.475298] CR2:  CR3: 00025522e000 CR4: 
00f50ef0
<4>[  574.475303] PKRU: 5554
<4>[  574.475306] Call Trace:
<4>[  574.475313]  
<4>[  574.475318]  ? __die+0x23/0x70
<4>[  574.475329]  ? page_fault_oops+0x180/0x4c0
<4>[  574.475339]  ? skb_pp_cow_data+0x34c/0x490
<4>[  574.475346]  ? kmem_cache_free+0x257/0x280
<4>[  574.475357]  ? exc_page_fault+0x67/0x150
<4>[  574.475368]  ? asm_exc_page_fault+0x26/0x30
<4>[  574.475381]  ? bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
<4>[  574.475386]  bq_xmit_all+0x158/0x420
<4>[  574.475397]  __dev_flush+0x30/0x90
<4>[  574.475407]  veth_poll+0x216/0x250 [veth]
<4>[  574.475421]  __napi_poll+0x28/0x1c0
<4>[  574.475430]  net_rx_action+0x32d/0x3a0
<4>[  574.475441]  handle_softirqs+0xcb/0x2c0
<4>[  574.475451]  do_softirq+0x40/0x60
<4>[  574.475458]  
<4>[  574.475461]  
<4>[  574.475464]  __local_bh_enable_ip+0x66/0x70
<4>[  574.475471]  __dev_queue_xmit+0x268/0xe40
<4>[  574.475480]  ? selinux_ip_postroute+0x213/0x420
<4>[  574.475491]  ? alloc_skb_with_frags+0x4a/0x1d0
<4>[  574.475502]  ip6_finish_output2+0x2be/0x640
<4>[  574.475512]  ? nf_hook_slow+0x42/0xf0
<4>[  574.475521]  ip6_finish_output+0x194/0x300
<4>[  574.475529]  ? __pfx_ip6_finish_output+0x10/0x10
<4>[  574.475538]  mld_sendpack+0x17c/0x240
<4>[  574.475548]  mld_ifc_work+0x192/0x410
<4>[  574.475557]  process_one_work+0x15d/0x380
<4>[  574.475566]  worker_thread+0x29d/0x3a0
<4>[  574.475573]  ? __pfx_worker_thread+0x10/0x10
<4>[  574.475580]  ? __pfx_worker_thread+0x10/0x10
<4>[  574.475587]  kthread+0xcd/0x100
<4>[  574.475597]  ? __pfx_kthread+0x10/0x10
<4>[  574.475606]  ret_from_fork+0x31/0x50
<4>[  574.475615]  ? __pfx_kthread+0x10/0x10
<4>[  574.475623]  ret_from_fork_asm+0x1a/0x30
<4>[  574.475635]  
<4>[  574.475637] Modules linked in: veth br_netfilter bridge stp llc iwlmvm 
x86_pkg_temp_thermal iwlwifi efivarfs nvme nvme_core
<4>[  574.475662] CR2: 
<4>[  574.475668] ---[ end trace  ]---

Therefore, provide it to the program by setting rxq properly.

Fixes: cb261b594b41 ("bpf: Run devmap xdp_prog on flush instead of bulk 
enqueue")
Reviewed-by: Toke Høiland-Jørgensen 
Signed-off-by: Florian Kauer 
---
 kernel/bpf/devmap.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 9e0e3b0a18e4..7878be18e9d2 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -333,9 +333,11 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, 
void *key,
 
 static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog,
struct xdp_frame **frames, int n,
-   struct net_device *dev)
+   struct net_device *tx_dev,
+   struct net_device *rx_dev)
 {
-   struct xdp_txq_info txq =

[PATCH net v3 2/2] bpf: selftests: send packet to devmap redirect XDP

2024-09-09 Thread Florian Kauer
The current xdp_devmap_attach test attaches a program
that redirects to another program via devmap.

It is, however, never executed, so do that to catch
any bugs that might occur during execution.

Also, execute the same for a veth pair so that we
also cover the non-generic path.

Warning: Running this without the bugfix in this series
will likely crash your system.

Signed-off-by: Florian Kauer 
---
 .../selftests/bpf/prog_tests/xdp_devmap_attach.c   | 114 +++--
 1 file changed, 108 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c 
b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
index ce6812558287..3da45f719736 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
@@ -1,6 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
+#include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include "test_xdp_devmap_helpers.skel.h"
@@ -17,7 +20,7 @@ static void test_xdp_with_devmap_helpers(void)
.ifindex = IFINDEX_LO,
};
__u32 len = sizeof(info);
-   int err, dm_fd, map_fd;
+   int err, dm_fd, dm_fd_redir, map_fd;
__u32 idx = 0;
 
 
@@ -25,14 +28,11 @@ static void test_xdp_with_devmap_helpers(void)
if (!ASSERT_OK_PTR(skel, "test_xdp_with_devmap_helpers__open_and_load"))
return;
 
-   dm_fd = bpf_program__fd(skel->progs.xdp_redir_prog);
-   err = bpf_xdp_attach(IFINDEX_LO, dm_fd, XDP_FLAGS_SKB_MODE, NULL);
+   dm_fd_redir = bpf_program__fd(skel->progs.xdp_redir_prog);
+   err = bpf_xdp_attach(IFINDEX_LO, dm_fd_redir, XDP_FLAGS_SKB_MODE, NULL);
if (!ASSERT_OK(err, "Generic attach of program with 8-byte devmap"))
goto out_close;
 
-   err = bpf_xdp_detach(IFINDEX_LO, XDP_FLAGS_SKB_MODE, NULL);
-   ASSERT_OK(err, "XDP program detach");
-
dm_fd = bpf_program__fd(skel->progs.xdp_dummy_dm);
map_fd = bpf_map__fd(skel->maps.dm_ports);
err = bpf_prog_get_info_by_fd(dm_fd, &info, &len);
@@ -47,6 +47,23 @@ static void test_xdp_with_devmap_helpers(void)
ASSERT_OK(err, "Read devmap entry");
ASSERT_EQ(info.id, val.bpf_prog.id, "Match program id to devmap entry 
prog_id");
 
+   /* send a packet to trigger any potential bugs in there */
+   char data[10] = {};
+   DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
+   .data_in = &data,
+   .data_size_in = 10,
+   .flags = BPF_F_TEST_XDP_LIVE_FRAMES,
+   .repeat = 1,
+   );
+   err = bpf_prog_test_run_opts(dm_fd_redir, &opts);
+   ASSERT_OK(err, "XDP test run");
+
+   /* wait for the packets to be flushed */
+   kern_sync_rcu();
+
+   err = bpf_xdp_detach(IFINDEX_LO, XDP_FLAGS_SKB_MODE, NULL);
+   ASSERT_OK(err, "XDP program detach");
+
/* can not attach BPF_XDP_DEVMAP program to a device */
err = bpf_xdp_attach(IFINDEX_LO, dm_fd, XDP_FLAGS_SKB_MODE, NULL);
if (!ASSERT_NEQ(err, 0, "Attach of BPF_XDP_DEVMAP program"))
@@ -124,6 +141,88 @@ static void test_xdp_with_devmap_frags_helpers(void)
test_xdp_with_devmap_frags_helpers__destroy(skel);
 }
 
+static void test_xdp_with_devmap_helpers_veth(void)
+{
+   struct test_xdp_with_devmap_helpers *skel = NULL;
+   struct bpf_prog_info info = {};
+   struct bpf_devmap_val val = {};
+   struct nstoken *nstoken = NULL;
+   __u32 len = sizeof(info);
+   int err, dm_fd, dm_fd_redir, map_fd, ifindex_dst;
+   __u32 idx = 0;
+
+   SYS(out_close, "ip netns add testns");
+   nstoken = open_netns("testns");
+   if (!ASSERT_OK_PTR(nstoken, "setns"))
+   goto out_close;
+
+   SYS(out_close, "ip link add veth_src type veth peer name veth_dst");
+   SYS(out_close, "ip link set dev veth_src up");
+   SYS(out_close, "ip link set dev veth_dst up");
+
+   val.ifindex = if_nametoindex("veth_src");
+   ifindex_dst = if_nametoindex("veth_dst");
+   if (!ASSERT_NEQ(val.ifindex, 0, "val.ifindex") ||
+   !ASSERT_NEQ(ifindex_dst, 0, "ifindex_dst"))
+   goto out_close;
+
+   skel = test_xdp_with_devmap_helpers__open_and_load();
+   if (!ASSERT_OK_PTR(skel, "test_xdp_with_devmap_helpers__open_and_load"))
+   return;
+
+   dm_fd_redir = bpf_program__fd(skel->progs.xdp_redir_prog);
+   err = bpf_xdp_attach(val.ifindex, dm_fd_redir, XDP_FLAGS_DRV_MODE, 
NULL);
+   if (!ASSERT_OK(err, "Attach of program with 8-byte devmap"))
+   goto out_close;
+
+   dm_fd = bpf_program__fd(skel->progs.xdp_dummy_dm);
+   map_fd = bpf_map__fd(skel->maps.dm_ports);
+   err = bpf_prog_get_info_by_fd(dm_fd, &info, &len);
+   if (!ASSERT_OK(err, "bpf_prog_get_info_by_fd"))
+   goto out_close;
+
+   val.bpf_prog.fd = dm_fd;
+   err

Re: [PATCH] ieee802154: Fix build error

2024-09-09 Thread Stefan Schmidt
Hello Jinjie.

On Mon, 09 Sep 2024 21:17:40 +0800, Jinjie Ruan wrote:
> If REGMAP_SPI is m and IEEE802154_MCR20A is y,
> 
>   mcr20a.c:(.text+0x3ed6c5b): undefined reference to 
> `__devm_regmap_init_spi'
>   ld: mcr20a.c:(.text+0x3ed6cb5): undefined reference to 
> `__devm_regmap_init_spi'
> 
> Select REGMAP_SPI for IEEE802154_MCR20A to fix it.
> 
> [...]

Applied, thanks!

[1/1] ieee802154: Fix build error
  commit: addf89774e48c992316449ffab4f29c2309ebefb

Best regards,
Stefan Schmidt 



[PATCH bpf-next] selftests/bpf: convert test_xdp_features.sh to test_progs

2024-09-09 Thread eBPF Foundation
test_xdp_features.sh is a shell script allowing to test that xdp features
advertised by an interface are indeed delivered. The test works by starting
two instance of the same program, both attaching specific xdp programs to
each side of a veth link, and then make those programs manage packets and
collect stats to check whether tested XDP feature is indeed delivered or
not. However this test is not integrated in test_progs framework and so can
not run automatically in CI.

Rewrite test_xdp_features to integrate it in test_progs so it can run
automatically in CI. The main changes brought by the rewrite are the
following:
- instead of running to separated processes (each one managing either the
  tester veth or the DUT vet), run a single process
- slightly change testing direction (v0 is the tester in local namespace,
  v1 is the Device Under Test in remote namespace)
- group all tests previously managed by test_xdp_features as subtests (one
  per tested XDP feature). As a consequence, run only once some steps
  instead of once per subtest (eg: starting/stopping the udp server). On
  the contrary, make sure that each subtest properly cleans up its state
  (ie detach xdp programs, reset test stats, etc)
- since there is now a single process, get rid of the "control" tcp channel
  used to configure DUT. Configuring the DUT now only consists in switching
  to DUT network namespace and run the relevant commands
- since there is no more control channel, get rid of TLVs, keep only the
  CMD_ECHO packet type, and set it as a magic
- simplify network setup: use only ipv6 instead of both ipv4 and ipv6,
  force static neighbours instead of waiting for autoconfiguration, do not
  force gro (fetch xdp features only once xdp programs are loaded instead)

The existing XDP programs are reused, with some minor changes:
- tester and dut stats maps are converted to global variables for easier
  usage
- programs do not use TLV struct anymore but the magic replacing the echo
  command
- avoid to accidentally make tests pass: drop packets instead of forwarding
  them to userspace when they do not match the expected payload

Signed-off-by: Alexis Lothoré (eBPF Foundation) 
---
The xdp_features rewrite has been tested in a x86_64 qemu environment on my
machine and in CI. In my environment, the test takes a bit less than 2s to
execute.

  # ./test_progs -a xdp_features
  #561/1   xdp_features/XDP_PASS:OK
  #561/2   xdp_features/XDP_DROP:OK
  #561/3   xdp_features/XDP_ABORTED:OK
  #561/4   xdp_features/XDP_TX:OK
  #561/5   xdp_features/XDP_REDIRECT:OK
  #561/6   xdp_features/XDP_NDO_XMIT:OK
  #561 xdp_features:OK
  Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED
---
 tools/testing/selftests/bpf/.gitignore |   1 -
 tools/testing/selftests/bpf/Makefile   |  10 +-
 .../selftests/bpf/prog_tests/xdp_features.c| 446 +
 tools/testing/selftests/bpf/progs/xdp_features.c   |  49 +-
 tools/testing/selftests/bpf/test_xdp_features.sh   | 107 ---
 tools/testing/selftests/bpf/xdp_features.c | 718 -
 tools/testing/selftests/bpf/xdp_features.h |  17 +-
 7 files changed, 462 insertions(+), 886 deletions(-)

diff --git a/tools/testing/selftests/bpf/.gitignore 
b/tools/testing/selftests/bpf/.gitignore
index e6533b3400de..93bf35213042 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -48,4 +48,3 @@ xskxceiver
 xdp_redirect_multi
 xdp_synproxy
 xdp_hw_metadata
-xdp_features
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 04716a5e43f1..db4a802c3e06 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -145,8 +145,7 @@ TEST_PROGS := test_kmod.sh \
test_bpftool.sh \
test_bpftool_metadata.sh \
test_doc_build.sh \
-   test_xsk.sh \
-   test_xdp_features.sh
+   test_xsk.sh
 
 TEST_PROGS_EXTENDED := with_addr.sh \
with_tunnels.sh ima_setup.sh verify_sig_setup.sh \
@@ -157,7 +156,7 @@ TEST_GEN_PROGS_EXTENDED = \
flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \
xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \
-   xdp_features bpf_test_no_cfi.ko
+   bpf_test_no_cfi.ko
 
 TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi
 
@@ -519,7 +518,6 @@ test_subskeleton_lib.skel.h-deps := 
test_subskeleton_lib2.bpf.o test_subskeleton
 test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o
 xsk_xdp_progs.skel.h-deps := xsk_xdp_progs.bpf.o
 xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o
-xdp_features.skel.h-deps := xdp_features.bpf.o
 
 LINKED_BPF_OBJS := $(foreach skel,$(LINKED_SKELS),$($(skel)-deps))
 LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(LINKED_BPF_OBJS))
@@ -787,10 +785,6 @@ $(OUTPUT)/xdp_hw_metadata: xdp_hw_metadata.c 
$(OUTPUT

[PATCH] mailbox, remoteproc: omap2+: fix compile testing

2024-09-09 Thread Arnd Bergmann
From: Arnd Bergmann 

Selecting CONFIG_OMAP2PLUS_MBOX while compile testing
causes a build failure:

WARNING: unmet direct dependencies detected for OMAP2PLUS_MBOX
  Depends on [n]: MAILBOX [=y] && (ARCH_OMAP2PLUS || ARCH_K3)
  Selected by [m]:
  - TI_K3_M4_REMOTEPROC [=m] && REMOTEPROC [=y] && (ARCH_K3 || COMPILE_TEST 
[=y])

Using 'select' to force-enable another subsystem is generally
a mistake and causes problems such as this one, so change the
three drivers that link against this driver to use 'depends on'
instead, and ensure the driver itself can be compile tested
regardless of the platform.

When compile-testing without CONFIG_TI_SCI_PROTOCOL=m, there
is a chance for a link failure, so add a careful dependency
on that.

arm-linux-gnueabi-ld: drivers/remoteproc/ti_k3_m4_remoteproc.o: in function 
`k3_m4_rproc_probe':
ti_k3_m4_remoteproc.c:(.text.k3_m4_rproc_probe+0x76): undefined reference to 
`devm_ti_sci_get_by_phandle'

Fixes: ebcf9008a895 ("remoteproc: k3-m4: Add a remoteproc driver for M4F 
subsystem")
Signed-off-by: Arnd Bergmann 
---
 drivers/mailbox/Kconfig|  2 +-
 drivers/mailbox/omap-mailbox.c |  2 +-
 drivers/remoteproc/Kconfig | 10 --
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index 4eed97295927..ecaf78beb934 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -73,7 +73,7 @@ config ARMADA_37XX_RWTM_MBOX
 
 config OMAP2PLUS_MBOX
tristate "OMAP2+ Mailbox framework support"
-   depends on ARCH_OMAP2PLUS || ARCH_K3
+   depends on ARCH_OMAP2PLUS || ARCH_K3 || COMPILE_TEST
help
  Mailbox implementation for OMAP family chips with hardware for
  interprocessor communication involving DSP, IVA1.0 and IVA2 in
diff --git a/drivers/mailbox/omap-mailbox.c b/drivers/mailbox/omap-mailbox.c
index 7a87424657a1..6797770474a5 100644
--- a/drivers/mailbox/omap-mailbox.c
+++ b/drivers/mailbox/omap-mailbox.c
@@ -603,7 +603,7 @@ static struct platform_driver omap_mbox_driver = {
.driver = {
.name = "omap-mailbox",
.pm = &omap_mbox_pm_ops,
-   .of_match_table = of_match_ptr(omap_mailbox_of_match),
+   .of_match_table = omap_mailbox_of_match,
},
 };
 module_platform_driver(omap_mbox_driver);
diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 0f0862e20a93..62f8548fb46a 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -330,8 +330,7 @@ config STM32_RPROC
 config TI_K3_DSP_REMOTEPROC
tristate "TI K3 DSP remoteproc support"
depends on ARCH_K3
-   select MAILBOX
-   select OMAP2PLUS_MBOX
+   depends on OMAP2PLUS_MBOX
help
  Say m here to support TI's C66x and C71x DSP remote processor
  subsystems on various TI K3 family of SoCs through the remote
@@ -343,8 +342,8 @@ config TI_K3_DSP_REMOTEPROC
 config TI_K3_M4_REMOTEPROC
tristate "TI K3 M4 remoteproc support"
depends on ARCH_K3 || COMPILE_TEST
-   select MAILBOX
-   select OMAP2PLUS_MBOX
+   depends on TI_SCI_PROTOCOL || (COMPILE_TEST && TI_SCI_PROTOCOL=n)
+   depends on OMAP2PLUS_MBOX
help
  Say m here to support TI's M4 remote processor subsystems
  on various TI K3 family of SoCs through the remote processor
@@ -356,8 +355,7 @@ config TI_K3_M4_REMOTEPROC
 config TI_K3_R5_REMOTEPROC
tristate "TI K3 R5 remoteproc support"
depends on ARCH_K3
-   select MAILBOX
-   select OMAP2PLUS_MBOX
+   depends on OMAP2PLUS_MBOX
help
  Say m here to support TI's R5F remote processor subsystems
  on various TI K3 family of SoCs through the remote processor
-- 
2.39.2




Re: [PATCH v2 1/1] lib/llist_kunit.c: add KUnit tests for llist

2024-09-09 Thread Artur Alves Cavalcante de Barros

On 9/5/24 5:51 PM, Rae Moar wrote:

On Tue, Sep 3, 2024 at 5:40 PM Artur Alves  wrote:


Add KUnit tests for the llist data structure. They test the vast
majority of methods and macros defined in include/linux/llist.h.

These are inspired by the existing tests for the 'list' doubly
linked in lib/list-test.c [1]. Each test case (llist_test_x)
tests the behaviour of the llist function/macro 'x'.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/list-test.c?h=v6.11-rc6

Signed-off-by: Artur Alves 


Hello!

Thanks for creating this new test! It looks really good and is passing
all the tests.

My main comment is that this patch is causing some checkpatch warnings:

   WARNING: Prefer a maximum 75 chars per line (possible unwrapped
commit description?)
   #13:
   [1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/list-test.c?h=v6.11-rc6

It's probably fine to ignore this warning as it is a link. But I might
remove the link because it is not absolutely necessary in this case.

   WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
   #58:
   new file mode 100644

   ERROR: that open brace { should be on the previous line
   #306: FILE: lib/llist_kunit.c:249:
   +static void llist_test_for_each_safe(struct kunit *test)
   +{

   ERROR: that open brace { should be on the previous line
   #325: FILE: lib/llist_kunit.c:268:
   +static void llist_test_for_each_entry(struct kunit *test)
   +{

   ERROR: that open brace { should be on the previous line
   #346: FILE: lib/llist_kunit.c:289:
   +static void llist_test_for_each_entry_safe(struct kunit *test)
   +{

These checkpatch errors are mistaken since the open brace should be
where it is. I believe this is getting picked up because of the
"for_each" in the test name. This was fixed for me by rewriting the
test names: from "llist_test_for_each_safe" -> to
"llist_test_safe_for_each", and so on for the other tests with errors.
Sorry it's a pain to change this but I think it is a better fix than
changing the checkpatch script.


---
  lib/Kconfig.debug   |  11 ++
  lib/tests/Makefile  |   1 +
  lib/tests/llist_kunit.c | 361 
  3 files changed, 373 insertions(+)
  create mode 100644 lib/tests/llist_kunit.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a30c03a66172..b2725daccc52 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2813,6 +2813,17 @@ config USERCOPY_KUNIT_TEST
   on the copy_to/from_user infrastructure, making sure basic
   user/kernel boundary testing is working.

+config LLIST_KUNIT_TEST
+   tristate "KUnit tests for lib/llist" if !KUNIT_ALL_TESTS
+   depends on KUNIT
+   default KUNIT_ALL_TESTS
+   help
+ This option builds the "llist_kunit" test module that
+ helps to verify the correctness of the functions and
+ macros defined in ().


Also, I think I would prefer if this description was a bit tweaked.
Saying it builds the "module" is confusing since this test might be
run built-in instead. So maybe something more similar to :

This builds the llist (lock-less list) KUnit test suite. It tests the
API and basic functionality of the macros and functions defined in
.


+
+ If unsure, say N.
+
  config TEST_UDELAY
 tristate "udelay test driver"
 help
diff --git a/lib/tests/Makefile b/lib/tests/Makefile
index c6a14cc8663e..8d7c40a73110 100644
--- a/lib/tests/Makefile
+++ b/lib/tests/Makefile
@@ -34,4 +34,5 @@ CFLAGS_stackinit_kunit.o += $(call cc-disable-warning, 
switch-unreachable)
  obj-$(CONFIG_STACKINIT_KUNIT_TEST) += stackinit_kunit.o
  obj-$(CONFIG_STRING_KUNIT_TEST) += string_kunit.o
  obj-$(CONFIG_STRING_HELPERS_KUNIT_TEST) += string_helpers_kunit.o
+obj-$(CONFIG_LLIST_KUNIT_TEST) += llist_kunit.o

diff --git a/lib/tests/llist_kunit.c b/lib/tests/llist_kunit.c
new file mode 100644
index ..f273c0d175c7
--- /dev/null
+++ b/lib/tests/llist_kunit.c
@@ -0,0 +1,361 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit test for the Kernel lock-less linked-list structure.
+ *
+ * Author: Artur Alves 
+ */
+
+#include 
+#include 
+
+#define ENTRIES_SIZE 3
+
+struct llist_test_struct {
+   int data;
+   struct llist_node node;
+};
+
+static void llist_test_init_llist(struct kunit *test)
+{
+   /* test if the llist is correctly initialized */
+   struct llist_head llist1 = LLIST_HEAD_INIT(llist1);
+   LLIST_HEAD(llist2);
+   struct llist_head llist3, *llist4, *llist5;
+
+   KUNIT_EXPECT_TRUE(test, llist_empty(&llist1));
+
+   KUNIT_EXPECT_TRUE(test, llist_empty(&llist2));
+
+   init_llist_head(&llist3);
+   KUNIT_EXPECT_TRUE(test, llist_empty(&llist3));
+
+   llist4 = kzalloc(sizeof(*llist4), GFP_KERNEL | __GFP_NOFAIL);
+   init_llist_head(llist4);
+   KUNIT_EXPECT_TRUE(test, llist_empty(llist4));
+   kfree(llist4);
+
+   llist5 = kmalloc(sizeof(*llist5), GFP_K

Re: [PATCH bpf-next] selftests/bpf: convert test_xdp_features.sh to test_progs

2024-09-09 Thread Maxime Chevallier
Hi Alexis,

On Mon, 09 Sep 2024 22:02:07 +0200
Alexis Lothoré (eBPF Foundation)  wrote:

> test_xdp_features.sh is a shell script allowing to test that xdp features
> advertised by an interface are indeed delivered. The test works by starting
> two instance of the same program, both attaching specific xdp programs to
> each side of a veth link, and then make those programs manage packets and
> collect stats to check whether tested XDP feature is indeed delivered or
> not. However this test is not integrated in test_progs framework and so can
> not run automatically in CI.
> 
> Rewrite test_xdp_features to integrate it in test_progs so it can run
> automatically in CI. The main changes brought by the rewrite are the
> following:
> - instead of running to separated processes (each one managing either the
>   tester veth or the DUT vet), run a single process
> - slightly change testing direction (v0 is the tester in local namespace,
>   v1 is the Device Under Test in remote namespace)
> - group all tests previously managed by test_xdp_features as subtests (one
>   per tested XDP feature). As a consequence, run only once some steps
>   instead of once per subtest (eg: starting/stopping the udp server). On
>   the contrary, make sure that each subtest properly cleans up its state
>   (ie detach xdp programs, reset test stats, etc)
> - since there is now a single process, get rid of the "control" tcp channel
>   used to configure DUT. Configuring the DUT now only consists in switching
>   to DUT network namespace and run the relevant commands
> - since there is no more control channel, get rid of TLVs, keep only the
>   CMD_ECHO packet type, and set it as a magic
> - simplify network setup: use only ipv6 instead of both ipv4 and ipv6,
>   force static neighbours instead of waiting for autoconfiguration, do not
>   force gro (fetch xdp features only once xdp programs are loaded instead)
> 
> The existing XDP programs are reused, with some minor changes:
> - tester and dut stats maps are converted to global variables for easier
>   usage
> - programs do not use TLV struct anymore but the magic replacing the echo
>   command
> - avoid to accidentally make tests pass: drop packets instead of forwarding
>   them to userspace when they do not match the expected payload
> 
> Signed-off-by: Alexis Lothoré (eBPF Foundation) 

I'm far for having reviewed the whole patch, but I do have one comment
below :)

[...]

> +static void *run_dut_echo_thread(void *arg)
> +{
> + struct test_data *t = (struct test_data *)arg;
> + __u32 magic;
> +
> + while (!t->quit_dut_echo_thread) {
> + struct sockaddr_storage addr;
> + socklen_t addrlen;
> + size_t n;
> +
> + n = recvfrom(t->echo_server_sock, &magic, sizeof(magic),
> +  MSG_WAITALL, (struct sockaddr *)&addr, &addrlen);
> + if (n != sizeof(magic)) {
> + usleep(LOOP_DELAY_US);
> + continue;
> + }
> +
> + if (htonl(magic) != CMD_ECHO)
> + continue;

Shouldn't it be ntohl here ? The former code used the ntohs helper for
that command, and you're sending the magic in send_echo_msg with a
htonl() so I guess here you might want to convert the value back to
host endianness.

Thanks,

Maxime



Re: [RFC 00/31] objtool, livepatch: Livepatch module generation

2024-09-09 Thread Josh Poimboeuf
On Sat, Sep 07, 2024 at 10:04:25PM -0700, Song Liu wrote:
> I think gcc doesn't complain, but clang does:
> 
> $ cat ttt.c
> static inline void ret(void)
> {
>   return;
> }
> 
> int main(void)
> {
>   return 0;
> }

Ah...  That's probably why the kernel adds "__maybe_unused" to its
inline macro (which the tools don't have).

Does this fix?

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index d1740a724eb2..74fec9f97339 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -100,7 +100,7 @@ static inline unsigned long __sym_last(struct symbol *s)
 }
 
 INTERVAL_TREE_DEFINE(struct symbol, node, unsigned long, __subtree_last,
-__sym_start, __sym_last, static inline,
+__sym_start, __sym_last, static inline __maybe_unused,
 __sym)
 
 #define __sym_for_each(_iter, _tree, _start, _end) \



Re: [PATCH] module: Refine kmemleak scanned areas

2024-09-09 Thread Luis Chamberlain
On Mon, Sep 09, 2024 at 11:23:56AM +0100, Vincent Donnefort wrote:
> On Mon, Sep 09, 2024 at 09:52:57AM +0100, Catalin Marinas wrote:
> > On Mon, Sep 09, 2024 at 08:40:34AM +0100, Vincent Donnefort wrote:
> > > On Sat, Sep 07, 2024 at 03:12:13PM +0100, Catalin Marinas wrote:
> > > > On Fri, Sep 06, 2024 at 04:38:56PM +0100, Vincent Donnefort wrote:
> > > > > commit ac3b43283923 ("module: replace module_layout with 
> > > > > module_memory")
> > > > > introduced a set of memory regions for the module layout sharing the
> > > > > same attributes but didn't update the kmemleak scanned areas which
> > > > > intended to limit kmemleak scan to sections containing writable data.
> > > > > This means sections such as .text and .rodata are scanned by kmemleak.
> > > > > 
> > > > > Refine the scanned areas for modules by limiting it to MOD_TEXT and
> > > > > MOD_INIT_TEXT mod_mem regions.
> > > > > 
> > > > > CC: Song Liu 
> > > > > CC: Catalin Marinas 
> > > > > Signed-off-by: Vincent Donnefort 
> > > > > 
> > > > > diff --git a/kernel/module/debug_kmemleak.c 
> > > > > b/kernel/module/debug_kmemleak.c
> > > > > index 12a569d361e8..b4cc03842d70 100644
> > > > > --- a/kernel/module/debug_kmemleak.c
> > > > > +++ b/kernel/module/debug_kmemleak.c
> > > > > @@ -12,19 +12,9 @@
> > > > >  void kmemleak_load_module(const struct module *mod,
> > > > > const struct load_info *info)
> > > > >  {
> > > > > - unsigned int i;
> > > > > -
> > > > > - /* only scan the sections containing data */
> > > > > - kmemleak_scan_area(mod, sizeof(struct module), GFP_KERNEL);
> > > > > -
> > > > > - for (i = 1; i < info->hdr->e_shnum; i++) {
> > > > > - /* Scan all writable sections that's not executable */
> > > > > - if (!(info->sechdrs[i].sh_flags & SHF_ALLOC) ||
> > > > > - !(info->sechdrs[i].sh_flags & SHF_WRITE) ||
> > > > > - (info->sechdrs[i].sh_flags & SHF_EXECINSTR))
> > > > > - continue;
> > > > > -
> > > > > - kmemleak_scan_area((void *)info->sechdrs[i].sh_addr,
> > > > > -info->sechdrs[i].sh_size, 
> > > > > GFP_KERNEL);
> > > > > + /* only scan writable, non-executable sections */
> > > > > + for_each_mod_mem_type(type) {
> > > > > + if (type != MOD_DATA && type != MOD_INIT_DATA)
> > > > > + kmemleak_no_scan(mod->mem[type].base);
> > > > >   }
> > > > >  }
> > > > 
> > > > I lost track of how module memory allocation works. Is struct module
> > > > still scanned after this change?
> > > 
> > > That section being RW, it will be part of the MOD_DATA vmalloc and 
> > > scanned.
> > 
> > Ah, makes sense. I'm fine with this patch, it simplifies the code now
> > that we have mod->mem[type]. I wouldn't say it's a fix, though no
> > backporting needed.
> 
> Agreed, it's "fixing" because it was scanning unecessary regions, but it's not
> worth any backport.

Please send a v2.

  Luis



Re: [RFC 00/31] objtool, livepatch: Livepatch module generation

2024-09-09 Thread Song Liu
On Mon, Sep 9, 2024 at 2:19 PM Josh Poimboeuf  wrote:
>
> On Sat, Sep 07, 2024 at 10:04:25PM -0700, Song Liu wrote:
> > I think gcc doesn't complain, but clang does:
> >
> > $ cat ttt.c
> > static inline void ret(void)
> > {
> >   return;
> > }
> >
> > int main(void)
> > {
> >   return 0;
> > }
>
> Ah...  That's probably why the kernel adds "__maybe_unused" to its
> inline macro (which the tools don't have).
>
> Does this fix?

Yes! It fixes this problem.

Thanks,
Song



Re: [PATCH net] selftests: net: csum: Fix checksums for packets with non-zero padding

2024-09-09 Thread Jakub Kicinski
On Mon, 09 Sep 2024 13:26:42 -0400 Willem de Bruijn wrote:
> > This seems to be a bug in the driver.
> > 
> > A call to skb_put_padto(skb, ETH_ZLEN) should be added.  
> 
> In which case this test detecting it may be nice to have, for lack of
> a more targeted test.

IIUC we're basically saying that we don't need to trim because pad
should be 0? In that case maybe let's keep the patch but add a check 
on top which scans the pad for non-zero bytes, and print an informative
warning?



Re: [PATCH] virtio_pmem: Add freeze/restore callbacks

2024-09-09 Thread Philip Chen
Hi

On Wed, Sep 4, 2024 at 10:54 AM Ira Weiny  wrote:
>
> Philip Chen wrote:
> > Hi maintainers,
> >
> > Can anyone let me know if this patch makes sense?
> > Any comment/feedback is appreciated.
> > Thanks in advance!
>
> I'm not an expert on virtio but the code looks ok on the surface.  I've
> discussed this with Dan a bit and virtio-pmem is not heavily tested.

Thanks for your comments.
I think this specific patch is not heavily involved with virtio spec details.
This patch simply provides the basic freeze/restore PM callbacks for
virtio_pmem, like people already did for the other virtio drivers.

>
> Based on our discussion [1] I wonder if there is a way we can recreate this
> with QEMU to incorporate into our testing?

Yes, these are how I test on crosvm, but I believe the same steps can
be applied to QEMU:
(1) Set pm_test_level to TEST_PLATFORM (build time change)
(2) Write something to pmem
(3) Make the device go through a freeze/restore cycle by writing
`disk` to `/sys/power/state`
(4) Validate the data written to pmem in (2) is still preserved

Note:
(a) The freeze/restore PM routines are sometimes called as the backup
for suspend/resume PM routines in a suspend/resume cycle.
In this case, we can also test freeze/restore PM routines with
suspend/resume: i.e. skip (1) and write `mem` to `sys/power/state` in
(3).
(b) I also tried to set up QEMU for testing. But QEMU crashes when I
try to freeze the device even without applying this patch.
Since the issue seems to be irrelevant to pmem and most likely a QEMU
setup problem on my end, I didn't spend more time enabling QEMU.



>
> Ira
>
> [1] 
> https://lore.kernel.org/lkml/ca+cxxhnb2i5o7_biofklth5zwp5t62d6f6c39vnut53cuku...@mail.gmail.com/
>
> >
> > On Wed, Aug 14, 2024 at 5:46 PM Philip Chen  wrote:
> > >
> > > Add basic freeze/restore PM callbacks to support hibernation (S4):
> > > - On freeze, delete vq and quiesce the device to prepare for
> > >   snapshotting.
> > > - On restore, re-init vq and mark DRIVER_OK.
> > >
> > > Signed-off-by: Philip Chen 
> > > ---
> > >  drivers/nvdimm/virtio_pmem.c | 24 
> > >  1 file changed, 24 insertions(+)
> > >
> > > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > > index c9b97aeabf85..2396d19ce549 100644
> > > --- a/drivers/nvdimm/virtio_pmem.c
> > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > @@ -143,6 +143,28 @@ static void virtio_pmem_remove(struct virtio_device 
> > > *vdev)
> > > virtio_reset_device(vdev);
> > >  }
> > >
> > > +static int virtio_pmem_freeze(struct virtio_device *vdev)
> > > +{
> > > +   vdev->config->del_vqs(vdev);
> > > +   virtio_reset_device(vdev);
> > > +
> > > +   return 0;
> > > +}
> > > +
> > > +static int virtio_pmem_restore(struct virtio_device *vdev)
> > > +{
> > > +   int ret;
> > > +
> > > +   ret = init_vq(vdev->priv);
> > > +   if (ret) {
> > > +   dev_err(&vdev->dev, "failed to initialize virtio pmem's 
> > > vq\n");
> > > +   return ret;
> > > +   }
> > > +   virtio_device_ready(vdev);
> > > +
> > > +   return 0;
> > > +}
> > > +
> > >  static unsigned int features[] = {
> > > VIRTIO_PMEM_F_SHMEM_REGION,
> > >  };
> > > @@ -155,6 +177,8 @@ static struct virtio_driver virtio_pmem_driver = {
> > > .validate   = virtio_pmem_validate,
> > > .probe  = virtio_pmem_probe,
> > > .remove = virtio_pmem_remove,
> > > +   .freeze = virtio_pmem_freeze,
> > > +   .restore= virtio_pmem_restore,
> > >  };
> > >
> > >  module_virtio_driver(virtio_pmem_driver);
> > > --
> > > 2.46.0.76.ge559c4bf1a-goog
> > >
>
>



Re: [PATCH net-next v2 0/5] selftests: mptcp: add time per subtests in TAP output

2024-09-09 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Fri, 06 Sep 2024 20:46:06 +0200 you wrote:
> Patches here add 'time=ms' in the diagnostic data of the TAP output,
> e.g.
> 
>   ok 1 - pm_netlink: defaults addr list # time=9ms
> 
> This addition is useful to quickly identify which subtests are taking a
> longer time than the others, or more than expected.
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/5] selftests: mptcp: lib: add time per subtests in TAP output
https://git.kernel.org/netdev/net-next/c/f58817c852e9
  - [net-next,v2,2/5] selftests: mptcp: connect: remote time in TAP output
https://git.kernel.org/netdev/net-next/c/1a38cee4bbd0
  - [net-next,v2,3/5] selftests: mptcp: reset the last TS before the first test
https://git.kernel.org/netdev/net-next/c/d4e192728efc
  - [net-next,v2,4/5] selftests: mptcp: diag: remove trailing whitespace
https://git.kernel.org/netdev/net-next/c/a5b6be42aac0
  - [net-next,v2,5/5] selftests: mptcp: connect: remove duplicated spaces in 
TAP output
https://git.kernel.org/netdev/net-next/c/a92d1db0c989

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html





[PATCH bpf-next] bpf: ringbuf: Support consuming BPF_MAP_TYPE_RINGBUF from prog

2024-09-09 Thread Daniel Xu
Right now there exists prog produce / userspace consume and userspace
produce / prog consume support. But it is also useful to have prog
produce / prog consume.

For example, we want to track the latency overhead of cpumap in
production. Since we need to store enqueue timestamps somewhere and
cpumap is MPSC, we need an MPSC data structure to shadow cpumap. BPF
ringbuf is such a data structure. Rather than reimplement (possibly
poorly) a custom ringbuffer in BPF, extend the existing interface to
allow the final quadrant of ringbuf usecases to be filled. Note we
ignore userspace to userspace use case - there is no need to involve
kernel for that.

Signed-off-by: Daniel Xu 
---
 kernel/bpf/verifier.c |  6 +-
 tools/testing/selftests/bpf/Makefile  |  3 +-
 .../selftests/bpf/prog_tests/ringbuf.c| 50 +++
 .../bpf/progs/test_ringbuf_bpf_to_bpf.c   | 64 +++
 4 files changed, 120 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_bpf_to_bpf.c

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 53d0556fbbf3..56bfe559f228 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9142,7 +9142,8 @@ static int check_map_func_compatibility(struct 
bpf_verifier_env *env,
func_id != BPF_FUNC_ringbuf_query &&
func_id != BPF_FUNC_ringbuf_reserve_dynptr &&
func_id != BPF_FUNC_ringbuf_submit_dynptr &&
-   func_id != BPF_FUNC_ringbuf_discard_dynptr)
+   func_id != BPF_FUNC_ringbuf_discard_dynptr &&
+   func_id != BPF_FUNC_user_ringbuf_drain)
goto error;
break;
case BPF_MAP_TYPE_USER_RINGBUF:
@@ -9276,7 +9277,8 @@ static int check_map_func_compatibility(struct 
bpf_verifier_env *env,
goto error;
break;
case BPF_FUNC_user_ringbuf_drain:
-   if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF)
+   if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF &&
+   map->map_type != BPF_MAP_TYPE_RINGBUF)
goto error;
break;
case BPF_FUNC_get_stackid:
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 9905e3739dd0..233109843d4d 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -503,7 +503,8 @@ LINKED_SKELS := test_static_linked.skel.h 
linked_funcs.skel.h   \
 LSKELS := fentry_test.c fexit_test.c fexit_sleep.c atomics.c   \
trace_printk.c trace_vprintk.c map_ptr_kern.c   \
core_kern.c core_kern_overflow.c test_ringbuf.c \
-   test_ringbuf_n.c test_ringbuf_map_key.c test_ringbuf_write.c
+   test_ringbuf_n.c test_ringbuf_map_key.c test_ringbuf_write.c\
+   test_ringbuf_bpf_to_bpf.c
 
 # Generate both light skeleton and libbpf skeleton for these
 LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test.c \
diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c 
b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
index da430df45aa4..3e7955573ac5 100644
--- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c
+++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
@@ -17,6 +17,7 @@
 #include "test_ringbuf_n.lskel.h"
 #include "test_ringbuf_map_key.lskel.h"
 #include "test_ringbuf_write.lskel.h"
+#include "test_ringbuf_bpf_to_bpf.lskel.h"
 
 #define EDONE 
 
@@ -497,6 +498,53 @@ static void ringbuf_map_key_subtest(void)
test_ringbuf_map_key_lskel__destroy(skel_map_key);
 }
 
+static void ringbuf_bpf_to_bpf_subtest(void)
+{
+   struct test_ringbuf_bpf_to_bpf_lskel *skel;
+   int err, i;
+
+   skel = test_ringbuf_bpf_to_bpf_lskel__open();
+   if (!ASSERT_OK_PTR(skel, "test_ringbuf_bpf_to_bpf_lskel__open"))
+   return;
+
+   skel->maps.ringbuf.max_entries = getpagesize();
+   skel->bss->pid = getpid();
+
+   err = test_ringbuf_bpf_to_bpf_lskel__load(skel);
+   if (!ASSERT_OK(err, "test_ringbuf_bpf_to_bpf_lskel__load"))
+   goto cleanup;
+
+   ringbuf = ring_buffer__new(skel->maps.ringbuf.map_fd, NULL, NULL, NULL);
+   if (!ASSERT_OK_PTR(ringbuf, "ring_buffer__new"))
+   goto cleanup;
+
+   err = test_ringbuf_bpf_to_bpf_lskel__attach(skel);
+   if (!ASSERT_OK(err, "test_ringbuf_bpf_to_bpf_lskel__attach"))
+   goto cleanup_ringbuf;
+
+   /* Produce N_SAMPLES samples in the ring buffer by calling getpid() */
+   skel->bss->value = SAMPLE_VALUE;
+   for (i = 0; i < N_SAMPLES; i++)
+   syscall(__NR_getpgid);
+
+   /* Trigger bpf-side consumption */
+   syscall(__NR_prctl);
+
+   /* Check correct number samples were consumed */
+   if (!ASSERT_EQ(skel->bss->round_tripped, N_SAMPLES, "round_tripped"))
+ 

Re: [PATCH net] selftests: net: csum: Fix checksums for packets with non-zero padding

2024-09-09 Thread Willem de Bruijn
Jakub Kicinski wrote:
> On Mon, 09 Sep 2024 13:26:42 -0400 Willem de Bruijn wrote:
> > > This seems to be a bug in the driver.
> > > 
> > > A call to skb_put_padto(skb, ETH_ZLEN) should be added.  
> > 
> > In which case this test detecting it may be nice to have, for lack of
> > a more targeted test.
> 
> IIUC we're basically saying that we don't need to trim because pad
> should be 0? In that case maybe let's keep the patch but add a check 
> on top which scans the pad for non-zero bytes, and print an informative
> warning?

Data arriving with padding probably deserves a separate test.

We can use this csum test as stand-in, I suppose.

Is it safe to assume that all padding is wrong on ingress, not just
non-zero padding. The ip stack itself treats it as benign and trims
the trailing bytes silently.

I do know of legitimate cases of trailer data lifting along.



Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2024-09-09 Thread Yan Zhao
On Mon, Sep 09, 2024 at 03:24:40PM +0200, Paolo Bonzini wrote:
> While this is a fix for future kernels, it doesn't change the result for VMs
> already in existence.
Though this is the truth, I have concerns that there may be other guest drivers
with improper PAT configurations that were previously masked by KVM's force-WB
setting. Now that we respect the guest's PAT settings, these misconfigurations
could lead to degraded performance, potentially perceived as errors, as was
observed in the previous VMX unit test and the current Bochs scenario.

> I don't think there's an alternative to putting this behind a quirk.




Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize performance and improve readability

2024-09-09 Thread Leizhen (ThunderTown)



On 2024/9/10 2:41, Thomas Gleixner wrote:
> On Wed, Sep 04 2024 at 21:41, Zhen Lei wrote:
> 
>> Currently, there are multiple instances where several nodes are extracted
>> from one list and added to another list. One by one extraction, and then
>> one by one splicing, not only low efficiency, readability is also poor.
>> The work can be done well with hlist_cut_number() and hlist_splice_init(),
>> which move the entire sublist at once.
>>
>> When the number of nodes expected to be moved is less than or equal to 0,
>> or the source list is empty, hlist_cut_number() safely returns 0. The
>> splicing is performed only when the return value of hlist_cut_number() is
>> greater than 0.
>>
>> For two calls to hlist_cut_number() in __free_object(), the result is
>> obviously positive, the check of the return value is omitted.
> 
> Sure but hlist_cut_number() suffers from the same problem as the current
> code. If is a massive cache line chase as you actually have to walk the
> list to figure out where to cut it off.
> 
> All related functions have this problem and all of this code is very
> strict about boundaries. Instead of accurately doing the refill, purge
> etc. we should look into proper batch mode mechanisms. Let me think
> about it.

It may be helpful to add several arrays to record the first node of each batch
in each free list. Take 'percpu_pool' as an example:

struct debug_percpu_free {
struct hlist_head   free_objs;
int obj_free;
+   int batch_idx;
+   struct hlist_node   *batch_first[4]; // ODEBUG_POOL_PERCPU_SIZE / 
ODEBUG_BATCH_SIZE
};

A new free node is added to the header of the list, and the batch is cut from 
the tail
of the list.
  NodeA<-->...<-->NodeB<-->...<-->NodeC<-->NodeD<--> free_objs
|---one batch---|---one batch---|
|   |
batch_first[0]  batch_first[1]

__free_object():
//add obj into percpu_pool
obj_free++;
if (obj_free % ODEBUG_BATCH_SIZE == 0) {
idx = 0x3 & (batch_idx + (obj_free / ODEBUG_BATCH_SIZE) - 1);
//update batch_first[idx]
}

if (obj_free >= ODEBUG_POOL_PERCPU_SIZE) {
//move one batch
//cut batch at 'batch_idx' into obj_to_free (or obj_pool, if 
less than debug_objects_pool_min_level)
batch_idx++;
obj_free -= ODEBUG_BATCH_SIZE
}

> 
> Thanks,
> 
> tglx
> 
> 
> .
> 

-- 
Regards,
  Zhen Lei



[PATCH] list: test: Increasing coverage of list_test_list_replace*()

2024-09-09 Thread I Hsin Cheng
Increase the test coverage of list_test_list_replace*() by adding the
checks to compare the pointer of "a_new.next" and "a_new.prev" to make
sure a perfect circular doubly linked list is formed after the
replacement.

Signed-off-by: I Hsin Cheng 
---
 lib/list-test.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/list-test.c b/lib/list-test.c
index 37cbc33e9fdb..e207c4c98d70 100644
--- a/lib/list-test.c
+++ b/lib/list-test.c
@@ -102,6 +102,8 @@ static void list_test_list_replace(struct kunit *test)
/* now: [list] -> a_new -> b */
KUNIT_EXPECT_PTR_EQ(test, list.next, &a_new);
KUNIT_EXPECT_PTR_EQ(test, b.prev, &a_new);
+   KUNIT_EXPECT_PTR_EQ(test, a_new.next, &b);
+   KUNIT_EXPECT_PTR_EQ(test, a_new.prev, &list);
 }
 
 static void list_test_list_replace_init(struct kunit *test)
@@ -118,6 +120,8 @@ static void list_test_list_replace_init(struct kunit *test)
/* now: [list] -> a_new -> b */
KUNIT_EXPECT_PTR_EQ(test, list.next, &a_new);
KUNIT_EXPECT_PTR_EQ(test, b.prev, &a_new);
+   KUNIT_EXPECT_PTR_EQ(test, a_new.next, &b);
+   KUNIT_EXPECT_PTR_EQ(test, a_new.prev, &list);
 
/* check a_old is empty (initialized) */
KUNIT_EXPECT_TRUE(test, list_empty_careful(&a_old));
-- 
2.43.0




[PATCH] list: test: Mending tests for list_cut_position()

2024-09-09 Thread I Hsin Cheng
Mending test for list_cut_position*() for the missing check of integer
"i" after the second loop. The variable should be checked for second
time to make sure both lists after the cut operation are formed as
expected.

Signed-off-by: I Hsin Cheng 
---
 lib/list-test.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/list-test.c b/lib/list-test.c
index 37cbc33e9fdb..f59188fc2aca 100644
--- a/lib/list-test.c
+++ b/lib/list-test.c
@@ -404,10 +404,13 @@ static void list_test_list_cut_position(struct kunit 
*test)
 
KUNIT_EXPECT_EQ(test, i, 2);
 
+   i = 0;
list_for_each(cur, &list1) {
KUNIT_EXPECT_PTR_EQ(test, cur, &entries[i]);
i++;
}
+
+   KUNIT_EXPECT_EQ(test, i, 1);
 }
 
 static void list_test_list_cut_before(struct kunit *test)
@@ -432,10 +435,13 @@ static void list_test_list_cut_before(struct kunit *test)
 
KUNIT_EXPECT_EQ(test, i, 1);
 
+   i = 0;
list_for_each(cur, &list1) {
KUNIT_EXPECT_PTR_EQ(test, cur, &entries[i]);
i++;
}
+
+   KUNIT_EXPECT_EQ(i, 2);
 }
 
 static void list_test_list_splice(struct kunit *test)
-- 
2.43.0




[RESEND PATCH v2] list: test: Mending tests for list_cut_position()

2024-09-09 Thread I Hsin Cheng
Mending test for list_cut_position*() for the missing check of integer
"i" after the second loop. The variable should be checked for second
time to make sure both lists after the cut operation are formed as
expected.

Signed-off-by: I Hsin Cheng 
---
 lib/list-test.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/list-test.c b/lib/list-test.c
index 37cbc33e9fdb..8d1d47a9fe9e 100644
--- a/lib/list-test.c
+++ b/lib/list-test.c
@@ -404,10 +404,13 @@ static void list_test_list_cut_position(struct kunit 
*test)
 
KUNIT_EXPECT_EQ(test, i, 2);
 
+   i = 0;
list_for_each(cur, &list1) {
KUNIT_EXPECT_PTR_EQ(test, cur, &entries[i]);
i++;
}
+
+   KUNIT_EXPECT_EQ(test, i, 1);
 }
 
 static void list_test_list_cut_before(struct kunit *test)
@@ -432,10 +435,13 @@ static void list_test_list_cut_before(struct kunit *test)
 
KUNIT_EXPECT_EQ(test, i, 1);
 
+   i = 0;
list_for_each(cur, &list1) {
KUNIT_EXPECT_PTR_EQ(test, cur, &entries[i]);
i++;
}
+
+   KUNIT_EXPECT_EQ(test, i, 2);
 }
 
 static void list_test_list_splice(struct kunit *test)
-- 
2.43.0




Re: [RESEND PATCH v1 1/7] vhost: Add a new module_param for enable kthread

2024-09-09 Thread kernel test robot
Hi Cindy,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master v6.11-rc7 next-20240909]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Cindy-Lu/vhost-Add-kthread-support-in-function-vhost_worker_queue/20240909-110046
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:
https://lore.kernel.org/r/20240909020138.1245873-2-lulu%40redhat.com
patch subject: [RESEND PATCH v1 1/7] vhost: Add a new module_param for enable 
kthread
config: x86_64-randconfig-123-20240910 
(https://download.01.org/0day-ci/archive/20240910/202409101256.ewkwh1mw-...@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240910/202409101256.ewkwh1mw-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202409101256.ewkwh1mw-...@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/vhost/vhost.c:44:6: sparse: sparse: symbol 'enforce_kthread' was not 
>> declared. Should it be static?
   drivers/vhost/vhost.c:1898:54: sparse: sparse: self-comparison always 
evaluates to false
   drivers/vhost/vhost.c:1899:54: sparse: sparse: self-comparison always 
evaluates to false
   drivers/vhost/vhost.c:1900:55: sparse: sparse: self-comparison always 
evaluates to false
   drivers/vhost/vhost.c:2155:46: sparse: sparse: self-comparison always 
evaluates to false
   drivers/vhost/vhost.c:2235:48: sparse: sparse: self-comparison always 
evaluates to false

vim +/enforce_kthread +44 drivers/vhost/vhost.c

35  
36  static ushort max_mem_regions = 64;
37  module_param(max_mem_regions, ushort, 0444);
38  MODULE_PARM_DESC(max_mem_regions,
39  "Maximum number of memory regions in memory map. (default: 
64)");
40  static int max_iotlb_entries = 2048;
41  module_param(max_iotlb_entries, int, 0444);
42  MODULE_PARM_DESC(max_iotlb_entries,
43  "Maximum number of iotlb entries. (default: 2048)");
  > 44  bool enforce_kthread = true;
45  module_param(enforce_kthread, bool, 0444);
46  MODULE_PARM_DESC(enforce_kthread, "enable vhost to use kthread 
(default: Y)");
47  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: [PATCH 0/3] KVM: x86: Fix Intel PT Host/Guest mode when host tracing also

2024-09-09 Thread Zhenyu Wang

Add Mingwei.

On 2024.09.06 16:00:23 +0300, Adrian Hunter wrote:
> Hi
> 
> There is a long-standing problem whereby running Intel PT on host and guest
> in Host/Guest mode, causes VM-Entry failure.
> 
> The motivation for this patch set is to provide a fix for stable kernels
> prior to the advent of the "Mediated Passthrough vPMU" patch set:
> 
>   https://lore.kernel.org/kvm/20240801045907.4010984-1-mizh...@google.com/
> 
> which would render a large part of the fix unnecessary but likely not be
> suitable for backport to stable due to its size and complexity.
> 
> Ideally, this patch set would be applied before "Mediated Passthrough vPMU"
> 
> Note that the fix does not conflict with "Mediated Passthrough vPMU", it
> is just that "Mediated Passthrough vPMU" will make the code to stop and
> restart Intel PT unnecessary.
>

With mediated passthrough vPMU, we could enable PT pmu's 
PERF_PMU_CAP_PASSTHROUGH_VPMU
capability like core pmu, then perf guest helper could handle any host PT events
during VM entry/exit as well like core perf events. The benefit is general perf 
interface
to handle like others. But it would be sure to depend on passthrough vPMU be 
enabled,
not as this to fix in case w/o that.

I have local patches to work against passthrough vPMU, but like passthrough vPMU
to be settled down first. With Adrian's change, it could be also possible to 
decouple
the dependency, so may support in both cases with or w/o passthrough vPMU 
enabled. 

> 
> Adrian Hunter (3):
>   KVM: x86: Fix Intel PT IA32_RTIT_CTL MSR validation
>   KVM: x86: Fix Intel PT Host/Guest mode when host tracing also
>   KVM: selftests: Add guest Intel PT test
> 
>  arch/x86/events/intel/pt.c | 131 ++-
>  arch/x86/events/intel/pt.h |  10 +
>  arch/x86/include/asm/intel_pt.h|   4 +
>  arch/x86/kvm/vmx/vmx.c |  26 +-
>  arch/x86/kvm/vmx/vmx.h |   1 -
>  tools/testing/selftests/kvm/Makefile   |   1 +
>  .../selftests/kvm/include/x86_64/processor.h   |   1 +
>  tools/testing/selftests/kvm/x86_64/intel_pt.c  | 381 
> +
>  8 files changed, 532 insertions(+), 23 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/intel_pt.c
> 
> base-commit: d45aab436cf06544abeeffc607110f559a3af3b4
> 
> 
> Regards
> Adrian
> 


signature.asc
Description: PGP signature


Re: [PATCH v3 0/2] Properly initialize speed/duplex and remove vDPA config updates

2024-09-09 Thread Michael S. Tsirkin
On Wed, Sep 04, 2024 at 10:11:13AM -0500, Carlos Bilbao wrote:
> From: Carlos Bilbao 
> 
> Initialize speed and duplex for virtio_net_config to UNKNOWN (mlx5_vdpa
> vDPA devices currently do not support VIRTIO_NET_F_SPEED_DUPLEX). Remove
> ioctl VHOST_VDPA_SET_CONFIG and its related logic as it is not supported;
> see: https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html
> 
> Carlos:
>   vdpa/mlx5: Set speed and duplex of vDPA devices to UNKNOWN
>   vdpa: Remove ioctl VHOST_VDPA_SET_CONFIG per spec compliance

This will need a rebase. Will apply once you post one.
Thanks!

> ---
> 
> Changes since v1:
>  Link: https://lkml.org/lkml/2024/8/29/1368
>  - Fix prefix of the first commit and add Reviewed-By tag.
>  - Redo second commit completely: instead of attempting to add support to
>set configuration fields, remove ioctl and support entirely from vDPA
>implementations -- because it's not allowed by spec.
> 
> Changes since v2:
>  Link: https://lkml.org/lkml/2024/9/3/1407
>  - Fix first commit by changing 4 spaces for a tab.
>  - In second commit, ENI is legacy and should keep set_config(). Change it
>to set_config_legacy() to avoid future confusion and erroneous
>implementations.
> 
> ---
>  drivers/vdpa/alibaba/eni_vdpa.c|  2 +-
>  drivers/vdpa/ifcvf/ifcvf_main.c| 10 --
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 19 ---
>  drivers/vdpa/pds/vdpa_dev.c| 16 
>  drivers/vdpa/solidrun/snet_main.c  | 18 --
>  drivers/vdpa/vdpa.c| 16 
>  drivers/vdpa/vdpa_sim/vdpa_sim.c   | 16 
>  drivers/vdpa/vdpa_sim/vdpa_sim.h   |  1 -
>  drivers/vdpa/vdpa_user/vduse_dev.c |  7 ---
>  drivers/vdpa/virtio_pci/vp_vdpa.c  | 14 --
>  drivers/vhost/vdpa.c   | 26 --
>  drivers/virtio/virtio_vdpa.c   |  9 -
>  include/linux/vdpa.h   |  7 ---
>  include/uapi/linux/vhost.h |  8 
>  14 files changed, 21 insertions(+), 148 deletions(-)