Re: [PATCH RFC 1/3] pinctrl: mediatek: paris: Expose more configurations to GPIO set_config

2024-10-25 Thread Nícolas F . R . A . Prado
On Thu, Oct 24, 2024 at 05:17:05PM +0200, AngeloGioacchino Del Regno wrote:
> Il 11/09/24 12:10, AngeloGioacchino Del Regno ha scritto:
> > Il 09/09/24 20:37, Nícolas F. R. A. Prado ha scritto:
> > > Currently the set_config callback in the gpio_chip registered by the
> > > pinctrl_paris driver only supports PIN_CONFIG_INPUT_DEBOUNCE, despite
> > 
> > [...] only supports operations configuring the input debounce parameter
> > of the EINT controller and denies configuring params on the other AP GPIOs 
> > [...]
> > 
> > (reword as needed)
> > 
> > > many other configurations already being implemented and available
> > > through the pinctrl API for configuration of pins by the Devicetree and
> > > other drivers.
> > > 
> > > Expose all configurations currently implemented through the GPIO API so
> > > they can also be set from userspace, which is particularly useful to
> > > allow testing them from userspace.
> > > 
> > > Signed-off-by: Nícolas F. R. A. Prado 
> > > ---
> > >   drivers/pinctrl/mediatek/pinctrl-paris.c | 20 ++--
> > 
> > You can do the same for pinctrl-moore too, it's trivial.
> > 
> > Other than that, I agree about performing this change, as this may be useful
> > for more than just testing.
> > 
> 
> Nicolas, please don't forget to respin this patch.

I was hoping to get some feedback on the test itself as well, particularly from
Linus as the pinctrl maintainer, but it's also been a while so I'll send a v2
with the feedback here addressed.

Thanks,
Nícolas



[PATCH v2 2/3] timers: Use __raise_softirq_irqoff() to raise the softirq.

2024-10-25 Thread Sebastian Andrzej Siewior
As an optimisation use __raise_softirq_irqoff() to raise the softirq.
This is always called from an interrupt handler, interrupts are already
disabled so it can be reduced to just or set softirq flag and let
softirq be invoked on return from interrupt.

Use __raise_softirq_irqoff() to raise the softirq.

Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/time/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 0fc9d066a7be4..1759de934284c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -2499,7 +2499,7 @@ static void run_local_timers(void)
 */
if (time_after_eq(jiffies, READ_ONCE(base->next_expiry)) ||
(i == BASE_DEF && tmigr_requires_handle_remote())) {
-   raise_softirq(TIMER_SOFTIRQ);
+   __raise_softirq_irqoff(TIMER_SOFTIRQ);
return;
}
}
-- 
2.45.2




[PATCH V3] selftests: livepatch: add test cases of stack_order sysfs interface

2024-10-25 Thread Wardenjohn
Add selftest test cases to sysfs attribute 'stack_order'.

Suggested-by: Petr Mladek 
Signed-off-by: Wardenjohn 
---
 .../testing/selftests/livepatch/test-sysfs.sh | 71 +++
 1 file changed, 71 insertions(+)

diff --git a/tools/testing/selftests/livepatch/test-sysfs.sh 
b/tools/testing/selftests/livepatch/test-sysfs.sh
index 05a14f5a7bfb..e44a051be307 100755
--- a/tools/testing/selftests/livepatch/test-sysfs.sh
+++ b/tools/testing/selftests/livepatch/test-sysfs.sh
@@ -5,6 +5,8 @@
 . $(dirname $0)/functions.sh
 
 MOD_LIVEPATCH=test_klp_livepatch
+MOD_LIVEPATCH2=test_klp_callbacks_demo
+MOD_LIVEPATCH3=test_klp_syscall
 
 setup_config
 
@@ -19,6 +21,8 @@ check_sysfs_rights "$MOD_LIVEPATCH" "enabled" "-rw-r--r--"
 check_sysfs_value  "$MOD_LIVEPATCH" "enabled" "1"
 check_sysfs_rights "$MOD_LIVEPATCH" "force" "--w---"
 check_sysfs_rights "$MOD_LIVEPATCH" "replace" "-r--r--r--"
+check_sysfs_rights "$MOD_LIVEPATCH" "stack_order" "-r--r--r--"
+check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
 check_sysfs_rights "$MOD_LIVEPATCH" "transition" "-r--r--r--"
 check_sysfs_value  "$MOD_LIVEPATCH" "transition" "0"
 check_sysfs_rights "$MOD_LIVEPATCH" "vmlinux/patched" "-r--r--r--"
@@ -131,4 +135,71 @@ livepatch: '$MOD_LIVEPATCH': completing unpatching 
transition
 livepatch: '$MOD_LIVEPATCH': unpatching complete
 % rmmod $MOD_LIVEPATCH"
 
+start_test "sysfs test stack_order value"
+
+load_lp $MOD_LIVEPATCH
+
+check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
+
+load_lp $MOD_LIVEPATCH2
+
+check_sysfs_value  "$MOD_LIVEPATCH2" "stack_order" "2"
+
+load_lp $MOD_LIVEPATCH3
+
+check_sysfs_value  "$MOD_LIVEPATCH3" "stack_order" "3"
+
+disable_lp $MOD_LIVEPATCH2
+unload_lp $MOD_LIVEPATCH2
+
+check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
+check_sysfs_value  "$MOD_LIVEPATCH3" "stack_order" "2"
+
+disable_lp $MOD_LIVEPATCH3
+unload_lp $MOD_LIVEPATCH3
+
+disable_lp $MOD_LIVEPATCH
+unload_lp $MOD_LIVEPATCH
+
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
+livepatch: enabling patch '$MOD_LIVEPATCH'
+livepatch: '$MOD_LIVEPATCH': initializing patching transition
+livepatch: '$MOD_LIVEPATCH': starting patching transition
+livepatch: '$MOD_LIVEPATCH': completing patching transition
+livepatch: '$MOD_LIVEPATCH': patching complete
+% insmod test_modules/$MOD_LIVEPATCH2.ko
+livepatch: enabling patch '$MOD_LIVEPATCH2'
+livepatch: '$MOD_LIVEPATCH2': initializing patching transition
+$MOD_LIVEPATCH2: pre_patch_callback: vmlinux
+livepatch: '$MOD_LIVEPATCH2': starting patching transition
+livepatch: '$MOD_LIVEPATCH2': completing patching transition
+$MOD_LIVEPATCH2: post_patch_callback: vmlinux
+livepatch: '$MOD_LIVEPATCH2': patching complete
+% insmod test_modules/$MOD_LIVEPATCH3.ko
+livepatch: enabling patch '$MOD_LIVEPATCH3'
+livepatch: '$MOD_LIVEPATCH3': initializing patching transition
+livepatch: '$MOD_LIVEPATCH3': starting patching transition
+livepatch: '$MOD_LIVEPATCH3': completing patching transition
+livepatch: '$MOD_LIVEPATCH3': patching complete
+% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH2/enabled
+livepatch: '$MOD_LIVEPATCH2': initializing unpatching transition
+$MOD_LIVEPATCH2: pre_unpatch_callback: vmlinux
+livepatch: '$MOD_LIVEPATCH2': starting unpatching transition
+livepatch: '$MOD_LIVEPATCH2': completing unpatching transition
+$MOD_LIVEPATCH2: post_unpatch_callback: vmlinux
+livepatch: '$MOD_LIVEPATCH2': unpatching complete
+% rmmod $MOD_LIVEPATCH2
+% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH3/enabled
+livepatch: '$MOD_LIVEPATCH3': initializing unpatching transition
+livepatch: '$MOD_LIVEPATCH3': starting unpatching transition
+livepatch: '$MOD_LIVEPATCH3': completing unpatching transition
+livepatch: '$MOD_LIVEPATCH3': unpatching complete
+% rmmod $MOD_LIVEPATCH3
+% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH/enabled
+livepatch: '$MOD_LIVEPATCH': initializing unpatching transition
+livepatch: '$MOD_LIVEPATCH': starting unpatching transition
+livepatch: '$MOD_LIVEPATCH': completing unpatching transition
+livepatch: '$MOD_LIVEPATCH': unpatching complete
+% rmmod $MOD_LIVEPATCH"
+
 exit 0
-- 
2.43.5




Re: [PATCH V2 4/4] selftests/mm: skip virtual_address_range tests on riscv

2024-10-25 Thread Palmer Dabbelt

On Tue, 08 Oct 2024 02:41:41 PDT (-0700), zhangchun...@iscas.ac.cn wrote:

RISC-V doesn't currently have the behavior of restricting the virtual
address space which virtual_address_range tests check, this will
cause the tests fail. So lets disable the whole test suite for riscv64
for now, not build it and run_vmtests.sh will skip it if it is not present.

Reviewed-by: Charlie Jenkins 
Signed-off-by: Chunyan Zhang 
---
V1: https://lore.kernel.org/linux-mm/ZuOuedBpS7i3T%2Fo0@ghost/T/
---
 tools/testing/selftests/mm/Makefile   |  2 ++
 tools/testing/selftests/mm/run_vmtests.sh | 10 ++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/mm/Makefile 
b/tools/testing/selftests/mm/Makefile
index 02e1204971b0..76a378c5c141 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -115,7 +115,9 @@ endif

 ifneq (,$(filter $(ARCH),arm64 mips64 parisc64 powerpc riscv64 s390x sparc64 
x86_64 s390))
 TEST_GEN_FILES += va_high_addr_switch
+ifneq ($(ARCH),riscv64)
 TEST_GEN_FILES += virtual_address_range
+endif
 TEST_GEN_FILES += write_to_hugetlbfs
 endif

diff --git a/tools/testing/selftests/mm/run_vmtests.sh 
b/tools/testing/selftests/mm/run_vmtests.sh
index c5797ad1d37b..4493bfd1911c 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -347,10 +347,12 @@ if [ $VADDR64 -ne 0 ]; then
# allows high virtual address allocation requests independent
# of platform's physical memory.

-   prev_policy=$(cat /proc/sys/vm/overcommit_memory)
-   echo 1 > /proc/sys/vm/overcommit_memory
-   CATEGORY="hugevm" run_test ./virtual_address_range
-   echo $prev_policy > /proc/sys/vm/overcommit_memory
+   if [ -x ./virtual_address_range ]; then
+   prev_policy=$(cat /proc/sys/vm/overcommit_memory)
+   echo 1 > /proc/sys/vm/overcommit_memory
+   CATEGORY="hugevm" run_test ./virtual_address_range
+   echo $prev_policy > /proc/sys/vm/overcommit_memory
+   fi

# va high address boundary switch test
ARCH_ARM64="arm64"


Acked-by: Palmer Dabbelt 

(I'm taking the first two as they're RISC-V bits)



[PATCH 2/2] rcuscale: Remove redundant WARN_ON_ONCE() splat

2024-10-25 Thread Uladzislau Rezki (Sony)
There are two places where WARN_ON_ONCE() is called two times
in the error paths. One which is encapsulated into if() condition
and another one, which is unnecessary, is placed in the brackets.

Remove an extra WARN_ON_ONCE() splat which is in brackets.

Signed-off-by: Uladzislau Rezki (Sony) 
---
 kernel/rcu/rcuscale.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index de7d511e6be4..1d8bb603c289 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -889,13 +889,11 @@ kfree_scale_init(void)
 
if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
pr_alert("ERROR: call_rcu() CBs are not being lazy as 
expected!\n");
-   WARN_ON_ONCE(1);
goto unwind;
}
 
if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start > 3 * HZ)) {
pr_alert("ERROR: call_rcu() CBs are being too lazy!\n");
-   WARN_ON_ONCE(1);
goto unwind;
}
}
-- 
2.39.5




Re: [PATCH v4 01/14] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct

2024-10-25 Thread Alexey Kardashevskiy




On 22/10/24 11:20, Nicolin Chen wrote:

Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device, i.e.
iommufd_device (idev) object, against an iommufd_viommu (vIOMMU) object in
the VM. This vDEVICE object (and its structure) holds all the information
and attributes in a VM, regarding the device related to the vIOMMU.

As an initial patch, add a per-vIOMMU virtual ID. This can be:
  - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table
  - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table
  - Virtual ID on a nested Intel VT-D IOMMU, an index to a Context Table
Potentially, this vDEVICE structure can hold some vData for Confidential
Compute Architecture (CCA).

Add a pair of vdevice_alloc and vdevice_free in struct iommufd_viommu_ops
to allow driver-level vDEVICE structure allocations.

Similar to iommufd_viommu_alloc, add an iommufd_vdevice_alloc helper, so
IOMMU drivers can allocate core-embedded style structures.

Signed-off-by: Nicolin Chen 
---
  include/linux/iommufd.h | 32 
  1 file changed, 32 insertions(+)

diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 5c13c35952d8..5d61a1d2947a 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -31,6 +31,7 @@ enum iommufd_object_type {
IOMMUFD_OBJ_ACCESS,
IOMMUFD_OBJ_FAULT,
IOMMUFD_OBJ_VIOMMU,
+   IOMMUFD_OBJ_VDEVICE,
  #ifdef CONFIG_IOMMUFD_TEST
IOMMUFD_OBJ_SELFTEST,
  #endif
@@ -92,6 +93,14 @@ struct iommufd_viommu {
unsigned int type;
  };
  
+struct iommufd_vdevice {

+   struct iommufd_object obj;
+   struct iommufd_ctx *ictx;
+   struct iommufd_device *idev;
+   struct iommufd_viommu *viommu;
+   u64 id; /* per-vIOMMU virtual ID */
+};
+
  /**
   * struct iommufd_viommu_ops - vIOMMU specific operations
   * @free: Free all driver-specific parts of an iommufd_viommu. The memory of 
the
@@ -101,12 +110,24 @@ struct iommufd_viommu {
   *   must be defined in include/uapi/linux/iommufd.h.
   *   It must fully initialize the new iommu_domain before
   *   returning. Upon failure, ERR_PTR must be returned.
+ * @vdevice_alloc: Allocate a driver-managed iommufd_vdevice to init some 
driver
+ * specific structure or HW procedure. Note that the core-level
+ * structure is filled by the iommufd core after calling this 
op.
+ * It is suggested to call iommufd_vdevice_alloc() helper for
+ * a bundled allocation of the core and the driver structures,
+ * using the ictx pointer in the given @viommu.
+ * @vdevice_free: Free a driver-managed iommufd_vdevice to de-init its 
structure
+ *or HW procedure. The memory of the vdevice will be free-ed by
+ *iommufd core.
   */
  struct iommufd_viommu_ops {
void (*free)(struct iommufd_viommu *viommu);
struct iommu_domain *(*domain_alloc_nested)(
struct iommufd_viommu *viommu,
const struct iommu_user_data *user_data);
+   struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu *viommu,
+struct device *dev, u64 id);
+   void (*vdevice_free)(struct iommufd_vdevice *vdev);
  };
  
  #if IS_ENABLED(CONFIG_IOMMUFD)

@@ -200,4 +221,15 @@ _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t 
size,
ret->member.ops = viommu_ops;  \
ret;   \
})
+#define iommufd_vdevice_alloc(ictx, drv_struct, member)
\
+   ({ \
+   static_assert( \
+   __same_type(struct iommufd_vdevice,\
+   ((struct drv_struct *)NULL)->member)); \
+   static_assert(offsetof(struct drv_struct, member.obj) == 0);   \
+   container_of(_iommufd_object_alloc(ictx,   \
+  sizeof(struct drv_struct),  \
+  IOMMUFD_OBJ_VDEVICE),   \
+struct drv_struct, member.obj);   \
+   })
  #endif


A nit: it hurts eyes to read:

mock_vdev = iommufd_vdevice_alloc(viommu->ictx, mock_vdevice, core);

vs.

mock_vdev = iommufd_vdevice_alloc(viommu->ictx, struct mock_vdevice, core);

as for the former I go searching for a "mock_vdevice" variable and for 
the latter it is clear it is 1) a macro 2) which does some type checking.


also, it makes it impossible to pass things like typeof(..) or a type 
from typedef. Thanks,



--
Alexey




[PATCH next] rcu: Unlock correctly in rcu_dump_cpu_stacks()

2024-10-25 Thread Dan Carpenter
The unlock needs to be outside the } close curly braces for this if
statement.  Otherwise it leads to a deadlock.

Fixes: 744e87210b1a ("rcu: Finer-grained grace-period-end checks in 
rcu_dump_cpu_stacks()")
Signed-off-by: Dan Carpenter 
---
 kernel/rcu/tree_stall.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 8994391b95c7..925fcdad5dea 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -357,8 +357,8 @@ static void rcu_dump_cpu_stacks(unsigned long gp_seq)
pr_err("Offline CPU %d blocking current 
GP.\n", cpu);
else
dump_cpu_task(cpu);
-   raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
+   raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
printk_deferred_exit();
}
-- 
2.45.2




[PATCH V4 00/15] selftests/resctrl: Support diverse platforms with MBM and MBA tests

2024-10-25 Thread Reinette Chatre
Changes since V3:
- V3: 
https://lore.kernel.org/all/cover.1729218182.git.reinette.cha...@intel.com/
- Rebased on HEAD 2a027d6bb660 of kselftest/next.
- Fix empty string parsing issues pointed out by Ilpo.
- Add Reviewed-by tags.
- Please see individual patches for detailed changes.

Changes since V2:
- V2: 
https://lore.kernel.org/all/cover.1726164080.git.reinette.cha...@intel.com/
- Add fix to protect against buffer overflow when parsing text from sysfs files.
- Add cleanup patch to address use of magic constants as pointed out by
  Ilpo.
- Add Reviewed-by tags where received, except for "selftests/resctrl: Use cache
  size to determine "fill_buf" buffer size" that changed too much since
  receiving the Reviewed-by tag.
- Please see individual patches for detailed changes.

Changes since V1:
- V1: https://lore.kernel.org/cover.1724970211.git.reinette.cha...@intel.com/
- V2 contains the same general solutions to stated problem as V1 but these
  are now preceded by more fixes (patches 1 to 5) and improved robustness
  (patches 6 to 9) to existing tests before the series gets back
  to solving the original problem with more confidence in patches 10 to 13.
- The posibility of making "memflush = false" for CMT test was discussed
  during V1. Modifying this setting does not have a significant impact on the
  observed results that are already well within acceptable range and this
  version thus keeps original default. If performance was a goal it may
  be possible to do further experimentation where "memflush = false" could
  eliminate the need for the sleep(1) within the test wrapper, but
  improving the performance is not a goal of this work.
- (New) Support what seems to be unintended ability for user space to provide
  parameters to "fill_buf" by making the parsing robust and only support
  changing parameters that are supported to be changed. Drop support for
  "write" operation since it has never been measured.
- (New) Improve wraparound handling. (Ilpo)
- (New) A couple of new fixes addressing issues discovered during development.
- (Change from V1) To support fill_buf parameters provided by user space as
  well as test specific fill_buf parameters struct fill_buf_param is no longer
  just a member of struct resctrl_val_param, instead there could be at most
  two instances of struct fill_buf_param, the immutable parameters provided
  by user space and the parameters used by individual tests. (Ilpo)
- Please see individual patches for detailed changes.

V1 cover:

The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory
Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald
Rapids systems. The test failures result from the following two
properties of these systems:
1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl
   MBA and MBM selftests measure memory traffic for which a hardcoded
   250MB buffer has been sufficient so far. On platforms with L3 cache
   larger than the buffer, the buffer fits in the L3 cache and thus
   no/very little memory traffic is generated during the "memory
   bandwidth" tests.
2) Some platform features, for example RAS features or memory
   performance features that generate memory traffic may drive accesses
   that are counted differently by performance counters and MBM
   respectively, for instance generating "overhead" traffic which is not
   counted against any specific RMID. Until now these counting
   differences have always been "in the noise". On Emerald Rapids
   systems the maximum MBA throttling (10% memory bandwidth)
   throttles memory bandwidth to where memory accesses by these other
   platform features push the memory bandwidth difference between
   memory controller performance counters and resctrl (MBM) beyond the
   tests' hardcoded tolerance.

Make the tests more robust against platform variations:
1) Let the buffer used by memory bandwidth tests be guided by the size
   of the L3 cache.
2) Larger buffers require longer initialization time before the buffer can
   be used to measurement. Rework the tests to ensure that buffer
   initialization is complete before measurements start.
3) Do not compare performance counters and MBM measurements at low
   bandwidth. The value of "low" is hardcoded to 750MiB based on
   measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake
   systems. This limit is not applicable to AMD systems since it
   only applies to the MBA and MBM tests that are isolated to Intel.

[1]
https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-platinum-8592-processor-320m-cache-1-9-ghz.html

Reinette Chatre (15):
  selftests/resctrl: Make functions only used in same file static
  selftests/resctrl: Print accurate buffer size as part of MBM results
  selftests/resctrl: Fix memory overflow due to unhandled wraparound
  selftests/resctrl: Protect against array overrun during iMC config
parsing
  selftests/resctrl: Protect against array overflow when reading strings
  sel

Re: [PATCH net-next v2 2/4] net: hsr: Add VLAN CTAG filter support

2024-10-25 Thread Anwar, Md Danish
Hi Vadim,

On 10/24/2024 7:06 PM, Vadim Fedorenko wrote:
> On 24/10/2024 11:30, MD Danish Anwar wrote:
>> From: Murali Karicheri 
>>
>> This patch adds support for VLAN ctag based filtering at slave devices.
>> The slave ethernet device may be capable of filtering ethernet packets
>> based on VLAN ID. This requires that when the VLAN interface is created
>> over an HSR/PRP interface, it passes the VID information to the
>> associated slave ethernet devices so that it updates the hardware
>> filters to filter ethernet frames based on VID. This patch adds the
>> required functions to propagate the vid information to the slave
>> devices.
>>
>> Signed-off-by: Murali Karicheri 
>> Signed-off-by: MD Danish Anwar 
>> ---
>>   net/hsr/hsr_device.c | 71 +++-
>>   1 file changed, 70 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
>> index 0ca47ebb01d3..ff586bdc2bde 100644
>> --- a/net/hsr/hsr_device.c
>> +++ b/net/hsr/hsr_device.c
>> @@ -515,6 +515,68 @@ static void hsr_change_rx_flags(struct net_device
>> *dev, int change)
>>   }
>>   }
>>   +static int hsr_ndo_vlan_rx_add_vid(struct net_device *dev,
>> +   __be16 proto, u16 vid)
>> +{
>> +    struct hsr_port *port;
>> +    struct hsr_priv *hsr;
>> +    int ret = 0;
>> +
>> +    hsr = netdev_priv(dev);
>> +
>> +    hsr_for_each_port(hsr, port) {
>> +    if (port->type == HSR_PT_MASTER)
>> +    continue;
>> +
>> +    ret = vlan_vid_add(port->dev, proto, vid);
>> +    switch (port->type) {
>> +    case HSR_PT_SLAVE_A:
>> +    if (ret) {
>> +    netdev_err(dev, "add vid failed for Slave-A\n");
>> +    return ret;
>> +    }
>> +    break;
>> +
>> +    case HSR_PT_SLAVE_B:
>> +    if (ret) {
>> +    /* clean up Slave-A */
>> +    netdev_err(dev, "add vid failed for Slave-B\n");
>> +    vlan_vid_del(port->dev, proto, vid);
>> +    return ret;
>> +    }
>> +    break;
>> +    default:
>> +    break;
>> +    }
>> +    }
>> +
>> +    return 0;
>> +}
> 
> This function doesn't match with hsr_ndo_vlan_rx_kill_vid().
> vlan_vid_add() can potentially be executed for port->type
> equals to HSR_PT_INTERLINK, but the result will be ignored. And
> the vlan_vid_del() will never happen in this case. Is it desired
> behavior? Maybe it's better to synchronize add/del code and refactor
> error path to avoid coping the code?
> 

The kill_vid / add_vid is not similar because during add_vid, if
vlan_vid_add() succeeds for one port but fails for other, we need to
delete it for the earlier port. We can only continue if vlan_vid_add()
succeeds for both ports. That's the reason the switch case handling of
add_vid can not match the same for kill_vid. Since cleanup of port is
needed, it's not possible to synchronize add/kill code

We only care about HSR_PT_SLAVE_A and HSR_PT_SLAVE_B here. So it's okay
to ignore HSR_PT_INTERLINK. It's a desired behaviour here.

>> +
>> +static int hsr_ndo_vlan_rx_kill_vid(struct net_device *dev,
>> +    __be16 proto, u16 vid)
>> +{
>> +    struct hsr_port *port;
>> +    struct hsr_priv *hsr;
>> +
>> +    hsr = netdev_priv(dev);
>> +
>> +    hsr_for_each_port(hsr, port) {
>> +    if (port->type == HSR_PT_MASTER)
>> +    continue;
>> +    switch (port->type) {
>> +    case HSR_PT_SLAVE_A:
>> +    case HSR_PT_SLAVE_B:
>> +    vlan_vid_del(port->dev, proto, vid);
>> +    break;
>> +    default:
>> +    break;
>> +    }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static const struct net_device_ops hsr_device_ops = {
>>   .ndo_change_mtu = hsr_dev_change_mtu,
>>   .ndo_open = hsr_dev_open,
>> @@ -523,6 +585,8 @@ static const struct net_device_ops hsr_device_ops = {
>>   .ndo_change_rx_flags = hsr_change_rx_flags,
>>   .ndo_fix_features = hsr_fix_features,
>>   .ndo_set_rx_mode = hsr_set_rx_mode,
>> +    .ndo_vlan_rx_add_vid = hsr_ndo_vlan_rx_add_vid,
>> +    .ndo_vlan_rx_kill_vid = hsr_ndo_vlan_rx_kill_vid,
>>   };
>>     static const struct device_type hsr_type = {
>> @@ -569,7 +633,8 @@ void hsr_dev_setup(struct net_device *dev)
>>     dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
>> NETIF_F_HIGHDMA |
>>  NETIF_F_GSO_MASK | NETIF_F_HW_CSUM |
>> -   NETIF_F_HW_VLAN_CTAG_TX;
>> +   NETIF_F_HW_VLAN_CTAG_TX |
>> +   NETIF_F_HW_VLAN_CTAG_FILTER;
>>     dev->features = dev->hw_features;
>>   }
>> @@ -647,6 +712,10 @@ int hsr_dev_finalize(struct net_device *hsr_dev,
>> struct net_device *slave[2],
>>   (slave[1]->features & NETIF_F_HW_HSR_FWD))
>>   hsr->fwd_offloaded = true;
>>   +    if ((slave[0]->features & NETIF_F_HW_VLAN_CTAG_FILTER) &&
>> +    (slave[1]->features & NETIF_F_HW_VLAN_CTAG_FILTER))
>> +    hsr_dev->features 

[PATCH V4 10/15] selftests/resctrl: Make benchmark parameter passing robust

2024-10-25 Thread Reinette Chatre
The benchmark used during the CMT, MBM, and MBA tests can be provided by
the user via (-b) parameter, if not provided the default "fill_buf"
benchmark is used. The user is additionally able to override
any of the "fill_buf" default parameters when running the tests with
"-b fill_buf ".

The "fill_buf" parameters are managed as an array of strings. Using an
array of strings is complex because it requires transformations to/from
strings at every producer and consumer. This is made worse for the
individual tests where the default benchmark parameters values may not
be appropriate and additional data wrangling is required. For example,
the CMT test duplicates the entire array of strings in order to replace
one of the parameters.

More issues appear when combining the usage of an array of strings with
the use case of user overriding default parameters by specifying
"-b fill_buf ". This use case is fragile with opportunities
to trigger a SIGSEGV because of opportunities for NULL pointers to exist
in the array of strings. For example, by running below (thus by specifying
"fill_buf" should be used but all parameters are NULL):
$ sudo resctrl_tests -t mbm -b fill_buf

Replace the "array of strings" parameters used for "fill_buf" with
new struct fill_buf_param that contains the "fill_buf" parameters that
can be used directly without transformations to/from strings. Two
instances of struct fill_buf_param may exist at any point in time:
* If the user provides new parameters to "fill_buf", the
  user parameter structure (struct user_params) will point to a
  fully initialized and immutable struct fill_buf_param
  containing the user provided parameters.
* If "fill_buf" is the benchmark that should be used by a test,
  then the test parameter structure (struct resctrl_val_param)
  will point to a fully initialized struct fill_buf_param. The
  latter may contain (a) the user provided parameters verbatim,
  (b) user provided parameters adjusted to be appropriate for
  the test, or (c) the default parameters for "fill_buf" that
  is appropriate for the test if the user did not provide
  "fill_buf" parameters nor an alternate benchmark.

The existing behavior of CMT test is to use test defined value for the
buffer size even if the user provides another value via command line.
This behavior is maintained since the test requires that the buffer size
matches the size of the cache allocated, and the amount of cache
allocated can instead be changed by the user with the "-n" command line
parameter.

Signed-off-by: Reinette Chatre 
---
Changes since V3:
- Handle empty string input. (Ilpo)

Changes since V2:
- Use empty initializers. (Ilpo)
- Let memflush be bool instead of int. (Ilpo)
- Make user input checks more robust. (Ilpo)
- Assign values as part of local variable definition. (Ilpo)

Changes since V1:
- Maintain original behavior where user can override "fill_buf"
  parameters via command line ... but only those that can actually
  be changed. (Ilpo)
- Fix parsing issues associated with original behavior to ensure
  any parameter is valid before any attempt to use it.
- Move patch earlier in series to highlight that this fixes existing
  issues.
- Make struct fill_buf_param dynamic to support user provided
  parameters as well as test specific parameters.
- Rewrite changelog.
---
 tools/testing/selftests/resctrl/cmt_test.c|  32 ++
 tools/testing/selftests/resctrl/fill_buf.c|   4 +-
 tools/testing/selftests/resctrl/mba_test.c|  13 ++-
 tools/testing/selftests/resctrl/mbm_test.c|  22 ++--
 tools/testing/selftests/resctrl/resctrl.h |  59 +++---
 .../testing/selftests/resctrl/resctrl_tests.c | 103 ++
 tools/testing/selftests/resctrl/resctrl_val.c |  41 ---
 7 files changed, 178 insertions(+), 96 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cmt_test.c 
b/tools/testing/selftests/resctrl/cmt_test.c
index 0c045080d808..4c3cf2c25a38 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -116,15 +116,13 @@ static void cmt_test_cleanup(void)
 
 static int cmt_run_test(const struct resctrl_test *test, const struct 
user_params *uparams)
 {
-   const char * const *cmd = uparams->benchmark_cmd;
-   const char *new_cmd[BENCHMARK_ARGS];
+   struct fill_buf_param fill_buf = {};
unsigned long cache_total_size = 0;
int n = uparams->bits ? : 5;
unsigned long long_mask;
-   char *span_str = NULL;
int count_of_bits;
size_t span;
-   int ret, i;
+   int ret;
 
ret = get_full_cbm("L3", &long_mask);
if (ret)
@@ -155,32 +153,26 @@ static int cmt_run_test(const struct resctrl_test *test, 
const struct user_param
 
span = cache_portion_size(cache_total_size, param.mask, long_mask);
 
-   if (strcmp(cmd[0], "fill_buf") == 0) {
-   /* Du

Re: [PATCH] vsock/test: fix failures due to wrong SO_RCVLOWAT parameter

2024-10-25 Thread Konstantin Shkolnyy

On 10/24/2024 03:43, Stefano Garzarella wrote:

Other setsockopt() in the tests where we use unsigned long are
SO_VM_SOCKETS_* but they are expected to be unsigned, so we should be
fine.


It's actually not "signed vs unsigned", but a "size + endianess" problem.

Also, looking at SO_VM_SOCKETS_* code in the test, it uses unsigned long 
and size_t which (I believe) will both shrink to 4 bytes on 32-bit 
machines, while the corresponding kernel code in af_vsock.c uses u64. It 
looks to me that this kernel code will be unhappy to receive just 4 
bytes when it expects 8.




stable-rc linux-6.6.y: Queues: tinyconfig: undefined reference to `irq_work_queue'

2024-10-25 Thread Naresh Kamboju
Most of the tinyconfigs are failing on stable-rc linux-6.6.y.

Build errors:
--
aarch64-linux-gnu-ld: kernel/task_work.o: in function `task_work_add':
task_work.c:(.text+0x190): undefined reference to `irq_work_queue'
task_work.c:(.text+0x190): relocation truncated to fit:
R_AARCH64_CALL26 against undefined symbol `irq_work_queue'

Reported-by: Linux Kernel Functional Testing 

metadata:

git_describe: v6.6.57-251-g1870a9bd3fe7
git_repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Build:   v6.6.57-251-g1870a9bd3fe7
Details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-queues-queue_6.6/build/v6.6.57-251-g1870a9bd3fe7
kernel_version: 6.6.58

Regressions (compared to build v6.6.57)


parisc:

  * build/gcc-11-tinyconfig
mips:

  * build/gcc-12-tinyconfig
  * build/clang-19-tinyconfig
  * build/gcc-8-tinyconfig
  * build/clang-nightly-tinyconfig
arm:

  * build/clang-19-tinyconfig
  * build/gcc-8-tinyconfig
  * build/gcc-13-tinyconfig
  * build/clang-nightly-tinyconfig
powerpc:

  * build/clang-19-tinyconfig
  * build/gcc-8-tinyconfig
  * build/gcc-13-tinyconfig
  * build/clang-nightly-tinyconfig
arm64:

  * build/clang-19-tinyconfig
  * build/gcc-8-tinyconfig
  * build/gcc-13-tinyconfig
  * build/clang-nightly-tinyconfig
arc:

  * build/gcc-9-tinyconfig
  * build/gcc-8-tinyconfig
s390:

  * build/clang-19-tinyconfig
  * build/gcc-8-tinyconfig
  * build/gcc-13-tinyconfig
  * build/clang-nightly-tinyconfig
sparc:

  * build/gcc-8-tinyconfig
  * build/gcc-11-tinyconfig
riscv:

  * build/clang-19-tinyconfig
  * build/gcc-8-tinyconfig
  * build/gcc-13-tinyconfig

compare history links:
-
 - 
https://qa-reports.linaro.org/lkft/linux-stable-rc-queues-queue_6.6/build/v6.6.57-251-g1870a9bd3fe7/testrun/25533195/suite/build/test/gcc-13-tinyconfig/history/
- 
https://qa-reports.linaro.org/lkft/linux-stable-rc-queues-queue_6.6/build/v6.6.57-251-g1870a9bd3fe7/testrun/25533195/suite/build/test/gcc-13-tinyconfig/log

--
Linaro LKFT
https://lkft.linaro.org



Re: [PATCH 1/4] dt-bindings: remoteproc: fsl,imx-rproc: add new compatible

2024-10-25 Thread Mathieu Poirier
Good day,

On Wed, Oct 23, 2024 at 12:21:11PM -0400, Laurentiu Mihalcea wrote:
> From: Laurentiu Mihalcea 
> 
> Add new compatible for imx95's CM7 with SOF.
> 
> Signed-off-by: Laurentiu Mihalcea 
> ---
>  .../bindings/remoteproc/fsl,imx-rproc.yaml| 58 +--
>  1 file changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml 
> b/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml
> index 57d75acb0b5e..ab0d8e017965 100644
> --- a/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml
> +++ b/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml
> @@ -28,6 +28,15 @@ properties:
>- fsl,imx8qxp-cm4
>- fsl,imx8ulp-cm33
>- fsl,imx93-cm33
> +  - fsl,imx95-cm7-sof

Why is this added in the remoteproc bindings when the driver is
sound/soc/sof/imx/imx95.c?

> +
> +  reg:
> +maxItems: 2
> +
> +  reg-names:
> +items:
> +  - const: dram
> +  - const: mailbox
>  
>clocks:
>  maxItems: 1
> @@ -38,10 +47,8 @@ properties:
>Phandle to syscon block which provide access to System Reset Controller
>  
>mbox-names:
> -items:
> -  - const: tx
> -  - const: rx
> -  - const: rxdb
> +minItems: 1
> +maxItems: 4
>  
>mboxes:
>  description:
> @@ -49,7 +56,7 @@ properties:
>List of <&phandle type channel> - 1 channel for TX, 1 channel for RX, 
> 1 channel for RXDB.
>(see mailbox/fsl,mu.yaml)
>  minItems: 1
> -maxItems: 3
> +maxItems: 4
>  
>memory-region:
>  description:
> @@ -84,6 +91,10 @@ properties:
>This property is to specify the resource id of the remote processor in 
> SoC
>which supports SCFW
>  
> +  port:
> +$ref: /schemas/sound/audio-graph-port.yaml#
> +unevaluatedProperties: false
> +
>  required:
>- compatible
>  
> @@ -114,6 +125,43 @@ allOf:
>properties:
>  power-domains: false
>  
> +  - if:
> +  properties:
> +compatible:
> +  contains:
> +const: fsl,imx95-cm7-sof
> +then:
> +  properties:
> +mboxes:
> +  minItems: 4
> +mbox-names:
> +  items:
> +- const: txdb0
> +- const: txdb1
> +- const: rxdb0
> +- const: rxdb1
> +memory-region:
> +  maxItems: 1
> +  required:
> +- reg
> +- reg-names
> +- mboxes
> +- mbox-names
> +- memory-region
> +- port
> +else:
> +  properties:
> +reg: false
> +reg-names: false
> +mboxes:
> +  maxItems: 3
> +mbox-names:
> +  items:
> +- const: tx
> +- const: rx
> +- const: rxdb
> +port: false
> +
>  additionalProperties: false
>  
>  examples:
> -- 
> 2.34.1
> 



[PATCH net v6] ipv6: Fix soft lockups in fib6_select_path under high next hop churn

2024-10-25 Thread Omid Ehtemam-Haghighi
Soft lockups have been observed on a cluster of Linux-based edge routers
located in a highly dynamic environment. Using the `bird` service, these
routers continuously update BGP-advertised routes due to frequently
changing nexthop destinations, while also managing significant IPv6
traffic. The lockups occur during the traversal of the multipath
circular linked-list in the `fib6_select_path` function, particularly
while iterating through the siblings in the list. The issue typically
arises when the nodes of the linked list are unexpectedly deleted
concurrently on a different core—indicated by their 'next' and
'previous' elements pointing back to the node itself and their reference
count dropping to zero. This results in an infinite loop, leading to a
soft lockup that triggers a system panic via the watchdog timer.

Apply RCU primitives in the problematic code sections to resolve the
issue. Where necessary, update the references to fib6_siblings to
annotate or use the RCU APIs.

Include a test script that reproduces the issue. The script
periodically updates the routing table while generating a heavy load
of outgoing IPv6 traffic through multiple iperf3 clients. It
consistently induces infinite soft lockups within a couple of minutes.

Kernel log:

 0 [bd13003e8d30] machine_kexec at 8ceaf3eb
 1 [bd13003e8d90] __crash_kexec at 8d0120e3
 2 [bd13003e8e58] panic at 8cef65d4
 3 [bd13003e8ed8] watchdog_timer_fn at 8d05cb03
 4 [bd13003e8f08] __hrtimer_run_queues at 8cfec62f
 5 [bd13003e8f70] hrtimer_interrupt at 8cfed756
 6 [bd13003e8fd0] __sysvec_apic_timer_interrupt at 8cea01af
 7 [bd13003e8ff0] sysvec_apic_timer_interrupt at 8df1b83d
--  --
 8 [bd13003d3708] asm_sysvec_apic_timer_interrupt at 8e000ecb
[exception RIP: fib6_select_path+299]
RIP: 8ddafe7b  RSP: bd13003d37b8  RFLAGS: 0287
RAX: 975850b43600  RBX: 975850b40200  RCX: 
RDX: 3fff  RSI: 51d383e4  RDI: 975850b43618
RBP: bd13003d3800   R8:    R9: 975850b40200
R10:   R11:   R12: bd13003d3830
R13: 975850b436a8  R14: 975850b43600  R15: 0007
ORIG_RAX:   CS: 0010  SS: 0018
 9 [bd13003d3808] ip6_pol_route at 8ddb030c
10 [bd13003d3888] ip6_pol_route_input at 8ddb068c
11 [bd13003d3898] fib6_rule_lookup at 8ddf02b5
12 [bd13003d3928] ip6_route_input at 8ddb0f47
13 [bd13003d3a18] ip6_rcv_finish_core.constprop.0 at 8dd950d0
14 [bd13003d3a30] ip6_list_rcv_finish.constprop.0 at 8dd96274
15 [bd13003d3a98] ip6_sublist_rcv at 8dd96474
16 [bd13003d3af8] ipv6_list_rcv at 8dd96615
17 [bd13003d3b60] __netif_receive_skb_list_core at 8dc16fec
18 [bd13003d3be0] netif_receive_skb_list_internal at 8dc176b3
19 [bd13003d3c50] napi_gro_receive at 8dc565b9
20 [bd13003d3c80] ice_receive_skb at c087e4f5 [ice]
21 [bd13003d3c90] ice_clean_rx_irq at c0881b80 [ice]
22 [bd13003d3d20] ice_napi_poll at c088232f [ice]
23 [bd13003d3d80] __napi_poll at 8dc18000
24 [bd13003d3db8] net_rx_action at 8dc18581
25 [bd13003d3e40] __do_softirq at 8df352e9
26 [bd13003d3eb0] run_ksoftirqd at 8ceffe47
27 [bd13003d3ec0] smpboot_thread_fn at 8cf36a30
28 [bd13003d3ee8] kthread at 8cf2b39f
29 [bd13003d3f28] ret_from_fork at 8ce5fa64
30 [bd13003d3f50] ret_from_fork_asm at 8ce03cbb

Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Reported-by: Adrian Oliver 
Signed-off-by: Omid Ehtemam-Haghighi 
Cc: David S. Miller 
Cc: David Ahern 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Cc: Shuah Khan 
Cc: Ido Schimmel 
Cc: Kuniyuki Iwashima 
Cc: Simon Horman 
Cc: net...@vger.kernel.org
Cc: linux-kselft...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
v5 -> v6:
* Adjust the comment line lengths in the test script to a maximum of
  80 characters
* Change memory allocation in inet6_rt_notify from gfp_any() to 
GFP_ATOMIC for
  atomic allocation in non-blocking contexts, as suggested by Ido 
Schimmel
* NOTE: I have executed the test script on both bare-metal servers and
  virtualized environments such as QEMU and vng. In the case of 
bare-metal, it
  consistently triggers a soft lockup in under a minute on unpatched 
kernels.
  For the virtualized environments, an unpatched kernel compiled with 
the
  Ubuntu 24.04 configuration also triggers a soft lockup, though it 
takes
  longer; however, it did not trigger a soft lockup on kernels compiled 
with
  configurations provided in:

  
https://github.com/linux-netdev/nipa/wiki/How-to-ru

Re: [PATCH] selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run

2024-10-25 Thread zhouyuhang




在 2024/10/24 22:26, Shuah Khan 写道:

On 10/24/24 03:50, zhouyuhang wrote:

From: zhouyuhang 

Test case idmap_mount_tree_invalid failed to run on the newer kernel
with the following output:

  #  RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
  # mount_setattr_test.c:1428:idmap_mount_tree_invalid:Expected 
sys_mount_setattr(open_tree_fd, "", AT_EMPTY_PATH, &attr, 
sizeof(attr)) (0) ! = 0 (0)

  # idmap_mount_tree_invalid: Test terminated by assertion

This is because tmpfs is mounted at "/mnt/A", and tmpfs already
contains the flag FS_ALLOW_IDMAP after the commit 7a80e5b8c6fa ("shmem:
support idmapped mounts for tmpfs"). So calling sys_mount_setattr here
returns 0 instead of -EINVAL as expected.

Ramfs is mounted at "/mnt/B" and does not support idmap mounts.
So we can use "/mnt/B" instead of "/mnt/A" to make the test run
successfully with the following output:

  # Starting 1 tests from 1 test cases.
  #  RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
  #    OK mount_setattr_idmapped.idmap_mount_tree_invalid
  ok 1 mount_setattr_idmapped.idmap_mount_tree_invalid
  # PASSED: 1 / 1 tests passed.



Sounds like this code is testing this very condition passing
in invalid mount to see what happens. If that is the intent
this patch is incorrect.



I think I probably understand what you mean, what you're saying is that 
the output of this line of errors is the condition,
and the main purpose of the test case is to see what happens when it 
invalid mount. But it's valid now, isn't it?
So we need to fix it. I don't think that constructing this error with 
ramfs will have any impact on the code that follows.
If you feel that using "/mnt/B" is unreliable, I think we can 
temporarily mount ramfs to "/mnt/A" here and continue using "/mnt/A".

Do you think this is feasible? Looking forward to your reply, thank you.


Signed-off-by: zhouyuhang 
---
  tools/testing/selftests/mount_setattr/mount_setattr_test.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/tools/testing/selftests/mount_setattr/mount_setattr_test.c 
b/tools/testing/selftests/mount_setattr/mount_setattr_test.c

index c6a8c732b802..54552c19bc24 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -1414,7 +1414,7 @@ TEST_F(mount_setattr_idmapped, 
idmap_mount_tree_invalid)

  ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/b", 0, 0, 0), 0);
  ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/BB/b", 0, 0, 0), 0);
  -    open_tree_fd = sys_open_tree(-EBADF, "/mnt/A",
+    open_tree_fd = sys_open_tree(-EBADF, "/mnt/B",
   AT_RECURSIVE |
   AT_EMPTY_PATH |
   AT_NO_AUTOMOUNT |


thanks,
-- Shuah





Re: [PATCH v4 4/4] selftests: pidfd: add tests for PIDFD_SELF_*

2024-10-25 Thread kernel test robot
struct f_owner_ex
  |^~
/usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:196:8: error: redefinition of 
‘struct flock’
  196 | struct flock {
  |^
/usr/include/x86_64-linux-gnu/bits/fcntl.h:35:8: note: originally defined here
   35 | struct flock
  |^
/usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:210:8: error: redefinition of 
‘struct flock64’
  210 | struct flock64 {
  |^~~
/usr/include/x86_64-linux-gnu/bits/fcntl.h:50:8: note: originally defined here
   50 | struct flock64
  |^~~
make: *** [../lib.mk:222: 
/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup/test_kill]
 Error 1
make: Leaving directory 
'/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup'



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241025/202410251504.707d78fc-oliver.s...@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki




Re: [PATCH v3] remoteproc: Add a new remoteproc state RPROC_DEFUNCT

2024-10-25 Thread Mukesh Ojha
On Mon, Oct 21, 2024 at 09:12:47AM -0600, Mathieu Poirier wrote:
> Hi Mukesh,
> 
> On Wed, Oct 16, 2024 at 10:25:46AM +0530, Mukesh Ojha wrote:
> > Multiple call to glink_subdev_stop() for the same remoteproc can happen
> > if rproc_stop() fails from Process-A that leaves the rproc state to
> > RPROC_CRASHED state later a call to recovery_store from user space in
> > Process B triggers rproc_trigger_recovery() of the same remoteproc to
> > recover it results in NULL pointer dereference issue in
> > qcom_glink_smem_unregister().
> > 
> > There is other side to this issue if we want to fix this via adding a
> > NULL check on glink->edge which does not guarantees that the remoteproc
> > will recover in second call from Process B as it has failed in the first
> > Process A during SMC shutdown call and may again fail at the same call
> > and rproc can not recover for such case.
> > 
> > Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of
> > remoteproc and the only way to recover from it via system restart.
> > 
> > Process-A   Process-B
> > 
> >   fatal error interrupt happens
> > 
> >   rproc_crash_handler_work()
> > mutex_lock_interruptible(&rproc->lock);
> > ...
> > 
> >rproc->state = RPROC_CRASHED;
> > ...
> > mutex_unlock(&rproc->lock);
> > 
> > rproc_trigger_recovery()
> >  mutex_lock_interruptible(&rproc->lock);
> > 
> >   adsp_stop()
> >   qcom_q6v5_pas 20c0.remoteproc: failed to shutdown: -22
> >   remoteproc remoteproc3: can't stop rproc: -22
> >  mutex_unlock(&rproc->lock);
> 
> Ok, that can happen.
> 
> > 
> > echo enabled > 
> > /sys/class/remoteproc/remoteprocX/recovery
> > recovery_store()
> >  rproc_trigger_recovery()
> >   
> > mutex_lock_interruptible(&rproc->lock);
> >rproc_stop()
> > glink_subdev_stop()
> >   
> > qcom_glink_smem_unregister() ==|
> > 
> >  |
> > 
> >  V
> 
> I am missing some information here but I will _assume_ this is caused by
> glink->edge being set to NULL [1] when glink_subdev_stop() is first called by
> process A.  Instead of adding a new state to the core I think a better idea
> would be to add a check for a NULL value on @smem in
> qcom_glink_smem_unregister().  This is a problem that should be fixed in the
> driver rather than the core.
> 
> [1]. 
> https://elixir.bootlin.com/linux/v6.12-rc4/source/drivers/remoteproc/qcom_common.c#L213


I did the same here [1] but after discussion with Bjorn, realized that
remoteproc might not even recover and may fail in the second attempt as
well and only way is reboot of the machine.

[1]
https://lore.kernel.org/lkml/20240925103351.1628788-1-quic_mo...@quicinc.com/

> 
> >   Unable to handle kernel 
> > NULL pointer dereference
> > at virtual 
> > address 0358
> > 
> > Signed-off-by: Mukesh Ojha 
> > ---
> > Changes in v3:
> >  - Fix kernel test reported error.
> > 
> > Changes in v2:
> >  - Removed NULL pointer check instead added a new state to signify
> >non-recoverable state of remoteproc.
> > 
> >  drivers/remoteproc/remoteproc_core.c  | 3 ++-
> >  drivers/remoteproc/remoteproc_sysfs.c | 1 +
> >  include/linux/remoteproc.h| 5 -
> >  3 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/remoteproc/remoteproc_core.c 
> > b/drivers/remoteproc/remoteproc_core.c
> > index f276956f2c5c..c4e14503b971 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1727,6 +1727,7 @@ static int rproc_stop(struct rproc *rproc, bool 
> > crashed)
> > /* power off the remote processor */
> > ret = rproc->ops->stop(rproc);
> > if (ret) {
> > +   rproc->state = RPROC_DEFUNCT;
> > dev_err(dev, "can't stop rproc: %d\n", ret);
> > return ret;
> > }
> > @@ -1839,7 +1840,7 @@ int rproc_trigger_recovery(struct rproc *rproc)
> > return ret;
> >  
> > /* State could have changed before we got the mutex */
> > -   if (rproc->state != RPROC_CRASHED)
> > +   if (rproc->state == RPROC_DEFUNCT || rproc->state != RPROC_CRASHED)
> > goto unlock_mutex;
> 
> The problem is that rproc_trigger_recovery() an only be called once for a
> remoteproc, something that modifies the state machine and may introduce 
> backward
> compatibility issues for other remote processor implementations.
> 


Re: [PATCH] selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run

2024-10-25 Thread Shuah Khan

On 10/24/24 03:50, zhouyuhang wrote:

From: zhouyuhang 

Test case idmap_mount_tree_invalid failed to run on the newer kernel
with the following output:

  #  RUN   mount_setattr_idmapped.idmap_mount_tree_invalid ...
  # mount_setattr_test.c:1428:idmap_mount_tree_invalid:Expected 
sys_mount_setattr(open_tree_fd, "", AT_EMPTY_PATH, &attr,  sizeof(attr)) (0) ! 
= 0 (0)
  # idmap_mount_tree_invalid: Test terminated by assertion

This is because tmpfs is mounted at "/mnt/A", and tmpfs already
contains the flag FS_ALLOW_IDMAP after the commit 7a80e5b8c6fa ("shmem:
support idmapped mounts for tmpfs"). So calling sys_mount_setattr here
returns 0 instead of -EINVAL as expected.

Ramfs is mounted at "/mnt/B" and does not support idmap mounts.
So we can use "/mnt/B" instead of "/mnt/A" to make the test run
successfully with the following output:

  # Starting 1 tests from 1 test cases.
  #  RUN   mount_setattr_idmapped.idmap_mount_tree_invalid ...
  #OK  mount_setattr_idmapped.idmap_mount_tree_invalid
  ok 1 mount_setattr_idmapped.idmap_mount_tree_invalid
  # PASSED: 1 / 1 tests passed.



Sounds like this code is testing this very condition passing
in invalid mount to see what happens. If that is the intent
this patch is incorrect.


Signed-off-by: zhouyuhang 
---
  tools/testing/selftests/mount_setattr/mount_setattr_test.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c 
b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index c6a8c732b802..54552c19bc24 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -1414,7 +1414,7 @@ TEST_F(mount_setattr_idmapped, idmap_mount_tree_invalid)
ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/b", 0, 0, 0), 0);
ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/BB/b", 0, 0, 0), 0);
  
-	open_tree_fd = sys_open_tree(-EBADF, "/mnt/A",

+   open_tree_fd = sys_open_tree(-EBADF, "/mnt/B",
 AT_RECURSIVE |
 AT_EMPTY_PATH |
 AT_NO_AUTOMOUNT |


thanks,
-- Shuah



[PATCH net-next v2 3/4] net: ti: icssg-prueth: Add VLAN support for HSR mode

2024-10-25 Thread MD Danish Anwar
From: Ravi Gunasekaran 

Add support for VLAN addition/deletion in HSR mode.
In HSR mode, even if the host port is not a member of
the VLAN domain, the slave ports should simply forward the
frames. So allow forwarding of all VLAN frames in HSR mode.

Signed-off-by: Ravi Gunasekaran 
Signed-off-by: MD Danish Anwar 
---
 drivers/net/ethernet/ti/icssg/icssg_prueth.c | 45 +++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c 
b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
index 0556910938fa..b4d70c6e0cff 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
@@ -808,6 +808,47 @@ static netdev_features_t emac_ndo_fix_features(struct 
net_device *ndev,
return features;
 }
 
+static int emac_ndo_vlan_rx_add_vid(struct net_device *ndev,
+   __be16 proto, u16 vid)
+{
+   struct prueth_emac *emac = netdev_priv(ndev);
+   struct prueth *prueth = emac->prueth;
+   int untag_mask = 0;
+   int port_mask;
+
+   if (prueth->is_hsr_offload_mode) {
+   port_mask = BIT(PRUETH_PORT_HOST) | BIT(emac->port_id);
+   untag_mask = 0;
+
+   netdev_dbg(emac->ndev, "VID add vid:%u port_mask:%X untag_mask 
%X\n",
+  vid, port_mask, untag_mask);
+
+   icssg_vtbl_modify(emac, vid, port_mask, untag_mask, true);
+   icssg_set_pvid(emac->prueth, vid, emac->port_id);
+   }
+   return 0;
+}
+
+static int emac_ndo_vlan_rx_del_vid(struct net_device *ndev,
+   __be16 proto, u16 vid)
+{
+   struct prueth_emac *emac = netdev_priv(ndev);
+   struct prueth *prueth = emac->prueth;
+   int untag_mask = 0;
+   int port_mask;
+
+   if (prueth->is_hsr_offload_mode) {
+   port_mask = BIT(PRUETH_PORT_HOST);
+   untag_mask = 0;
+
+   netdev_dbg(emac->ndev, "VID del vid:%u port_mask:%X untag_mask  
%X\n",
+  vid, port_mask, untag_mask);
+
+   icssg_vtbl_modify(emac, vid, port_mask, untag_mask, false);
+   }
+   return 0;
+}
+
 static const struct net_device_ops emac_netdev_ops = {
.ndo_open = emac_ndo_open,
.ndo_stop = emac_ndo_stop,
@@ -820,6 +861,8 @@ static const struct net_device_ops emac_netdev_ops = {
.ndo_get_stats64 = icssg_ndo_get_stats64,
.ndo_get_phys_port_name = icssg_ndo_get_phys_port_name,
.ndo_fix_features = emac_ndo_fix_features,
+   .ndo_vlan_rx_add_vid = emac_ndo_vlan_rx_add_vid,
+   .ndo_vlan_rx_kill_vid = emac_ndo_vlan_rx_del_vid,
 };
 
 static int prueth_netdev_init(struct prueth *prueth,
@@ -947,7 +990,7 @@ static int prueth_netdev_init(struct prueth *prueth,
ndev->netdev_ops = &emac_netdev_ops;
ndev->ethtool_ops = &icssg_ethtool_ops;
ndev->hw_features = NETIF_F_SG;
-   ndev->features = ndev->hw_features;
+   ndev->features = ndev->hw_features | NETIF_F_HW_VLAN_CTAG_FILTER;
ndev->hw_features |= NETIF_PRUETH_HSR_OFFLOAD_FEATURES;
 
netif_napi_add(ndev, &emac->napi_rx, icssg_napi_rx_poll);
-- 
2.34.1




[PATCH v2 0/3] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.

2024-10-25 Thread Sebastian Andrzej Siewior
Hi,

the following was in the PREEMPT_RT queue since last softirq rework. The
result is that timer wake ups (hrtimer, timer_list) happens in hardirq
processing them requires to wake ksoftirqd. ksoftirqd runs at SCHED_OTHER so it
will compete for resources with all other tasks in the system, potentially
delayed the processing further.

The idea was to let the timers be processed by a dedicated thread
running at low SCHED_FIFO priority.
While looking at it again, it might make sense to have the
pending_softirq flag per-thread to avoid threads with higher priority
picking up softirqs from low-priority threads. This isn't yet a problem
because adding softirqs for processing happens only from threaded
interrupts. So the low-priority thread will wait until the high-priority
thread is done. And the high-priority thread will PI-boost the
low-priority thread until it is done. It would only make sense to make
the flags per-thread once the BH lock is gone.

The patch is limited to PREEMPT_RT. The ksoftirqd bullets from above
apply also to !PREEMPT_RT +threadirqs. Would it make sense to restrict
it to force_irqthreads() instead?

v1…v2: 
https://lore.kernel.org/all/20241004103842.131014-1-bige...@linutronix.de/
 Frederick's comments:
 - Use __raise_softirq_irqoff() to raise the softirq for !PREEMPT_RT. Also a
   lockdep test to ensure that this is always invoked from an IRQ.
 - Make raise_ktimers_thread() only OR the flag and nothing else to
   align with __raise_softirq_irqoff(). The wake happens on return from
   interrupt anyway.
 - A comment in timersd_setup() and interrupt.h
 - local_pending_timers() => local_timers_pending().

Sebastian




[PATCH net-next v10 02/23] net: introduce OpenVPN Data Channel Offload (ovpn)

2024-10-25 Thread Antonio Quartulli
OpenVPN is a userspace software existing since around 2005 that allows
users to create secure tunnels.

So far OpenVPN has implemented all operations in userspace, which
implies several back and forth between kernel and user land in order to
process packets (encapsulate/decapsulate, encrypt/decrypt, rerouting..).

With `ovpn` we intend to move the fast path (data channel) entirely
in kernel space and thus improve user measured throughput over the
tunnel.

`ovpn` is implemented as a simple virtual network device driver, that
can be manipulated by means of the standard RTNL APIs. A device of kind
`ovpn` allows only IPv4/6 traffic and can be of type:
* P2P (peer-to-peer): any packet sent over the interface will be
  encapsulated and transmitted to the other side (typical OpenVPN
  client or peer-to-peer behaviour);
* P2MP (point-to-multipoint): packets sent over the interface are
  transmitted to peers based on existing routes (typical OpenVPN
  server behaviour).

After the interface has been created, OpenVPN in userspace can
configure it using a new Netlink API. Specifically it is possible
to manage peers and their keys.

The OpenVPN control channel is multiplexed over the same transport
socket by means of OP codes. Anything that is not DATA_V2 (OpenVPN
OP code for data traffic) is sent to userspace and handled there.
This way the `ovpn` codebase is kept as compact as possible while
focusing on handling data traffic only (fast path).

Any OpenVPN control feature (like cipher negotiation, TLS handshake,
rekeying, etc.) is still fully handled by the userspace process.

When userspace establishes a new connection with a peer, it first
performs the handshake and then passes the socket to the `ovpn` kernel
module, which takes ownership. From this moment on `ovpn` will handle
data traffic for the new peer.
When control packets are received on the link, they are forwarded to
userspace through the same transport socket they were received on, as
userspace is still listening to them.

Some events (like peer deletion) are sent to a Netlink multicast group.

Although it wasn't easy to convince the community, `ovpn` implements
only a limited number of the data-channel features supported by the
userspace program.

Each feature that made it to `ovpn` was attentively vetted to
avoid carrying too much legacy along with us (and to give a clear cut to
old and probalby-not-so-useful features).

Notably, only encryption using AEAD ciphers (specifically
ChaCha20Poly1305 and AES-GCM) was implemented. Supporting any other
cipher out there was not deemed useful.

Both UDP and TCP sockets ae supported.

As explained above, in case of P2MP mode, OpenVPN will use the main system
routing table to decide which packet goes to which peer. This implies
that no routing table was re-implemented in the `ovpn` kernel module.

This kernel module can be enabled by selecting the CONFIG_OVPN entry
in the networking drivers section.

NOTE: this first patch introduces the very basic framework only.
Features are then added patch by patch, however, although each patch
will compile and possibly not break at runtime, only after having
applied the full set it is expected to see the ovpn module fully working.

Cc: steffen.klass...@secunet.com
Cc: antony.ant...@secunet.com
Signed-off-by: Antonio Quartulli 
---
 MAINTAINERS   |   8 
 drivers/net/Kconfig   |  13 ++
 drivers/net/Makefile  |   1 +
 drivers/net/ovpn/Makefile |  11 +
 drivers/net/ovpn/io.c |  22 +
 drivers/net/ovpn/io.h |  15 ++
 drivers/net/ovpn/main.c   | 116 ++
 drivers/net/ovpn/main.h   |  15 ++
 include/uapi/linux/udp.h  |   1 +
 9 files changed, 202 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 
f39ab140710f16b1245924bfe381cd64d499ff8a..09e193bbc218d74846cbae26f80ada3e04c3692a
 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17286,6 +17286,14 @@ F: arch/openrisc/
 F: drivers/irqchip/irq-ompic.c
 F: drivers/irqchip/irq-or1k-*
 
+OPENVPN DATA CHANNEL OFFLOAD
+M: Antonio Quartulli 
+L: openvpn-de...@lists.sourceforge.net (moderated for non-subscribers)
+L: net...@vger.kernel.org
+S: Supported
+T: git https://github.com/OpenVPN/linux-kernel-ovpn.git
+F: drivers/net/ovpn/
+
 OPENVSWITCH
 M: Pravin B Shelar 
 L: net...@vger.kernel.org
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 
1fd5acdc73c6af0e1a861867039c3624fc618e25..269b73fcfd348a48174fb96b8f8d4f8788636fa8
 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -115,6 +115,19 @@ config WIREGUARD_DEBUG
 
  Say N here unless you know what you're doing.
 
+config OVPN
+   tristate "OpenVPN data channel offload"
+   depends on NET && INET
+   select NET_UDP_TUNNEL
+   select DST_CACHE
+   select CRYPTO
+   select CRYPTO_AES
+   select CRYPTO_GCM
+   select CRYPTO_CHACHA20POLY1305
+   help
+ This module enhances the p

[PATCH net-next v10 01/23] netlink: add NLA_POLICY_MAX_LEN macro

2024-10-25 Thread Antonio Quartulli
Similarly to NLA_POLICY_MIN_LEN, NLA_POLICY_MAX_LEN defines a policy
with a maximum length value.

The netlink generator for YAML specs has been extended accordingly.

Cc: donald.hun...@gmail.com
Signed-off-by: Antonio Quartulli 
---
 include/net/netlink.h  | 1 +
 tools/net/ynl/ynl-gen-c.py | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 
db6af207287c839408c58cb28b82408e0548eaca..2dc671c977ff3297975269d236264907009703d3
 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -469,6 +469,7 @@ struct nla_policy {
.max = _len \
 }
 #define NLA_POLICY_MIN_LEN(_len)   NLA_POLICY_MIN(NLA_BINARY, _len)
+#define NLA_POLICY_MAX_LEN(_len)   NLA_POLICY_MAX(NLA_BINARY, _len)
 
 /**
  * struct nl_info - netlink source information
diff --git a/tools/net/ynl/ynl-gen-c.py b/tools/net/ynl/ynl-gen-c.py
index 
1a825b4081b222cf97eb73f01a2a5c1ffe47cd5c..aa22eb0924754f38ea0b9e68a1ff5a55d94d6717
 100755
--- a/tools/net/ynl/ynl-gen-c.py
+++ b/tools/net/ynl/ynl-gen-c.py
@@ -481,7 +481,7 @@ class TypeBinary(Type):
 pass
 elif len(self.checks) == 1:
 check_name = list(self.checks)[0]
-if check_name not in {'exact-len', 'min-len'}:
+if check_name not in {'exact-len', 'min-len', 'max-len'}:
 raise Exception('Unsupported check for binary type: ' + 
check_name)
 else:
 raise Exception('More than one check for binary type not 
implemented, yet')
@@ -492,6 +492,8 @@ class TypeBinary(Type):
 mem = 'NLA_POLICY_EXACT_LEN(' + self.get_limit_str('exact-len') + 
')'
 elif 'min-len' in self.checks:
 mem = '{ .len = ' + self.get_limit_str('min-len') + ', }'
+elif 'max-len' in self.checks:
+mem = 'NLA_POLICY_MAX_LEN(' + self.get_limit_str('max-len') + ')'
 
 return mem
 

-- 
2.45.2




[PATCH net-next v10 03/23] ovpn: add basic netlink support

2024-10-25 Thread Antonio Quartulli
This commit introduces basic netlink support with family
registration/unregistration functionalities and stub pre/post-doit.

More importantly it introduces the YAML uAPI description along
with its auto-generated files:
- include/uapi/linux/ovpn.h
- drivers/net/ovpn/netlink-gen.c
- drivers/net/ovpn/netlink-gen.h

Cc: donald.hun...@gmail.com
Signed-off-by: Antonio Quartulli 
---
 Documentation/netlink/specs/ovpn.yaml | 362 ++
 MAINTAINERS   |   2 +
 drivers/net/ovpn/Makefile |   2 +
 drivers/net/ovpn/main.c   |  15 +-
 drivers/net/ovpn/netlink-gen.c| 212 
 drivers/net/ovpn/netlink-gen.h|  41 
 drivers/net/ovpn/netlink.c| 157 +++
 drivers/net/ovpn/netlink.h|  15 ++
 drivers/net/ovpn/ovpnstruct.h |  25 +++
 include/uapi/linux/ovpn.h | 109 ++
 10 files changed, 939 insertions(+), 1 deletion(-)

diff --git a/Documentation/netlink/specs/ovpn.yaml 
b/Documentation/netlink/specs/ovpn.yaml
new file mode 100644
index 
..79339c25d607f1b5d15a0a973f6fc23637e158a2
--- /dev/null
+++ b/Documentation/netlink/specs/ovpn.yaml
@@ -0,0 +1,362 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+#
+# Author: Antonio Quartulli 
+#
+# Copyright (c) 2024, OpenVPN Inc.
+#
+
+name: ovpn
+
+protocol: genetlink
+
+doc: Netlink protocol to control OpenVPN network devices
+
+definitions:
+  -
+type: const
+name: nonce-tail-size
+value: 8
+  -
+type: enum
+name: cipher-alg
+entries: [ none, aes-gcm, chacha20-poly1305 ]
+  -
+type: enum
+name: del-peer-reason
+entries: [ teardown, userspace, expired, transport-error, 
transport-disconnect ]
+  -
+type: enum
+name: key-slot
+entries: [ primary, secondary ]
+
+attribute-sets:
+  -
+name: peer
+attributes:
+  -
+name: id
+type: u32
+doc: |
+  The unique ID of the peer. To be used to identify peers during
+  operations
+checks:
+  max: 0xFF
+  -
+name: remote-ipv4
+type: u32
+doc: The remote IPv4 address of the peer
+byte-order: big-endian
+display-hint: ipv4
+  -
+name: remote-ipv6
+type: binary
+doc: The remote IPv6 address of the peer
+display-hint: ipv6
+checks:
+  exact-len: 16
+  -
+name: remote-ipv6-scope-id
+type: u32
+doc: The scope id of the remote IPv6 address of the peer (RFC2553)
+  -
+name: remote-port
+type: u16
+doc: The remote port of the peer
+byte-order: big-endian
+checks:
+  min: 1
+  -
+name: socket
+type: u32
+doc: The socket to be used to communicate with the peer
+  -
+name: vpn-ipv4
+type: u32
+doc: The IPv4 address assigned to the peer by the server
+byte-order: big-endian
+display-hint: ipv4
+  -
+name: vpn-ipv6
+type: binary
+doc: The IPv6 address assigned to the peer by the server
+display-hint: ipv6
+checks:
+  exact-len: 16
+  -
+name: local-ipv4
+type: u32
+doc: The local IPv4 to be used to send packets to the peer (UDP only)
+byte-order: big-endian
+display-hint: ipv4
+  -
+name: local-ipv6
+type: binary
+doc: The local IPv6 to be used to send packets to the peer (UDP only)
+display-hint: ipv6
+checks:
+  exact-len: 16
+  -
+name: local-port
+type: u16
+doc: The local port to be used to send packets to the peer (UDP only)
+byte-order: big-endian
+checks:
+  min: 1
+  -
+name: keepalive-interval
+type: u32
+doc: |
+  The number of seconds after which a keep alive message is sent to the
+  peer
+  -
+name: keepalive-timeout
+type: u32
+doc: |
+  The number of seconds from the last activity after which the peer is
+  assumed dead
+  -
+name: del-reason
+type: u32
+doc: The reason why a peer was deleted
+enum: del-peer-reason
+  -
+name: vpn-rx-bytes
+type: uint
+doc: Number of bytes received over the tunnel
+  -
+name: vpn-tx-bytes
+type: uint
+doc: Number of bytes transmitted over the tunnel
+  -
+name: vpn-rx-packets
+type: uint
+doc: Number of packets received over the tunnel
+  -
+name: vpn-tx-packets
+type: uint
+doc: Number of packets transmitted over the tunnel
+  -
+name: link-rx-bytes
+type: uint
+doc: Number of bytes received at the transport level
+  -
+name: link-tx-bytes

[PATCH net-next v10 05/23] ovpn: keep carrier always on

2024-10-25 Thread Antonio Quartulli
An ovpn interface will keep carrier always on and let the user
decide when an interface should be considered disconnected.

This way, even if an ovpn interface is not connected to any peer,
it can still retain all IPs and routes and thus prevent any data
leak.

Signed-off-by: Antonio Quartulli 
Reviewed-by: Andrew Lunn 
---
 drivers/net/ovpn/main.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net)
 
 static int ovpn_net_open(struct net_device *dev)
 {
+   /* ovpn keeps the carrier always on to avoid losing IP or route
+* configuration upon disconnection. This way it can prevent leaks
+* of traffic outside of the VPN tunnel.
+* The user may override this behaviour by tearing down the interface
+* manually.
+*/
+   netif_carrier_on(dev);
netif_tx_start_all_queues(dev);
return 0;
 }

-- 
2.45.2




[PATCH net-next v10 04/23] ovpn: add basic interface creation/destruction/management routines

2024-10-25 Thread Antonio Quartulli
Add basic infrastructure for handling ovpn interfaces.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/main.c   | 115 --
 drivers/net/ovpn/main.h   |   7 +++
 drivers/net/ovpn/ovpnstruct.h |   8 +++
 drivers/net/ovpn/packet.h |  40 +++
 include/uapi/linux/if_link.h  |  15 ++
 5 files changed, 180 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -10,18 +10,52 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
-#include 
+#include 
 
 #include "ovpnstruct.h"
 #include "main.h"
 #include "netlink.h"
 #include "io.h"
+#include "packet.h"
 
 /* Driver info */
 #define DRV_DESCRIPTION"OpenVPN data channel offload (ovpn)"
 #define DRV_COPYRIGHT  "(C) 2020-2024 OpenVPN, Inc."
 
+static void ovpn_struct_free(struct net_device *net)
+{
+}
+
+static int ovpn_net_open(struct net_device *dev)
+{
+   netif_tx_start_all_queues(dev);
+   return 0;
+}
+
+static int ovpn_net_stop(struct net_device *dev)
+{
+   netif_tx_stop_all_queues(dev);
+   return 0;
+}
+
+static const struct net_device_ops ovpn_netdev_ops = {
+   .ndo_open   = ovpn_net_open,
+   .ndo_stop   = ovpn_net_stop,
+   .ndo_start_xmit = ovpn_net_xmit,
+};
+
+static const struct device_type ovpn_type = {
+   .name = OVPN_FAMILY_NAME,
+};
+
+static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = {
+   [IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P,
+   OVPN_MODE_MP),
+};
+
 /**
  * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn'
  * @dev: the interface to check
@@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev)
return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
 }
 
+static void ovpn_setup(struct net_device *dev)
+{
+   /* compute the overhead considering AEAD encryption */
+   const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 +
+sizeof(struct udphdr) +
+max(sizeof(struct ipv6hdr), sizeof(struct iphdr));
+
+   netdev_features_t feat = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
+NETIF_F_GSO | NETIF_F_GSO_SOFTWARE |
+NETIF_F_HIGHDMA;
+
+   dev->needs_free_netdev = true;
+
+   dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
+
+   dev->netdev_ops = &ovpn_netdev_ops;
+
+   dev->priv_destructor = ovpn_struct_free;
+
+   dev->hard_header_len = 0;
+   dev->addr_len = 0;
+   dev->mtu = ETH_DATA_LEN - overhead;
+   dev->min_mtu = IPV4_MIN_MTU;
+   dev->max_mtu = IP_MAX_MTU - overhead;
+
+   dev->type = ARPHRD_NONE;
+   dev->flags = IFF_POINTOPOINT | IFF_NOARP;
+   dev->priv_flags |= IFF_NO_QUEUE;
+
+   dev->lltx = true;
+   dev->features |= feat;
+   dev->hw_features |= feat;
+   dev->hw_enc_features |= feat;
+
+   dev->needed_headroom = OVPN_HEAD_ROOM;
+   dev->needed_tailroom = OVPN_MAX_PADDING;
+
+   SET_NETDEV_DEVTYPE(dev, &ovpn_type);
+}
+
 static int ovpn_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
 {
-   return -EOPNOTSUPP;
+   struct ovpn_struct *ovpn = netdev_priv(dev);
+   enum ovpn_mode mode = OVPN_MODE_P2P;
+
+   if (data && data[IFLA_OVPN_MODE]) {
+   mode = nla_get_u8(data[IFLA_OVPN_MODE]);
+   netdev_dbg(dev, "setting device mode: %u\n", mode);
+   }
+
+   ovpn->dev = dev;
+   ovpn->mode = mode;
+
+   /* turn carrier explicitly off after registration, this way state is
+* clearly defined
+*/
+   netif_carrier_off(dev);
+
+   return register_netdevice(dev);
 }
 
 static struct rtnl_link_ops ovpn_link_ops = {
.kind = OVPN_FAMILY_NAME,
.netns_refund = false,
+   .priv_size = sizeof(struct ovpn_struct),
+   .setup = ovpn_setup,
+   .policy = ovpn_policy,
+   .maxtype = IFLA_OVPN_MAX,
.newlink = ovpn_newlink,
.dellink = unregister_netdevice_queue,
 };
@@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct notifier_block 
*nb,
 unsigned long state, void *ptr)
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct ovpn_struct *ovpn;
 
if (!ovpn_dev_is_valid(dev))
return NOTIFY_DONE;
 
+   ovpn = netdev_priv(dev);
+
switch (state) {
case NETDEV_REGISTER:
-   /* add device to internal list for later destruction upon
-* unregistration
-*/
+   ovpn-

[PATCH net-next v10 06/23] ovpn: introduce the ovpn_peer object

2024-10-25 Thread Antonio Quartulli
An ovpn_peer object holds the whole status of a remote peer
(regardless whether it is a server or a client).

This includes status for crypto, tx/rx buffers, napi, etc.

Only support for one peer is introduced (P2P mode).
Multi peer support is introduced with a later patch.

Along with the ovpn_peer, also the ovpn_bind object is introcued
as the two are strictly related.
An ovpn_bind object wraps a sockaddr representing the local
coordinates being used to talk to a specific peer.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/Makefile |   2 +
 drivers/net/ovpn/bind.c   |  58 +++
 drivers/net/ovpn/bind.h   | 117 ++
 drivers/net/ovpn/main.c   |  11 ++
 drivers/net/ovpn/main.h   |   2 +
 drivers/net/ovpn/ovpnstruct.h |   4 +
 drivers/net/ovpn/peer.c   | 354 ++
 drivers/net/ovpn/peer.h   |  79 ++
 8 files changed, 627 insertions(+)

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
201dc001419f1d99ae95c0ee0f96e68f8a4eac16..ce13499b3e1775a7f2a9ce16c6cb0aa088f93685
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -7,7 +7,9 @@
 # Author:  Antonio Quartulli 
 
 obj-$(CONFIG_OVPN) := ovpn.o
+ovpn-y += bind.o
 ovpn-y += main.o
 ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
+ovpn-y += peer.o
diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
new file mode 100644
index 
..b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a
--- /dev/null
+++ b/drivers/net/ovpn/bind.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2012-2024 OpenVPN, Inc.
+ *
+ *  Author:James Yonan 
+ * Antonio Quartulli 
+ */
+
+#include 
+#include 
+
+#include "ovpnstruct.h"
+#include "bind.h"
+#include "peer.h"
+
+/**
+ * ovpn_bind_from_sockaddr - retrieve binding matching sockaddr
+ * @ss: the sockaddr to match
+ *
+ * Return: the bind matching the passed sockaddr if found, NULL otherwise
+ */
+struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss)
+{
+   struct ovpn_bind *bind;
+   size_t sa_len;
+
+   if (ss->ss_family == AF_INET)
+   sa_len = sizeof(struct sockaddr_in);
+   else if (ss->ss_family == AF_INET6)
+   sa_len = sizeof(struct sockaddr_in6);
+   else
+   return ERR_PTR(-EAFNOSUPPORT);
+
+   bind = kzalloc(sizeof(*bind), GFP_ATOMIC);
+   if (unlikely(!bind))
+   return ERR_PTR(-ENOMEM);
+
+   memcpy(&bind->remote, ss, sa_len);
+
+   return bind;
+}
+
+/**
+ * ovpn_bind_reset - assign new binding to peer
+ * @peer: the peer whose binding has to be replaced
+ * @new: the new bind to assign
+ */
+void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new)
+{
+   struct ovpn_bind *old;
+
+   spin_lock_bh(&peer->lock);
+   old = rcu_replace_pointer(peer->bind, new, true);
+   spin_unlock_bh(&peer->lock);
+
+   kfree_rcu(old, rcu);
+}
diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h
new file mode 100644
index 
..859213d5040deb36c416eafcf5c6ab31c4d52c7a
--- /dev/null
+++ b/drivers/net/ovpn/bind.h
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2012-2024 OpenVPN, Inc.
+ *
+ *  Author:James Yonan 
+ * Antonio Quartulli 
+ */
+
+#ifndef _NET_OVPN_OVPNBIND_H_
+#define _NET_OVPN_OVPNBIND_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct ovpn_peer;
+
+/**
+ * union ovpn_sockaddr - basic transport layer address
+ * @in4: IPv4 address
+ * @in6: IPv6 address
+ */
+union ovpn_sockaddr {
+   struct sockaddr_in in4;
+   struct sockaddr_in6 in6;
+};
+
+/**
+ * struct ovpn_bind - remote peer binding
+ * @remote: the remote peer sockaddress
+ * @local: local endpoint used to talk to the peer
+ * @local.ipv4: local IPv4 used to talk to the peer
+ * @local.ipv6: local IPv6 used to talk to the peer
+ * @rcu: used to schedule RCU cleanup job
+ */
+struct ovpn_bind {
+   union ovpn_sockaddr remote;  /* remote sockaddr */
+
+   union {
+   struct in_addr ipv4;
+   struct in6_addr ipv6;
+   } local;
+
+   struct rcu_head rcu;
+};
+
+/**
+ * skb_protocol_to_family - translate skb->protocol to AF_INET or AF_INET6
+ * @skb: the packet sk_buff to inspect
+ *
+ * Return: AF_INET, AF_INET6 or 0 in case of unknown protocol
+ */
+static inline unsigned short skb_protocol_to_family(const struct sk_buff *skb)
+{
+   switch (skb->protocol) {
+   case htons(ETH_P_IP):
+   return AF_INET;
+   case htons(ETH_P_IPV6):
+   return AF_INET6;
+   default:
+   return 0;
+   }
+}
+
+/**
+ * ovpn_bind_skb_src_match - match packet source with binding
+ * @bind: the binding to match
+ * @skb: the pack

[PATCH net-next v10 07/23] ovpn: introduce the ovpn_socket object

2024-10-25 Thread Antonio Quartulli
This specific structure is used in the ovpn kernel module
to wrap and carry around a standard kernel socket.

ovpn takes ownership of passed sockets and therefore an ovpn
specific objects is attached to them for status tracking
purposes.

Initially only UDP support is introduced. TCP will come in a later
patch.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/Makefile |   2 +
 drivers/net/ovpn/socket.c | 120 ++
 drivers/net/ovpn/socket.h |  48 +++
 drivers/net/ovpn/udp.c|  72 
 drivers/net/ovpn/udp.h|  17 +++
 5 files changed, 259 insertions(+)

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
ce13499b3e1775a7f2a9ce16c6cb0aa088f93685..56bddc9bef83e0befde6af3c3565bb91731d7b22
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -13,3 +13,5 @@ ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
 ovpn-y += peer.o
+ovpn-y += socket.o
+ovpn-y += udp.o
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
new file mode 100644
index 
..090a3232ab0ec19702110f1a90f45c7f10889f6f
--- /dev/null
+++ b/drivers/net/ovpn/socket.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:James Yonan 
+ * Antonio Quartulli 
+ */
+
+#include 
+#include 
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "io.h"
+#include "peer.h"
+#include "socket.h"
+#include "udp.h"
+
+static void ovpn_socket_detach(struct socket *sock)
+{
+   if (!sock)
+   return;
+
+   sockfd_put(sock);
+}
+
+/**
+ * ovpn_socket_release_kref - kref_put callback
+ * @kref: the kref object
+ */
+void ovpn_socket_release_kref(struct kref *kref)
+{
+   struct ovpn_socket *sock = container_of(kref, struct ovpn_socket,
+   refcount);
+
+   ovpn_socket_detach(sock->sock);
+   kfree_rcu(sock, rcu);
+}
+
+static bool ovpn_socket_hold(struct ovpn_socket *sock)
+{
+   return kref_get_unless_zero(&sock->refcount);
+}
+
+static struct ovpn_socket *ovpn_socket_get(struct socket *sock)
+{
+   struct ovpn_socket *ovpn_sock;
+
+   rcu_read_lock();
+   ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
+   if (!ovpn_socket_hold(ovpn_sock)) {
+   pr_warn("%s: found ovpn_socket with ref = 0\n", __func__);
+   ovpn_sock = NULL;
+   }
+   rcu_read_unlock();
+
+   return ovpn_sock;
+}
+
+static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
+{
+   int ret = -EOPNOTSUPP;
+
+   if (!sock || !peer)
+   return -EINVAL;
+
+   if (sock->sk->sk_protocol == IPPROTO_UDP)
+   ret = ovpn_udp_socket_attach(sock, peer->ovpn);
+
+   return ret;
+}
+
+/**
+ * ovpn_socket_new - create a new socket and initialize it
+ * @sock: the kernel socket to embed
+ * @peer: the peer reachable via this socket
+ *
+ * Return: an openvpn socket on success or a negative error code otherwise
+ */
+struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer 
*peer)
+{
+   struct ovpn_socket *ovpn_sock;
+   int ret;
+
+   ret = ovpn_socket_attach(sock, peer);
+   if (ret < 0 && ret != -EALREADY)
+   return ERR_PTR(ret);
+
+   /* if this socket is already owned by this interface, just increase the
+* refcounter and use it as expected.
+*
+* Since UDP sockets can be used to talk to multiple remote endpoints,
+* openvpn normally instantiates only one socket and shares it among all
+* its peers. For this reason, when we find out that a socket is already
+* used for some other peer in *this* instance, we can happily increase
+* its refcounter and use it normally.
+*/
+   if (ret == -EALREADY) {
+   /* caller is expected to increase the sock refcounter before
+* passing it to this function. For this reason we drop it if
+* not needed, like when this socket is already owned.
+*/
+   ovpn_sock = ovpn_socket_get(sock);
+   sockfd_put(sock);
+   return ovpn_sock;
+   }
+
+   ovpn_sock = kzalloc(sizeof(*ovpn_sock), GFP_KERNEL);
+   if (!ovpn_sock)
+   return ERR_PTR(-ENOMEM);
+
+   ovpn_sock->ovpn = peer->ovpn;
+   ovpn_sock->sock = sock;
+   kref_init(&ovpn_sock->refcount);
+
+   rcu_assign_sk_user_data(sock->sk, ovpn_sock);
+
+   return ovpn_sock;
+}
diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h
new file mode 100644
index 
..5ad9c5073b085482da95ee8ebf40acf20bf2e4b3
--- /dev/null
+++ b/drivers/net/ovpn/socket.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data chann

[PATCH net-next v10 11/23] ovpn: store tunnel and transport statistics

2024-10-25 Thread Antonio Quartulli
Byte/packet counters for in-tunnel and transport streams
are now initialized and updated as needed.

To be exported via netlink.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/Makefile  |  1 +
 drivers/net/ovpn/crypto_aead.c |  2 ++
 drivers/net/ovpn/io.c  | 11 ++
 drivers/net/ovpn/peer.c|  2 ++
 drivers/net/ovpn/peer.h|  5 +
 drivers/net/ovpn/skb.h |  1 +
 drivers/net/ovpn/stats.c   | 21 +++
 drivers/net/ovpn/stats.h   | 47 ++
 8 files changed, 90 insertions(+)

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
ccdaeced1982c851475657860a005ff2b9dfbd13..d43fda72646bdc7644d9a878b56da0a0e5680c98
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -17,4 +17,5 @@ ovpn-y += netlink-gen.o
 ovpn-y += peer.o
 ovpn-y += pktid.o
 ovpn-y += socket.o
+ovpn-y += stats.o
 ovpn-y += udp.o
diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c
index 
f9e3feb297b19868b1084048933796fcc7a47d6e..072bb0881764752520e8e26e18337c1274ce1aa4
 100644
--- a/drivers/net/ovpn/crypto_aead.c
+++ b/drivers/net/ovpn/crypto_aead.c
@@ -48,6 +48,7 @@ int ovpn_aead_encrypt(struct ovpn_peer *peer, struct 
ovpn_crypto_key_slot *ks,
int nfrags, ret;
u32 pktid, op;
 
+   ovpn_skb_cb(skb)->orig_len = skb->len;
ovpn_skb_cb(skb)->peer = peer;
ovpn_skb_cb(skb)->ks = ks;
 
@@ -159,6 +160,7 @@ int ovpn_aead_decrypt(struct ovpn_peer *peer, struct 
ovpn_crypto_key_slot *ks,
payload_offset = OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE + tag_size;
payload_len = skb->len - payload_offset;
 
+   ovpn_skb_cb(skb)->orig_len = skb->len;
ovpn_skb_cb(skb)->payload_offset = payload_offset;
ovpn_skb_cb(skb)->peer = peer;
ovpn_skb_cb(skb)->ks = ks;
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
4c81c4547d35d2a73f680ef1f5d8853ffbd952e0..d56e74660c7be9020b5bdf7971322d41afd436d6
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ovpnstruct.h"
 #include "peer.h"
@@ -68,6 +69,7 @@ void ovpn_decrypt_post(void *data, int ret)
unsigned int payload_offset = 0;
struct sk_buff *skb = data;
struct ovpn_peer *peer;
+   unsigned int orig_len;
__be16 proto;
__be32 *pid;
 
@@ -80,6 +82,7 @@ void ovpn_decrypt_post(void *data, int ret)
payload_offset = ovpn_skb_cb(skb)->payload_offset;
ks = ovpn_skb_cb(skb)->ks;
peer = ovpn_skb_cb(skb)->peer;
+   orig_len = ovpn_skb_cb(skb)->orig_len;
 
/* crypto is done, cleanup skb CB and its members */
 
@@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret)
goto drop;
}
 
+   /* increment RX stats */
+   ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len);
+   ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len);
+
ovpn_netdev_write(peer, skb);
/* skb is passed to upper layer - don't free it */
skb = NULL;
@@ -175,6 +182,7 @@ void ovpn_encrypt_post(void *data, int ret)
struct ovpn_crypto_key_slot *ks;
struct sk_buff *skb = data;
struct ovpn_peer *peer;
+   unsigned int orig_len;
 
/* encryption is happening asynchronously. This function will be
 * called later by the crypto callback with a proper return value
@@ -184,6 +192,7 @@ void ovpn_encrypt_post(void *data, int ret)
 
ks = ovpn_skb_cb(skb)->ks;
peer = ovpn_skb_cb(skb)->peer;
+   orig_len = ovpn_skb_cb(skb)->orig_len;
 
/* crypto is done, cleanup skb CB and its members */
 
@@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret)
goto err;
 
skb_mark_not_on_list(skb);
+   ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len);
+   ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len);
 
switch (peer->sock->sock->sk->sk_protocol) {
case IPPROTO_UDP:
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
98ae7662f1e76811e625dc5f4b4c5c884856fbd6..5025bfb759d6a5f31e3f2ec094fe561fbdb9f451
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -48,6 +48,8 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 
id)
ovpn_crypto_state_init(&peer->crypto);
spin_lock_init(&peer->lock);
kref_init(&peer->refcount);
+   ovpn_peer_stats_init(&peer->vpn_stats);
+   ovpn_peer_stats_init(&peer->link_stats);
 
ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL);
if (ret < 0) {
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 
754fea470d1b4787f64a931d6c6adc24182fc16f..eb1e31e854fbfff25d07fba8026789e41a76c113
 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -13,6 +13,7 @@
 #include 
 
 #include "crypto.h"
+#include "stats.h"
 
 /**
  * s

[PATCH net-next v10 09/23] ovpn: implement basic RX path (UDP)

2024-10-25 Thread Antonio Quartulli
Packets received over the socket are forwarded to the user device.

Implementation is UDP only. TCP will be added by a later patch.

Note: no decryption/decapsulation exists yet, packets are forwarded as
they arrive without much processing.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/io.c |  66 ++-
 drivers/net/ovpn/io.h |   2 +
 drivers/net/ovpn/main.c   |  13 +-
 drivers/net/ovpn/ovpnstruct.h |   3 ++
 drivers/net/ovpn/proto.h  |  75 ++
 drivers/net/ovpn/socket.c |  24 ++
 drivers/net/ovpn/udp.c| 104 +-
 drivers/net/ovpn/udp.h|   3 +-
 8 files changed, 286 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
77ba4d33ae0bd2f52e8bd1c06a182d24285297b4..791a1b117125118b179cb13cdfd5fbab6523a360
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -9,15 +9,79 @@
 
 #include 
 #include 
+#include 
 #include 
 
-#include "io.h"
 #include "ovpnstruct.h"
 #include "peer.h"
+#include "io.h"
+#include "netlink.h"
+#include "proto.h"
 #include "udp.h"
 #include "skb.h"
 #include "socket.h"
 
+/* Called after decrypt to write the IP packet to the device.
+ * This method is expected to manage/free the skb.
+ */
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+   unsigned int pkt_len;
+
+   /* we can't guarantee the packet wasn't corrupted before entering the
+* VPN, therefore we give other layers a chance to check that
+*/
+   skb->ip_summed = CHECKSUM_NONE;
+
+   /* skb hash for transport packet no longer valid after decapsulation */
+   skb_clear_hash(skb);
+
+   /* post-decrypt scrub -- prepare to inject encapsulated packet onto the
+* interface, based on __skb_tunnel_rx() in dst.h
+*/
+   skb->dev = peer->ovpn->dev;
+   skb_set_queue_mapping(skb, 0);
+   skb_scrub_packet(skb, true);
+
+   skb_reset_network_header(skb);
+   skb_reset_transport_header(skb);
+   skb_probe_transport_header(skb);
+   skb_reset_inner_headers(skb);
+
+   memset(skb->cb, 0, sizeof(skb->cb));
+
+   /* cause packet to be "received" by the interface */
+   pkt_len = skb->len;
+   if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
+skb) == NET_RX_SUCCESS))
+   /* update RX stats with the size of decrypted packet */
+   dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
+}
+
+static void ovpn_decrypt_post(struct sk_buff *skb, int ret)
+{
+   struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
+
+   if (unlikely(ret < 0))
+   goto drop;
+
+   ovpn_netdev_write(peer, skb);
+   /* skb is passed to upper layer - don't free it */
+   skb = NULL;
+drop:
+   if (unlikely(skb))
+   dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
+   ovpn_peer_put(peer);
+   kfree_skb(skb);
+}
+
+/* pick next packet from RX queue, decrypt and forward it to the device */
+void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+   ovpn_skb_cb(skb)->peer = peer;
+   ovpn_decrypt_post(skb, 0);
+}
+
 static void ovpn_encrypt_post(struct sk_buff *skb, int ret)
 {
struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
index 
aa259be66441f7b0262f39da12d6c3dce0a9b24c..9667a0a470e0b4b427524fffb5b9b395007e5a2f
 100644
--- a/drivers/net/ovpn/io.h
+++ b/drivers/net/ovpn/io.h
@@ -12,4 +12,6 @@
 
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
 
+void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb);
+
 #endif /* _NET_OVPN_OVPN_H_ */
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
5492ce07751d135c1484fe1ed8227c646df94969..73348765a8cf24321aa6be78e75f607d6dbffb1d
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,7 +33,16 @@ static void ovpn_struct_free(struct net_device *net)
 
 static int ovpn_net_init(struct net_device *dev)
 {
-   return 0;
+   struct ovpn_struct *ovpn = netdev_priv(dev);
+
+   return gro_cells_init(&ovpn->gro_cells, dev);
+}
+
+static void ovpn_net_uninit(struct net_device *dev)
+{
+   struct ovpn_struct *ovpn = netdev_priv(dev);
+
+   gro_cells_destroy(&ovpn->gro_cells);
 }
 
 static int ovpn_net_open(struct net_device *dev)
@@ -56,6 +66,7 @@ static int ovpn_net_stop(struct net_device *dev)
 
 static const struct net_device_ops ovpn_netdev_ops = {
.ndo_init   = ovpn_net_init,
+   .ndo_uninit = ovpn_net_uninit,
.ndo_open   = ovpn_net_open,
.ndo_stop   = ovpn_net_stop,
.ndo_start_xmit = ovpn_net_xmit,
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/ne

[PATCH net-next v10 10/23] ovpn: implement packet processing

2024-10-25 Thread Antonio Quartulli
This change implements encryption/decryption and
encapsulation/decapsulation of OpenVPN packets.

Support for generic crypto state is added along with
a wrapper for the AEAD crypto kernel API.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/Makefile  |   3 +
 drivers/net/ovpn/crypto.c  | 153 +
 drivers/net/ovpn/crypto.h  | 139 
 drivers/net/ovpn/crypto_aead.c | 367 +
 drivers/net/ovpn/crypto_aead.h |  31 
 drivers/net/ovpn/io.c  | 146 ++--
 drivers/net/ovpn/io.h  |   3 +
 drivers/net/ovpn/packet.h  |   2 +-
 drivers/net/ovpn/peer.c|  29 
 drivers/net/ovpn/peer.h|   6 +
 drivers/net/ovpn/pktid.c   | 130 +++
 drivers/net/ovpn/pktid.h   |  87 ++
 drivers/net/ovpn/proto.h   |  31 
 drivers/net/ovpn/skb.h |   4 +
 14 files changed, 1120 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
56bddc9bef83e0befde6af3c3565bb91731d7b22..ccdaeced1982c851475657860a005ff2b9dfbd13
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -8,10 +8,13 @@
 
 obj-$(CONFIG_OVPN) := ovpn.o
 ovpn-y += bind.o
+ovpn-y += crypto.o
+ovpn-y += crypto_aead.o
 ovpn-y += main.o
 ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
 ovpn-y += peer.o
+ovpn-y += pktid.o
 ovpn-y += socket.o
 ovpn-y += udp.o
diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c
new file mode 100644
index 
..f1f7510e2f735e367f96eb4982ba82c9af3c8bfc
--- /dev/null
+++ b/drivers/net/ovpn/crypto.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:James Yonan 
+ * Antonio Quartulli 
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "packet.h"
+#include "pktid.h"
+#include "crypto_aead.h"
+#include "crypto.h"
+
+static void ovpn_ks_destroy_rcu(struct rcu_head *head)
+{
+   struct ovpn_crypto_key_slot *ks;
+
+   ks = container_of(head, struct ovpn_crypto_key_slot, rcu);
+   ovpn_aead_crypto_key_slot_destroy(ks);
+}
+
+void ovpn_crypto_key_slot_release(struct kref *kref)
+{
+   struct ovpn_crypto_key_slot *ks;
+
+   ks = container_of(kref, struct ovpn_crypto_key_slot, refcount);
+   call_rcu(&ks->rcu, ovpn_ks_destroy_rcu);
+}
+
+/* can only be invoked when all peer references have been dropped (i.e. RCU
+ * release routine)
+ */
+void ovpn_crypto_state_release(struct ovpn_crypto_state *cs)
+{
+   struct ovpn_crypto_key_slot *ks;
+
+   ks = rcu_access_pointer(cs->slots[0]);
+   if (ks) {
+   RCU_INIT_POINTER(cs->slots[0], NULL);
+   ovpn_crypto_key_slot_put(ks);
+   }
+
+   ks = rcu_access_pointer(cs->slots[1]);
+   if (ks) {
+   RCU_INIT_POINTER(cs->slots[1], NULL);
+   ovpn_crypto_key_slot_put(ks);
+   }
+}
+
+/* Reset the ovpn_crypto_state object in a way that is atomic
+ * to RCU readers.
+ */
+int ovpn_crypto_state_reset(struct ovpn_crypto_state *cs,
+   const struct ovpn_peer_key_reset *pkr)
+{
+   struct ovpn_crypto_key_slot *old = NULL, *new;
+   u8 idx;
+
+   if (pkr->slot != OVPN_KEY_SLOT_PRIMARY &&
+   pkr->slot != OVPN_KEY_SLOT_SECONDARY)
+   return -EINVAL;
+
+   new = ovpn_aead_crypto_key_slot_new(&pkr->key);
+   if (IS_ERR(new))
+   return PTR_ERR(new);
+
+   spin_lock_bh(&cs->lock);
+   idx = cs->primary_idx;
+   switch (pkr->slot) {
+   case OVPN_KEY_SLOT_PRIMARY:
+   old = rcu_replace_pointer(cs->slots[idx], new,
+ lockdep_is_held(&cs->lock));
+   break;
+   case OVPN_KEY_SLOT_SECONDARY:
+   old = rcu_replace_pointer(cs->slots[!idx], new,
+ lockdep_is_held(&cs->lock));
+   break;
+   }
+   spin_unlock_bh(&cs->lock);
+
+   if (old)
+   ovpn_crypto_key_slot_put(old);
+
+   return 0;
+}
+
+void ovpn_crypto_key_slot_delete(struct ovpn_crypto_state *cs,
+enum ovpn_key_slot slot)
+{
+   struct ovpn_crypto_key_slot *ks = NULL;
+   u8 idx;
+
+   if (slot != OVPN_KEY_SLOT_PRIMARY &&
+   slot != OVPN_KEY_SLOT_SECONDARY) {
+   pr_warn("Invalid slot to release: %u\n", slot);
+   return;
+   }
+
+   spin_lock_bh(&cs->lock);
+   idx = cs->primary_idx;
+   switch (slot) {
+   case OVPN_KEY_SLOT_PRIMARY:
+   ks = rcu_replace_pointer(cs->slots[idx], NULL,
+lockdep_is_held(&cs->lock));
+   break;
+   case OVPN_KEY_SLOT_SECONDARY:
+   ks = rcu_replace_pointer(cs

[PATCH net-next v10 12/23] ovpn: implement TCP transport

2024-10-25 Thread Antonio Quartulli
With this change ovpn is allowed to communicate to peers also via TCP.
Parsing of incoming messages is implemented through the strparser API.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/Kconfig   |   1 +
 drivers/net/ovpn/Makefile |   1 +
 drivers/net/ovpn/io.c |   4 +
 drivers/net/ovpn/main.c   |   3 +
 drivers/net/ovpn/peer.h   |  37 
 drivers/net/ovpn/socket.c |  44 +++-
 drivers/net/ovpn/socket.h |   9 +-
 drivers/net/ovpn/tcp.c| 506 ++
 drivers/net/ovpn/tcp.h|  44 
 9 files changed, 643 insertions(+), 6 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 
269b73fcfd348a48174fb96b8f8d4f8788636fa8..f37ce285e61fbee3201f4095ada3230305df511b
 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -118,6 +118,7 @@ config WIREGUARD_DEBUG
 config OVPN
tristate "OpenVPN data channel offload"
depends on NET && INET
+   select STREAM_PARSER
select NET_UDP_TUNNEL
select DST_CACHE
select CRYPTO
diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
d43fda72646bdc7644d9a878b56da0a0e5680c98..f4d4bd87c851c8dd5b81e357315c4b22de4bd092
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -18,4 +18,5 @@ ovpn-y += peer.o
 ovpn-y += pktid.o
 ovpn-y += socket.o
 ovpn-y += stats.o
+ovpn-y += tcp.o
 ovpn-y += udp.o
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
d56e74660c7be9020b5bdf7971322d41afd436d6..deda19ab87391f86964ba43088b7847d22420eee
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -22,6 +22,7 @@
 #include "crypto_aead.h"
 #include "netlink.h"
 #include "proto.h"
+#include "tcp.h"
 #include "udp.h"
 #include "skb.h"
 #include "socket.h"
@@ -213,6 +214,9 @@ void ovpn_encrypt_post(void *data, int ret)
case IPPROTO_UDP:
ovpn_udp_send_skb(peer->ovpn, peer, skb);
break;
+   case IPPROTO_TCP:
+   ovpn_tcp_send_skb(peer, skb);
+   break;
default:
/* no transport configured yet */
goto err;
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
73348765a8cf24321aa6be78e75f607d6dbffb1d..0488e395eb27d3dba1efc8ff39c023e0ac4a38dd
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -22,6 +22,7 @@
 #include "io.h"
 #include "packet.h"
 #include "peer.h"
+#include "tcp.h"
 
 /* Driver info */
 #define DRV_DESCRIPTION"OpenVPN data channel offload (ovpn)"
@@ -237,6 +238,8 @@ static int __init ovpn_init(void)
goto unreg_rtnl;
}
 
+   ovpn_tcp_init();
+
return 0;
 
 unreg_rtnl:
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 
eb1e31e854fbfff25d07fba8026789e41a76c113..2b7fa9510e362ef3646157bb0d361bab19ddaa99
 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -11,6 +11,7 @@
 #define _NET_OVPN_OVPNPEER_H_
 
 #include 
+#include 
 
 #include "crypto.h"
 #include "stats.h"
@@ -23,6 +24,18 @@
  * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
  * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
  * @sock: the socket being used to talk to this peer
+ * @tcp: keeps track of TCP specific state
+ * @tcp.strp: stream parser context (TCP only)
+ * @tcp.tx_work: work for deferring outgoing packet processing (TCP only)
+ * @tcp.user_queue: received packets that have to go to userspace (TCP only)
+ * @tcp.tx_in_progress: true if TX is already ongoing (TCP only)
+ * @tcp.out_msg.skb: packet scheduled for sending (TCP only)
+ * @tcp.out_msg.offset: offset where next send should start (TCP only)
+ * @tcp.out_msg.len: remaining data to send within packet (TCP only)
+ * @tcp.sk_cb.sk_data_ready: pointer to original cb (TCP only)
+ * @tcp.sk_cb.sk_write_space: pointer to original cb (TCP only)
+ * @tcp.sk_cb.prot: pointer to original prot object (TCP only)
+ * @tcp.sk_cb.ops: pointer to the original prot_ops object (TCP only)
  * @crypto: the crypto configuration (ciphers, keys, etc..)
  * @dst_cache: cache for dst_entry used to send to peer
  * @bind: remote peer binding
@@ -43,6 +56,30 @@ struct ovpn_peer {
struct in6_addr ipv6;
} vpn_addrs;
struct ovpn_socket *sock;
+
+   /* state of the TCP reading. Needed to keep track of how much of a
+* single packet has already been read from the stream and how much is
+* missing
+*/
+   struct {
+   struct strparser strp;
+   struct work_struct tx_work;
+   struct sk_buff_head user_queue;
+   bool tx_in_progress;
+
+   struct {
+   struct sk_buff *skb;
+   int offset;
+   int len;
+   } out_msg;
+
+   struct {
+   void (*sk_data_ready)(struct sock *sk);
+   void (*sk_write_space)(struct sock *sk);
+

[PATCH net-next v10 13/23] ovpn: implement multi-peer support

2024-10-25 Thread Antonio Quartulli
With this change an ovpn instance will be able to stay connected to
multiple remote endpoints.

This functionality is strictly required when running ovpn on an
OpenVPN server.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/main.c   |  55 +-
 drivers/net/ovpn/ovpnstruct.h |  19 +
 drivers/net/ovpn/peer.c   | 166 --
 drivers/net/ovpn/peer.h   |   9 +++
 4 files changed, 243 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
0488e395eb27d3dba1efc8ff39c023e0ac4a38dd..c7453127ab640d7268c1ce919a87cc5419fac9ee
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -30,6 +30,9 @@
 
 static void ovpn_struct_free(struct net_device *net)
 {
+   struct ovpn_struct *ovpn = netdev_priv(net);
+
+   kfree(ovpn->peers);
 }
 
 static int ovpn_net_init(struct net_device *dev)
@@ -133,12 +136,52 @@ static void ovpn_setup(struct net_device *dev)
SET_NETDEV_DEVTYPE(dev, &ovpn_type);
 }
 
+static int ovpn_mp_alloc(struct ovpn_struct *ovpn)
+{
+   struct in_device *dev_v4;
+   int i;
+
+   if (ovpn->mode != OVPN_MODE_MP)
+   return 0;
+
+   dev_v4 = __in_dev_get_rtnl(ovpn->dev);
+   if (dev_v4) {
+   /* disable redirects as Linux gets confused by ovpn
+* handling same-LAN routing.
+* This happens because a multipeer interface is used as
+* relay point between hosts in the same subnet, while
+* in a classic LAN this would not be needed because the
+* two hosts would be able to talk directly.
+*/
+   IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
+   IPV4_DEVCONF_ALL(dev_net(ovpn->dev), SEND_REDIRECTS) = false;
+   }
+
+   /* the peer container is fairly large, therefore we allocate it only in
+* MP mode
+*/
+   ovpn->peers = kzalloc(sizeof(*ovpn->peers), GFP_KERNEL);
+   if (!ovpn->peers)
+   return -ENOMEM;
+
+   spin_lock_init(&ovpn->peers->lock);
+
+   for (i = 0; i < ARRAY_SIZE(ovpn->peers->by_id); i++) {
+   INIT_HLIST_HEAD(&ovpn->peers->by_id[i]);
+   INIT_HLIST_NULLS_HEAD(&ovpn->peers->by_vpn_addr[i], i);
+   INIT_HLIST_NULLS_HEAD(&ovpn->peers->by_transp_addr[i], i);
+   }
+
+   return 0;
+}
+
 static int ovpn_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
 {
struct ovpn_struct *ovpn = netdev_priv(dev);
enum ovpn_mode mode = OVPN_MODE_P2P;
+   int err;
 
if (data && data[IFLA_OVPN_MODE]) {
mode = nla_get_u8(data[IFLA_OVPN_MODE]);
@@ -149,6 +192,10 @@ static int ovpn_newlink(struct net *src_net, struct 
net_device *dev,
ovpn->mode = mode;
spin_lock_init(&ovpn->lock);
 
+   err = ovpn_mp_alloc(ovpn);
+   if (err < 0)
+   return err;
+
/* turn carrier explicitly off after registration, this way state is
 * clearly defined
 */
@@ -197,8 +244,14 @@ static int ovpn_netdev_notifier_call(struct notifier_block 
*nb,
netif_carrier_off(dev);
ovpn->registered = false;
 
-   if (ovpn->mode == OVPN_MODE_P2P)
+   switch (ovpn->mode) {
+   case OVPN_MODE_P2P:
ovpn_peer_release_p2p(ovpn);
+   break;
+   case OVPN_MODE_MP:
+   ovpn_peers_free(ovpn);
+   break;
+   }
break;
case NETDEV_POST_INIT:
case NETDEV_GOING_DOWN:
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
index 
4a48fc048890ab1cda78bc104fe3034b4a49d226..12ed5e22c2108c9f143d1984048eb40c887cac63
 100644
--- a/drivers/net/ovpn/ovpnstruct.h
+++ b/drivers/net/ovpn/ovpnstruct.h
@@ -15,6 +15,23 @@
 #include 
 #include 
 
+/**
+ * struct ovpn_peer_collection - container of peers for MultiPeer mode
+ * @by_id: table of peers index by ID
+ * @by_vpn_addr: table of peers indexed by VPN IP address (items can be
+ *  rehashed on the fly due to peer IP change)
+ * @by_transp_addr: table of peers indexed by transport address (items can be
+ * rehashed on the fly due to peer IP change)
+ * @lock: protects writes to peer tables
+ */
+struct ovpn_peer_collection {
+   DECLARE_HASHTABLE(by_id, 12);
+   struct hlist_nulls_head by_vpn_addr[1 << 12];
+   struct hlist_nulls_head by_transp_addr[1 << 12];
+
+   spinlock_t lock; /* protects writes to peer tables */
+};
+
 /**
  * struct ovpn_struct - per ovpn interface state
  * @dev: the actual netdev representing the tunnel
@@ -22,6 +39,7 @@
  * @registered: whether dev is still registered with netdev or not
  * @mode: device operation mo

Re: [PATCH vhost 1/2] vdpa/mlx5: Fix PA offset with unaligned starting iotlb map

2024-10-25 Thread Jason Wang
On Mon, Oct 21, 2024 at 9:41 PM Dragos Tatulea  wrote:
>
> From: Si-Wei Liu 
>
> When calculating the physical address range based on the iotlb and mr
> [start,end) ranges, the offset of mr->start relative to map->start
> is not taken into account. This leads to some incorrect and duplicate
> mappings.
>
> For the case when mr->start < map->start the code is already correct:
> the range in [mr->start, map->start) was handled by a different
> iteration.
>
> Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Si-Wei Liu 
> Signed-off-by: Dragos Tatulea 
> ---

Acked-by: Jason Wang 

Thanks




[PATCH net-next v10 14/23] ovpn: implement peer lookup logic

2024-10-25 Thread Antonio Quartulli
In a multi-peer scenario there are a number of situations when a
specific peer needs to be looked up.

We may want to lookup a peer by:
1. its ID
2. its VPN destination IP
3. its transport IP/port couple

For each of the above, there is a specific routing table referencing all
peers for fast look up.

Case 2. is a bit special in the sense that an outgoing packet may not be
sent to the peer VPN IP directly, but rather to a network behind it. For
this reason we first perform a nexthop lookup in the system routing
table and then we use the retrieved nexthop as peer search key.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/peer.c | 272 ++--
 1 file changed, 264 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
73ef509faab9701192a45ffe78a46dbbbeab01c2..c7dc9032c2b55fd42befc1f3e7a0eca893a96576
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ovpnstruct.h"
 #include "bind.h"
@@ -125,6 +126,94 @@ static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb,
return true;
 }
 
+/**
+ * ovpn_nexthop_from_skb4 - retrieve IPv4 nexthop for outgoing skb
+ * @skb: the outgoing packet
+ *
+ * Return: the IPv4 of the nexthop
+ */
+static __be32 ovpn_nexthop_from_skb4(struct sk_buff *skb)
+{
+   const struct rtable *rt = skb_rtable(skb);
+
+   if (rt && rt->rt_uses_gateway)
+   return rt->rt_gw4;
+
+   return ip_hdr(skb)->daddr;
+}
+
+/**
+ * ovpn_nexthop_from_skb6 - retrieve IPv6 nexthop for outgoing skb
+ * @skb: the outgoing packet
+ *
+ * Return: the IPv6 of the nexthop
+ */
+static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb)
+{
+   const struct rt6_info *rt = skb_rt6_info(skb);
+
+   if (!rt || !(rt->rt6i_flags & RTF_GATEWAY))
+   return ipv6_hdr(skb)->daddr;
+
+   return rt->rt6i_gateway;
+}
+
+#define ovpn_get_hash_head(_tbl, _key, _key_len) ({\
+   typeof(_tbl) *__tbl = &(_tbl);  \
+   (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \
+
+/**
+ * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address
+ * @ovpn: the openvpn instance to search
+ * @addr: VPN IPv4 to use as search key
+ *
+ * Refcounter is not increased for the returned peer.
+ *
+ * Return: the peer if found or NULL otherwise
+ */
+static struct ovpn_peer *ovpn_peer_get_by_vpn_addr4(struct ovpn_struct *ovpn,
+   __be32 addr)
+{
+   struct hlist_nulls_head *nhead;
+   struct hlist_nulls_node *ntmp;
+   struct ovpn_peer *tmp;
+
+   nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, &addr,
+  sizeof(addr));
+
+   hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, hash_entry_addr4)
+   if (addr == tmp->vpn_addrs.ipv4.s_addr)
+   return tmp;
+
+   return NULL;
+}
+
+/**
+ * ovpn_peer_get_by_vpn_addr6 - retrieve peer by its VPN IPv6 address
+ * @ovpn: the openvpn instance to search
+ * @addr: VPN IPv6 to use as search key
+ *
+ * Refcounter is not increased for the returned peer.
+ *
+ * Return: the peer if found or NULL otherwise
+ */
+static struct ovpn_peer *ovpn_peer_get_by_vpn_addr6(struct ovpn_struct *ovpn,
+   struct in6_addr *addr)
+{
+   struct hlist_nulls_head *nhead;
+   struct hlist_nulls_node *ntmp;
+   struct ovpn_peer *tmp;
+
+   nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, addr,
+  sizeof(*addr));
+
+   hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, hash_entry_addr6)
+   if (ipv6_addr_equal(addr, &tmp->vpn_addrs.ipv6))
+   return tmp;
+
+   return NULL;
+}
+
 /**
  * ovpn_peer_transp_match - check if sockaddr and peer binding match
  * @peer: the peer to get the binding from
@@ -202,14 +291,44 @@ ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn,
 struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
   struct sk_buff *skb)
 {
-   struct ovpn_peer *peer = NULL;
+   struct ovpn_peer *tmp, *peer = NULL;
struct sockaddr_storage ss = { 0 };
+   struct hlist_nulls_head *nhead;
+   struct hlist_nulls_node *ntmp;
+   size_t sa_len;
 
if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss)))
return NULL;
 
if (ovpn->mode == OVPN_MODE_P2P)
-   peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
+   return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
+
+   switch (ss.ss_family) {
+   case AF_INET:
+   sa_len = sizeof(struct sockaddr_in);
+   break;
+   case AF_INET6:
+   sa_len = sizeof(struct sockaddr_in6);
+   break;
+   default:
+  

[PATCH net-next v10 15/23] ovpn: implement keepalive mechanism

2024-10-25 Thread Antonio Quartulli
OpenVPN supports configuring a periodic keepalive packet.
message to allow the remote endpoint detect link failures.

This change implements the keepalive sending and timer expiring logic.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/io.c |  77 +
 drivers/net/ovpn/io.h |   5 ++
 drivers/net/ovpn/main.c   |   3 +
 drivers/net/ovpn/ovpnstruct.h |   2 +
 drivers/net/ovpn/peer.c   | 188 ++
 drivers/net/ovpn/peer.h   |  15 
 drivers/net/ovpn/proto.h  |   2 -
 7 files changed, 290 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
deda19ab87391f86964ba43088b7847d22420eee..63c140138bf98e5d1df79a2565b666d86513323d
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -27,6 +27,33 @@
 #include "skb.h"
 #include "socket.h"
 
+const unsigned char ovpn_keepalive_message[OVPN_KEEPALIVE_SIZE] = {
+   0x2a, 0x18, 0x7b, 0xf3, 0x64, 0x1e, 0xb4, 0xcb,
+   0x07, 0xed, 0x2d, 0x0a, 0x98, 0x1f, 0xc7, 0x48
+};
+
+/**
+ * ovpn_is_keepalive - check if skb contains a keepalive message
+ * @skb: packet to check
+ *
+ * Assumes that the first byte of skb->data is defined.
+ *
+ * Return: true if skb contains a keepalive or false otherwise
+ */
+static bool ovpn_is_keepalive(struct sk_buff *skb)
+{
+   if (*skb->data != ovpn_keepalive_message[0])
+   return false;
+
+   if (skb->len != OVPN_KEEPALIVE_SIZE)
+   return false;
+
+   if (!pskb_may_pull(skb, OVPN_KEEPALIVE_SIZE))
+   return false;
+
+   return !memcmp(skb->data, ovpn_keepalive_message, OVPN_KEEPALIVE_SIZE);
+}
+
 /* Called after decrypt to write the IP packet to the device.
  * This method is expected to manage/free the skb.
  */
@@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret)
goto drop;
}
 
+   /* keep track of last received authenticated packet for keepalive */
+   peer->last_recv = ktime_get_real_seconds();
+
/* point to encapsulated IP packet */
__skb_pull(skb, payload_offset);
 
@@ -121,6 +151,12 @@ void ovpn_decrypt_post(void *data, int ret)
goto drop;
}
 
+   if (ovpn_is_keepalive(skb)) {
+   net_dbg_ratelimited("%s: ping received from peer %u\n",
+   peer->ovpn->dev->name, peer->id);
+   goto drop;
+   }
+
net_info_ratelimited("%s: unsupported protocol received from 
peer %u\n",
 peer->ovpn->dev->name, peer->id);
goto drop;
@@ -221,6 +257,10 @@ void ovpn_encrypt_post(void *data, int ret)
/* no transport configured yet */
goto err;
}
+
+   /* keep track of last sent packet for keepalive */
+   peer->last_sent = ktime_get_real_seconds();
+
/* skb passed down the stack - don't free it */
skb = NULL;
 err:
@@ -361,3 +401,40 @@ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct 
net_device *dev)
kfree_skb_list(skb);
return NET_XMIT_DROP;
 }
+
+/**
+ * ovpn_xmit_special - encrypt and transmit an out-of-band message to peer
+ * @peer: peer to send the message to
+ * @data: message content
+ * @len: message length
+ *
+ * Assumes that caller holds a reference to peer
+ */
+void ovpn_xmit_special(struct ovpn_peer *peer, const void *data,
+  const unsigned int len)
+{
+   struct ovpn_struct *ovpn;
+   struct sk_buff *skb;
+
+   ovpn = peer->ovpn;
+   if (unlikely(!ovpn))
+   return;
+
+   skb = alloc_skb(256 + len, GFP_ATOMIC);
+   if (unlikely(!skb))
+   return;
+
+   skb_reserve(skb, 128);
+   skb->priority = TC_PRIO_BESTEFFORT;
+   __skb_put_data(skb, data, len);
+
+   /* increase reference counter when passing peer to sending queue */
+   if (!ovpn_peer_hold(peer)) {
+   netdev_dbg(ovpn->dev, "%s: cannot hold peer reference for 
sending special packet\n",
+  __func__);
+   kfree_skb(skb);
+   return;
+   }
+
+   ovpn_send(ovpn, skb, peer);
+}
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
index 
ad81dd86924689309b3299573575a1705eddaf99..eb224114152c29f42aadf026212e8d278006b490
 100644
--- a/drivers/net/ovpn/io.h
+++ b/drivers/net/ovpn/io.h
@@ -10,9 +10,14 @@
 #ifndef _NET_OVPN_OVPN_H_
 #define _NET_OVPN_OVPN_H_
 
+#define OVPN_KEEPALIVE_SIZE 16
+extern const unsigned char ovpn_keepalive_message[OVPN_KEEPALIVE_SIZE];
+
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
 
 void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb);
+void ovpn_xmit_special(struct ovpn_peer *peer, const void *data,
+  const unsigned int len);
 
 void ovpn_encrypt_post(void *data, int ret);
 void ovpn_decrypt_post(

[PATCH net-next v10 17/23] ovpn: add support for peer floating

2024-10-25 Thread Antonio Quartulli
A peer connected via UDP may change its IP address without reconnecting
(float).

Add support for detecting and updating the new peer IP/port in case of
floating.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/bind.c |  10 ++--
 drivers/net/ovpn/io.c   |   9 
 drivers/net/ovpn/peer.c | 129 ++--
 drivers/net/ovpn/peer.h |   2 +
 4 files changed, 139 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
index 
b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a..d17d078c5730bf4336dc87f45cdba3f6b8cad770
 100644
--- a/drivers/net/ovpn/bind.c
+++ b/drivers/net/ovpn/bind.c
@@ -47,12 +47,8 @@ struct ovpn_bind *ovpn_bind_from_sockaddr(const struct 
sockaddr_storage *ss)
  * @new: the new bind to assign
  */
 void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new)
+   __must_hold(&peer->lock)
 {
-   struct ovpn_bind *old;
-
-   spin_lock_bh(&peer->lock);
-   old = rcu_replace_pointer(peer->bind, new, true);
-   spin_unlock_bh(&peer->lock);
-
-   kfree_rcu(old, rcu);
+   kfree_rcu(rcu_replace_pointer(peer->bind, new,
+ lockdep_is_held(&peer->lock)), rcu);
 }
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret)
/* keep track of last received authenticated packet for keepalive */
peer->last_recv = ktime_get_real_seconds();
 
+   if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
+   /* check if this peer changed it's IP address and update
+* state
+*/
+   ovpn_peer_float(peer, skb);
+   /* update source endpoint for this peer */
+   ovpn_peer_update_local_endpoint(peer, skb);
+   }
+
/* point to encapsulated IP packet */
__skb_pull(skb, payload_offset);
 
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
3f67d200e283213fcb732d10f9edeb53e0a0e9ee..da6215bbb643592e4567e61e4b4976d367ed109c
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -94,6 +94,131 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, 
u32 id)
return peer;
 }
 
+/**
+ * ovpn_peer_reset_sockaddr - recreate binding for peer
+ * @peer: peer to recreate the binding for
+ * @ss: sockaddr to use as remote endpoint for the binding
+ * @local_ip: local IP for the binding
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
+   const struct sockaddr_storage *ss,
+   const u8 *local_ip)
+   __must_hold(&peer->lock)
+{
+   struct ovpn_bind *bind;
+   size_t ip_len;
+
+   /* create new ovpn_bind object */
+   bind = ovpn_bind_from_sockaddr(ss);
+   if (IS_ERR(bind))
+   return PTR_ERR(bind);
+
+   if (local_ip) {
+   if (ss->ss_family == AF_INET) {
+   ip_len = sizeof(struct in_addr);
+   } else if (ss->ss_family == AF_INET6) {
+   ip_len = sizeof(struct in6_addr);
+   } else {
+   netdev_dbg(peer->ovpn->dev, "%s: invalid family for 
remote endpoint\n",
+  __func__);
+   kfree(bind);
+   return -EINVAL;
+   }
+
+   memcpy(&bind->local, local_ip, ip_len);
+   }
+
+   /* set binding */
+   ovpn_bind_reset(peer, bind);
+
+   return 0;
+}
+
+#define ovpn_get_hash_head(_tbl, _key, _key_len) ({\
+   typeof(_tbl) *__tbl = &(_tbl);  \
+   (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \
+
+/**
+ * ovpn_peer_float - update remote endpoint for peer
+ * @peer: peer to update the remote endpoint for
+ * @skb: incoming packet to retrieve the source address (remote) from
+ */
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+   struct hlist_nulls_head *nhead;
+   struct sockaddr_storage ss;
+   const u8 *local_ip = NULL;
+   struct sockaddr_in6 *sa6;
+   struct sockaddr_in *sa;
+   struct ovpn_bind *bind;
+   sa_family_t family;
+   size_t salen;
+
+   rcu_read_lock();
+   bind = rcu_dereference(peer->bind);
+   if (unlikely(!bind)) {
+   rcu_read_unlock();
+   return;
+   }
+
+   spin_lock_bh(&peer->lock);
+   if (likely(ovpn_bind_skb_src_match(bind, skb)))
+   goto unlock;
+
+   family = skb_protocol_to_family(skb);
+
+   if (bind->remote.in4.sin_family == family)
+   local_ip = (u8 *)&bind->local;
+
+   switch (family) {
+   case AF_INET

[PATCH net-next v10 16/23] ovpn: add support for updating local UDP endpoint

2024-10-25 Thread Antonio Quartulli
In case of UDP links, the local endpoint used to communicate with a
given peer may change without a connection restart.

Add support for learning the new address in case of change.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/peer.c | 45 +
 drivers/net/ovpn/peer.h |  3 +++
 2 files changed, 48 insertions(+)

diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
e8a42212af391916b5321e729f7e8a864d0a541f..3f67d200e283213fcb732d10f9edeb53e0a0e9ee
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -416,6 +416,51 @@ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct 
*ovpn, u32 peer_id)
return peer;
 }
 
+/**
+ * ovpn_peer_update_local_endpoint - update local endpoint for peer
+ * @peer: peer to update the endpoint for
+ * @skb: incoming packet to retrieve the destination address (local) from
+ */
+void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer,
+struct sk_buff *skb)
+{
+   struct ovpn_bind *bind;
+
+   rcu_read_lock();
+   bind = rcu_dereference(peer->bind);
+   if (unlikely(!bind))
+   goto unlock;
+
+   spin_lock_bh(&peer->lock);
+   switch (skb_protocol_to_family(skb)) {
+   case AF_INET:
+   if (unlikely(bind->local.ipv4.s_addr != ip_hdr(skb)->daddr)) {
+   netdev_dbg(peer->ovpn->dev,
+  "%s: learning local IPv4 for peer %d (%pI4 
-> %pI4)\n",
+  __func__, peer->id, &bind->local.ipv4.s_addr,
+  &ip_hdr(skb)->daddr);
+   bind->local.ipv4.s_addr = ip_hdr(skb)->daddr;
+   }
+   break;
+   case AF_INET6:
+   if (unlikely(!ipv6_addr_equal(&bind->local.ipv6,
+ &ipv6_hdr(skb)->daddr))) {
+   netdev_dbg(peer->ovpn->dev,
+  "%s: learning local IPv6 for peer %d (%pI6c 
-> %pI6c\n",
+  __func__, peer->id, &bind->local.ipv6,
+  &ipv6_hdr(skb)->daddr);
+   bind->local.ipv6 = ipv6_hdr(skb)->daddr;
+   }
+   break;
+   default:
+   break;
+   }
+   spin_unlock_bh(&peer->lock);
+
+unlock:
+   rcu_read_unlock();
+}
+
 /**
  * ovpn_peer_get_by_dst - Lookup peer to send skb to
  * @ovpn: the private data representing the current VPN session
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 
952927ae78a3ab753aaf2c6cc6f77121bdac34be..1a8638d266b11a4a80ee2f088394d47a7798c3af
 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -152,4 +152,7 @@ bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, 
struct sk_buff *skb,
 void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 
timeout);
 void ovpn_peer_keepalive_work(struct work_struct *work);
 
+void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer,
+struct sk_buff *skb);
+
 #endif /* _NET_OVPN_OVPNPEER_H_ */

-- 
2.45.2




Re: [PATCH v7 1/3] modules: Support extended MODVERSIONS info

2024-10-25 Thread Sami Tolvanen
On Wed, Oct 23, 2024 at 02:31:28AM +, Matthew Maurer wrote:
> Adds a new format for MODVERSIONS which stores each field in a separate
> ELF section. This initially adds support for variable length names, but
> could later be used to add additional fields to MODVERSIONS in a
> backwards compatible way if needed. Any new fields will be ignored by
> old user tooling, unlike the current format where user tooling cannot
> tolerate adjustments to the format (for example making the name field
> longer).
> 
> Since PPC munges its version records to strip leading dots, we reproduce
> the munging for the new format. Other architectures do not appear to
> have architecture-specific usage of this information.
> 
> Signed-off-by: Matthew Maurer 

Reviewed-by: Sami Tolvanen 

Sami



RE: [PATCH v4 00/11] iommufd: Add vIOMMU infrastructure (Part-1)

2024-10-25 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Tuesday, October 22, 2024 8:19 AM
> 
> This series introduces a new vIOMMU infrastructure and related ioctls.
> 
> IOMMUFD has been using the HWPT infrastructure for all cases, including a
> nested IO page table support. Yet, there're limitations for an HWPT-based
> structure to support some advanced HW-accelerated features, such as
> CMDQV
> on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-
> IOMMU
> environment, it is not straightforward for nested HWPTs to share the same
> parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a
> parent HWPT typically hold one stage-2 IO pagetable and tag it with only
> one ID in the cache entries. When sharing one large stage-2 IO pagetable
> across physical IOMMU instances, that one ID may not always be available
> across all the IOMMU instances. In other word, it's ideal for SW to have
> a different container for the stage-2 IO pagetable so it can hold another
> ID that's available.

Just holding multiple IDs doesn't require a different container. This is
just a side effect when vIOMMU will be required for other said reasons.

If we have to put more words here I'd prefer to adding a bit more for
CMDQV which is more compelling. not a big deal though. 😊

> 
> For this "different container", add vIOMMU, an additional layer to hold
> extra virtualization information:
> 
> 
> ___
>  |  iommufd (with vIOMMU)|
>  |   |
>  | [5]   |
>  |_  |
>  |   | | |
>  |  ||vIOMMU   | |
>  |  || | |
>  |  || | |
>  |  |  [1]   | |  [4] [2]|
>  |  | __ | | _   |
>  |  ||  || [3] || |   || |
>  |  || IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
>  |  ||__||_||_|   || |
>  |  ||  |  |   | |
> 
> |__||__|__|_
> __|_|
> ||  |  |   |
>   __v_   |__v_   __v_   ___v__
>  |   struct   |  |  PFN  |  (paging)  | |  (nested)  | |struct|
>  |iommu_device|  |-->|iommu_domain|<|iommu_domain|<
> |device|
>  ||   storage|| || |__|
> 

nit - [1] ... [5] can be removed.

> The vIOMMU object should be seen as a slice of a physical IOMMU instance
> that is passed to or shared with a VM. That can be some HW/SW resources:
>  - Security namespace for guest owned ID, e.g. guest-controlled cache tags
>  - Access to a sharable nesting parent pagetable across physical IOMMUs
>  - Virtualization of various platforms IDs, e.g. RIDs and others
>  - Delivery of paravirtualized invalidation
>  - Direct assigned invalidation queues
>  - Direct assigned interrupts
>  - Non-affiliated event reporting

sorry no idea about 'non-affiliated event'. Can you elaborate?

> 
> On a multi-IOMMU system, the vIOMMU object must be instanced to the
> number
> of the physical IOMMUs that are passed to (via devices) a guest VM, while

'to the number of the physical IOMMUs that have a slice passed to ..."

> being able to hold the shareable parent HWPT. Each vIOMMU then just
> needs
> to allocate its own individual ID to tag its own cache:
>  
>  | |  paging_hwpt0  |
>  | hwpt_nested0 |--->| viommu0 --
>  | |  IDx   |
>  
>  
>  | |  paging_hwpt0  |
>  | hwpt_nested1 |--->| viommu1 --
>  | |  IDy   |
>  
> 
> As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an
> allocation
> only. And implement it in arm-smmu-v3 driver as a real world use case.
> 
> More vIOMMU-based structs and ioctls will be introduced in the follow-up
> series to support vDEVICE, vIRQ (vEVENT) and vQUEUE objects. Although we
> repurposed the vIOMMU object from an earlier RFC, just for a referece:
> https://lore.kernel.org/all/cover.1712978212.git.nicol...@nvidia.com/
> 
> This series is on Github:
>

RE: [PATCH v4 01/11] iommufd: Move struct iommufd_object to public iommufd header

2024-10-25 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Tuesday, October 22, 2024 8:19 AM
> 
> Prepare for an embedded structure design for driver-level iommufd_viommu
> objects:
> // include/linux/iommufd.h
> struct iommufd_viommu {
> struct iommufd_object obj;
> 
> };
> 
> // Some IOMMU driver
> struct iommu_driver_viommu {
> struct iommufd_viommu core;
> 
> };
> 
> It has to expose struct iommufd_object and enum iommufd_object_type
> from
> the core-level private header to the public iommufd header.
> 
> Reviewed-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian 



[PATCH v5 1/5] pidfd: extend pidfd_get_pid() and de-duplicate pid lookup

2024-10-25 Thread Lorenzo Stoakes
The means by which a pid is determined from a pidfd is duplicated, with
some callers holding a reference to the (pid)fd, and others explicitly
pinning the pid.

Introduce __pidfd_get_pid() which narrows this to one approach of pinning
the pid, with an optional output parameters for file->f_flags to avoid the
need to hold onto a file to retrieve this.

Additionally, allow the ability to open a pidfd by opening a /proc/
directory, utilised by the pidfd_send_signal() system call, providing a
pidfd_get_pid_proc() helper function to do so.

Doing this allows us to eliminate open-coded pidfd pid lookup and to
consistently handle this in one place.

This lays the groundwork for a subsequent patch which adds a new sentinel
pidfd to explicitly reference the current process (i.e. thread group
leader) without the need for a pidfd.

Reviewed-by: Shakeel Butt 
Signed-off-by: Lorenzo Stoakes 
---
 include/linux/pid.h | 30 +-
 kernel/pid.c| 42 --
 kernel/signal.c | 29 ++---
 3 files changed, 59 insertions(+), 42 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index a3aad9b4074c..d466890e1b35 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_PID_H
 #define _LINUX_PID_H
 
+#include 
 #include 
 #include 
 #include 
@@ -72,8 +73,35 @@ extern struct pid init_struct_pid;
 
 struct file;
 
+
+/**
+ * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
+ *
+ * @pidfd:  The pidfd whose pid we want, or the fd of a /proc/ file if
+ *  @alloc_proc is also set.
+ * @allow_proc: If set, then an fd of a /proc/ file can be passed instead
+ *  of a pidfd, and this will be used to determine the pid.
+ * @flags:  Output variable, if non-NULL, then the file->f_flags of the
+ *  pidfd will be set here.
+ *
+ * Returns: If successful, the pid associated with the pidfd, otherwise an
+ *  error.
+ */
+struct pid *__pidfd_get_pid(unsigned int pidfd, bool allow_proc,
+   unsigned int *flags);
+
+static inline struct pid *pidfd_get_pid(unsigned int pidfd, unsigned int 
*flags)
+{
+   return __pidfd_get_pid(pidfd, /* allow_proc = */ false, flags);
+}
+
+static inline struct pid *pidfd_get_pid_proc(unsigned int pidfd,
+unsigned int *flags)
+{
+   return __pidfd_get_pid(pidfd, /* allow_proc = */ true, flags);
+}
+
 struct pid *pidfd_pid(const struct file *file);
-struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
 struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
 int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
 void do_notify_pidfd(struct task_struct *task);
diff --git a/kernel/pid.c b/kernel/pid.c
index 2715afb77eab..94c97559e5c5 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -534,22 +535,32 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
 }
 EXPORT_SYMBOL_GPL(find_ge_pid);
 
-struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
+struct pid *__pidfd_get_pid(unsigned int pidfd, bool allow_proc,
+   unsigned int *flags)
 {
-   struct fd f;
struct pid *pid;
+   struct fd f = fdget(pidfd);
+   struct file *file = fd_file(f);
 
-   f = fdget(fd);
-   if (!fd_file(f))
+   if (!file)
return ERR_PTR(-EBADF);
 
-   pid = pidfd_pid(fd_file(f));
-   if (!IS_ERR(pid)) {
-   get_pid(pid);
-   *flags = fd_file(f)->f_flags;
+   pid = pidfd_pid(file);
+   /* If we allow opening a pidfd via /proc/, do so. */
+   if (IS_ERR(pid) && allow_proc)
+   pid = tgid_pidfd_to_pid(file);
+
+   if (IS_ERR(pid)) {
+   fdput(f);
+   return pid;
}
 
+   /* Pin pid before we release fd. */
+   get_pid(pid);
+   if (flags)
+   *flags = file->f_flags;
fdput(f);
+
return pid;
 }
 
@@ -747,23 +758,18 @@ SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd,
unsigned int, flags)
 {
struct pid *pid;
-   struct fd f;
int ret;
 
/* flags is currently unused - make sure it's unset */
if (flags)
return -EINVAL;
 
-   f = fdget(pidfd);
-   if (!fd_file(f))
-   return -EBADF;
-
-   pid = pidfd_pid(fd_file(f));
+   pid = pidfd_get_pid(pidfd, NULL);
if (IS_ERR(pid))
-   ret = PTR_ERR(pid);
-   else
-   ret = pidfd_getfd(pid, fd);
+   return PTR_ERR(pid);
 
-   fdput(f);
+   ret = pidfd_getfd(pid, fd);
+
+   put_pid(pid);
return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 4344860ffcac..9a35b1cf40ad 100644
--- a/kernel/signal.c
+++ b/kerne

[PATCH v5 4/5] selftests: pidfd: add pidfd.h UAPI wrapper

2024-10-25 Thread Lorenzo Stoakes
Conflicts can arise between system fcntl.h and linux/fcntl.h, imported by
the linux/pidfd.h UAPI header.

Work around this by adding a wrapper for linux/pidfd.h to
tools/include/ which sets the linux/fcntl.h header guard ahead of
importing the pidfd.h header file.

Adjust the pidfd selftests Makefile to reference this include directory and
put it at a higher precidence than any make header installed headers to
ensure the wrapper is preferred.

This way we can directly import the UAPI header file without issue, use the
latest system header file without having to duplicate anything.

Reviewed-by: Shuah Khan 
Signed-off-by: Lorenzo Stoakes 
---
 tools/include/linux/pidfd.h| 14 ++
 tools/testing/selftests/pidfd/Makefile |  3 +--
 2 files changed, 15 insertions(+), 2 deletions(-)
 create mode 100644 tools/include/linux/pidfd.h

diff --git a/tools/include/linux/pidfd.h b/tools/include/linux/pidfd.h
new file mode 100644
index ..113c8023072d
--- /dev/null
+++ b/tools/include/linux/pidfd.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _TOOLS_LINUX_PIDFD_H
+#define _TOOLS_LINUX_PIDFD_H
+
+/*
+ * Some systems have issues with the linux/fcntl.h import in linux/pidfd.h, so
+ * work around this by setting the header guard.
+ */
+#define _LINUX_FCNTL_H
+#include "../../../include/uapi/linux/pidfd.h"
+#undef _LINUX_FCNTL_H
+
+#endif /* _TOOLS_LINUX_PIDFD_H */
diff --git a/tools/testing/selftests/pidfd/Makefile 
b/tools/testing/selftests/pidfd/Makefile
index d731e3e76d5b..f5038c9dae14 100644
--- a/tools/testing/selftests/pidfd/Makefile
+++ b/tools/testing/selftests/pidfd/Makefile
@@ -1,8 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
-CFLAGS += -g $(KHDR_INCLUDES) -pthread -Wall
+CFLAGS += -g -isystem $(top_srcdir)/tools/include $(KHDR_INCLUDES) -pthread 
-Wall
 
 TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test \
pidfd_poll_test pidfd_wait pidfd_getfd_test pidfd_setns_test
 
 include ../lib.mk
-
-- 
2.47.0




Re: [PATCH V4 00/15] selftests/resctrl: Support diverse platforms with MBM and MBA tests

2024-10-25 Thread Reinette Chatre



On 10/25/24 6:54 AM, Ilpo Järvinen wrote:
> On Thu, 24 Oct 2024, Reinette Chatre wrote:
> 
>> Hi Shuah,
>>
>> On 10/24/24 3:36 PM, Shuah Khan wrote:
>>>
>>> Is this patch series ready to be applied?
>>>
>>
>> I believe it is close ... I would like to give Ilpo some time to peek
>> at patches 2 and 10 to confirm if I got their fixes right this time. The
>> rest of the series is ready.
> 
> Hi,
> 
> I took a look at those two patches now and they seemed fine to me so this 
> series should be ready to go now.
> 

Thank you very much Ilpo.

Reinette



Re: [PATCH v4 02/11] iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct

2024-10-25 Thread Jason Gunthorpe
On Fri, Oct 25, 2024 at 08:47:40AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, October 22, 2024 9:16 PM
> > 
> > On Tue, Oct 22, 2024 at 04:59:07PM +0800, Baolu Lu wrote:
> > 
> > > Is it feasible to make vIOMMU object more generic, rather than strictly
> > > tying it to nested translation? For example, a normal paging domain that
> > > translates gPAs to hPAs could also have a vIOMMU object associated with
> > > it.
> > >
> > > While we can only support vIOMMU object allocation uAPI for S2 paging
> > > domains in the context of this series, we could consider leaving the
> > > option open to associate a vIOMMU object with other normal paging
> > > domains that are not a nested parent?
> > 
> > Why? The nested parent flavour of the domain is basically free to
> > create, what reason would be to not do that?
> > 
> > If the HW doesn't support it, then does the HW really need/support a
> > VIOMMU?
> 
> Now it's agreed to build trusted I/O on top of this new vIOMMU object.
> format-wise probably it's free to assume that nested parent is supported
> on any new platform which will support trusted I/O. But I'm not sure
> all the conditions around allowing nested are same as for trusted I/O,
> e.g. for ARM nesting is allowed only for CANWBS/S2FWB. Are they
> always guaranteed in trusted I/O configuration?

ARM is a big ? what exactly will come, but I'm expecting that to be
resolved either with continued HW support or Linux will add the cache
flushing and relax the test.

> Baolu did raise a good open to confirm given it will be used beyond
> nesting. 😊

Even CC is "nesting", it is just nested with a fixed Identity S1 in
the baseline case. The S2 translation still exists and still has to be
consistent with whatever the secure world is doing.

So, my feeling is that the S2 nested domain is mandatory for the
viommu, especially for CC, it must exists. In the end there may be
more options than just a nested parent.

For instance if the CC design relies on the secure world sharing the
CPU and IOMMU page table we might need a new HWPT type to represent
that configuration.

>From a uapi perspective we seem OK here as the hwpt input could be
anything. We might have to adjust some checks in the kernel someday.

Jason



[PATCH net-next 2/2] net: netconsole: selftests: Add userdata validation

2024-10-25 Thread Breno Leitao
Extend netcons_basic selftest to verify the userdata functionality by:
 1. Creating a test key in the userdata configfs directory
 2. Writing a known value to the key
 3. Validating the key-value pair appears in the captured network output

This ensures the userdata feature is properly tested during selftests.

Signed-off-by: Breno Leitao 
---
 .../selftests/drivers/net/netcons_basic.sh| 29 +++
 1 file changed, 29 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh 
b/tools/testing/selftests/drivers/net/netcons_basic.sh
index 4ad1e216c6b0..d182dcc2a10b 100755
--- a/tools/testing/selftests/drivers/net/netcons_basic.sh
+++ b/tools/testing/selftests/drivers/net/netcons_basic.sh
@@ -26,10 +26,13 @@ DSTIP=192.168.2.2
 
 PORT=""
 MSG="netconsole selftest"
+USERDATA_KEY="key"
+USERDATA_VALUE="value"
 TARGET=$(mktemp -u netcons_X)
 DEFAULT_PRINTK_VALUES=$(cat /proc/sys/kernel/printk)
 NETCONS_CONFIGFS="/sys/kernel/config/netconsole"
 NETCONS_PATH="${NETCONS_CONFIGFS}"/"${TARGET}"
+KEY_PATH="${NETCONS_PATH}/userdata/${USERDATA_KEY}"
 # NAMESPACE will be populated by setup_ns with a random value
 NAMESPACE=""
 
@@ -122,6 +125,8 @@ function cleanup() {
 
# delete netconsole dynamic reconfiguration
echo 0 > "${NETCONS_PATH}"/enabled
+   # Remove key
+   rmdir "${KEY_PATH}"
# Remove the configfs entry
rmdir "${NETCONS_PATH}"
 
@@ -136,6 +141,18 @@ function cleanup() {
echo "${DEFAULT_PRINTK_VALUES}" > /proc/sys/kernel/printk
 }
 
+function set_user_data() {
+   if [[ ! -d "${NETCONS_PATH}""/userdata" ]]
+   then
+   echo "Userdata path not available in ${NETCONS_PATH}/userdata"
+   exit "${ksft_skip}"
+   fi
+
+   mkdir -p "${KEY_PATH}"
+   VALUE_PATH="${KEY_PATH}""/value"
+   echo "${USERDATA_VALUE}" > "${VALUE_PATH}"
+}
+
 function listen_port_and_save_to() {
local OUTPUT=${1}
# Just wait for 2 seconds
@@ -146,6 +163,10 @@ function listen_port_and_save_to() {
 function validate_result() {
local TMPFILENAME="$1"
 
+   # TMPFILENAME will contain something like:
+   # 
6.11.1-0_fbk0_rc13_509_g30d75cea12f7,13,1822,115075213798,-;netconsole 
selftest: netcons_gtJHM
+   #  key=value
+
# Check if the file exists
if [ ! -f "$TMPFILENAME" ]; then
echo "FAIL: File was not generated." >&2
@@ -158,6 +179,12 @@ function validate_result() {
exit "${ksft_fail}"
fi
 
+   if ! grep -q "${USERDATA_KEY}=${USERDATA_VALUE}" "${TMPFILENAME}"; then
+   echo "FAIL: ${USERDATA_KEY}=${USERDATA_VALUE} not found in 
${TMPFILENAME}" >&2
+   cat "${TMPFILENAME}" >&2
+   exit "${ksft_fail}"
+   fi
+
# Delete the file once it is validated, otherwise keep it
# for debugging purposes
rm "${TMPFILENAME}"
@@ -220,6 +247,8 @@ trap cleanup EXIT
 set_network
 # Create a dynamic target for netconsole
 create_dynamic_target
+# Set userdata "key" with the "value" value
+set_user_data
 # Listed for netconsole port inside the namespace and destination interface
 listen_port_and_save_to "${OUTPUT_FILE}" &
 # Wait for socat to start and listen to the port.
-- 
2.43.5




Re: [PATCH v4 06/11] iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC

2024-10-25 Thread Nicolin Chen
On Fri, Oct 25, 2024 at 09:04:15AM +, Tian, Kevin wrote:
> > From: Nicolin Chen 
> > Sent: Tuesday, October 22, 2024 8:19 AM
> >
> > +static struct iommufd_hwpt_nested *
> > +iommufd_hwpt_nested_alloc_for_viommu(struct iommufd_viommu
> > *viommu,
> > +  const struct iommu_user_data *user_data)
> 
> probably "_for" can be skipped to reduce the name length

That would sound like a hwpt_nested allocating vIOMMU...

It'd be probably neutral to have iommufd_viommu_alloc_hwpt_nested,
yet we have iommufd_hwpt_nested_alloc (HWPT-based) to align with..

> looks there missed a check on flags in this path.

Oh yes, I missed that. Will pass in the cmd->flags.

Thanks
Nicolin



Re: [PATCH v4 04/11] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

2024-10-25 Thread Nicolin Chen
On Fri, Oct 25, 2024 at 09:05:58AM +, Tian, Kevin wrote:
> > From: Nicolin Chen 
> > Sent: Tuesday, October 22, 2024 8:19 AM
> > +
> > + viommu->type = cmd->type;
> > + viommu->ictx = ucmd->ictx;
> > + viommu->hwpt = hwpt_paging;
> > + /* Assume physical IOMMUs are unpluggable (the most likely case)
> > */
> > + viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev);
> > +
> 
> so what would happen if this assumption breaks?

I had a very verbose comments previously that Alexey suggested to
optimize away.. Perhaps I should add back the part that mentions
adding a refcount for pluggable ones..

Nicolin



Re: [PATCH v4 00/11] iommufd: Add vIOMMU infrastructure (Part-1)

2024-10-25 Thread Nicolin Chen
On Fri, Oct 25, 2024 at 08:34:05AM +, Tian, Kevin wrote:
> > From: Nicolin Chen 
> > Sent: Tuesday, October 22, 2024 8:19 AM
> >
> > This series introduces a new vIOMMU infrastructure and related ioctls.
> >
> > IOMMUFD has been using the HWPT infrastructure for all cases, including a
> > nested IO page table support. Yet, there're limitations for an HWPT-based
> > structure to support some advanced HW-accelerated features, such as
> > CMDQV
> > on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-
> > IOMMU
> > environment, it is not straightforward for nested HWPTs to share the same
> > parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a
> > parent HWPT typically hold one stage-2 IO pagetable and tag it with only
> > one ID in the cache entries. When sharing one large stage-2 IO pagetable
> > across physical IOMMU instances, that one ID may not always be available
> > across all the IOMMU instances. In other word, it's ideal for SW to have
> > a different container for the stage-2 IO pagetable so it can hold another
> > ID that's available.
> 
> Just holding multiple IDs doesn't require a different container. This is
> just a side effect when vIOMMU will be required for other said reasons.
> 
> If we have to put more words here I'd prefer to adding a bit more for
> CMDQV which is more compelling. not a big deal though. 😊

Ack.

> > For this "different container", add vIOMMU, an additional layer to hold
> > extra virtualization information:
> >
> > 
> > ___
> >  |  iommufd (with vIOMMU)|
> >  |   |
> >  | [5]   |
> >  |_  |
> >  |   | | |
> >  |  ||vIOMMU   | |
> >  |  || | |
> >  |  || | |
> >  |  |  [1]   | |  [4] [2]|
> >  |  | __ | | _   |
> >  |  ||  || [3] || |   || |
> >  |  || IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
> >  |  ||__||_||_|   || |
> >  |  ||  |  |   | |
> >
> > |__||__|__|_
> > __|_|
> > ||  |  |   |
> >   __v_   |__v_   __v_   ___v__
> >  |   struct   |  |  PFN  |  (paging)  | |  (nested)  | |struct|
> >  |iommu_device|  |-->|iommu_domain|<|iommu_domain|<
> > |device|
> >  ||   storage|| || |__|
> >
> 
> nit - [1] ... [5] can be removed.

They are copied from the Documentation where numbers are needed.
I will take all the numbers out in the cover-letters.

> > The vIOMMU object should be seen as a slice of a physical IOMMU instance
> > that is passed to or shared with a VM. That can be some HW/SW resources:
> >  - Security namespace for guest owned ID, e.g. guest-controlled cache tags
> >  - Access to a sharable nesting parent pagetable across physical IOMMUs
> >  - Virtualization of various platforms IDs, e.g. RIDs and others
> >  - Delivery of paravirtualized invalidation
> >  - Direct assigned invalidation queues
> >  - Direct assigned interrupts
> >  - Non-affiliated event reporting
> 
> sorry no idea about 'non-affiliated event'. Can you elaborate?

I'll put an "e.g.".

> > On a multi-IOMMU system, the vIOMMU object must be instanced to the
> > number
> > of the physical IOMMUs that are passed to (via devices) a guest VM, while
> 
> 'to the number of the physical IOMMUs that have a slice passed to ..."

Ack.

Thanks
Nicolin



Re: [PATCH v4 04/11] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

2024-10-25 Thread Nicolin Chen
On Fri, Oct 25, 2024 at 08:59:11AM +, Tian, Kevin wrote:
> > From: Nicolin Chen 
> > Sent: Tuesday, October 22, 2024 8:19 AM
> >
> > Add a new ioctl for user space to do a vIOMMU allocation. It must be based
> > on a nesting parent HWPT, so take its refcount.
> >
> > If an IOMMU driver supports a driver-managed vIOMMU object, it must
> > define
> 
> why highlight 'driver-managed', implying a core-managed vIOMMU
> object some day?

Oh, core-managed vIOMMU is gone since this version. I should have
updated the commit message here too.

> > +/**
> > + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC)
> > + * @size: sizeof(struct iommu_viommu_alloc)
> > + * @flags: Must be 0
> > + * @type: Type of the virtual IOMMU. Must be defined in enum
> > iommu_viommu_type
> > + * @dev_id: The device's physical IOMMU will be used to back the virtual
> > IOMMU
> > + * @hwpt_id: ID of a nesting parent HWPT to associate to
> > + * @out_viommu_id: Output virtual IOMMU ID for the allocated object
> > + *
> > + * Allocate a virtual IOMMU object that represents the underlying physical
> > + * IOMMU's virtualization support. The vIOMMU object is a security-isolated
> > + * slice of the physical IOMMU HW that is unique to a specific VM.
> 
> the object itself is a software abstraction, while a 'slice' is a set of
> real hw resources.

Yea, let's do this:
 * Allocate a virtual IOMMU object, representing the underlying physical IOMMU's
 * virtualization support that is a security-isolated slice of the real IOMMU HW
 * that is unique to a specific VM.

Thanks
Nicolin



[PATCH net-next 1/2] net: netconsole: selftests: Change the IP subnet

2024-10-25 Thread Breno Leitao
Use a less populated IP range to run the tests, as suggested by Petr in
Link: https://lore.kernel.org/netdev/87ikvukv3s@nvidia.com/.

Suggested-by: Petr Machata 
Signed-off-by: Breno Leitao 
---
 tools/testing/selftests/drivers/net/netcons_basic.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh 
b/tools/testing/selftests/drivers/net/netcons_basic.sh
index 06021b2059b7..4ad1e216c6b0 100755
--- a/tools/testing/selftests/drivers/net/netcons_basic.sh
+++ b/tools/testing/selftests/drivers/net/netcons_basic.sh
@@ -20,9 +20,9 @@ SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
 
 # Simple script to test dynamic targets in netconsole
 SRCIF="" # to be populated later
-SRCIP=192.168.1.1
+SRCIP=192.168.2.1
 DSTIF="" # to be populated later
-DSTIP=192.168.1.2
+DSTIP=192.168.2.2
 
 PORT=""
 MSG="netconsole selftest"
-- 
2.43.5




Re: [PATCH v4 01/14] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct

2024-10-25 Thread Nicolin Chen
On Fri, Oct 25, 2024 at 10:20:54AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 25, 2024 at 06:53:01PM +1100, Alexey Kardashevskiy wrote:
> > > +#define iommufd_vdevice_alloc(ictx, drv_struct, member)  
> > >   \
> > > + ({ \
> > > + static_assert( \
> > > + __same_type(struct iommufd_vdevice,\
> > > + ((struct drv_struct *)NULL)->member)); \
> > > + static_assert(offsetof(struct drv_struct, member.obj) == 0);   \
> > > + container_of(_iommufd_object_alloc(ictx,   \
> > > +sizeof(struct drv_struct),  \
> > > +IOMMUFD_OBJ_VDEVICE),   \
> > > +  struct drv_struct, member.obj);   \
> > > + })
> > >   #endif
> > 
> > A nit: it hurts eyes to read:
> > 
> > mock_vdev = iommufd_vdevice_alloc(viommu->ictx, mock_vdevice, core);
> > 
> > vs.
> > 
> > mock_vdev = iommufd_vdevice_alloc(viommu->ictx, struct mock_vdevice, core);
> > 
> > as for the former I go searching for a "mock_vdevice" variable and for the
> > latter it is clear it is 1) a macro 2) which does some type checking.
> > 
> > also, it makes it impossible to pass things like typeof(..) or a type from
> > typedef. Thanks,
> 
> Makes sense to me

Ack. Will change accordingly.

> And the container_of() should not be used in these macros, the point
> was to avoid it to make the PTR_ERR behavior cleraer. Just put a force
> type cast

I recall that I changed it for a compiler complaint. But it seems
to be gone now. Will change it back.

Thanks
Nicolin



Re: [PATCH net-next v2 2/4] net: hsr: Add VLAN CTAG filter support

2024-10-25 Thread Vadim Fedorenko

On 24/10/2024 11:30, MD Danish Anwar wrote:

From: Murali Karicheri 

This patch adds support for VLAN ctag based filtering at slave devices.
The slave ethernet device may be capable of filtering ethernet packets
based on VLAN ID. This requires that when the VLAN interface is created
over an HSR/PRP interface, it passes the VID information to the
associated slave ethernet devices so that it updates the hardware
filters to filter ethernet frames based on VID. This patch adds the
required functions to propagate the vid information to the slave
devices.

Signed-off-by: Murali Karicheri 
Signed-off-by: MD Danish Anwar 
---
  net/hsr/hsr_device.c | 71 +++-
  1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index 0ca47ebb01d3..ff586bdc2bde 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -515,6 +515,68 @@ static void hsr_change_rx_flags(struct net_device *dev, 
int change)
}
  }
  
+static int hsr_ndo_vlan_rx_add_vid(struct net_device *dev,

+  __be16 proto, u16 vid)
+{
+   struct hsr_port *port;
+   struct hsr_priv *hsr;
+   int ret = 0;
+
+   hsr = netdev_priv(dev);
+
+   hsr_for_each_port(hsr, port) {
+   if (port->type == HSR_PT_MASTER)
+   continue;
+
+   ret = vlan_vid_add(port->dev, proto, vid);
+   switch (port->type) {
+   case HSR_PT_SLAVE_A:
+   if (ret) {
+   netdev_err(dev, "add vid failed for Slave-A\n");
+   return ret;
+   }
+   break;
+
+   case HSR_PT_SLAVE_B:
+   if (ret) {
+   /* clean up Slave-A */
+   netdev_err(dev, "add vid failed for Slave-B\n");
+   vlan_vid_del(port->dev, proto, vid);
+   return ret;
+   }
+   break;
+   default:
+   break;
+   }
+   }
+
+   return 0;
+}


This function doesn't match with hsr_ndo_vlan_rx_kill_vid().
vlan_vid_add() can potentially be executed for port->type
equals to HSR_PT_INTERLINK, but the result will be ignored. And
the vlan_vid_del() will never happen in this case. Is it desired
behavior? Maybe it's better to synchronize add/del code and refactor
error path to avoid coping the code?


+
+static int hsr_ndo_vlan_rx_kill_vid(struct net_device *dev,
+   __be16 proto, u16 vid)
+{
+   struct hsr_port *port;
+   struct hsr_priv *hsr;
+
+   hsr = netdev_priv(dev);
+
+   hsr_for_each_port(hsr, port) {
+   if (port->type == HSR_PT_MASTER)
+   continue;
+   switch (port->type) {
+   case HSR_PT_SLAVE_A:
+   case HSR_PT_SLAVE_B:
+   vlan_vid_del(port->dev, proto, vid);
+   break;
+   default:
+   break;
+   }
+   }
+
+   return 0;
+}
+
  static const struct net_device_ops hsr_device_ops = {
.ndo_change_mtu = hsr_dev_change_mtu,
.ndo_open = hsr_dev_open,
@@ -523,6 +585,8 @@ static const struct net_device_ops hsr_device_ops = {
.ndo_change_rx_flags = hsr_change_rx_flags,
.ndo_fix_features = hsr_fix_features,
.ndo_set_rx_mode = hsr_set_rx_mode,
+   .ndo_vlan_rx_add_vid = hsr_ndo_vlan_rx_add_vid,
+   .ndo_vlan_rx_kill_vid = hsr_ndo_vlan_rx_kill_vid,
  };
  
  static const struct device_type hsr_type = {

@@ -569,7 +633,8 @@ void hsr_dev_setup(struct net_device *dev)
  
  	dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA |

   NETIF_F_GSO_MASK | NETIF_F_HW_CSUM |
-  NETIF_F_HW_VLAN_CTAG_TX;
+  NETIF_F_HW_VLAN_CTAG_TX |
+  NETIF_F_HW_VLAN_CTAG_FILTER;
  
  	dev->features = dev->hw_features;

  }
@@ -647,6 +712,10 @@ int hsr_dev_finalize(struct net_device *hsr_dev, struct 
net_device *slave[2],
(slave[1]->features & NETIF_F_HW_HSR_FWD))
hsr->fwd_offloaded = true;
  
+	if ((slave[0]->features & NETIF_F_HW_VLAN_CTAG_FILTER) &&

+   (slave[1]->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+   hsr_dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+
res = register_netdevice(hsr_dev);
if (res)
goto err_unregister;





Re: [PATCH RFC 1/3] pinctrl: mediatek: paris: Expose more configurations to GPIO set_config

2024-10-25 Thread AngeloGioacchino Del Regno

Il 11/09/24 12:10, AngeloGioacchino Del Regno ha scritto:

Il 09/09/24 20:37, Nícolas F. R. A. Prado ha scritto:

Currently the set_config callback in the gpio_chip registered by the
pinctrl_paris driver only supports PIN_CONFIG_INPUT_DEBOUNCE, despite


[...] only supports operations configuring the input debounce parameter
of the EINT controller and denies configuring params on the other AP GPIOs [...]

(reword as needed)


many other configurations already being implemented and available
through the pinctrl API for configuration of pins by the Devicetree and
other drivers.

Expose all configurations currently implemented through the GPIO API so
they can also be set from userspace, which is particularly useful to
allow testing them from userspace.

Signed-off-by: Nícolas F. R. A. Prado 
---
  drivers/pinctrl/mediatek/pinctrl-paris.c | 20 ++--


You can do the same for pinctrl-moore too, it's trivial.

Other than that, I agree about performing this change, as this may be useful
for more than just testing.



Nicolas, please don't forget to respin this patch.

Thanks,
Angelo



  1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/pinctrl/mediatek/pinctrl-paris.c b/drivers/pinctrl/mediatek/ 
pinctrl-paris.c

index e12316c42698..668f8055a544 100644
--- a/drivers/pinctrl/mediatek/pinctrl-paris.c
+++ b/drivers/pinctrl/mediatek/pinctrl-paris.c
@@ -255,10 +255,9 @@ static int mtk_pinconf_get(struct pinctrl_dev *pctldev,
  return err;
  }
-static int mtk_pinconf_set(struct pinctrl_dev *pctldev, unsigned int pin,
+static int mtk_pinconf_set(struct mtk_pinctrl *hw, unsigned int pin,
 enum pin_config_param param, u32 arg)
  {
-    struct mtk_pinctrl *hw = pinctrl_dev_get_drvdata(pctldev);
  const struct mtk_pin_desc *desc;
  int err = -ENOTSUPP;
  u32 reg;
@@ -795,7 +794,7 @@ static int mtk_pconf_group_set(struct pinctrl_dev *pctldev, 
unsigned group,

  int i, ret;
  for (i = 0; i < num_configs; i++) {
-    ret = mtk_pinconf_set(pctldev, grp->pin,
+    ret = mtk_pinconf_set(hw, grp->pin,
    pinconf_to_config_param(configs[i]),
    pinconf_to_config_argument(configs[i]));
  if (ret < 0)
@@ -937,18 +936,19 @@ static int mtk_gpio_set_config(struct gpio_chip *chip, 
unsigned int offset,

  {
  struct mtk_pinctrl *hw = gpiochip_get_data(chip);
  const struct mtk_pin_desc *desc;
-    u32 debounce;
+    enum pin_config_param param = pinconf_to_config_param(config);
+    u32 arg = pinconf_to_config_argument(config);
  desc = (const struct mtk_pin_desc *)&hw->soc->pins[offset];
-    if (!hw->eint ||
-    pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE ||
-    desc->eint.eint_n == EINT_NA)
-    return -ENOTSUPP;
+    if (param == PIN_CONFIG_INPUT_DEBOUNCE) {
+    if (!hw->eint || desc->eint.eint_n == EINT_NA)
+    return -ENOTSUPP;
-    debounce = pinconf_to_config_argument(config);
+    return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, arg);
+    }
-    return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, debounce);
+    return mtk_pinconf_set(hw, offset, param, arg);
  }
  static int mtk_build_gpiochip(struct mtk_pinctrl *hw)










[PATCH V4 09/15] selftests/resctrl: Remove unused measurement code

2024-10-25 Thread Reinette Chatre
The MBM and MBA resctrl selftests run a benchmark during which
it takes measurements of read memory bandwidth via perf.
Code exists to support measurements of write memory bandwidth
but there exists no path with which this code can execute.

While code exists for write memory bandwidth measurement
there has not yet been a use case for it. Remove this unused code.
Rename relevant functions to include "read" so that it is clear
that it relates only to memory bandwidth reads, while renaming
the functions also add consistency by changing the "membw"
instances to more prevalent "mem_bw".

Signed-off-by: Reinette Chatre 
Reviewed-by: Ilpo Järvinen 
---
Changes since V2:
- Add Ilpo's Reviewed-by tag.

Changes since V1:
- New patch.
---
 tools/testing/selftests/resctrl/mba_test.c|   4 +-
 tools/testing/selftests/resctrl/mbm_test.c|   4 +-
 tools/testing/selftests/resctrl/resctrl.h |   8 +-
 tools/testing/selftests/resctrl/resctrl_val.c | 234 ++
 tools/testing/selftests/resctrl/resctrlfs.c   |  17 --
 5 files changed, 85 insertions(+), 182 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c 
b/tools/testing/selftests/resctrl/mba_test.c
index da40a8ed4413..be0ead73e55d 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -21,7 +21,7 @@ static int mba_init(const struct resctrl_val_param *param, 
int domain_id)
 {
int ret;
 
-   ret = initialize_mem_bw_imc();
+   ret = initialize_read_mem_bw_imc();
if (ret)
return ret;
 
@@ -68,7 +68,7 @@ static int mba_setup(const struct resctrl_test *test,
 static int mba_measure(const struct user_params *uparams,
   struct resctrl_val_param *param, pid_t bm_pid)
 {
-   return measure_mem_bw(uparams, param, bm_pid, "reads");
+   return measure_read_mem_bw(uparams, param, bm_pid);
 }
 
 static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
diff --git a/tools/testing/selftests/resctrl/mbm_test.c 
b/tools/testing/selftests/resctrl/mbm_test.c
index cf08ba5e314e..defa94293915 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -91,7 +91,7 @@ static int mbm_init(const struct resctrl_val_param *param, 
int domain_id)
 {
int ret;
 
-   ret = initialize_mem_bw_imc();
+   ret = initialize_read_mem_bw_imc();
if (ret)
return ret;
 
@@ -122,7 +122,7 @@ static int mbm_setup(const struct resctrl_test *test,
 static int mbm_measure(const struct user_params *uparams,
   struct resctrl_val_param *param, pid_t bm_pid)
 {
-   return measure_mem_bw(uparams, param, bm_pid, "reads");
+   return measure_read_mem_bw(uparams, param, bm_pid);
 }
 
 static void mbm_test_cleanup(void)
diff --git a/tools/testing/selftests/resctrl/resctrl.h 
b/tools/testing/selftests/resctrl/resctrl.h
index ba1ce1b35699..82801245e4c1 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -126,7 +126,6 @@ int filter_dmesg(void);
 int get_domain_id(const char *resource, int cpu_no, int *domain_id);
 int mount_resctrlfs(void);
 int umount_resctrlfs(void);
-const char *get_bw_report_type(const char *bw_report);
 bool resctrl_resource_exists(const char *resource);
 bool resctrl_mon_feature_exists(const char *resource, const char *feature);
 bool resource_info_file_exists(const char *resource, const char *file);
@@ -143,10 +142,9 @@ unsigned char *alloc_buffer(size_t buf_size, int memflush);
 void mem_flush(unsigned char *buf, size_t buf_size);
 void fill_cache_read(unsigned char *buf, size_t buf_size, bool once);
 int run_fill_buf(size_t buf_size, int memflush);
-int initialize_mem_bw_imc(void);
-int measure_mem_bw(const struct user_params *uparams,
-  struct resctrl_val_param *param, pid_t bm_pid,
-  const char *bw_report);
+int initialize_read_mem_bw_imc(void);
+int measure_read_mem_bw(const struct user_params *uparams,
+   struct resctrl_val_param *param, pid_t bm_pid);
 void initialize_mem_bw_resctrl(const struct resctrl_val_param *param,
   int domain_id);
 int resctrl_val(const struct resctrl_test *test,
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c 
b/tools/testing/selftests/resctrl/resctrl_val.c
index 113ca18d67c1..c4ebf70a46ef 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -12,13 +12,10 @@
 
 #define UNCORE_IMC "uncore_imc"
 #define READ_FILE_NAME "events/cas_count_read"
-#define WRITE_FILE_NAME"events/cas_count_write"
 #define DYN_PMU_PATH   "/sys/bus/event_source/devices"
 #define SCALE  0.6103515625
 #define MAX_IMCS   20
 #define MAX_TOKENS 5
-#define READ   0
-#define WRITE  1
 
 #define CO

[PATCH net-next v10 19/23] ovpn: implement key add/get/del/swap via netlink

2024-10-25 Thread Antonio Quartulli
This change introduces the netlink commands needed to add, get, delete
and swap keys for a specific peer.

Userspace is expected to use these commands to create, inspect (non
sensible data only), destroy and rotate session keys for a specific
peer.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/crypto.c  |  42 ++
 drivers/net/ovpn/crypto.h  |   4 +
 drivers/net/ovpn/crypto_aead.c |  17 +++
 drivers/net/ovpn/crypto_aead.h |   2 +
 drivers/net/ovpn/netlink.c | 308 -
 5 files changed, 369 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c
index 
f1f7510e2f735e367f96eb4982ba82c9af3c8bfc..cfb014c947b968752ba3dab84ec42dc8ec086379
 100644
--- a/drivers/net/ovpn/crypto.c
+++ b/drivers/net/ovpn/crypto.c
@@ -151,3 +151,45 @@ void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state 
*cs)
 
spin_unlock_bh(&cs->lock);
 }
+
+/**
+ * ovpn_crypto_config_get - populate keyconf object with non-sensible key data
+ * @cs: the crypto state to extract the key data from
+ * @slot: the specific slot to inspect
+ * @keyconf: the output object to populate
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_crypto_config_get(struct ovpn_crypto_state *cs,
+  enum ovpn_key_slot slot,
+  struct ovpn_key_config *keyconf)
+{
+   struct ovpn_crypto_key_slot *ks;
+   int idx;
+
+   switch (slot) {
+   case OVPN_KEY_SLOT_PRIMARY:
+   idx = cs->primary_idx;
+   break;
+   case OVPN_KEY_SLOT_SECONDARY:
+   idx = !cs->primary_idx;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   rcu_read_lock();
+   ks = rcu_dereference(cs->slots[idx]);
+   if (!ks || (ks && !ovpn_crypto_key_slot_hold(ks))) {
+   rcu_read_unlock();
+   return -ENOENT;
+   }
+   rcu_read_unlock();
+
+   keyconf->cipher_alg = ovpn_aead_crypto_alg(ks);
+   keyconf->key_id = ks->key_id;
+
+   ovpn_crypto_key_slot_put(ks);
+
+   return 0;
+}
diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h
index 
3b437d26b531c3034cca5343c755ef9c7ef57276..96fd41f4b81b74f8a3ecfe33ee24ba0122d222fe
 100644
--- a/drivers/net/ovpn/crypto.h
+++ b/drivers/net/ovpn/crypto.h
@@ -136,4 +136,8 @@ void ovpn_crypto_state_release(struct ovpn_crypto_state 
*cs);
 
 void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs);
 
+int ovpn_crypto_config_get(struct ovpn_crypto_state *cs,
+  enum ovpn_key_slot slot,
+  struct ovpn_key_config *keyconf);
+
 #endif /* _NET_OVPN_OVPNCRYPTO_H_ */
diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c
index 
072bb0881764752520e8e26e18337c1274ce1aa4..25e4e4a453b2bc499aec9a192fe3d86ba1aac511
 100644
--- a/drivers/net/ovpn/crypto_aead.c
+++ b/drivers/net/ovpn/crypto_aead.c
@@ -367,3 +367,20 @@ ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config 
*kc)
ovpn_aead_crypto_key_slot_destroy(ks);
return ERR_PTR(ret);
 }
+
+enum ovpn_cipher_alg ovpn_aead_crypto_alg(struct ovpn_crypto_key_slot *ks)
+{
+   const char *alg_name;
+
+   if (!ks->encrypt)
+   return OVPN_CIPHER_ALG_NONE;
+
+   alg_name = crypto_tfm_alg_name(crypto_aead_tfm(ks->encrypt));
+
+   if (!strcmp(alg_name, ALG_NAME_AES))
+   return OVPN_CIPHER_ALG_AES_GCM;
+   else if (!strcmp(alg_name, ALG_NAME_CHACHAPOLY))
+   return OVPN_CIPHER_ALG_CHACHA20_POLY1305;
+   else
+   return OVPN_CIPHER_ALG_NONE;
+}
diff --git a/drivers/net/ovpn/crypto_aead.h b/drivers/net/ovpn/crypto_aead.h
index 
77ee8141599bc06b0dc664c5b0a4dae660a89238..fb65be82436edd7ff89b171f7a89c9103b617d1f
 100644
--- a/drivers/net/ovpn/crypto_aead.h
+++ b/drivers/net/ovpn/crypto_aead.h
@@ -28,4 +28,6 @@ struct ovpn_crypto_key_slot *
 ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc);
 void ovpn_aead_crypto_key_slot_destroy(struct ovpn_crypto_key_slot *ks);
 
+enum ovpn_cipher_alg ovpn_aead_crypto_alg(struct ovpn_crypto_key_slot *ks);
+
 #endif /* _NET_OVPN_OVPNAEAD_H_ */
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 
d504445325ef82db04f87367c858adaf025f6297..fe9377b9b8145784917460cd5f222bc7fae4d8db
 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -18,6 +18,7 @@
 #include "netlink.h"
 #include "netlink-gen.h"
 #include "bind.h"
+#include "crypto.h"
 #include "packet.h"
 #include "peer.h"
 #include "socket.h"
@@ -679,24 +680,323 @@ int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct 
genl_info *info)
return ret;
 }
 
+static int ovpn_nl_get_key_dir(struct genl_info *info, struct nlattr *key,
+  enum ovpn_cipher_alg cipher,
+  struct ovpn_key_direction *dir)
+{
+   struct nlattr *attrs[OVPN_A_KEYDIR_

[PATCH net-next v10 18/23] ovpn: implement peer add/get/dump/delete via netlink

2024-10-25 Thread Antonio Quartulli
This change introduces the netlink command needed to add, delete and
retrieve/dump known peers. Userspace is expected to use these commands
to handle known peer lifecycles.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/netlink.c | 578 -
 drivers/net/ovpn/peer.c|  48 ++--
 drivers/net/ovpn/peer.h|   5 +
 3 files changed, 609 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 
2cc34eb1d1d870c6705714cb971c3c5dfb04afda..d504445325ef82db04f87367c858adaf025f6297
 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -16,6 +17,10 @@
 #include "io.h"
 #include "netlink.h"
 #include "netlink-gen.h"
+#include "bind.h"
+#include "packet.h"
+#include "peer.h"
+#include "socket.h"
 
 MODULE_ALIAS_GENL_FAMILY(OVPN_FAMILY_NAME);
 
@@ -86,29 +91,592 @@ void ovpn_nl_post_doit(const struct genl_split_ops *ops, 
struct sk_buff *skb,
netdev_put(ovpn->dev, &ovpn->dev_tracker);
 }
 
+static int ovpn_nl_attr_sockaddr_remote(struct nlattr **attrs,
+   struct sockaddr_storage *ss)
+{
+   struct sockaddr_in6 *sin6;
+   struct sockaddr_in *sin;
+   struct in6_addr *in6;
+   __be16 port = 0;
+   __be32 *in;
+   int af;
+
+   ss->ss_family = AF_UNSPEC;
+
+   if (attrs[OVPN_A_PEER_REMOTE_PORT])
+   port = nla_get_be16(attrs[OVPN_A_PEER_REMOTE_PORT]);
+
+   if (attrs[OVPN_A_PEER_REMOTE_IPV4]) {
+   af = AF_INET;
+   ss->ss_family = AF_INET;
+   in = nla_data(attrs[OVPN_A_PEER_REMOTE_IPV4]);
+   } else if (attrs[OVPN_A_PEER_REMOTE_IPV6]) {
+   af = AF_INET6;
+   ss->ss_family = AF_INET6;
+   in6 = nla_data(attrs[OVPN_A_PEER_REMOTE_IPV6]);
+   } else {
+   return AF_UNSPEC;
+   }
+
+   switch (ss->ss_family) {
+   case AF_INET6:
+   /* If this is a regular IPv6 just break and move on,
+* otherwise switch to AF_INET and extract the IPv4 accordingly
+*/
+   if (!ipv6_addr_v4mapped(in6)) {
+   sin6 = (struct sockaddr_in6 *)ss;
+   sin6->sin6_port = port;
+   memcpy(&sin6->sin6_addr, in6, sizeof(*in6));
+   break;
+   }
+
+   /* v4-mapped-v6 address */
+   ss->ss_family = AF_INET;
+   in = &in6->s6_addr32[3];
+   fallthrough;
+   case AF_INET:
+   sin = (struct sockaddr_in *)ss;
+   sin->sin_port = port;
+   sin->sin_addr.s_addr = *in;
+   break;
+   }
+
+   /* don't return ss->ss_family as it may have changed in case of
+* v4-mapped-v6 address
+*/
+   return af;
+}
+
+static u8 *ovpn_nl_attr_local_ip(struct nlattr **attrs)
+{
+   u8 *addr6;
+
+   if (!attrs[OVPN_A_PEER_LOCAL_IPV4] && !attrs[OVPN_A_PEER_LOCAL_IPV6])
+   return NULL;
+
+   if (attrs[OVPN_A_PEER_LOCAL_IPV4])
+   return nla_data(attrs[OVPN_A_PEER_LOCAL_IPV4]);
+
+   addr6 = nla_data(attrs[OVPN_A_PEER_LOCAL_IPV6]);
+   /* this is an IPv4-mapped IPv6 address, therefore extract the actual
+* v4 address from the last 4 bytes
+*/
+   if (ipv6_addr_v4mapped((struct in6_addr *)addr6))
+   return addr6 + 12;
+
+   return addr6;
+}
+
+static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
+struct genl_info *info,
+struct nlattr **attrs)
+{
+   if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
+ OVPN_A_PEER_ID))
+   return -EINVAL;
+
+   if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "cannot specify both remote IPv4 or IPv6 
address");
+   return -EINVAL;
+   }
+
+   if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
+   !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "cannot specify remote port without IP 
address");
+   return -EINVAL;
+   }
+
+   if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
+   attrs[OVPN_A_PEER_LOCAL_IPV4]) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "cannot specify local IPv4 address without 
remote");
+   return -EINVAL;
+   }
+
+   if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
+   attrs[OVPN_A_PEER_LOCAL_IPV6]) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "cannot specify local IPV6 address without 
remote");
+   retu

[PATCH net-next v10 21/23] ovpn: notify userspace when a peer is deleted

2024-10-25 Thread Antonio Quartulli
Whenever a peer is deleted, send a notification to userspace so that it
can react accordingly.

This is most important when a peer is deleted due to ping timeout,
because it all happens in kernelspace and thus userspace has no direct
way to learn about it.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/netlink.c | 55 ++
 drivers/net/ovpn/netlink.h |  1 +
 drivers/net/ovpn/peer.c|  1 +
 3 files changed, 57 insertions(+)

diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 
2b2ba1a810a0e87fb9ffb43b988fa52725a9589b..4d7d835cb47fd1f03d7cdafa2eda9f03065b8024
 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -999,6 +999,61 @@ int ovpn_nl_key_del_doit(struct sk_buff *skb, struct 
genl_info *info)
return 0;
 }
 
+/**
+ * ovpn_nl_peer_del_notify - notify userspace about peer being deleted
+ * @peer: the peer being deleted
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_nl_peer_del_notify(struct ovpn_peer *peer)
+{
+   struct sk_buff *msg;
+   struct nlattr *attr;
+   int ret = -EMSGSIZE;
+   void *hdr;
+
+   netdev_info(peer->ovpn->dev, "deleting peer with id %u, reason %d\n",
+   peer->id, peer->delete_reason);
+
+   msg = nlmsg_new(100, GFP_ATOMIC);
+   if (!msg)
+   return -ENOMEM;
+
+   hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0, OVPN_CMD_PEER_DEL_NTF);
+   if (!hdr) {
+   ret = -ENOBUFS;
+   goto err_free_msg;
+   }
+
+   if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex))
+   goto err_cancel_msg;
+
+   attr = nla_nest_start(msg, OVPN_A_PEER);
+   if (!attr)
+   goto err_cancel_msg;
+
+   if (nla_put_u8(msg, OVPN_A_PEER_DEL_REASON, peer->delete_reason))
+   goto err_cancel_msg;
+
+   if (nla_put_u32(msg, OVPN_A_PEER_ID, peer->id))
+   goto err_cancel_msg;
+
+   nla_nest_end(msg, attr);
+
+   genlmsg_end(msg, hdr);
+
+   genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg,
+   0, OVPN_NLGRP_PEERS, GFP_ATOMIC);
+
+   return 0;
+
+err_cancel_msg:
+   genlmsg_cancel(msg, hdr);
+err_free_msg:
+   nlmsg_free(msg);
+   return ret;
+}
+
 /**
  * ovpn_nl_key_swap_notify - notify userspace peer's key must be renewed
  * @peer: the peer whose key needs to be renewed
diff --git a/drivers/net/ovpn/netlink.h b/drivers/net/ovpn/netlink.h
index 
33390b13c8904d40b629662005a9eb92ff617c3b..4ab3abcf23dba11f6b92e3d69e700693adbc671b
 100644
--- a/drivers/net/ovpn/netlink.h
+++ b/drivers/net/ovpn/netlink.h
@@ -12,6 +12,7 @@
 int ovpn_nl_register(void);
 void ovpn_nl_unregister(void);
 
+int ovpn_nl_peer_del_notify(struct ovpn_peer *peer);
 int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id);
 
 #endif /* _NET_OVPN_NETLINK_H_ */
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
8cfe1997ec116ae4fe74cd7105d228569e2a66a9..91c608f1ffa1d9dd1535ba308b6adc933dbbf1f1
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -242,6 +242,7 @@ void ovpn_peer_release_kref(struct kref *kref)
 {
struct ovpn_peer *peer = container_of(kref, struct ovpn_peer, refcount);
 
+   ovpn_nl_peer_del_notify(peer);
ovpn_peer_release(peer);
 }
 

-- 
2.45.2




[PATCH net-next v10 20/23] ovpn: kill key and notify userspace in case of IV exhaustion

2024-10-25 Thread Antonio Quartulli
IV wrap-around is cryptographically dangerous for a number of ciphers,
therefore kill the key and inform userspace (via netlink) should the
IV space go exhausted.

Userspace has two ways of deciding when the key has to be renewed before
exhausting the IV space:
1) time based approach:
   after X seconds/minutes userspace generates a new key and sends it
   to the kernel. This is based on guestimate and normally default
   timer value works well.

2) packet count based approach:
   after X packets/bytes userspace generates a new key and sends it to
   the kernel. Userspace keeps track of the amount of traffic by
   periodically polling GET_PEER and fetching the VPN/LINK stats.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/crypto.c  | 19 
 drivers/net/ovpn/crypto.h  |  2 ++
 drivers/net/ovpn/io.c  | 13 +++
 drivers/net/ovpn/netlink.c | 55 ++
 drivers/net/ovpn/netlink.h |  2 ++
 5 files changed, 91 insertions(+)

diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c
index 
cfb014c947b968752ba3dab84ec42dc8ec086379..a2346bc630be9b60604282d20a33321c277bc56f
 100644
--- a/drivers/net/ovpn/crypto.c
+++ b/drivers/net/ovpn/crypto.c
@@ -55,6 +55,25 @@ void ovpn_crypto_state_release(struct ovpn_crypto_state *cs)
}
 }
 
+/* removes the key matching the specified id from the crypto context */
+void ovpn_crypto_kill_key(struct ovpn_crypto_state *cs, u8 key_id)
+{
+   struct ovpn_crypto_key_slot *ks = NULL;
+
+   spin_lock_bh(&cs->lock);
+   if (rcu_access_pointer(cs->slots[0])->key_id == key_id) {
+   ks = rcu_replace_pointer(cs->slots[0], NULL,
+lockdep_is_held(&cs->lock));
+   } else if (rcu_access_pointer(cs->slots[1])->key_id == key_id) {
+   ks = rcu_replace_pointer(cs->slots[1], NULL,
+lockdep_is_held(&cs->lock));
+   }
+   spin_unlock_bh(&cs->lock);
+
+   if (ks)
+   ovpn_crypto_key_slot_put(ks);
+}
+
 /* Reset the ovpn_crypto_state object in a way that is atomic
  * to RCU readers.
  */
diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h
index 
96fd41f4b81b74f8a3ecfe33ee24ba0122d222fe..b7a7be752d54f1f8bcd548e0a714511efcaf68a8
 100644
--- a/drivers/net/ovpn/crypto.h
+++ b/drivers/net/ovpn/crypto.h
@@ -140,4 +140,6 @@ int ovpn_crypto_config_get(struct ovpn_crypto_state *cs,
   enum ovpn_key_slot slot,
   struct ovpn_key_config *keyconf);
 
+void ovpn_crypto_kill_key(struct ovpn_crypto_state *cs, u8 key_id);
+
 #endif /* _NET_OVPN_OVPNCRYPTO_H_ */
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261..c04791a508e5c0ae292b7b5d8098096c676b2f99
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -248,6 +248,19 @@ void ovpn_encrypt_post(void *data, int ret)
if (likely(ovpn_skb_cb(skb)->req))
aead_request_free(ovpn_skb_cb(skb)->req);
 
+   if (unlikely(ret == -ERANGE)) {
+   /* we ran out of IVs and we must kill the key as it can't be
+* use anymore
+*/
+   netdev_warn(peer->ovpn->dev,
+   "killing key %u for peer %u\n", ks->key_id,
+   peer->id);
+   ovpn_crypto_kill_key(&peer->crypto, ks->key_id);
+   /* let userspace know so that a new key must be negotiated */
+   ovpn_nl_key_swap_notify(peer, ks->key_id);
+   goto err;
+   }
+
if (unlikely(ret < 0))
goto err;
 
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 
fe9377b9b8145784917460cd5f222bc7fae4d8db..2b2ba1a810a0e87fb9ffb43b988fa52725a9589b
 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -999,6 +999,61 @@ int ovpn_nl_key_del_doit(struct sk_buff *skb, struct 
genl_info *info)
return 0;
 }
 
+/**
+ * ovpn_nl_key_swap_notify - notify userspace peer's key must be renewed
+ * @peer: the peer whose key needs to be renewed
+ * @key_id: the ID of the key that needs to be renewed
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id)
+{
+   struct nlattr *k_attr;
+   struct sk_buff *msg;
+   int ret = -EMSGSIZE;
+   void *hdr;
+
+   netdev_info(peer->ovpn->dev, "peer with id %u must rekey - primary key 
unusable.\n",
+   peer->id);
+
+   msg = nlmsg_new(100, GFP_ATOMIC);
+   if (!msg)
+   return -ENOMEM;
+
+   hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0, OVPN_CMD_KEY_SWAP_NTF);
+   if (!hdr) {
+   ret = -ENOBUFS;
+   goto err_free_msg;
+   }
+
+   if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex))
+   goto err_cancel_msg;
+
+   k_att

[PATCH net-next v10 22/23] ovpn: add basic ethtool support

2024-10-25 Thread Antonio Quartulli
Implement support for basic ethtool functionality.

Note that ovpn is a virtual device driver, therefore
various ethtool APIs are just not meaningful and thus
not implemented.

Signed-off-by: Antonio Quartulli 
Reviewed-by: Andrew Lunn 
---
 drivers/net/ovpn/main.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
1bd563e3f16f49dd01c897fbe79cbd90f4b8e9aa..9dcf51ae1497dda17d418b762011b04bfd0521df
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -7,6 +7,7 @@
  * James Yonan 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -96,6 +97,19 @@ bool ovpn_dev_is_valid(const struct net_device *dev)
return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
 }
 
+static void ovpn_get_drvinfo(struct net_device *dev,
+struct ethtool_drvinfo *info)
+{
+   strscpy(info->driver, OVPN_FAMILY_NAME, sizeof(info->driver));
+   strscpy(info->bus_info, "ovpn", sizeof(info->bus_info));
+}
+
+static const struct ethtool_ops ovpn_ethtool_ops = {
+   .get_drvinfo= ovpn_get_drvinfo,
+   .get_link   = ethtool_op_get_link,
+   .get_ts_info= ethtool_op_get_ts_info,
+};
+
 static void ovpn_setup(struct net_device *dev)
 {
/* compute the overhead considering AEAD encryption */
@@ -111,6 +125,7 @@ static void ovpn_setup(struct net_device *dev)
 
dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
 
+   dev->ethtool_ops = &ovpn_ethtool_ops;
dev->netdev_ops = &ovpn_netdev_ops;
 
dev->priv_destructor = ovpn_struct_free;

-- 
2.45.2




[PATCH net-next v10 23/23] testing/selftests: add test tool and scripts for ovpn module

2024-10-25 Thread Antonio Quartulli
The ovpn-cli tool can be compiled and used as selftest for the ovpn
kernel module.

It implements the netlink API and can thus be integrated in any
script for more automated testing.

Along with the tool, 4 scripts are added that perform basic
functionality tests by means of network namespaces.

Cc: sh...@kernel.org
Cc: linux-kselft...@vger.kernel.org
Signed-off-by: Antonio Quartulli 
---
 MAINTAINERS|1 +
 tools/testing/selftests/Makefile   |1 +
 tools/testing/selftests/net/ovpn/.gitignore|2 +
 tools/testing/selftests/net/ovpn/Makefile  |   17 +
 tools/testing/selftests/net/ovpn/config|   10 +
 tools/testing/selftests/net/ovpn/data64.key|5 +
 tools/testing/selftests/net/ovpn/ovpn-cli.c| 2370 
 tools/testing/selftests/net/ovpn/tcp_peers.txt |5 +
 .../testing/selftests/net/ovpn/test-chachapoly.sh  |9 +
 tools/testing/selftests/net/ovpn/test-float.sh |9 +
 tools/testing/selftests/net/ovpn/test-tcp.sh   |9 +
 tools/testing/selftests/net/ovpn/test.sh   |  183 ++
 tools/testing/selftests/net/ovpn/udp_peers.txt |5 +
 13 files changed, 2626 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 
cf3d55c3e98aaea8f8817faed99dd7499cd59a71..110485aec73ae5bfeef4f228490ed76e28e01870
 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17295,6 +17295,7 @@ T:  git 
https://github.com/OpenVPN/linux-kernel-ovpn.git
 F: Documentation/netlink/specs/ovpn.yaml
 F: drivers/net/ovpn/
 F: include/uapi/linux/ovpn.h
+F: tools/testing/selftests/net/ovpn/
 
 OPENVSWITCH
 M: Pravin B Shelar 
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 
363d031a16f7e14152c904e6b68dab1f90c98392..be42906ecb11d4b0f9866d2c04b0e8fb27a2b995
 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -68,6 +68,7 @@ TARGETS += net/hsr
 TARGETS += net/mptcp
 TARGETS += net/netfilter
 TARGETS += net/openvswitch
+TARGETS += net/ovpn
 TARGETS += net/packetdrill
 TARGETS += net/rds
 TARGETS += net/tcp_ao
diff --git a/tools/testing/selftests/net/ovpn/.gitignore 
b/tools/testing/selftests/net/ovpn/.gitignore
new file mode 100644
index 
..ee44c081ca7c089933659689303c303a9fa9713b
--- /dev/null
+++ b/tools/testing/selftests/net/ovpn/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0+
+ovpn-cli
diff --git a/tools/testing/selftests/net/ovpn/Makefile 
b/tools/testing/selftests/net/ovpn/Makefile
new file mode 100644
index 
..c76d8fd953c5674941c8c2787813063b1bce180f
--- /dev/null
+++ b/tools/testing/selftests/net/ovpn/Makefile
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2020-2024 OpenVPN, Inc.
+#
+CFLAGS = -pedantic -Wextra -Wall -Wl,--no-as-needed -g -O0 -ggdb 
$(KHDR_INCLUDES)
+CFLAGS += $(shell pkg-config --cflags libnl-3.0 libnl-genl-3.0)
+
+LDFLAGS = -lmbedtls -lmbedcrypto
+LDFLAGS += $(shell pkg-config --libs libnl-3.0 libnl-genl-3.0)
+
+TEST_PROGS = test.sh \
+   test-chachapoly.sh \
+   test-tcp.sh \
+   test-float.sh
+
+TEST_GEN_FILES = ovpn-cli
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/net/ovpn/config 
b/tools/testing/selftests/net/ovpn/config
new file mode 100644
index 
..71946ba9fa175c191725e369eb9b973503d9d9c4
--- /dev/null
+++ b/tools/testing/selftests/net/ovpn/config
@@ -0,0 +1,10 @@
+CONFIG_NET=y
+CONFIG_INET=y
+CONFIG_STREAM_PARSER=y
+CONFIG_NET_UDP_TUNNEL=y
+CONFIG_DST_CACHE=y
+CONFIG_CRYPTO=y
+CONFIG_CRYPTO_AES=y
+CONFIG_CRYPTO_GCM=y
+CONFIG_CRYPTO_CHACHA20POLY1305=y
+CONFIG_OVPN=m
diff --git a/tools/testing/selftests/net/ovpn/data64.key 
b/tools/testing/selftests/net/ovpn/data64.key
new file mode 100644
index 
..a99e88c4e290f58b12f399b857b873f308d9ba09
--- /dev/null
+++ b/tools/testing/selftests/net/ovpn/data64.key
@@ -0,0 +1,5 @@
+jRqMACN7d7/aFQNT8S7jkrBD8uwrgHbG5OQZP2eu4R1Y7tfpS2bf5RHv06Vi163CGoaIiTX99R3B
+ia9ycAH8Wz1+9PWv51dnBLur9jbShlgZ2QHLtUc4a/gfT7zZwULXuuxdLnvR21DDeMBaTbkgbai9
+uvAa7ne1liIgGFzbv+Bas4HDVrygxIxuAnP5Qgc3648IJkZ0QEXPF+O9f0n5+QIvGCxkAUVx+5K6
+KIs+SoeWXnAopELmoGSjUpFtJbagXK82HfdqpuUxT2Tnuef0/14SzVE/vNleBNu2ZbyrSAaah8tE
+BofkPJUBFY+YQcfZNM5Dgrw3i+Bpmpq/gpdg5w==
diff --git a/tools/testing/selftests/net/ovpn/ovpn-cli.c 
b/tools/testing/selftests/net/ovpn/ovpn-cli.c
new file mode 100644
index 
..046dd069aaaf4e5b091947bd57ed79f8519a780f
--- /dev/null
+++ b/tools/testing/selftests/net/ovpn/ovpn-cli.c
@@ -0,0 +1,2370 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel accelerator
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:Antonio Quartulli 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#inc

RE: [PATCH v4 11/11] iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support

2024-10-25 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Tuesday, October 22, 2024 8:20 AM
> 
> Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type.
> Implement
> an arm_vsmmu_alloc() with its viommu op
> arm_vsmmu_domain_alloc_nested(),
> to replace arm_smmu_domain_alloc_nesting(). As an initial step, copy the
> VMID from s2_parent. A later cleanup series is required to move the VMID
> allocation out of the stage-2 domain allocation routine to this.
> 
> After that, replace nested_domain->s2_parent with nested_domain->vsmmu.
> 
> Note that the validatting conditions for a nested_domain allocation are
> moved from arm_vsmmu_domain_alloc_nested to arm_vsmmu_alloc, since
> there
> is no point in creating a vIOMMU (vsmmu) from the beginning if it would
> not support a nested_domain.
> 
> Signed-off-by: Nicolin Chen 

hmm I wonder whether this series should be merged with Jason's
nesting series together and directly use vIOMMU to create nesting.
Otherwise it looks a bit weird for one series to first enable a uAPI
which is immediately replaced by another uAPI from the following
series. Even if both are merged in one cycle, logically it doesn't
sound clean when looking at the git history.






[PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process

2024-10-25 Thread Lorenzo Stoakes
It is useful to be able to utilise the pidfd mechanism to reference the
current thread or process (from a userland point of view - thread group
leader from the kernel's point of view).

Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.

For convenience and to avoid confusion from userland's perspective we alias
these:

* PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
  the user will want to use, as they would find it surprising if for
  instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
  and that failed.

* PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
  have no concept of thread groups or what a thread group leader is, and
  from userland's perspective and nomenclature this is what userland
  considers to be a process.

Due to the refactoring of the central __pidfd_get_pid() function we can
implement this functionality centrally, providing the use of this sentinel
in most functionality which utilises pidfd's.

We need to explicitly adjust kernel_waitid_prepare() to permit this (though
it wouldn't really make sense to use this there, we provide the ability for
consistency).

We explicitly disallow use of this in setns(), which would otherwise have
required explicit custom handling, as it doesn't make sense to set the
current calling thread to join the namespace of itself.

As the callers of pidfd_get_pid() expect an increased reference count on
the pid we do so in the self case, reducing churn and avoiding any breakage
from existing logic which decrements this reference count.

This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS,
...), process_madvise(), process_mrelease(), pidfd_send_signal(), and
pidfd_getfd() system calls.

Things such as polling a pidfs and general fd operations are not supported,
this strictly provides the sentinel for APIs which explicitly accept a
pidfd.

Reviewed-by: Shakeel Butt 
Signed-off-by: Lorenzo Stoakes 
---
 include/linux/pid.h|  8 --
 include/uapi/linux/pidfd.h | 15 +++
 kernel/exit.c  |  3 ++-
 kernel/nsproxy.c   |  1 +
 kernel/pid.c   | 51 --
 5 files changed, 57 insertions(+), 21 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index d466890e1b35..3b2ac7567a88 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -78,11 +78,15 @@ struct file;
  * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
  *
  * @pidfd:  The pidfd whose pid we want, or the fd of a /proc/ file if
- *  @alloc_proc is also set.
+ *  @alloc_proc is also set, or PIDFD_SELF_* to refer to the 
current
+ *  thread or thread group leader.
  * @allow_proc: If set, then an fd of a /proc/ file can be passed instead
  *  of a pidfd, and this will be used to determine the pid.
+
  * @flags:  Output variable, if non-NULL, then the file->f_flags of the
- *  pidfd will be set here.
+ *  pidfd will be set here or If PIDFD_SELF_THREAD is set, this is
+ *  set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP then
+ *  this is set to zero.
  *
  * Returns: If successful, the pid associated with the pidfd, otherwise an
  *  error.
diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
index 565fc0629fff..0ca2ebf906fd 100644
--- a/include/uapi/linux/pidfd.h
+++ b/include/uapi/linux/pidfd.h
@@ -29,4 +29,19 @@
 #define PIDFD_GET_USER_NAMESPACE  _IO(PIDFS_IOCTL_MAGIC, 9)
 #define PIDFD_GET_UTS_NAMESPACE   _IO(PIDFS_IOCTL_MAGIC, 10)
 
+/*
+ * Special sentinel values which can be used to refer to the current thread or
+ * thread group leader (which from a userland perspective is the process).
+ */
+#define PIDFD_SELF PIDFD_SELF_THREAD
+#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP
+
+#define PIDFD_SELF_THREAD  -100 /* Current thread. */
+#define PIDFD_SELF_THREAD_GROUP-200 /* Current thread group leader. */
+
+static inline int pidfd_is_self_sentinel(pid_t pid)
+{
+   return pid == PIDFD_SELF_THREAD || pid == PIDFD_SELF_THREAD_GROUP;
+}
+
 #endif /* _UAPI_LINUX_PIDFD_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index 619f0014c33b..3eb20f8252ee 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #include 
@@ -1739,7 +1740,7 @@ int kernel_waitid_prepare(struct wait_opts *wo, int 
which, pid_t upid,
break;
case P_PIDFD:
type = PIDTYPE_PID;
-   if (upid < 0)
+   if (upid < 0 && !pidfd_is_self_sentinel(upid))
return -EINVAL;
 
pid = pidfd_get_pid(upid, &f_flags);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index dc952c3b05af..d239f7eeaa1f 100644

[PATCH v5 0/5] introduce PIDFD_SELF* sentinels

2024-10-25 Thread Lorenzo Stoakes
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:

int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);

...

close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.

This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.

It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.

There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:

* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.

In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.

This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.

We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.

In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.

We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.

v5:
* Fixup self test dependencies on pidfd/pidfd.h.

v4:
* Avoid returning an fd in the __pidfd_get_pid() function as pointed out by
  Christian, instead simply always pin the pid and maintain fd scope in the
  helper alone.
* Add wrapper header file in tools/include/linux to allow for import of
  UAPI pidfd.h header without encountering the collision between system
  fcntl.h and linux/fcntl.h as discussed with Shuah and John.
* Fixup tests to import the UAPI pidfd.h header working around conflicts
  between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports,
  as reported by Shuah.
* Use an int for pidfd_is_self_sentinel() to avoid any dependency on
  stdbool.h in userland.
https://lore.kernel.org/linux-mm/cover.1729198898.git.lorenzo.stoa...@oracle.com/

v3:
* Do not fput() an invalid fd as reported by kernel test bot.
* Fix unintended churn from moving variable declaration.
https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoa...@oracle.com/

v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoa...@oracle.com/

Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
  a good idea, but perhaps some debate to be had on implementation. It
  seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
  PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoa...@oracle.com/

RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoa...@oracle.com/

Lorenzo Stoakes (5):
  pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
  pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
  tools: testing: separate out wait_for_pid() into helper header
  selftests: pidfd: add pidfd.h UAPI wrapper
  selftests: pidfd: add tests for PIDFD_SELF_*

 include/linux/pid.h   |  34 -
 include/uapi/linux/pidfd.h|  15 ++
 kernel/exit.c |   3 +-
 kernel/nsproxy.c  |   1 +
 kernel/pid.c  |  65 +---
 kernel/signal.c   |  29 +---
 tools/include/linux/pidfd.h   |  14 ++
 tools/testing/selftests/cgroup/test_kill.c|   2 +-
 .../pid_namespace/regression_enomem.c |   2 +-
 tools/testing/selftests/pidfd/Makefile|   3 +-
 tools/testing/selftests/pidfd/pidfd.h |  28 +---
 ...

[PATCH net-next v10 08/23] ovpn: implement basic TX path (UDP)

2024-10-25 Thread Antonio Quartulli
Packets sent over the ovpn interface are processed and transmitted to the
connected peer, if any.

Implementation is UDP only. TCP will be added by a later patch.

Note: no crypto/encapsulation exists yet. packets are just captured and
sent.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/io.c   | 138 +++-
 drivers/net/ovpn/peer.c |  37 +++-
 drivers/net/ovpn/peer.h |   4 +
 drivers/net/ovpn/skb.h  |  51 +++
 drivers/net/ovpn/udp.c  | 232 
 drivers/net/ovpn/udp.h  |   8 ++
 6 files changed, 468 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
ad3813419c33cbdfe7e8ad6f5c8b444a3540a69f..77ba4d33ae0bd2f52e8bd1c06a182d24285297b4
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -9,14 +9,150 @@
 
 #include 
 #include 
+#include 
 
 #include "io.h"
+#include "ovpnstruct.h"
+#include "peer.h"
+#include "udp.h"
+#include "skb.h"
+#include "socket.h"
+
+static void ovpn_encrypt_post(struct sk_buff *skb, int ret)
+{
+   struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
+
+   if (unlikely(ret < 0))
+   goto err;
+
+   skb_mark_not_on_list(skb);
+
+   switch (peer->sock->sock->sk->sk_protocol) {
+   case IPPROTO_UDP:
+   ovpn_udp_send_skb(peer->ovpn, peer, skb);
+   break;
+   default:
+   /* no transport configured yet */
+   goto err;
+   }
+   /* skb passed down the stack - don't free it */
+   skb = NULL;
+err:
+   if (unlikely(skb))
+   dev_core_stats_tx_dropped_inc(peer->ovpn->dev);
+   ovpn_peer_put(peer);
+   kfree_skb(skb);
+}
+
+static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+   ovpn_skb_cb(skb)->peer = peer;
+
+   /* take a reference to the peer because the crypto code may run async.
+* ovpn_encrypt_post() will release it upon completion
+*/
+   if (unlikely(!ovpn_peer_hold(peer))) {
+   DEBUG_NET_WARN_ON_ONCE(1);
+   return false;
+   }
+
+   ovpn_encrypt_post(skb, 0);
+   return true;
+}
+
+/* send skb to connected peer, if any */
+static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb,
+ struct ovpn_peer *peer)
+{
+   struct sk_buff *curr, *next;
+
+   if (likely(!peer))
+   /* retrieve peer serving the destination IP of this packet */
+   peer = ovpn_peer_get_by_dst(ovpn, skb);
+   if (unlikely(!peer)) {
+   net_dbg_ratelimited("%s: no peer to send data to\n",
+   ovpn->dev->name);
+   dev_core_stats_tx_dropped_inc(ovpn->dev);
+   goto drop;
+   }
+
+   /* this might be a GSO-segmented skb list: process each skb
+* independently
+*/
+   skb_list_walk_safe(skb, curr, next)
+   if (unlikely(!ovpn_encrypt_one(peer, curr))) {
+   dev_core_stats_tx_dropped_inc(ovpn->dev);
+   kfree_skb(curr);
+   }
+
+   /* skb passed over, no need to free */
+   skb = NULL;
+drop:
+   if (likely(peer))
+   ovpn_peer_put(peer);
+   kfree_skb_list(skb);
+}
 
 /* Send user data to the network
  */
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
+   struct ovpn_struct *ovpn = netdev_priv(dev);
+   struct sk_buff *segments, *curr, *next;
+   struct sk_buff_head skb_list;
+   __be16 proto;
+   int ret;
+
+   /* reset netfilter state */
+   nf_reset_ct(skb);
+
+   /* verify IP header size in network packet */
+   proto = ovpn_ip_check_protocol(skb);
+   if (unlikely(!proto || skb->protocol != proto)) {
+   net_err_ratelimited("%s: dropping malformed payload packet\n",
+   dev->name);
+   dev_core_stats_tx_dropped_inc(ovpn->dev);
+   goto drop;
+   }
+
+   if (skb_is_gso(skb)) {
+   segments = skb_gso_segment(skb, 0);
+   if (IS_ERR(segments)) {
+   ret = PTR_ERR(segments);
+   net_err_ratelimited("%s: cannot segment packet: %d\n",
+   dev->name, ret);
+   dev_core_stats_tx_dropped_inc(ovpn->dev);
+   goto drop;
+   }
+
+   consume_skb(skb);
+   skb = segments;
+   }
+
+   /* from this moment on, "skb" might be a list */
+
+   __skb_queue_head_init(&skb_list);
+   skb_list_walk_safe(skb, curr, next) {
+   skb_mark_not_on_list(curr);
+
+   curr = skb_share_check(curr, GFP_ATOMIC);
+   if (unlikely(!curr)) {
+   net_err_ratelimited("%s: skb_share_check failed\n",
+   dev->name);
+

Re: [PATCH v4 00/14] iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE)

2024-10-25 Thread Jason Gunthorpe
On Thu, Oct 24, 2024 at 11:14:21PM -0700, Nicolin Chen wrote:
> On Fri, Oct 25, 2024 at 04:58:33PM +1100, Alexey Kardashevskiy wrote:
> > > > > > Is there any real example of a .vdevice_alloc hook, besides the
> > > > > > selftests? It is not in iommufd_viommu_p2-v4-with-rmr, hence the
> > > > > > question. I am trying to sketch something with this new machinery 
> > > > > > and
> > > > > > less guessing would be nice. Thanks,
> > > > > 
> > > > > No, I am actually dropping that one, and moving the vdevice struct
> > > > > to the private header, as there seems to be no use case:
> > > > 
> > > > Why keep it then?
> > > 
> > > We need that structure to store per-vIOMMU virtual ID. Hiding it
> > > in the core only means we need to provide another vIOMMU APIs for
> > > drivers to look up the ID, v.s. exposing it for drivers to access
> > > directly.
> > 
> > Sorry I lost you here. If we need it, then there should be an example of
> > .vdevice_alloc() somewhere but you say they is not one. How do you test
> > this, with just selftests? :) Thanks,
> 
> A vDEVICE object will be core-allocated and core-managed, while the
> vdevice_alloc is for driver-allocated purpose for which there is no
> use case (at least with this series). You can check the vdev ioctl
> in this version that has two pathways to allocate a vDEVICE object.
> 
> A vdev_id is used to index viommu's xarray for a driver to convert
> the id to a dev pointer via a vIOMMU API. Dropping .vdevice_alloc
> just means the driver only lost its direct access.

I think the point here is this has to go in stages at the present
moment the iommu drivers don't need to hook the vdevice object, so
Nicolin should take it out of this series.

I would expect CC to need to be in this path, so we should bring it
back in the CC series.

For CC I'm broadly expecting that creating the CC type vIOMMU will
call a CC implementation, and then creating a vdevice against the
vIOMMU will also call the CC implementation. The two callbacks would
ask the secure world to create the relevant VM visible objects.

Jason



Re: [PATCH V4 10/15] selftests/resctrl: Make benchmark parameter passing robust

2024-10-25 Thread Ilpo Järvinen
On Thu, 24 Oct 2024, Reinette Chatre wrote:

> The benchmark used during the CMT, MBM, and MBA tests can be provided by
> the user via (-b) parameter, if not provided the default "fill_buf"
> benchmark is used. The user is additionally able to override
> any of the "fill_buf" default parameters when running the tests with
> "-b fill_buf ".
> 
> The "fill_buf" parameters are managed as an array of strings. Using an
> array of strings is complex because it requires transformations to/from
> strings at every producer and consumer. This is made worse for the
> individual tests where the default benchmark parameters values may not
> be appropriate and additional data wrangling is required. For example,
> the CMT test duplicates the entire array of strings in order to replace
> one of the parameters.
> 
> More issues appear when combining the usage of an array of strings with
> the use case of user overriding default parameters by specifying
> "-b fill_buf ". This use case is fragile with opportunities
> to trigger a SIGSEGV because of opportunities for NULL pointers to exist
> in the array of strings. For example, by running below (thus by specifying
> "fill_buf" should be used but all parameters are NULL):
>   $ sudo resctrl_tests -t mbm -b fill_buf
> 
> Replace the "array of strings" parameters used for "fill_buf" with
> new struct fill_buf_param that contains the "fill_buf" parameters that
> can be used directly without transformations to/from strings. Two
> instances of struct fill_buf_param may exist at any point in time:
>   * If the user provides new parameters to "fill_buf", the
> user parameter structure (struct user_params) will point to a
> fully initialized and immutable struct fill_buf_param
> containing the user provided parameters.
>   * If "fill_buf" is the benchmark that should be used by a test,
> then the test parameter structure (struct resctrl_val_param)
> will point to a fully initialized struct fill_buf_param. The
> latter may contain (a) the user provided parameters verbatim,
> (b) user provided parameters adjusted to be appropriate for
> the test, or (c) the default parameters for "fill_buf" that
> is appropriate for the test if the user did not provide
> "fill_buf" parameters nor an alternate benchmark.
> 
> The existing behavior of CMT test is to use test defined value for the
> buffer size even if the user provides another value via command line.
> This behavior is maintained since the test requires that the buffer size
> matches the size of the cache allocated, and the amount of cache
> allocated can instead be changed by the user with the "-n" command line
> parameter.
> 
> Signed-off-by: Reinette Chatre 

Thanks for the update.

Reviewed-by: Ilpo Järvinen 

-- 
 i.

> ---
> Changes since V3:
> - Handle empty string input. (Ilpo)
> 
> Changes since V2:
> - Use empty initializers. (Ilpo)
> - Let memflush be bool instead of int. (Ilpo)
> - Make user input checks more robust. (Ilpo)
> - Assign values as part of local variable definition. (Ilpo)
> 
> Changes since V1:
> - Maintain original behavior where user can override "fill_buf"
>   parameters via command line ... but only those that can actually
>   be changed. (Ilpo)
> - Fix parsing issues associated with original behavior to ensure
>   any parameter is valid before any attempt to use it.
> - Move patch earlier in series to highlight that this fixes existing
>   issues.
> - Make struct fill_buf_param dynamic to support user provided
>   parameters as well as test specific parameters.
> - Rewrite changelog.
> ---
>  tools/testing/selftests/resctrl/cmt_test.c|  32 ++
>  tools/testing/selftests/resctrl/fill_buf.c|   4 +-
>  tools/testing/selftests/resctrl/mba_test.c|  13 ++-
>  tools/testing/selftests/resctrl/mbm_test.c|  22 ++--
>  tools/testing/selftests/resctrl/resctrl.h |  59 +++---
>  .../testing/selftests/resctrl/resctrl_tests.c | 103 ++
>  tools/testing/selftests/resctrl/resctrl_val.c |  41 ---
>  7 files changed, 178 insertions(+), 96 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cmt_test.c 
> b/tools/testing/selftests/resctrl/cmt_test.c
> index 0c045080d808..4c3cf2c25a38 100644
> --- a/tools/testing/selftests/resctrl/cmt_test.c
> +++ b/tools/testing/selftests/resctrl/cmt_test.c
> @@ -116,15 +116,13 @@ static void cmt_test_cleanup(void)
>  
>  static int cmt_run_test(const struct resctrl_test *test, const struct 
> user_params *uparams)
>  {
> - const char * const *cmd = uparams->benchmark_cmd;
> - const char *new_cmd[BENCHMARK_ARGS];
> + struct fill_buf_param fill_buf = {};
>   unsigned long cache_total_size = 0;
>   int n = uparams->bits ? : 5;
>   unsigned long long_mask;
> - char *span_str = NULL;
>   int count_of_bits;
>   size_t span;
> - int ret, i;
> + int ret;
>  
>   ret = get_full_cbm("L3", &long_mask);
>   i

Re: [PATCH V4 02/15] selftests/resctrl: Print accurate buffer size as part of MBM results

2024-10-25 Thread Ilpo Järvinen
On Thu, 24 Oct 2024, Reinette Chatre wrote:

> By default the MBM test uses the "fill_buf" benchmark to keep reading
> from a buffer with size DEFAULT_SPAN while measuring memory bandwidth.
> User space can provide an alternate benchmark or amend the size of
> the buffer "fill_buf" should use.
> 
> Analysis of the MBM measurements do not require that a buffer be used
> and thus do not require knowing the size of the buffer if it was used
> during testing. Even so, the buffer size is printed as informational
> as part of the MBM test results. What is printed as buffer size is
> hardcoded as DEFAULT_SPAN, even if the test relied on another benchmark
> (that may or may not use a buffer) or if user space amended the buffer
> size.
> 
> Ensure that accurate buffer size is printed when using "fill_buf"
> benchmark and omit the buffer size information if another benchmark
> is used.
> 
> Fixes: ecdbb911f22d ("selftests/resctrl: Add MBM test")
> Signed-off-by: Reinette Chatre 

Reviewed-by: Ilpo Järvinen 

--
 i.

> ---
> Backporting is not recommended. Backporting this fix will be
> a challenge with all the refactoring done since then. This issue
> does not impact default tests and there is no sign that
> folks run these tests with anything but the defaults. This issue is
> also minor since it does not impact actual test runs or results,
> just the information printed during a test run.
> 
> Changes since V3:
> - Ensure string parsing handles case when user provides "". (Ilpo)
> - Fix error returned. (Ilpo)
> 
> Changes since V2:
> - Make user input checks more robust. (Ilpo)
> 
> Changes since V1:
> - New patch.
> ---
>  tools/testing/selftests/resctrl/mbm_test.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/mbm_test.c 
> b/tools/testing/selftests/resctrl/mbm_test.c
> index 6b5a3b52d861..cf08ba5e314e 100644
> --- a/tools/testing/selftests/resctrl/mbm_test.c
> +++ b/tools/testing/selftests/resctrl/mbm_test.c
> @@ -40,7 +40,8 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, 
> size_t span)
>   ksft_print_msg("%s Check MBM diff within %d%%\n",
>  ret ? "Fail:" : "Pass:", MAX_DIFF_PERCENT);
>   ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per);
> - ksft_print_msg("Span (MB): %zu\n", span / MB);
> + if (span)
> + ksft_print_msg("Span (MB): %zu\n", span / MB);
>   ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc);
>   ksft_print_msg("avg_bw_resc: %lu\n", avg_bw_resc);
>  
> @@ -138,15 +139,26 @@ static int mbm_run_test(const struct resctrl_test 
> *test, const struct user_param
>   .setup  = mbm_setup,
>   .measure= mbm_measure,
>   };
> + char *endptr = NULL;
> + size_t span = 0;
>   int ret;
>  
>   remove(RESULT_FILE_NAME);
>  
> + if (uparams->benchmark_cmd[0] && strcmp(uparams->benchmark_cmd[0], 
> "fill_buf") == 0) {
> + if (uparams->benchmark_cmd[1] && *uparams->benchmark_cmd[1] != 
> '\0') {
> + errno = 0;
> + span = strtoul(uparams->benchmark_cmd[1], &endptr, 10);
> + if (errno || *endptr != '\0')
> + return -EINVAL;
> + }
> + }
> +
>   ret = resctrl_val(test, uparams, uparams->benchmark_cmd, ¶m);
>   if (ret)
>   return ret;
>  
> - ret = check_results(DEFAULT_SPAN);
> + ret = check_results(span);
>   if (ret && (get_vendor() == ARCH_INTEL))
>   ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA 
> Clustering is enabled. Check BIOS configuration.\n");
>  
> 

Re: [PATCH v3] remoteproc: Add a new remoteproc state RPROC_DEFUNCT

2024-10-25 Thread Mukesh Ojha
On Fri, Oct 25, 2024 at 09:08:03AM -0600, Mathieu Poirier wrote:
> On Fri, Oct 25, 2024 at 01:40:45PM +0530, Mukesh Ojha wrote:
> > On Mon, Oct 21, 2024 at 09:12:47AM -0600, Mathieu Poirier wrote:
> > > Hi Mukesh,
> > > 
> > > On Wed, Oct 16, 2024 at 10:25:46AM +0530, Mukesh Ojha wrote:
> > > > Multiple call to glink_subdev_stop() for the same remoteproc can happen
> > > > if rproc_stop() fails from Process-A that leaves the rproc state to
> > > > RPROC_CRASHED state later a call to recovery_store from user space in
> > > > Process B triggers rproc_trigger_recovery() of the same remoteproc to
> > > > recover it results in NULL pointer dereference issue in
> > > > qcom_glink_smem_unregister().
> > > > 
> > > > There is other side to this issue if we want to fix this via adding a
> > > > NULL check on glink->edge which does not guarantees that the remoteproc
> > > > will recover in second call from Process B as it has failed in the first
> > > > Process A during SMC shutdown call and may again fail at the same call
> > > > and rproc can not recover for such case.
> > > > 
> > > > Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of
> > > > remoteproc and the only way to recover from it via system restart.
> > > > 
> > > > Process-A   Process-B
> > > > 
> > > >   fatal error interrupt happens
> > > > 
> > > >   rproc_crash_handler_work()
> > > > mutex_lock_interruptible(&rproc->lock);
> > > > ...
> > > > 
> > > >rproc->state = RPROC_CRASHED;
> > > > ...
> > > > mutex_unlock(&rproc->lock);
> > > > 
> > > > rproc_trigger_recovery()
> > > >  mutex_lock_interruptible(&rproc->lock);
> > > > 
> > > >   adsp_stop()
> > > >   qcom_q6v5_pas 20c0.remoteproc: failed to shutdown: -22
> > > >   remoteproc remoteproc3: can't stop rproc: -22
> > > >  mutex_unlock(&rproc->lock);
> > > 
> > > Ok, that can happen.
> > > 
> > > > 
> > > > echo enabled > 
> > > > /sys/class/remoteproc/remoteprocX/recovery
> > > > recovery_store()
> > > >  
> > > > rproc_trigger_recovery()
> > > >   
> > > > mutex_lock_interruptible(&rproc->lock);
> > > >rproc_stop()
> > > > glink_subdev_stop()
> > > >   
> > > > qcom_glink_smem_unregister() ==|
> > > > 
> > > >  |
> > > > 
> > > >  V
> > > 
> > > I am missing some information here but I will _assume_ this is caused by
> > > glink->edge being set to NULL [1] when glink_subdev_stop() is first 
> > > called by
> > > process A.  Instead of adding a new state to the core I think a better 
> > > idea
> > > would be to add a check for a NULL value on @smem in
> > > qcom_glink_smem_unregister().  This is a problem that should be fixed in 
> > > the
> > > driver rather than the core.
> > > 
> > > [1]. 
> > > https://elixir.bootlin.com/linux/v6.12-rc4/source/drivers/remoteproc/qcom_common.c#L213
> > 
> > 
> > I did the same here [1] but after discussion with Bjorn, realized that
> > remoteproc might not even recover and may fail in the second attempt as
> > well and only way is reboot of the machine.
> 
> Whether in RPROC_CRASHED or RPROC_DEFUNCT state, the end result is the same -
> manual intervention is needed.  I don't see why another state needs to be 
> added.

Is it really true ? As when recovery is disabled and any rproc crash
will result in RPROC_CRASHED state, while recovery enablement can
recover the rproc back to ONLINE while if rproc recovery is not
successful it can be put into RPROC_DEFUNCT state.

-Mukesh

> 
> > 
> > [1]
> > https://lore.kernel.org/lkml/20240925103351.1628788-1-quic_mo...@quicinc.com/
> > 
> > > 
> > > >   Unable to handle 
> > > > kernel NULL pointer dereference
> > > > at 
> > > > virtual address 0358
> > > > 
> > > > Signed-off-by: Mukesh Ojha 
> > > > ---
> > > > Changes in v3:
> > > >  - Fix kernel test reported error.
> > > > 
> > > > Changes in v2:
> > > >  - Removed NULL pointer check instead added a new state to signify
> > > >non-recoverable state of remoteproc.
> > > > 
> > > >  drivers/remoteproc/remoteproc_core.c  | 3 ++-
> > > >  drivers/remoteproc/remoteproc_sysfs.c | 1 +
> > > >  include/linux/remoteproc.h| 5 -
> > > >  3 files changed, 7 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/remoteproc/remoteproc_core.c 
> > > > b/drivers/remoteproc/remoteproc_core.c
> > > > index f2

Re: [PATCH v4 11/11] iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support

2024-10-25 Thread Jason Gunthorpe
On Fri, Oct 25, 2024 at 09:18:05AM +, Tian, Kevin wrote:
> > From: Nicolin Chen 
> > Sent: Tuesday, October 22, 2024 8:20 AM
> > 
> > Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type.
> > Implement
> > an arm_vsmmu_alloc() with its viommu op
> > arm_vsmmu_domain_alloc_nested(),
> > to replace arm_smmu_domain_alloc_nesting(). As an initial step, copy the
> > VMID from s2_parent. A later cleanup series is required to move the VMID
> > allocation out of the stage-2 domain allocation routine to this.
> > 
> > After that, replace nested_domain->s2_parent with nested_domain->vsmmu.
> > 
> > Note that the validatting conditions for a nested_domain allocation are
> > moved from arm_vsmmu_domain_alloc_nested to arm_vsmmu_alloc, since
> > there
> > is no point in creating a vIOMMU (vsmmu) from the beginning if it would
> > not support a nested_domain.
> > 
> > Signed-off-by: Nicolin Chen 
> 
> hmm I wonder whether this series should be merged with Jason's
> nesting series together and directly use vIOMMU to create nesting.
> Otherwise it looks a bit weird for one series to first enable a uAPI
> which is immediately replaced by another uAPI from the following
> series.

It has changed from my original expectation, that's for sure. I've
wondered the same thing.

For now I've been keeping them separate and was going to review when
this is all settled down.

It is troublesome because of all the branches, but if we don't have a
conflict we could take the whole lot through iommufd.

Jason



[PATCH v3] vsock/test: fix failures due to wrong SO_RCVLOWAT parameter

2024-10-25 Thread Konstantin Shkolnyy
This happens on 64-bit big-endian machines.
SO_RCVLOWAT requires an int parameter. However, instead of int, the test
uses unsigned long in one place and size_t in another. Both are 8 bytes
long on 64-bit machines. The kernel, having received the 8 bytes, doesn't
test for the exact size of the parameter, it only cares that it's >=
sizeof(int), and casts the 4 lower-addressed bytes to an int, which, on
a big-endian machine, contains 0. 0 doesn't trigger an error, SO_RCVLOWAT
returns with success and the socket stays with the default SO_RCVLOWAT = 1,
which results in vsock_test failures, while vsock_perf doesn't even notice
that it's failed to change it.

Fixes: b1346338fbae ("vsock_test: POLLIN + SO_RCVLOWAT test")
Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic")
Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility")
Signed-off-by: Konstantin Shkolnyy 
---

Notes:
The problem was found on s390 (big endian), while x86-64 didn't show it. 
After this fix, all tests pass on s390.
Changes for v3:
- fix the same problem in vsock_perf and update commit message
Changes for v2:
- add "Fixes:" lines to the commit message

 tools/testing/vsock/vsock_perf.c | 6 +++---
 tools/testing/vsock/vsock_test.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/testing/vsock/vsock_perf.c b/tools/testing/vsock/vsock_perf.c
index 4e8578f815e0..22633c2848cc 100644
--- a/tools/testing/vsock/vsock_perf.c
+++ b/tools/testing/vsock/vsock_perf.c
@@ -133,7 +133,7 @@ static float get_gbps(unsigned long bits, time_t ns_delta)
   ((float)ns_delta / NSEC_PER_SEC);
 }
 
-static void run_receiver(unsigned long rcvlowat_bytes)
+static void run_receiver(int rcvlowat_bytes)
 {
unsigned int read_cnt;
time_t rx_begin_ns;
@@ -163,7 +163,7 @@ static void run_receiver(unsigned long rcvlowat_bytes)
printf("Listen port %u\n", port);
printf("RX buffer %lu bytes\n", buf_size_bytes);
printf("vsock buffer %lu bytes\n", vsock_buf_bytes);
-   printf("SO_RCVLOWAT %lu bytes\n", rcvlowat_bytes);
+   printf("SO_RCVLOWAT %d bytes\n", rcvlowat_bytes);
 
fd = socket(AF_VSOCK, SOCK_STREAM, 0);
 
@@ -439,7 +439,7 @@ static long strtolx(const char *arg)
 int main(int argc, char **argv)
 {
unsigned long to_send_bytes = DEFAULT_TO_SEND_BYTES;
-   unsigned long rcvlowat_bytes = DEFAULT_RCVLOWAT_BYTES;
+   int rcvlowat_bytes = DEFAULT_RCVLOWAT_BYTES;
int peer_cid = -1;
bool sender = false;
 
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index f851f8961247..30857dd4ca97 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -833,7 +833,7 @@ static void test_stream_poll_rcvlowat_server(const struct 
test_opts *opts)
 
 static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
 {
-   unsigned long lowat_val = RCVLOWAT_BUF_SIZE;
+   int lowat_val = RCVLOWAT_BUF_SIZE;
char buf[RCVLOWAT_BUF_SIZE];
struct pollfd fds;
short poll_flags;
@@ -1282,7 +1282,7 @@ static void 
test_stream_rcvlowat_def_cred_upd_client(const struct test_opts *opt
 static void test_stream_credit_update_test(const struct test_opts *opts,
   bool low_rx_bytes_test)
 {
-   size_t recv_buf_size;
+   int recv_buf_size;
struct pollfd fds;
size_t buf_size;
void *buf;
-- 
2.34.1




Re: [PATCH v4 00/11] iommufd: Add vIOMMU infrastructure (Part-1)

2024-10-25 Thread Jason Gunthorpe
On Fri, Oct 25, 2024 at 08:34:05AM +, Tian, Kevin wrote:
> > The vIOMMU object should be seen as a slice of a physical IOMMU instance
> > that is passed to or shared with a VM. That can be some HW/SW resources:
> >  - Security namespace for guest owned ID, e.g. guest-controlled cache tags
> >  - Access to a sharable nesting parent pagetable across physical IOMMUs
> >  - Virtualization of various platforms IDs, e.g. RIDs and others
> >  - Delivery of paravirtualized invalidation
> >  - Direct assigned invalidation queues
> >  - Direct assigned interrupts
> >  - Non-affiliated event reporting
> 
> sorry no idea about 'non-affiliated event'. Can you elaborate?

This would be an even that is not a connected to a device

For instance a CMDQ experienced a problem.

Jason



Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process

2024-10-25 Thread Lorenzo Stoakes
On Fri, Oct 25, 2024 at 01:50:12PM +0100, Pedro Falcato wrote:
> On Fri, Oct 25, 2024 at 10:41 AM Lorenzo Stoakes
>  wrote:
> >
> > It is useful to be able to utilise the pidfd mechanism to reference the
> > current thread or process (from a userland point of view - thread group
> > leader from the kernel's point of view).
> >
> > Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
> > PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
> >
> > For convenience and to avoid confusion from userland's perspective we alias
> > these:
> >
> > * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
> >   the user will want to use, as they would find it surprising if for
> >   instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
> >   and that failed.
> >
> > * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
> >   have no concept of thread groups or what a thread group leader is, and
> >   from userland's perspective and nomenclature this is what userland
> >   considers to be a process.
> >
> > Due to the refactoring of the central __pidfd_get_pid() function we can
> > implement this functionality centrally, providing the use of this sentinel
> > in most functionality which utilises pidfd's.
> >
> > We need to explicitly adjust kernel_waitid_prepare() to permit this (though
> > it wouldn't really make sense to use this there, we provide the ability for
> > consistency).
> >
> > We explicitly disallow use of this in setns(), which would otherwise have
> > required explicit custom handling, as it doesn't make sense to set the
> > current calling thread to join the namespace of itself.
> >
> > As the callers of pidfd_get_pid() expect an increased reference count on
> > the pid we do so in the self case, reducing churn and avoiding any breakage
> > from existing logic which decrements this reference count.
> >
> > This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS,
> > ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and
> > pidfd_getfd() system calls.
> >
> > Things such as polling a pidfs and general fd operations are not supported,
> > this strictly provides the sentinel for APIs which explicitly accept a
> > pidfd.
> >
> > Reviewed-by: Shakeel Butt 
> > Signed-off-by: Lorenzo Stoakes 
> > ---
> >  include/linux/pid.h|  8 --
> >  include/uapi/linux/pidfd.h | 15 +++
> >  kernel/exit.c  |  3 ++-
> >  kernel/nsproxy.c   |  1 +
> >  kernel/pid.c   | 51 --
> >  5 files changed, 57 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/linux/pid.h b/include/linux/pid.h
> > index d466890e1b35..3b2ac7567a88 100644
> > --- a/include/linux/pid.h
> > +++ b/include/linux/pid.h
> > @@ -78,11 +78,15 @@ struct file;
> >   * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
> >   *
> >   * @pidfd:  The pidfd whose pid we want, or the fd of a /proc/ 
> > file if
> > - *  @alloc_proc is also set.
> > + *  @alloc_proc is also set, or PIDFD_SELF_* to refer to the 
> > current
> > + *  thread or thread group leader.
> >   * @allow_proc: If set, then an fd of a /proc/ file can be passed 
> > instead
> >   *  of a pidfd, and this will be used to determine the pid.
> > +
> >   * @flags:  Output variable, if non-NULL, then the file->f_flags of the
> > - *  pidfd will be set here.
> > + *  pidfd will be set here or If PIDFD_SELF_THREAD is set, 
> > this is
> > + *  set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP 
> > then
> > + *  this is set to zero.
> >   *
> >   * Returns: If successful, the pid associated with the pidfd, otherwise an
> >   *  error.
> > diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> > index 565fc0629fff..0ca2ebf906fd 100644
> > --- a/include/uapi/linux/pidfd.h
> > +++ b/include/uapi/linux/pidfd.h
> > @@ -29,4 +29,19 @@
> >  #define PIDFD_GET_USER_NAMESPACE  _IO(PIDFS_IOCTL_MAGIC, 9)
> >  #define PIDFD_GET_UTS_NAMESPACE   _IO(PIDFS_IOCTL_MAGIC, 10)
> >
> > +/*
> > + * Special sentinel values which can be used to refer to the current 
> > thread or
> > + * thread group leader (which from a userland perspective is the process).
> > + */
> > +#define PIDFD_SELF PIDFD_SELF_THREAD
> > +#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP
> > +
> > +#define PIDFD_SELF_THREAD  -100 /* Current thread. */
>
> This conflicts with AT_FDCWD, might be worth changing?
>
> > +#define PIDFD_SELF_THREAD_GROUP-200 /* Current thread group 
> > leader. */
>
> We might want to pick some range outside of the negative errno space
> (-4096 IIRC), since we have plenty of values to pick from (2^31 at
> least).

This is entirely up to Christian, I used the values he suggested in
review. But I agree w

Re: [PATCH v3] remoteproc: Add a new remoteproc state RPROC_DEFUNCT

2024-10-25 Thread Mathieu Poirier
On Fri, Oct 25, 2024 at 01:40:45PM +0530, Mukesh Ojha wrote:
> On Mon, Oct 21, 2024 at 09:12:47AM -0600, Mathieu Poirier wrote:
> > Hi Mukesh,
> > 
> > On Wed, Oct 16, 2024 at 10:25:46AM +0530, Mukesh Ojha wrote:
> > > Multiple call to glink_subdev_stop() for the same remoteproc can happen
> > > if rproc_stop() fails from Process-A that leaves the rproc state to
> > > RPROC_CRASHED state later a call to recovery_store from user space in
> > > Process B triggers rproc_trigger_recovery() of the same remoteproc to
> > > recover it results in NULL pointer dereference issue in
> > > qcom_glink_smem_unregister().
> > > 
> > > There is other side to this issue if we want to fix this via adding a
> > > NULL check on glink->edge which does not guarantees that the remoteproc
> > > will recover in second call from Process B as it has failed in the first
> > > Process A during SMC shutdown call and may again fail at the same call
> > > and rproc can not recover for such case.
> > > 
> > > Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of
> > > remoteproc and the only way to recover from it via system restart.
> > > 
> > >   Process-A   Process-B
> > > 
> > >   fatal error interrupt happens
> > > 
> > >   rproc_crash_handler_work()
> > > mutex_lock_interruptible(&rproc->lock);
> > > ...
> > > 
> > >rproc->state = RPROC_CRASHED;
> > > ...
> > > mutex_unlock(&rproc->lock);
> > > 
> > > rproc_trigger_recovery()
> > >  mutex_lock_interruptible(&rproc->lock);
> > > 
> > >   adsp_stop()
> > >   qcom_q6v5_pas 20c0.remoteproc: failed to shutdown: -22
> > >   remoteproc remoteproc3: can't stop rproc: -22
> > >  mutex_unlock(&rproc->lock);
> > 
> > Ok, that can happen.
> > 
> > > 
> > >   echo enabled > 
> > > /sys/class/remoteproc/remoteprocX/recovery
> > >   recovery_store()
> > >rproc_trigger_recovery()
> > > 
> > > mutex_lock_interruptible(&rproc->lock);
> > >  rproc_stop()
> > >   glink_subdev_stop()
> > > 
> > > qcom_glink_smem_unregister() ==|
> > >   
> > >|
> > >   
> > >V
> > 
> > I am missing some information here but I will _assume_ this is caused by
> > glink->edge being set to NULL [1] when glink_subdev_stop() is first called 
> > by
> > process A.  Instead of adding a new state to the core I think a better idea
> > would be to add a check for a NULL value on @smem in
> > qcom_glink_smem_unregister().  This is a problem that should be fixed in the
> > driver rather than the core.
> > 
> > [1]. 
> > https://elixir.bootlin.com/linux/v6.12-rc4/source/drivers/remoteproc/qcom_common.c#L213
> 
> 
> I did the same here [1] but after discussion with Bjorn, realized that
> remoteproc might not even recover and may fail in the second attempt as
> well and only way is reboot of the machine.

Whether in RPROC_CRASHED or RPROC_DEFUNCT state, the end result is the same -
manual intervention is needed.  I don't see why another state needs to be added.

> 
> [1]
> https://lore.kernel.org/lkml/20240925103351.1628788-1-quic_mo...@quicinc.com/
> 
> > 
> > > Unable to handle kernel 
> > > NULL pointer dereference
> > > at 
> > > virtual address 0358
> > > 
> > > Signed-off-by: Mukesh Ojha 
> > > ---
> > > Changes in v3:
> > >  - Fix kernel test reported error.
> > > 
> > > Changes in v2:
> > >  - Removed NULL pointer check instead added a new state to signify
> > >non-recoverable state of remoteproc.
> > > 
> > >  drivers/remoteproc/remoteproc_core.c  | 3 ++-
> > >  drivers/remoteproc/remoteproc_sysfs.c | 1 +
> > >  include/linux/remoteproc.h| 5 -
> > >  3 files changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/remoteproc/remoteproc_core.c 
> > > b/drivers/remoteproc/remoteproc_core.c
> > > index f276956f2c5c..c4e14503b971 100644
> > > --- a/drivers/remoteproc/remoteproc_core.c
> > > +++ b/drivers/remoteproc/remoteproc_core.c
> > > @@ -1727,6 +1727,7 @@ static int rproc_stop(struct rproc *rproc, bool 
> > > crashed)
> > >   /* power off the remote processor */
> > >   ret = rproc->ops->stop(rproc);
> > >   if (ret) {
> > > + rproc->state = RPROC_DEFUNCT;
> > >   dev_err(dev, "can't stop rproc: %d\n", ret);
> > >   return ret;
> > >   }
> > > @@ -1839,7 +1840,7 @@ int rproc_trigger_recovery(struct rproc *rproc)
> > >   return ret;
> > >  
> > >   /* State could have change

Re: [PATCH v7 1/3] modules: Support extended MODVERSIONS info

2024-10-25 Thread Matthew Maurer
> Sorry I realise it's version 7, but although the above looks correct it's
> kind of dense.
>
> I think the below would also work and is (I think) easier to follow, and
> is more obviously similar to the existing code. I'm sure your version is
> faster, but I don't think it's that performance critical.
>
> static void dedotify_ext_version_names(char *str_seq, unsigned long size)
> {
> char *end = str_seq + size;
> char *p = str_seq;
>
> while (p < end) {
> if (*p == '.')
> memmove(p, p + 1, end - p - 1);
>
> p += strlen(p) + 1;
> }
> }
>
> The tail of str_seq will be filled with nulls as long as the last string
> was null terminated.
>
> cheers

As you alluded to, what you're providing is potentially O(n^2) in the
number of symbols a module depends on - the existing code is O(n).
If leading dots on names are rare, this is probably fine. If they're
common, this will potentially make loading modules with a large number
of imported symbols actually take a measurable amount of additional
time.

That said, I take your point about complexity, and trust you to know
your arch's inputs/requirements, so if I don't hear back again I will
incorporate that into the next revision of the patch (to be produced
after the gendwarfksyms update comes out).



Re: [PATCH v4 01/14] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct

2024-10-25 Thread Jason Gunthorpe
On Fri, Oct 25, 2024 at 06:53:01PM +1100, Alexey Kardashevskiy wrote:
> > +#define iommufd_vdevice_alloc(ictx, drv_struct, member)
> > \
> > +   ({ \
> > +   static_assert( \
> > +   __same_type(struct iommufd_vdevice,\
> > +   ((struct drv_struct *)NULL)->member)); \
> > +   static_assert(offsetof(struct drv_struct, member.obj) == 0);   \
> > +   container_of(_iommufd_object_alloc(ictx,   \
> > +  sizeof(struct drv_struct),  \
> > +  IOMMUFD_OBJ_VDEVICE),   \
> > +struct drv_struct, member.obj);   \
> > +   })
> >   #endif
> 
> A nit: it hurts eyes to read:
> 
> mock_vdev = iommufd_vdevice_alloc(viommu->ictx, mock_vdevice, core);
> 
> vs.
> 
> mock_vdev = iommufd_vdevice_alloc(viommu->ictx, struct mock_vdevice, core);
> 
> as for the former I go searching for a "mock_vdevice" variable and for the
> latter it is clear it is 1) a macro 2) which does some type checking.
> 
> also, it makes it impossible to pass things like typeof(..) or a type from
> typedef. Thanks,

Makes sense to me

And the container_of() should not be used in these macros, the point
was to avoid it to make the PTR_ERR behavior cleraer. Just put a force
type cast

Jason



Re: [PATCH net-next 1/2] net: netconsole: selftests: Change the IP subnet

2024-10-25 Thread Petr Machata


Breno Leitao  writes:

> Use a less populated IP range to run the tests, as suggested by Petr in
> Link: https://lore.kernel.org/netdev/87ikvukv3s@nvidia.com/.
>
> Suggested-by: Petr Machata 
> Signed-off-by: Breno Leitao 
> ---
>  tools/testing/selftests/drivers/net/netcons_basic.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh 
> b/tools/testing/selftests/drivers/net/netcons_basic.sh
> index 06021b2059b7..4ad1e216c6b0 100755
> --- a/tools/testing/selftests/drivers/net/netcons_basic.sh
> +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh
> @@ -20,9 +20,9 @@ SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
>  
>  # Simple script to test dynamic targets in netconsole
>  SRCIF="" # to be populated later
> -SRCIP=192.168.1.1
> +SRCIP=192.168.2.1

I mentioned 192.0.2.0/24, which we commonly use in selftests. The range
is meant for examples and documentation, which is not exactly selftests,
but feels like it's not bending the rules too far. And we shouldn't see
the range in the wild.

>  DSTIF="" # to be populated later
> -DSTIP=192.168.1.2
> +DSTIP=192.168.2.2
>  
>  PORT=""
>  MSG="netconsole selftest"




[PATCH v5 09/13] iommufd/selftest: Add refcount to mock_iommu_device

2024-10-25 Thread Nicolin Chen
For an iommu_dev that can unplug (so far only this selftest does so), the
viommu->iommu_dev pointer has no guarantee of its life cycle after it is
copied from the idev->dev->iommu->iommu_dev.

Track the user count of the iommu_dev. Postpone the exit routine using a
completion, if refcount is unbalanced. The refcount inc/dec will be added
in the following patch.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/iommufd/selftest.c | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 92d753985640..2d33b35da704 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -533,14 +533,17 @@ static bool mock_domain_capable(struct device *dev, enum 
iommu_cap cap)
 
 static struct iopf_queue *mock_iommu_iopf_queue;
 
-static struct iommu_device mock_iommu_device = {
-};
+static struct mock_iommu_device {
+   struct iommu_device iommu_dev;
+   struct completion complete;
+   refcount_t users;
+} mock_iommu;
 
 static struct iommu_device *mock_probe_device(struct device *dev)
 {
if (dev->bus != &iommufd_mock_bus_type.bus)
return ERR_PTR(-ENODEV);
-   return &mock_iommu_device;
+   return &mock_iommu.iommu_dev;
 }
 
 static void mock_domain_page_response(struct device *dev, struct iopf_fault 
*evt,
@@ -1556,24 +1559,27 @@ int __init iommufd_test_init(void)
if (rc)
goto err_platform;
 
-   rc = iommu_device_sysfs_add(&mock_iommu_device,
+   rc = iommu_device_sysfs_add(&mock_iommu.iommu_dev,
&selftest_iommu_dev->dev, NULL, "%s",
dev_name(&selftest_iommu_dev->dev));
if (rc)
goto err_bus;
 
-   rc = iommu_device_register_bus(&mock_iommu_device, &mock_ops,
+   rc = iommu_device_register_bus(&mock_iommu.iommu_dev, &mock_ops,
  &iommufd_mock_bus_type.bus,
  &iommufd_mock_bus_type.nb);
if (rc)
goto err_sysfs;
 
+   refcount_set(&mock_iommu.users, 1);
+   init_completion(&mock_iommu.complete);
+
mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
 
return 0;
 
 err_sysfs:
-   iommu_device_sysfs_remove(&mock_iommu_device);
+   iommu_device_sysfs_remove(&mock_iommu.iommu_dev);
 err_bus:
bus_unregister(&iommufd_mock_bus_type.bus);
 err_platform:
@@ -1583,6 +1589,15 @@ int __init iommufd_test_init(void)
return rc;
 }
 
+static void iommufd_test_wait_for_users(void)
+{
+   if (refcount_dec_and_test(&mock_iommu.users))
+   return;
+   /* Time out waiting for iommu device user count to become 0 */
+   WARN_ON(!wait_for_completion_timeout(&mock_iommu.complete,
+msecs_to_jiffies(1)));
+}
+
 void iommufd_test_exit(void)
 {
if (mock_iommu_iopf_queue) {
@@ -1590,8 +1605,9 @@ void iommufd_test_exit(void)
mock_iommu_iopf_queue = NULL;
}
 
-   iommu_device_sysfs_remove(&mock_iommu_device);
-   iommu_device_unregister_bus(&mock_iommu_device,
+   iommufd_test_wait_for_users();
+   iommu_device_sysfs_remove(&mock_iommu.iommu_dev);
+   iommu_device_unregister_bus(&mock_iommu.iommu_dev,
&iommufd_mock_bus_type.bus,
&iommufd_mock_bus_type.nb);
bus_unregister(&iommufd_mock_bus_type.bus);
-- 
2.43.0




[PATCH v5 08/13] iommufd/selftest: Add mock_viommu_cache_invalidate

2024-10-25 Thread Nicolin Chen
Similar to the coverage of cache_invalidate_user for iotlb invalidation,
add a device cache and a viommu_cache_invalidate function to test it out.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/iommufd/iommufd_test.h | 25 +
 drivers/iommu/iommufd/selftest.c | 76 +++-
 2 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h 
b/drivers/iommu/iommufd/iommufd_test.h
index edced4ac7cd3..46558f83e734 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -54,6 +54,11 @@ enum {
MOCK_NESTED_DOMAIN_IOTLB_NUM = 4,
 };
 
+enum {
+   MOCK_DEV_CACHE_ID_MAX = 3,
+   MOCK_DEV_CACHE_NUM = 4,
+};
+
 struct iommu_test_cmd {
__u32 size;
__u32 op;
@@ -152,6 +157,7 @@ struct iommu_test_hw_info {
 /* Should not be equal to any defined value in enum iommu_hwpt_data_type */
 #define IOMMU_HWPT_DATA_SELFTEST 0xdead
 #define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef
+#define IOMMU_TEST_DEV_CACHE_DEFAULT 0xbaddad
 
 /**
  * struct iommu_hwpt_selftest
@@ -182,4 +188,23 @@ struct iommu_hwpt_invalidate_selftest {
 
 #define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef
 
+/* Should not be equal to any defined value in enum 
iommu_viommu_invalidate_data_type */
+#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST 0xdeadbeef
+#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef
+
+/**
+ * struct iommu_viommu_invalidate_selftest - Invalidation data for Mock VIOMMU
+ *
(IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST)
+ * @flags: Invalidate flags
+ * @cache_id: Invalidate cache entry index
+ *
+ * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @cache_id will be ignored
+ */
+struct iommu_viommu_invalidate_selftest {
+#define IOMMU_TEST_INVALIDATE_FLAG_ALL (1 << 0)
+   __u32 flags;
+   __u32 vdev_id;
+   __u32 cache_id;
+};
+
 #endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 33a0fcc0eff7..01556854f2f2 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -163,6 +163,7 @@ struct mock_dev {
struct device dev;
unsigned long flags;
int id;
+   u32 cache[MOCK_DEV_CACHE_NUM];
 };
 
 static inline struct mock_dev *to_mock_dev(struct device *dev)
@@ -606,9 +607,80 @@ mock_viommu_alloc_domain_nested(struct iommufd_viommu 
*viommu,
return &mock_nested->domain;
 }
 
+static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu,
+   struct iommu_user_data_array *array)
+{
+   struct iommu_viommu_invalidate_selftest *cmds;
+   struct iommu_viommu_invalidate_selftest *cur;
+   struct iommu_viommu_invalidate_selftest *end;
+   int rc;
+
+   /* A zero-length array is allowed to validate the array type */
+   if (array->entry_num == 0 &&
+   array->type == IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) {
+   array->entry_num = 0;
+   return 0;
+   }
+
+   cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL);
+   if (!cmds)
+   return -ENOMEM;
+   cur = cmds;
+   end = cmds + array->entry_num;
+
+   static_assert(sizeof(*cmds) == 3 * sizeof(u32));
+   rc = iommu_copy_struct_from_full_user_array(
+   cmds, sizeof(*cmds), array,
+   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST);
+   if (rc)
+   goto out;
+
+   while (cur != end) {
+   struct mock_dev *mdev;
+   struct device *dev;
+   int i;
+
+   if (cur->flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) {
+   rc = -EOPNOTSUPP;
+   goto out;
+   }
+
+   if (cur->cache_id > MOCK_DEV_CACHE_ID_MAX) {
+   rc = -EINVAL;
+   goto out;
+   }
+
+   xa_lock(&viommu->vdevs);
+   dev = iommufd_viommu_find_dev(viommu,
+ (unsigned long)cur->vdev_id);
+   if (!dev) {
+   xa_unlock(&viommu->vdevs);
+   rc = -EINVAL;
+   goto out;
+   }
+   mdev = container_of(dev, struct mock_dev, dev);
+
+   if (cur->flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) {
+   /* Invalidate all cache entries and ignore cache_id */
+   for (i = 0; i < MOCK_DEV_CACHE_NUM; i++)
+   mdev->cache[i] = 0;
+   } else {
+   mdev->cache[cur->cache_id] = 0;
+   }
+   xa_unlock(&viommu->vdevs);
+
+   cur++;
+   }
+out:
+   array->entry_num = cur - cmds;
+   kfree(cmds);
+   return rc;
+}
+
 static struct iommufd_viommu_ops mock_viommu_ops = {
.free = mock_viommu_free,
.alloc_domain_nested = mock_viommu_all

Re: [PATCH v4 4/4] selftests: pidfd: add tests for PIDFD_SELF_*

2024-10-25 Thread Lorenzo Stoakes
flock64 {
>   |^~~
> /usr/include/x86_64-linux-gnu/bits/fcntl.h:50:8: note: originally defined here
>50 | struct flock64
>   |^~~
> make: *** [../lib.mk:221: 
> /usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup/test_kill]
>  Error 1
> make: *** Waiting for unfinished jobs
> make: Leaving directory 
> '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup'
> 2024-10-23 12:53:56 make quicktest=1 run_tests -C cgroup
> make: Entering directory 
> '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup'
>   CC   test_kill
> In file included from /usr/x86_64-linux-gnu/include/asm/fcntl.h:1,
>  from /usr/x86_64-linux-gnu/include/linux/fcntl.h:5,
>  from /usr/x86_64-linux-gnu/include/linux/pidfd.h:7,
>  from ../pidfd/pidfd.h:19,
>  from test_kill.c:13:
> /usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:156:8: error: redefinition 
> of ‘struct f_owner_ex’
>   156 | struct f_owner_ex {
>   |^~
> In file included from /usr/include/x86_64-linux-gnu/bits/fcntl.h:61,
>  from /usr/include/fcntl.h:35,
>  from ../pidfd/pidfd.h:8:
> /usr/include/x86_64-linux-gnu/bits/fcntl-linux.h:274:8: note: originally 
> defined here
>   274 | struct f_owner_ex
>   |^~
> /usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:196:8: error: redefinition 
> of ‘struct flock’
>   196 | struct flock {
>   |^
> /usr/include/x86_64-linux-gnu/bits/fcntl.h:35:8: note: originally defined here
>35 | struct flock
>   |^
> /usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:210:8: error: redefinition 
> of ‘struct flock64’
>   210 | struct flock64 {
>   |^~~
> /usr/include/x86_64-linux-gnu/bits/fcntl.h:50:8: note: originally defined here
>50 | struct flock64
>   |^~~
> make: *** [../lib.mk:222: 
> /usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup/test_kill]
>  Error 1
> make: Leaving directory 
> '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup'
>
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20241025/202410251504.707d78fc-oliver.s...@intel.com
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>



Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process

2024-10-25 Thread Pedro Falcato
On Fri, Oct 25, 2024 at 10:41 AM Lorenzo Stoakes
 wrote:
>
> It is useful to be able to utilise the pidfd mechanism to reference the
> current thread or process (from a userland point of view - thread group
> leader from the kernel's point of view).
>
> Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
> PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
>
> For convenience and to avoid confusion from userland's perspective we alias
> these:
>
> * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
>   the user will want to use, as they would find it surprising if for
>   instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
>   and that failed.
>
> * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
>   have no concept of thread groups or what a thread group leader is, and
>   from userland's perspective and nomenclature this is what userland
>   considers to be a process.
>
> Due to the refactoring of the central __pidfd_get_pid() function we can
> implement this functionality centrally, providing the use of this sentinel
> in most functionality which utilises pidfd's.
>
> We need to explicitly adjust kernel_waitid_prepare() to permit this (though
> it wouldn't really make sense to use this there, we provide the ability for
> consistency).
>
> We explicitly disallow use of this in setns(), which would otherwise have
> required explicit custom handling, as it doesn't make sense to set the
> current calling thread to join the namespace of itself.
>
> As the callers of pidfd_get_pid() expect an increased reference count on
> the pid we do so in the self case, reducing churn and avoiding any breakage
> from existing logic which decrements this reference count.
>
> This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS,
> ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and
> pidfd_getfd() system calls.
>
> Things such as polling a pidfs and general fd operations are not supported,
> this strictly provides the sentinel for APIs which explicitly accept a
> pidfd.
>
> Reviewed-by: Shakeel Butt 
> Signed-off-by: Lorenzo Stoakes 
> ---
>  include/linux/pid.h|  8 --
>  include/uapi/linux/pidfd.h | 15 +++
>  kernel/exit.c  |  3 ++-
>  kernel/nsproxy.c   |  1 +
>  kernel/pid.c   | 51 --
>  5 files changed, 57 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index d466890e1b35..3b2ac7567a88 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -78,11 +78,15 @@ struct file;
>   * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
>   *
>   * @pidfd:  The pidfd whose pid we want, or the fd of a /proc/ file 
> if
> - *  @alloc_proc is also set.
> + *  @alloc_proc is also set, or PIDFD_SELF_* to refer to the 
> current
> + *  thread or thread group leader.
>   * @allow_proc: If set, then an fd of a /proc/ file can be passed 
> instead
>   *  of a pidfd, and this will be used to determine the pid.
> +
>   * @flags:  Output variable, if non-NULL, then the file->f_flags of the
> - *  pidfd will be set here.
> + *  pidfd will be set here or If PIDFD_SELF_THREAD is set, this 
> is
> + *  set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP 
> then
> + *  this is set to zero.
>   *
>   * Returns: If successful, the pid associated with the pidfd, otherwise an
>   *  error.
> diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> index 565fc0629fff..0ca2ebf906fd 100644
> --- a/include/uapi/linux/pidfd.h
> +++ b/include/uapi/linux/pidfd.h
> @@ -29,4 +29,19 @@
>  #define PIDFD_GET_USER_NAMESPACE  _IO(PIDFS_IOCTL_MAGIC, 9)
>  #define PIDFD_GET_UTS_NAMESPACE   _IO(PIDFS_IOCTL_MAGIC, 10)
>
> +/*
> + * Special sentinel values which can be used to refer to the current thread 
> or
> + * thread group leader (which from a userland perspective is the process).
> + */
> +#define PIDFD_SELF PIDFD_SELF_THREAD
> +#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP
> +
> +#define PIDFD_SELF_THREAD  -100 /* Current thread. */

This conflicts with AT_FDCWD, might be worth changing?

> +#define PIDFD_SELF_THREAD_GROUP-200 /* Current thread group leader. 
> */

We might want to pick some range outside of the negative errno space
(-4096 IIRC), since we have plenty of values to pick from (2^31 at
least).

> +static inline int pidfd_is_self_sentinel(pid_t pid)
> +{
> +   return pid == PIDFD_SELF_THREAD || pid == PIDFD_SELF_THREAD_GROUP;
> +}

Do we want this in the uapi header? Even if this is useful, it might
come with several drawbacks such as breaking scripts that parse kernel
headers (and a quick git grep suggests we do have static inlines in
headers, but in ra

Re: [PATCH V4 00/15] selftests/resctrl: Support diverse platforms with MBM and MBA tests

2024-10-25 Thread Ilpo Järvinen
On Thu, 24 Oct 2024, Reinette Chatre wrote:

> Hi Shuah,
> 
> On 10/24/24 3:36 PM, Shuah Khan wrote:
> > 
> > Is this patch series ready to be applied?
> > 
> 
> I believe it is close ... I would like to give Ilpo some time to peek
> at patches 2 and 10 to confirm if I got their fixes right this time. The
> rest of the series is ready.

Hi,

I took a look at those two patches now and they seemed fine to me so this 
series should be ready to go now.

-- 
 i.




Re: [PATCH v7 1/3] modules: Support extended MODVERSIONS info

2024-10-25 Thread Michael Ellerman
Matthew Maurer  writes:
> Adds a new format for MODVERSIONS which stores each field in a separate
> ELF section. This initially adds support for variable length names, but
> could later be used to add additional fields to MODVERSIONS in a
> backwards compatible way if needed. Any new fields will be ignored by
> old user tooling, unlike the current format where user tooling cannot
> tolerate adjustments to the format (for example making the name field
> longer).
>
> Since PPC munges its version records to strip leading dots, we reproduce
> the munging for the new format. Other architectures do not appear to
> have architecture-specific usage of this information.
>
> Signed-off-by: Matthew Maurer 
> ---
>  arch/powerpc/kernel/module_64.c | 24 ++-
>  kernel/module/internal.h| 11 +
>  kernel/module/main.c| 92 
> +
>  kernel/module/version.c | 45 
>  4 files changed, 162 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 
> e9bab599d0c2745e4d2b5cae04f2c56395c24654..02ada0b057cef6b2f29fa7519a5d52acac740ee5
>  100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -355,6 +355,24 @@ static void dedotify_versions(struct modversion_info 
> *vers,
>   }
>  }
>  
> +/* Same as normal versions, remove a leading dot if present. */
> +static void dedotify_ext_version_names(char *str_seq, unsigned long size)
> +{
> + unsigned long out = 0;
> + unsigned long in;
> + char last = '\0';
> +
> + for (in = 0; in < size; in++) {
> + /* Skip one leading dot */
> + if (last == '\0' && str_seq[in] == '.')
> + in++;
> + last = str_seq[in];
> + str_seq[out++] = last;
> + }
> + /* Zero the trailing portion of the names table for robustness */
> + memset(&str_seq[out], 0, size - out);
> +}

Sorry I realise it's version 7, but although the above looks correct it's
kind of dense.

I think the below would also work and is (I think) easier to follow, and
is more obviously similar to the existing code. I'm sure your version is
faster, but I don't think it's that performance critical.

static void dedotify_ext_version_names(char *str_seq, unsigned long size)
{
char *end = str_seq + size;
char *p = str_seq;

while (p < end) {
if (*p == '.')
memmove(p, p + 1, end - p - 1);

p += strlen(p) + 1;
}
}

The tail of str_seq will be filled with nulls as long as the last string
was null terminated.

cheers



[PATCH v5 02/13] iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage

2024-10-25 Thread Nicolin Chen
Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage of
IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc TEST_F
to cover the IOMMU_VDEVICE_ALLOC ioctl.

Signed-off-by: Nicolin Chen 
---
 tools/testing/selftests/iommu/iommufd_utils.h | 27 +++
 tools/testing/selftests/iommu/iommufd.c   | 20 ++
 .../selftests/iommu/iommufd_fail_nth.c|  4 +++
 3 files changed, 51 insertions(+)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h 
b/tools/testing/selftests/iommu/iommufd_utils.h
index ca09308dad6a..5b17d7b2ac5c 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -790,3 +790,30 @@ static int _test_cmd_viommu_alloc(int fd, __u32 device_id, 
__u32 hwpt_id,
EXPECT_ERRNO(_errno,   \
 _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id,  \
type, 0, viommu_id))
+
+static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id,
+  __u64 virt_id, __u32 *vdev_id)
+{
+   struct iommu_vdevice_alloc cmd = {
+   .size = sizeof(cmd),
+   .dev_id = idev_id,
+   .viommu_id = viommu_id,
+   .virt_id = virt_id,
+   };
+   int ret;
+
+   ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &cmd);
+   if (ret)
+   return ret;
+   if (vdev_id)
+   *vdev_id = cmd.out_vdevice_id;
+   return 0;
+}
+
+#define test_cmd_vdevice_alloc(viommu_id, idev_id, virt_id, vdev_id)   \
+   ASSERT_EQ(0, _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \
+virt_id, vdev_id))
+#define test_err_vdevice_alloc(_errno, viommu_id, idev_id, virt_id, vdev_id) \
+   EXPECT_ERRNO(_errno, \
+_test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id,   \
+virt_id, vdev_id))
diff --git a/tools/testing/selftests/iommu/iommufd.c 
b/tools/testing/selftests/iommu/iommufd.c
index b48b22d33ad4..93255403dee4 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -129,6 +129,7 @@ TEST_F(iommufd, cmd_length)
TEST_LENGTH(iommu_option, IOMMU_OPTION, val64);
TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved);
TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id);
+   TEST_LENGTH(iommu_vdevice_alloc, IOMMU_VDEVICE_ALLOC, __reserved2);
 #undef TEST_LENGTH
 }
 
@@ -2473,4 +2474,23 @@ TEST_F(iommufd_viommu, viommu_auto_destroy)
 {
 }
 
+TEST_F(iommufd_viommu, vdevice_alloc)
+{
+   uint32_t viommu_id = self->viommu_id;
+   uint32_t dev_id = self->device_id;
+   uint32_t vdev_id = 0;
+
+   if (dev_id) {
+   /* Set vdev_id to 0x99, unset it, and set to 0x88 */
+   test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+   test_err_vdevice_alloc(EEXIST, viommu_id, dev_id, 0x99,
+  &vdev_id);
+   test_ioctl_destroy(vdev_id);
+   test_cmd_vdevice_alloc(viommu_id, dev_id, 0x88, &vdev_id);
+   test_ioctl_destroy(vdev_id);
+   } else {
+   test_err_vdevice_alloc(ENOENT, viommu_id, dev_id, 0x99, NULL);
+   }
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c 
b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index e9a980b7729b..28f11b26f836 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -583,6 +583,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
uint32_t idev_id;
uint32_t hwpt_id;
uint32_t viommu_id;
+   uint32_t vdev_id;
__u64 iova;
 
self->fd = open("/dev/iommu", O_RDWR);
@@ -635,6 +636,9 @@ TEST_FAIL_NTH(basic_fail_nth, device)
   IOMMU_VIOMMU_TYPE_SELFTEST, 0, &viommu_id))
return -1;
 
+   if (_test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, 0, &vdev_id))
+   return -1;
+
return 0;
 }
 
-- 
2.43.0




Re: [PATCH net-next 1/2] net: netconsole: selftests: Change the IP subnet

2024-10-25 Thread Breno Leitao
On Fri, Oct 25, 2024 at 07:01:59PM +0200, Petr Machata wrote:
> 
> Breno Leitao  writes:
> 
> > Use a less populated IP range to run the tests, as suggested by Petr in
> > Link: https://lore.kernel.org/netdev/87ikvukv3s@nvidia.com/.
> >
> > Suggested-by: Petr Machata 
> > Signed-off-by: Breno Leitao 
> > ---
> >  tools/testing/selftests/drivers/net/netcons_basic.sh | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh 
> > b/tools/testing/selftests/drivers/net/netcons_basic.sh
> > index 06021b2059b7..4ad1e216c6b0 100755
> > --- a/tools/testing/selftests/drivers/net/netcons_basic.sh
> > +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh
> > @@ -20,9 +20,9 @@ SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
> >  
> >  # Simple script to test dynamic targets in netconsole
> >  SRCIF="" # to be populated later
> > -SRCIP=192.168.1.1
> > +SRCIP=192.168.2.1
> 
> I mentioned 192.0.2.0/24, which we commonly use in selftests. The range
> is meant for examples and documentation, which is not exactly selftests,
> but feels like it's not bending the rules too far. And we shouldn't see
> the range in the wild.

True, my mistake. I will update it to 192.0.2.1 and 192.0.2.2.



[PATCH RFC v2 0/5] Verify bias functionality for pinctrl_paris driver through new gpio test

2024-10-25 Thread Nícolas F . R . A . Prado
This series was motivated by the regression fixed by 166bf8af9122
("pinctrl: mediatek: common-v2: Fix broken bias-disable for
PULL_PU_PD_RSEL_TYPE"). A bug was introduced in the pinctrl_paris driver
which prevented certain pins from having their bias configured.

Running this test on the mt8195-tomato platform with the test plan
included below[1] shows the test passing with the fix applied, but failing
without the fix:

With fix:
  $ ./gpio-setget-config.py
  TAP version 13
  # Using test plan file: ./google,tomato.yaml
  1..3
  ok 1 pinctrl_paris.34.pull-up
  ok 2 pinctrl_paris.34.pull-down
  ok 3 pinctrl_paris.34.disabled
  # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Without fix:
  $ ./gpio-setget-config.py
  TAP version 13
  # Using test plan file: ./google,tomato.yaml
  1..3
  # Bias doesn't match: Expected pull-up, read pull-down.
  not ok 1 pinctrl_paris.34.pull-up
  ok 2 pinctrl_paris.34.pull-down
  # Bias doesn't match: Expected disabled, read pull-down.
  not ok 3 pinctrl_paris.34.disabled
  # Totals: pass:1 fail:2 xfail:0 xpass:0 skip:0 error:0

In order to achieve this, the first three patches expose bias
configuration through the GPIO API in the MediaTek pinctrl drivers,
notably, pinctrl_paris, patch 4 extends the gpio-mockup-cdev utility for
use by patch 5, and patch 5 introduces a new GPIO kselftest that takes a
test plan in YAML, which can be tailored per-platform to specify the
configurations to test, and sets and gets back each pin configuration to
verify that they match and thus that the driver is behaving as expected.

Since the GPIO uAPI only allows setting the pin configuration, getting
it back is done through pinconf-pins in the pinctrl debugfs folder.

The test currently only verifies bias but it would be easy to extend to
verify other pin configurations.

The test plan YAML file can be customized for each use-case and is
platform-dependant. For that reason, only an example is included in
patch 3 and the user is supposed to provide their test plan. That said,
the aim is to collect test plans for ease of use at [2].

[1] This is the test plan used for mt8195-tomato:

- label: "pinctrl_paris"
  tests:
  # Pin 34 has type MTK_PULL_PU_PD_RSEL_TYPE and is unused.
  # Setting bias to MTK_PULL_PU_PD_RSEL_TYPE pins was fixed by
  # 166bf8af9122 ("pinctrl: mediatek: common-v2: Fix broken bias-disable for 
PULL_PU_PD_RSEL_TYPE")
  - pin: 34
bias: "pull-up"
  - pin: 34
bias: "pull-down"
  - pin: 34
bias: "disabled"

[2] https://github.com/kernelci/platform-test-parameters

Signed-off-by: Nícolas F. R. A. Prado 
---
Changes in v2:
- Added patches 2 and 3 enabling the extra GPIO pin configurations on
  the other mediatek drivers: pinctrl-moore and pinctrl-mtk-common
- Tweaked function name in patch 1:
  mtk_pinconf_set -> mtk_paris_pin_config_set,
  to make it clear it is not a pinconf_ops
- Adjusted commit message to make it clear the current support is
  limited to pins supported by the EINT controller
- Link to v1: 
https://lore.kernel.org/r/20240909-kselftest-gpio-set-get-config-v1-0-16a065afc...@collabora.com

---
Nícolas F. R. A. Prado (5):
  pinctrl: mediatek: paris: Expose more configurations to GPIO set_config
  pinctrl: mediatek: moore: Expose more configurations to GPIO set_config
  pinctrl: mediatek: common: Expose more configurations to GPIO set_config
  selftest: gpio: Add wait flag to gpio-mockup-cdev
  selftest: gpio: Add a new set-get config test

 drivers/pinctrl/mediatek/pinctrl-moore.c   | 283 +++--
 drivers/pinctrl/mediatek/pinctrl-mtk-common.c  |  48 ++--
 drivers/pinctrl/mediatek/pinctrl-paris.c   |  26 +-
 tools/testing/selftests/gpio/Makefile  |   2 +-
 tools/testing/selftests/gpio/gpio-mockup-cdev.c|  14 +-
 .../gpio-set-get-config-example-test-plan.yaml |  15 ++
 .../testing/selftests/gpio/gpio-set-get-config.py  | 183 +
 7 files changed, 395 insertions(+), 176 deletions(-)
---
base-commit: a39230ecf6b3057f5897bc4744a790070cfbe7a8
change-id: 20240906-kselftest-gpio-set-get-config-6e5bb670c1a5

Best regards,
-- 
Nícolas F. R. A. Prado 




[PATCH RFC v2 1/5] pinctrl: mediatek: paris: Expose more configurations to GPIO set_config

2024-10-25 Thread Nícolas F . R . A . Prado
Currently the set_config callback in the gpio_chip registered by the
pinctrl_paris driver only supports configuring a single parameter on
specific pins (the input debounce of the EINT controller, on pins that
support it), even though many other configurations are already
implemented and available through the pinctrl API for configuration of
pins by the Devicetree and other drivers.

Expose all configurations currently implemented through the GPIO API so
they can also be set from userspace, which is particularly useful to
allow testing them from userspace.

Signed-off-by: Nícolas F. R. A. Prado 
---
 drivers/pinctrl/mediatek/pinctrl-paris.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/pinctrl/mediatek/pinctrl-paris.c 
b/drivers/pinctrl/mediatek/pinctrl-paris.c
index 
87e958d827bf939aa6006794287698be4936f25e..c9455de266a447ab7f5446c1511bef0ef9c9128e
 100644
--- a/drivers/pinctrl/mediatek/pinctrl-paris.c
+++ b/drivers/pinctrl/mediatek/pinctrl-paris.c
@@ -255,10 +255,9 @@ static int mtk_pinconf_get(struct pinctrl_dev *pctldev,
return err;
 }
 
-static int mtk_pinconf_set(struct pinctrl_dev *pctldev, unsigned int pin,
-  enum pin_config_param param, u32 arg)
+static int mtk_paris_pin_config_set(struct mtk_pinctrl *hw, unsigned int pin,
+   enum pin_config_param param, u32 arg)
 {
-   struct mtk_pinctrl *hw = pinctrl_dev_get_drvdata(pctldev);
const struct mtk_pin_desc *desc;
int err = -ENOTSUPP;
u32 reg;
@@ -795,9 +794,9 @@ static int mtk_pconf_group_set(struct pinctrl_dev *pctldev, 
unsigned group,
int i, ret;
 
for (i = 0; i < num_configs; i++) {
-   ret = mtk_pinconf_set(pctldev, grp->pin,
- pinconf_to_config_param(configs[i]),
- pinconf_to_config_argument(configs[i]));
+   ret = mtk_paris_pin_config_set(hw, grp->pin,
+  
pinconf_to_config_param(configs[i]),
+  
pinconf_to_config_argument(configs[i]));
if (ret < 0)
return ret;
 
@@ -937,18 +936,19 @@ static int mtk_gpio_set_config(struct gpio_chip *chip, 
unsigned int offset,
 {
struct mtk_pinctrl *hw = gpiochip_get_data(chip);
const struct mtk_pin_desc *desc;
-   u32 debounce;
+   enum pin_config_param param = pinconf_to_config_param(config);
+   u32 arg = pinconf_to_config_argument(config);
 
desc = (const struct mtk_pin_desc *)&hw->soc->pins[offset];
 
-   if (!hw->eint ||
-   pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE ||
-   desc->eint.eint_n == EINT_NA)
-   return -ENOTSUPP;
+   if (param == PIN_CONFIG_INPUT_DEBOUNCE) {
+   if (!hw->eint || desc->eint.eint_n == EINT_NA)
+   return -ENOTSUPP;
 
-   debounce = pinconf_to_config_argument(config);
+   return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, arg);
+   }
 
-   return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, debounce);
+   return mtk_paris_pin_config_set(hw, offset, param, arg);
 }
 
 static int mtk_build_gpiochip(struct mtk_pinctrl *hw)

-- 
2.47.0




[PATCH RFC v2 4/5] selftest: gpio: Add wait flag to gpio-mockup-cdev

2024-10-25 Thread Nícolas F . R . A . Prado
Add a -w flag to the gpio-mockup-cdev utility that causes the program to
wait until a signal is received before exiting, even when its behavior
is to retrieve the GPIO value of the line. This allows using this
utility to keep a GPIO line configured even when in input mode, which
will be relied on in other tests.

Signed-off-by: Nícolas F. R. A. Prado 
---
 tools/testing/selftests/gpio/gpio-mockup-cdev.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/gpio/gpio-mockup-cdev.c 
b/tools/testing/selftests/gpio/gpio-mockup-cdev.c
index 
d1640f44f8ac2a6fda7a5f75605f83fcaa165dc0..f674dcafa60a02cb1739f3cfae8963dc09efba74
 100644
--- a/tools/testing/selftests/gpio/gpio-mockup-cdev.c
+++ b/tools/testing/selftests/gpio/gpio-mockup-cdev.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CONSUMER   "gpio-mockup-cdev"
 
@@ -95,6 +96,7 @@ static void usage(char *prog)
printf("   (default is to leave bias unchanged):\n");
printf("-l: set line active low (default is active high)\n");
printf("-s: set line value (default is to get line value)\n");
+   printf("-w: wait even in get mode\n");
printf("-u: uAPI version to use (default is 2)\n");
exit(-1);
 }
@@ -120,13 +122,14 @@ int main(int argc, char *argv[])
unsigned int offset, val = 0, abiv;
uint32_t flags_v1;
uint64_t flags_v2;
+   bool wait = false;
 
abiv = 2;
ret = 0;
flags_v1 = GPIOHANDLE_REQUEST_INPUT;
flags_v2 = GPIO_V2_LINE_FLAG_INPUT;
 
-   while ((opt = getopt(argc, argv, "lb:s:u:")) != -1) {
+   while ((opt = getopt(argc, argv, "lb:s:u:w")) != -1) {
switch (opt) {
case 'l':
flags_v1 |= GPIOHANDLE_REQUEST_ACTIVE_LOW;
@@ -150,10 +153,14 @@ int main(int argc, char *argv[])
flags_v1 |= GPIOHANDLE_REQUEST_OUTPUT;
flags_v2 &= ~GPIO_V2_LINE_FLAG_INPUT;
flags_v2 |= GPIO_V2_LINE_FLAG_OUTPUT;
+   wait = true;
break;
case 'u':
abiv = atoi(optarg);
break;
+   case 'w':
+   wait = true;
+   break;
default:
usage(argv[0]);
}
@@ -183,9 +190,10 @@ int main(int argc, char *argv[])
return lfd;
}
 
-   if (flags_v2 & GPIO_V2_LINE_FLAG_OUTPUT) {
+   if (wait)
wait_signal();
-   } else {
+
+   if (flags_v2 & GPIO_V2_LINE_FLAG_INPUT) {
if (abiv == 1)
ret = get_value_v1(lfd);
else

-- 
2.47.0




[PATCH RFC v2 5/5] selftest: gpio: Add a new set-get config test

2024-10-25 Thread Nícolas F . R . A . Prado
Add a new kselftest that sets a configuration to a GPIO line and then
gets it back to verify that it was correctly carried out by the driver.

Setting a configuration is done through the GPIO uAPI, but retrieving it
is done through the debugfs interface since that is the only place where
it can be retrieved from userspace.

The test reads the test plan from a YAML file, which includes the chips
and pin settings to set and validate.

Signed-off-by: Nícolas F. R. A. Prado 
---
 tools/testing/selftests/gpio/Makefile  |   2 +-
 .../gpio-set-get-config-example-test-plan.yaml |  15 ++
 .../testing/selftests/gpio/gpio-set-get-config.py  | 183 +
 3 files changed, 199 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/gpio/Makefile 
b/tools/testing/selftests/gpio/Makefile
index 
e0884390447dcfffe4ca0b4fa0f1669463bb669c..bdfeb0c9aaddc436df77ada1d5ac0c80890960a7
 100644
--- a/tools/testing/selftests/gpio/Makefile
+++ b/tools/testing/selftests/gpio/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 
-TEST_PROGS := gpio-mockup.sh gpio-sim.sh
+TEST_PROGS := gpio-mockup.sh gpio-sim.sh gpio-set-get-config.py
 TEST_FILES := gpio-mockup-sysfs.sh
 TEST_GEN_PROGS_EXTENDED := gpio-mockup-cdev gpio-chip-info gpio-line-name
 CFLAGS += -O2 -g -Wall $(KHDR_INCLUDES)
diff --git 
a/tools/testing/selftests/gpio/gpio-set-get-config-example-test-plan.yaml 
b/tools/testing/selftests/gpio/gpio-set-get-config-example-test-plan.yaml
new file mode 100644
index 
..3b749be3c8dcf6822b7531424a6b1f8fca840a65
--- /dev/null
+++ b/tools/testing/selftests/gpio/gpio-set-get-config-example-test-plan.yaml
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0
+# Top-level contains a list of the GPIO chips that will be tested. Each one is
+# chosen based on the GPIO chip's info label.
+- label: "gpiochip_device_label"
+  # For each GPIO chip, multiple pin configurations can be tested, which are
+  # listed under 'tests'
+  tests:
+  # pin indicates the pin number to test
+  - pin: 34
+# bias can be 'pull-up', 'pull-down', 'disabled'
+bias: "pull-up"
+  - pin: 34
+bias: "pull-down"
+  - pin: 34
+bias: "disabled"
diff --git a/tools/testing/selftests/gpio/gpio-set-get-config.py 
b/tools/testing/selftests/gpio/gpio-set-get-config.py
new file mode 100755
index 
..6f1444c8d46bcfc226f414520b74f4a59725854f
--- /dev/null
+++ b/tools/testing/selftests/gpio/gpio-set-get-config.py
@@ -0,0 +1,183 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024 Collabora Ltd
+
+#
+# This test validates GPIO pin configuration. It takes a test plan in YAML (see
+# gpio-set-get-config-example-test-plan.yaml) and sets and gets back each pin
+# configuration described in the plan and checks that they match in order to
+# validate that they are being applied correctly.
+#
+# When the file name for the test plan is not provided through --test-plan, it
+# will be guessed based on the platform ID (DT compatible or DMI).
+#
+
+import time
+import os
+import sys
+import argparse
+import re
+import subprocess
+import glob
+import signal
+
+import yaml
+
+# Allow ksft module to be imported from different directory
+this_dir = os.path.dirname(os.path.realpath(__file__))
+sys.path.append(os.path.join(this_dir, "../kselftest/"))
+
+import ksft
+
+
+def config_pin(chip_dev, pin_config):
+flags = []
+if pin_config.get("bias"):
+flags += f"-b {pin_config['bias']}".split()
+flags += ["-w", chip_dev, str(pin_config["pin"])]
+gpio_mockup_cdev_path = os.path.join(this_dir, "gpio-mockup-cdev")
+return subprocess.Popen([gpio_mockup_cdev_path] + flags)
+
+
+def get_bias_debugfs(chip_debugfs_path, pin):
+with open(os.path.join(chip_debugfs_path, "pinconf-pins")) as f:
+for l in f:
+m = re.match(rf"pin {pin}.*bias (?P(pull )?\w+)", l)
+if m:
+return m.group("bias")
+
+
+def check_config_pin(chip, chip_debugfs_dir, pin_config):
+test_passed = True
+
+if pin_config.get("bias"):
+bias = get_bias_debugfs(chip_debugfs_dir, pin_config["pin"])
+# Convert "pull up" / "pull down" to "pull-up" / "pull-down"
+bias = bias.replace(" ", "-")
+if bias != pin_config["bias"]:
+ksft.print_msg(
+f"Bias doesn't match: Expected {pin_config['bias']}, read 
{bias}."
+)
+test_passed = False
+
+ksft.test_result(
+test_passed,
+f"{chip['label']}.{pin_config['pin']}.{pin_config['bias']}",
+)
+
+
+def get_devfs_chip_file(chip_dict):
+gpio_chip_info_path = os.path.join(this_dir, 'gpio-chip-info')
+for f in glob.glob("/dev/gpiochip*"):
+proc = subprocess.run(
+f"{gpio_chip_info_path} {f} label".split(), capture_output=True, 
text=True
+)
+if proc.returncode:
+ksft.print_msg(f"Error opening gpio device {

Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process

2024-10-25 Thread Lorenzo Stoakes
On Fri, Oct 25, 2024 at 11:44:34AM -0700, John Hubbard wrote:
> On 10/25/24 11:38 AM, Pedro Falcato wrote:
> > On Fri, Oct 25, 2024 at 6:41 PM John Hubbard  wrote:
> > >
> > > On 10/25/24 5:50 AM, Pedro Falcato wrote:
> > > > On Fri, Oct 25, 2024 at 10:41 AM Lorenzo Stoakes
> > > >  wrote:
> > > ...
> > > > > +static inline int pidfd_is_self_sentinel(pid_t pid)
> > > > > +{
> > > > > +   return pid == PIDFD_SELF_THREAD || pid == 
> > > > > PIDFD_SELF_THREAD_GROUP;
> > > > > +}
> > > >
> > > > Do we want this in the uapi header? Even if this is useful, it might
> > > > come with several drawbacks such as breaking scripts that parse kernel
> > > > headers (and a quick git grep suggests we do have static inlines in
> > > > headers, but in rather obscure ones) and breaking C89:
> > > >
> > >
> > > Let's please not say "C89" anymore, we've moved on! :)
> > >
> > > The notes in [1], which is now nearly 2.5 years old, discuss the move to
> > > C11, and specifically how to handle the inline keyword.
> >
> > That seems to only apply to the kernel internally, uapi headers are
>
> Yes.
>
> > included from userspace too (-std=c89 -pedantic doesn't know what a
> > gnu extension is). And uapi headers _generally_ keep to defining
> > constants and structs, nothing more.
>
> OK

Because a lot of people using -ANSI- C89 are importing a very new linux
feature header.

And let's ignore the hundreds of existing uses... OK.

The rules, unstated anywhere, are that we must support 1972-era C in an
optional header for a feature available only in new kernels because
somebody somewhere is using a VAX-11 and gosh darn it they can't change
their toolchain!

And you had better make sure you don't wear out those tape drums...

>
> > I don't know what the guidelines for uapi headers are nowadays, but we
> > generally want to not break userspace.
> >
> > >
> > > I think it's quite clear at this point, that we should not hold up new
> > > work, based on concerns about handling the inline keyword, nor about
> > > C89.
> >
> > Right, but the correct solution is probably to move
> > pidfd_is_self_sentinel to some other place, since it's not even
> > supposed to be used by userspace (it's semantically useless to
> > userspace, and it's only two users are in the kernel, kernel/pid.c and
> > exit.c).
> >
>
> Yes, if userspace absolutely doesn't need nor want this, then putting
> it in a non-uapi header does sound like the right move.

The bike shed should be blue! Wait no no, it should be red... Hang on
yellow yes! Yellow's great!

No wait - did we _test_ yellow in the way I wanted...

I mean for me this isn't a big deal - we declare the defines here, it makes
sense to have a very very simple inline function.

It's not like userspace is overly hurt by this...

Also I did explain there's no obvious header to put this in in the kernel
and I'm not introducing one sorry.

ANyway if you guys feel strong enough about this, I'll respin again and
just open-code this trivial check where it's used.

>
>
> thanks,
> --
> John Hubbard
>



[PATCH v12 7/7] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware

2024-10-25 Thread Arnaud Pouliquen
The new TEE remoteproc driver is used to manage remote firmware in a
secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is
introduced to delegate the loading of the firmware to the trusted
execution context. In such cases, the firmware should be signed and
adhere to the image format defined by the TEE.

Signed-off-by: Arnaud Pouliquen 
---
updates vs previous version
- rename structures, variables and function from tee_rproc_xxx to
  rproc_tee_xxx,
- rework code to take into account rproc_tee_register and
  rproc_tee_unregister APIs update,
- optimize code around dev_err_probe() when rproc_tee_register() fails.
---
 drivers/remoteproc/stm32_rproc.c | 57 ++--
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/remoteproc/stm32_rproc.c b/drivers/remoteproc/stm32_rproc.c
index 288bd70c7861..7875b26a38a5 100644
--- a/drivers/remoteproc/stm32_rproc.c
+++ b/drivers/remoteproc/stm32_rproc.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -255,6 +256,19 @@ static int stm32_rproc_release(struct rproc *rproc)
return 0;
 }
 
+static int stm32_rproc_tee_stop(struct rproc *rproc)
+{
+   int err;
+
+   stm32_rproc_request_shutdown(rproc);
+
+   err = rproc_tee_stop(rproc);
+   if (err)
+   return err;
+
+   return stm32_rproc_release(rproc);
+}
+
 static int stm32_rproc_prepare(struct rproc *rproc)
 {
struct device *dev = rproc->dev.parent;
@@ -691,8 +705,20 @@ static const struct rproc_ops st_rproc_ops = {
.get_boot_addr  = rproc_elf_get_boot_addr,
 };
 
+static const struct rproc_ops st_rproc_tee_ops = {
+   .prepare= stm32_rproc_prepare,
+   .start  = rproc_tee_start,
+   .stop   = stm32_rproc_tee_stop,
+   .kick   = stm32_rproc_kick,
+   .load   = rproc_tee_load_fw,
+   .parse_fw   = rproc_tee_parse_fw,
+   .find_loaded_rsc_table = rproc_tee_find_loaded_rsc_table,
+   .release_fw = rproc_tee_release_fw,
+};
+
 static const struct of_device_id stm32_rproc_match[] = {
{ .compatible = "st,stm32mp1-m4" },
+   { .compatible = "st,stm32mp1-m4-tee" },
{},
 };
 MODULE_DEVICE_TABLE(of, stm32_rproc_match);
@@ -853,15 +879,36 @@ static int stm32_rproc_probe(struct platform_device *pdev)
struct device_node *np = dev->of_node;
struct rproc *rproc;
unsigned int state;
+   u32 proc_id;
int ret;
 
ret = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(32));
if (ret)
return ret;
 
-   rproc = devm_rproc_alloc(dev, np->name, &st_rproc_ops, NULL, 
sizeof(*ddata));
-   if (!rproc)
-   return -ENOMEM;
+   if (of_device_is_compatible(np, "st,stm32mp1-m4-tee")) {
+   /*
+* Delegate the firmware management to the secure context.
+* The firmware loaded has to be signed.
+*/
+   ret = of_property_read_u32(np, "st,proc-id", &proc_id);
+   if (ret) {
+   dev_err(dev, "failed to read st,rproc-id property\n");
+   return ret;
+   }
+
+   rproc = devm_rproc_alloc(dev, np->name, &st_rproc_tee_ops, 
NULL, sizeof(*ddata));
+   if (!rproc)
+   return -ENOMEM;
+
+   ret = rproc_tee_register(dev, rproc, proc_id);
+   if (ret)
+   return dev_err_probe(dev, ret,  "signed firmware not 
supported by TEE\n");
+   } else {
+   rproc = devm_rproc_alloc(dev, np->name, &st_rproc_ops, NULL, 
sizeof(*ddata));
+   if (!rproc)
+   return -ENOMEM;
+   }
 
ddata = rproc->priv;
 
@@ -913,6 +960,8 @@ static int stm32_rproc_probe(struct platform_device *pdev)
dev_pm_clear_wake_irq(dev);
device_init_wakeup(dev, false);
}
+   rproc_tee_unregister(rproc);
+
return ret;
 }
 
@@ -933,6 +982,8 @@ static void stm32_rproc_remove(struct platform_device *pdev)
dev_pm_clear_wake_irq(dev);
device_init_wakeup(dev, false);
}
+
+   rproc_tee_unregister(rproc);
 }
 
 static int stm32_rproc_suspend(struct device *dev)
-- 
2.25.1




[PATCH v12 2/7] remoteproc: Add TEE support

2024-10-25 Thread Arnaud Pouliquen
Add a remoteproc TEE (Trusted Execution Environment) driver
that will be probed by the TEE bus. If the associated Trusted
application is supported on secure part this driver offers a client
interface to load a firmware by the secure part.
This firmware could be authenticated by the secure trusted application.

Signed-off-by: Arnaud Pouliquen 
---
Updates vs previous version:
- rename structures, functions, and variables from "tee_rproc_xxx" to
  "rproc_tee_xxx",
- update rproc_tee_register to return an error instead of
  "struct rproc_tee *" pointer,
- update rproc_tee_unregister argument from "struct rproc_tee *trproc"
  to "struct rproc *rproc",
- reword MODULE_DESCRIPTION.
---
 drivers/remoteproc/Kconfig  |  10 +
 drivers/remoteproc/Makefile |   1 +
 drivers/remoteproc/remoteproc_tee.c | 510 
 include/linux/remoteproc.h  |   4 +
 include/linux/remoteproc_tee.h  | 106 ++
 5 files changed, 631 insertions(+)
 create mode 100644 drivers/remoteproc/remoteproc_tee.c
 create mode 100644 include/linux/remoteproc_tee.h

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 955e4e38477e..d0284220a194 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -23,6 +23,16 @@ config REMOTEPROC_CDEV
 
  It's safe to say N if you don't want to use this interface.
 
+config REMOTEPROC_TEE
+   tristate "Remoteproc support by a TEE application"
+   depends on OPTEE
+   help
+ Support a remote processor with a TEE application. The Trusted
+ Execution Context is responsible for loading the trusted firmware
+ image and managing the remote processor's lifecycle.
+
+ It's safe to say N if you don't want to use remoteproc TEE.
+
 config IMX_REMOTEPROC
tristate "i.MX remoteproc support"
depends on ARCH_MXC
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 5ff4e2fee4ab..f77e0abe8349 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -11,6 +11,7 @@ remoteproc-y  += remoteproc_sysfs.o
 remoteproc-y   += remoteproc_virtio.o
 remoteproc-y   += remoteproc_elf_loader.o
 obj-$(CONFIG_REMOTEPROC_CDEV)  += remoteproc_cdev.o
+obj-$(CONFIG_REMOTEPROC_TEE)   += remoteproc_tee.o
 obj-$(CONFIG_IMX_REMOTEPROC)   += imx_rproc.o
 obj-$(CONFIG_IMX_DSP_REMOTEPROC)   += imx_dsp_rproc.o
 obj-$(CONFIG_INGENIC_VPU_RPROC)+= ingenic_rproc.o
diff --git a/drivers/remoteproc/remoteproc_tee.c 
b/drivers/remoteproc/remoteproc_tee.c
new file mode 100644
index ..f258b9304daf
--- /dev/null
+++ b/drivers/remoteproc/remoteproc_tee.c
@@ -0,0 +1,510 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) STMicroelectronics 2024
+ * Author: Arnaud Pouliquen 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MAX_TEE_PARAM_ARRAY_MEMBER 4
+
+/*
+ * Authentication of the firmware and load in the remote processor memory
+ *
+ * [in]  params[0].value.a:unique 32bit identifier of the remote processor
+ * [in] params[1].memref:  buffer containing the image of the 
buffer
+ */
+#define TA_RPROC_FW_CMD_LOAD_FW1
+
+/*
+ * Start the remote processor
+ *
+ * [in]  params[0].value.a:unique 32bit identifier of the remote processor
+ */
+#define TA_RPROC_FW_CMD_START_FW   2
+
+/*
+ * Stop the remote processor
+ *
+ * [in]  params[0].value.a:unique 32bit identifier of the remote processor
+ */
+#define TA_RPROC_FW_CMD_STOP_FW3
+
+/*
+ * Return the address of the resource table, or 0 if not found
+ * No check is done to verify that the address returned is accessible by
+ * the non secure context. If the resource table is loaded in a protected
+ * memory the access by the non secure context will lead to a data abort.
+ *
+ * [in]  params[0].value.a:unique 32bit identifier of the remote processor
+ * [out]  params[1].value.a:   32bit LSB resource table memory address
+ * [out]  params[1].value.b:   32bit MSB resource table memory address
+ * [out]  params[2].value.a:   32bit LSB resource table memory size
+ * [out]  params[2].value.b:   32bit MSB resource table memory size
+ */
+#define TA_RPROC_FW_CMD_GET_RSC_TABLE  4
+
+/*
+ * Return the address of the core dump
+ *
+ * [in]  params[0].value.a:unique 32bit identifier of the remote processor
+ * [out] params[1].memref: address of the core dump image if exist,
+ * else return Null
+ */
+#define TA_RPROC_FW_CMD_GET_COREDUMP   5
+
+/*
+ * Release remote processor firmware images and associated resources.
+ * This command should be used in case an error occurs between the loading of
+ * the firmware images (A_RPROC_CMD_LOAD_FW) and the starting of the remote
+ * processor (TA_RPROC_CMD_START_FW) or after stopping the remote processor
+

[PATCH v12 6/7] remoteproc: stm32: Create sub-functions to request shutdown and release

2024-10-25 Thread Arnaud Pouliquen
To prepare for the support of TEE remoteproc, create sub-functions
that can be used in both cases, with and without remoteproc TEE support.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/remoteproc/stm32_rproc.c | 82 +++-
 1 file changed, 49 insertions(+), 33 deletions(-)

diff --git a/drivers/remoteproc/stm32_rproc.c b/drivers/remoteproc/stm32_rproc.c
index 8c7f7950b80e..288bd70c7861 100644
--- a/drivers/remoteproc/stm32_rproc.c
+++ b/drivers/remoteproc/stm32_rproc.c
@@ -209,6 +209,52 @@ static int stm32_rproc_mbox_idx(struct rproc *rproc, const 
unsigned char *name)
return -EINVAL;
 }
 
+static void stm32_rproc_request_shutdown(struct rproc *rproc)
+{
+   struct stm32_rproc *ddata = rproc->priv;
+   int err, idx;
+
+   /* Request shutdown of the remote processor */
+   if (rproc->state != RPROC_OFFLINE && rproc->state != RPROC_CRASHED) {
+   idx = stm32_rproc_mbox_idx(rproc, STM32_MBX_SHUTDOWN);
+   if (idx >= 0 && ddata->mb[idx].chan) {
+   err = mbox_send_message(ddata->mb[idx].chan, "detach");
+   if (err < 0)
+   dev_warn(&rproc->dev, "warning: remote FW 
shutdown without ack\n");
+   }
+   }
+}
+
+static int stm32_rproc_release(struct rproc *rproc)
+{
+   struct stm32_rproc *ddata = rproc->priv;
+   unsigned int err = 0;
+
+   /* To allow platform Standby power mode, set remote proc Deep Sleep */
+   if (ddata->pdds.map) {
+   err = regmap_update_bits(ddata->pdds.map, ddata->pdds.reg,
+ddata->pdds.mask, 1);
+   if (err) {
+   dev_err(&rproc->dev, "failed to set pdds\n");
+   return err;
+   }
+   }
+
+   /* Update coprocessor state to OFF if available */
+   if (ddata->m4_state.map) {
+   err = regmap_update_bits(ddata->m4_state.map,
+ddata->m4_state.reg,
+ddata->m4_state.mask,
+M4_STATE_OFF);
+   if (err) {
+   dev_err(&rproc->dev, "failed to set copro state\n");
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static int stm32_rproc_prepare(struct rproc *rproc)
 {
struct device *dev = rproc->dev.parent;
@@ -519,17 +565,9 @@ static int stm32_rproc_detach(struct rproc *rproc)
 static int stm32_rproc_stop(struct rproc *rproc)
 {
struct stm32_rproc *ddata = rproc->priv;
-   int err, idx;
+   int err;
 
-   /* request shutdown of the remote processor */
-   if (rproc->state != RPROC_OFFLINE && rproc->state != RPROC_CRASHED) {
-   idx = stm32_rproc_mbox_idx(rproc, STM32_MBX_SHUTDOWN);
-   if (idx >= 0 && ddata->mb[idx].chan) {
-   err = mbox_send_message(ddata->mb[idx].chan, "detach");
-   if (err < 0)
-   dev_warn(&rproc->dev, "warning: remote FW 
shutdown without ack\n");
-   }
-   }
+   stm32_rproc_request_shutdown(rproc);
 
err = stm32_rproc_set_hold_boot(rproc, true);
if (err)
@@ -541,29 +579,7 @@ static int stm32_rproc_stop(struct rproc *rproc)
return err;
}
 
-   /* to allow platform Standby power mode, set remote proc Deep Sleep */
-   if (ddata->pdds.map) {
-   err = regmap_update_bits(ddata->pdds.map, ddata->pdds.reg,
-ddata->pdds.mask, 1);
-   if (err) {
-   dev_err(&rproc->dev, "failed to set pdds\n");
-   return err;
-   }
-   }
-
-   /* update coprocessor state to OFF if available */
-   if (ddata->m4_state.map) {
-   err = regmap_update_bits(ddata->m4_state.map,
-ddata->m4_state.reg,
-ddata->m4_state.mask,
-M4_STATE_OFF);
-   if (err) {
-   dev_err(&rproc->dev, "failed to set copro state\n");
-   return err;
-   }
-   }
-
-   return 0;
+   return stm32_rproc_release(rproc);
 }
 
 static void stm32_rproc_kick(struct rproc *rproc, int vqid)
-- 
2.25.1




Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process

2024-10-25 Thread John Hubbard

On 10/25/24 2:09 PM, Lorenzo Stoakes wrote:

On Fri, Oct 25, 2024 at 01:31:49PM -0700, John Hubbard wrote:

On 10/25/24 12:49 PM, Lorenzo Stoakes wrote:

On Fri, Oct 25, 2024 at 11:44:34AM -0700, John Hubbard wrote:

On 10/25/24 11:38 AM, Pedro Falcato wrote:

On Fri, Oct 25, 2024 at 6:41 PM John Hubbard  wrote:

...
I'll admit to being easily cowed by "you're breaking userspace" arguments.
Even when they start to get rather absurd. Because I can't easily tell where
the line is.

Maybe "-std=c89 -pedantic" is on the other side of the line. I'd like it
to be! :)


Well, apparently not...


Why not? Your arguments are clear and reasonable. Why shouldn't they prevail?

Please don't think that I have some sort of firm position here. I'm simply
looking for the right answer. And if that's different than something I
proposed earlier, no problem. The best answer should win.

...

The bike shed should be blue! Wait no no, it should be red... Hang on
yellow yes! Yellow's great!


Putting a header in the right location, so as to avoid breakage here or
there, is not bikeshedding. Sorry.


There are 312 uses of "static inline" already in UAPI headers, not all
quite as obscure as claimed.



OK, good. Let's lead with that. It seems very clear, then, that a new one
won't cause a problem.


Specifically requiring me and only me to support ansi C89 for a theorised
scenario is in my opinion bikeshedding, but I don't want to get into an
argument about something so petty :)


An argument about the definition of bikeshedding sounds delightfully
recursive, but yes, let's not. :)

...

ANyway if you guys feel strong enough about this, I'll respin again and
just open-code this trivial check where it's used.


No strong feelings, just hoping to help make a choice that gets you
closer to getting your patches committed.


I mean, you are saying I am breaking things and implying the series is
blocked on this, that sounds like a strong opinion, but again I'm not going
to argue.


Actually, Pedro's request kicked this off, and I was hoping to dismiss
it--again, in order to help move things along. My opinion is that we
should shun ancient toolchains and ancient systems whenever possible.

Somehow that got turned into "I'm trying to block the patchset". Really,
whatever works, follows The Rules (whatever we eventually understand
them to be), and doesn't cause someone *else* to come out of the
woodwork and claim a problem, is fine with me.



As with the requirement that I, only for my part of the change, must fix up
test header import, while I disagree I should be doing the fix, I did it
anyway as I am accommodating and reasonable.


I agree that pre-existing problems in selftests should not be your
problem.

By the way, I'm occasionally involved in helping fix up various
selftest-related problems, especially when they impact mm. Send me a
note if you have anything in mind that ought to be fixed up, I might be
able to help head off future grief in that area.



So fine - I'll respin and just open-code this as it's trivial and there's
no (other) sensible place to put it anyway.

A P.S. though - a very NOT theoretical issue with userspace is the import
of linux/fcntl.h in pidfd.h which seems to me to have been imported solely
for the kernel's sake.

A gentle suggestion (it seems I can't win - gentle suggestions are ignored,
tongue-in-cheek parody is taken to be mean... but anyway) is to do


Actually, these come across as sarcasm, especially in the context of
these emails that show you are becoming quite distraught.

I've met you several times at the conferences. We get along well. And
your work is top notch. So please consider that I'm very much supportive
of you and your work here.

I'm still trying to understand why you are recently sending these very
strong emails (Vlastimil also took some heat), but I see that you also
mentioned some long hours.

If my feedback is making things worse here, I'll try to adjust.
Selftests in general are a frustrating area.


thanks,
--
John Hubbard



something like:

#ifdef __KERNEL__
#include 
#else
#include 
#endif

At the top of the pidfd.h header. This must surely sting a _lot_ of people
in userland otherwise.

But this is out of scope for this change.





Re: [PATCH next] rcu: Unlock correctly in rcu_dump_cpu_stacks()

2024-10-25 Thread Paul E. McKenney
On Fri, Oct 25, 2024 at 10:06:43AM +0300, Dan Carpenter wrote:
> The unlock needs to be outside the } close curly braces for this if
> statement.  Otherwise it leads to a deadlock.
> 
> Fixes: 744e87210b1a ("rcu: Finer-grained grace-period-end checks in 
> rcu_dump_cpu_stacks()")
> Signed-off-by: Dan Carpenter 

Good catch!

Reviewed-by: Paul E. McKenney 

This is a regression from this past merge window, if I am keeping track.
So it is a candidate for going in before the next merge window opens.

Thanx, Paul

> ---
>  kernel/rcu/tree_stall.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 8994391b95c7..925fcdad5dea 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -357,8 +357,8 @@ static void rcu_dump_cpu_stacks(unsigned long gp_seq)
>   pr_err("Offline CPU %d blocking current 
> GP.\n", cpu);
>   else
>   dump_cpu_task(cpu);
> - raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>   }
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>   }
>   printk_deferred_exit();
>   }
> -- 
> 2.45.2
> 



[PATCH RFC v2 3/5] pinctrl: mediatek: common: Expose more configurations to GPIO set_config

2024-10-25 Thread Nícolas F . R . A . Prado
Currently the set_config callback in the gpio_chip registered by the
pinctrl-mtk-common driver only supports configuring a single parameter
on specific pins (the input debounce of the EINT controller, on pins
that support it), even though many other configurations are already
implemented and available through the pinctrl API for configuration of
pins by the Devicetree and other drivers.

Expose all configurations currently implemented through the GPIO API so
they can also be set from userspace, which is particularly useful to
allow testing them from userspace.
---
 drivers/pinctrl/mediatek/pinctrl-mtk-common.c | 48 ---
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c 
b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
index 
91edb539925a49b4302866b9ac36f580cc189fb5..7f9764b474c4e7d0d4c3d6e542bdb7df0264daec
 100644
--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
+++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
@@ -62,13 +62,12 @@ static unsigned int mtk_get_port(struct mtk_pinctrl *pctl, 
unsigned long pin)
<< pctl->devdata->port_shf;
 }
 
-static int mtk_pmx_gpio_set_direction(struct pinctrl_dev *pctldev,
-   struct pinctrl_gpio_range *range, unsigned offset,
-   bool input)
+static int mtk_common_pin_set_direction(struct mtk_pinctrl *pctl,
+   unsigned int offset,
+   bool input)
 {
unsigned int reg_addr;
unsigned int bit;
-   struct mtk_pinctrl *pctl = pinctrl_dev_get_drvdata(pctldev);
 
reg_addr = mtk_get_port(pctl, offset) + pctl->devdata->dir_offset;
bit = BIT(offset & pctl->devdata->mode_mask);
@@ -86,6 +85,15 @@ static int mtk_pmx_gpio_set_direction(struct pinctrl_dev 
*pctldev,
return 0;
 }
 
+static int mtk_pmx_gpio_set_direction(struct pinctrl_dev *pctldev,
+   struct pinctrl_gpio_range *range, unsigned int offset,
+   bool input)
+{
+   struct mtk_pinctrl *pctl = pinctrl_dev_get_drvdata(pctldev);
+
+   return mtk_common_pin_set_direction(pctl, offset, input);
+}
+
 static void mtk_gpio_set(struct gpio_chip *chip, unsigned offset, int value)
 {
unsigned int reg_addr;
@@ -363,12 +371,11 @@ static int mtk_pconf_set_pull_select(struct mtk_pinctrl 
*pctl,
return 0;
 }
 
-static int mtk_pconf_parse_conf(struct pinctrl_dev *pctldev,
+static int mtk_pconf_parse_conf(struct mtk_pinctrl *pctl,
unsigned int pin, enum pin_config_param param,
-   enum pin_config_param arg)
+   u32 arg)
 {
int ret = 0;
-   struct mtk_pinctrl *pctl = pinctrl_dev_get_drvdata(pctldev);
 
switch (param) {
case PIN_CONFIG_BIAS_DISABLE:
@@ -381,15 +388,15 @@ static int mtk_pconf_parse_conf(struct pinctrl_dev 
*pctldev,
ret = mtk_pconf_set_pull_select(pctl, pin, true, false, arg);
break;
case PIN_CONFIG_INPUT_ENABLE:
-   mtk_pmx_gpio_set_direction(pctldev, NULL, pin, true);
+   mtk_common_pin_set_direction(pctl, pin, true);
ret = mtk_pconf_set_ies_smt(pctl, pin, arg, param);
break;
case PIN_CONFIG_OUTPUT:
mtk_gpio_set(pctl->chip, pin, arg);
-   ret = mtk_pmx_gpio_set_direction(pctldev, NULL, pin, false);
+   ret = mtk_common_pin_set_direction(pctl, pin, false);
break;
case PIN_CONFIG_INPUT_SCHMITT_ENABLE:
-   mtk_pmx_gpio_set_direction(pctldev, NULL, pin, true);
+   mtk_common_pin_set_direction(pctl, pin, true);
ret = mtk_pconf_set_ies_smt(pctl, pin, arg, param);
break;
case PIN_CONFIG_DRIVE_STRENGTH:
@@ -421,7 +428,7 @@ static int mtk_pconf_group_set(struct pinctrl_dev *pctldev, 
unsigned group,
int i, ret;
 
for (i = 0; i < num_configs; i++) {
-   ret = mtk_pconf_parse_conf(pctldev, g->pin,
+   ret = mtk_pconf_parse_conf(pctl, g->pin,
pinconf_to_config_param(configs[i]),
pinconf_to_config_argument(configs[i]));
if (ret < 0)
@@ -870,19 +877,20 @@ static int mtk_gpio_set_config(struct gpio_chip *chip, 
unsigned offset,
struct mtk_pinctrl *pctl = gpiochip_get_data(chip);
const struct mtk_desc_pin *pin;
unsigned long eint_n;
-   u32 debounce;
+   enum pin_config_param param = pinconf_to_config_param(config);
+   u32 arg = pinconf_to_config_argument(config);
 
-   if (pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE)
-   return -ENOTSUPP;
+   if (param == PIN_CONFIG_INPUT_DEBOUNCE) {
+   pin = pctl->devdata->pins + offset;
+   if (pin->eint.eintnum == NO_EINT_SUPPORT)
+   return -EINVAL;
 
-   pin 

Re: [PATCH v2] vsock/test: fix failures due to wrong SO_RCVLOWAT parameter

2024-10-25 Thread Stefano Garzarella

On Thu, Oct 24, 2024 at 11:10:58AM -0500, Konstantin Shkolnyy wrote:

This happens on 64-bit big-endian machines.
SO_RCVLOWAT requires an int parameter. However, instead of int, the test
uses unsigned long in one place and size_t in another. Both are 8 bytes
long on 64-bit machines. The kernel, having received the 8 bytes, doesn't
test for the exact size of the parameter, it only cares that it's >=
sizeof(int), and casts the 4 lower-addressed bytes to an int, which, on
a big-endian machine, contains 0. 0 doesn't trigger an error, SO_RCVLOWAT
returns with success and the socket stays with the default SO_RCVLOWAT = 1,
which results in test failures.

Fixes: b1346338fbae ("vsock_test: POLLIN + SO_RCVLOWAT test")
Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic")
Signed-off-by: Konstantin Shkolnyy 
---

Notes:
   The problem was found on s390 (big endian), while x86-64 didn't show it. 
After this fix, all tests pass on s390.
Changes for v2:
- add "Fixes:" lines to the commit message


LGTM!

Reviewed-by: Stefano Garzarella 



tools/testing/vsock/vsock_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 8d38dbf8f41f..7fd25b814b4b 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -835,7 +835,7 @@ static void test_stream_poll_rcvlowat_server(const struct 
test_opts *opts)

static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
{
-   unsigned long lowat_val = RCVLOWAT_BUF_SIZE;
+   int lowat_val = RCVLOWAT_BUF_SIZE;
char buf[RCVLOWAT_BUF_SIZE];
struct pollfd fds;
short poll_flags;
@@ -1357,7 +1357,7 @@ static void 
test_stream_rcvlowat_def_cred_upd_client(const struct test_opts *opt
static void test_stream_credit_update_test(const struct test_opts *opts,
   bool low_rx_bytes_test)
{
-   size_t recv_buf_size;
+   int recv_buf_size;
struct pollfd fds;
size_t buf_size;
void *buf;
--
2.34.1






RE: [PATCH v4 04/11] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

2024-10-25 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Tuesday, October 22, 2024 8:19 AM
> 
> Add a new ioctl for user space to do a vIOMMU allocation. It must be based
> on a nesting parent HWPT, so take its refcount.
> 
> If an IOMMU driver supports a driver-managed vIOMMU object, it must
> define

why highlight 'driver-managed', implying a core-managed vIOMMU 
object some day?

> +/**
> + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC)
> + * @size: sizeof(struct iommu_viommu_alloc)
> + * @flags: Must be 0
> + * @type: Type of the virtual IOMMU. Must be defined in enum
> iommu_viommu_type
> + * @dev_id: The device's physical IOMMU will be used to back the virtual
> IOMMU
> + * @hwpt_id: ID of a nesting parent HWPT to associate to
> + * @out_viommu_id: Output virtual IOMMU ID for the allocated object
> + *
> + * Allocate a virtual IOMMU object that represents the underlying physical
> + * IOMMU's virtualization support. The vIOMMU object is a security-isolated
> + * slice of the physical IOMMU HW that is unique to a specific VM.

the object itself is a software abstraction, while a 'slice' is a set of
real hw resources.




RE: [PATCH v4 05/11] iommufd: Add domain_alloc_nested op to iommufd_viommu_ops

2024-10-25 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Tuesday, October 22, 2024 8:19 AM
> 
> Allow IOMMU driver to use a vIOMMU object that holds a nesting parent
> hwpt/domain to allocate a nested domain.
> 
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian 



Re: [PATCH 2/2] rcuscale: Remove redundant WARN_ON_ONCE() splat

2024-10-25 Thread Uladzislau Rezki
On Thu, Oct 24, 2024 at 01:28:24PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 24, 2024 at 06:45:58PM +0200, Uladzislau Rezki (Sony) wrote:
> > There are two places where WARN_ON_ONCE() is called two times
> > in the error paths. One which is encapsulated into if() condition
> > and another one, which is unnecessary, is placed in the brackets.
> > 
> > Remove an extra WARN_ON_ONCE() splat which is in brackets.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) 
> 
> For both:
> 
> Reviewed-by: Paul E. McKenney 
> 
Thank you :)

--
Uladzislau Rezki



Re: [PATCH] sched_ext: Fix function pointer type mismatches in BPF selftests

2024-10-25 Thread Tejun Heo
On Thu, Oct 24, 2024 at 10:46:09AM +0530, Vishal Chourasia wrote:
> Fix incompatible function pointer type warnings in sched_ext BPF selftests by
> explicitly casting the function pointers when initializing struct_ops.
> This addresses multiple -Wincompatible-function-pointer-types warnings from 
> the
> clang compiler where function signatures didn't match exactly.
> 
> The void * cast ensures the compiler accepts the function pointer
> assignment despite minor type differences in the parameters.
> 
> Signed-off-by: Vishal Chourasia 

Applied to sched_ext/for-6.12-fixes.

Thanks.

-- 
tejun



  1   2   >