Re: [PATCH RFC 1/3] pinctrl: mediatek: paris: Expose more configurations to GPIO set_config
On Thu, Oct 24, 2024 at 05:17:05PM +0200, AngeloGioacchino Del Regno wrote: > Il 11/09/24 12:10, AngeloGioacchino Del Regno ha scritto: > > Il 09/09/24 20:37, Nícolas F. R. A. Prado ha scritto: > > > Currently the set_config callback in the gpio_chip registered by the > > > pinctrl_paris driver only supports PIN_CONFIG_INPUT_DEBOUNCE, despite > > > > [...] only supports operations configuring the input debounce parameter > > of the EINT controller and denies configuring params on the other AP GPIOs > > [...] > > > > (reword as needed) > > > > > many other configurations already being implemented and available > > > through the pinctrl API for configuration of pins by the Devicetree and > > > other drivers. > > > > > > Expose all configurations currently implemented through the GPIO API so > > > they can also be set from userspace, which is particularly useful to > > > allow testing them from userspace. > > > > > > Signed-off-by: Nícolas F. R. A. Prado > > > --- > > > drivers/pinctrl/mediatek/pinctrl-paris.c | 20 ++-- > > > > You can do the same for pinctrl-moore too, it's trivial. > > > > Other than that, I agree about performing this change, as this may be useful > > for more than just testing. > > > > Nicolas, please don't forget to respin this patch. I was hoping to get some feedback on the test itself as well, particularly from Linus as the pinctrl maintainer, but it's also been a while so I'll send a v2 with the feedback here addressed. Thanks, Nícolas
[PATCH v2 2/3] timers: Use __raise_softirq_irqoff() to raise the softirq.
As an optimisation use __raise_softirq_irqoff() to raise the softirq. This is always called from an interrupt handler, interrupts are already disabled so it can be reduced to just or set softirq flag and let softirq be invoked on return from interrupt. Use __raise_softirq_irqoff() to raise the softirq. Signed-off-by: Sebastian Andrzej Siewior --- kernel/time/timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 0fc9d066a7be4..1759de934284c 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -2499,7 +2499,7 @@ static void run_local_timers(void) */ if (time_after_eq(jiffies, READ_ONCE(base->next_expiry)) || (i == BASE_DEF && tmigr_requires_handle_remote())) { - raise_softirq(TIMER_SOFTIRQ); + __raise_softirq_irqoff(TIMER_SOFTIRQ); return; } } -- 2.45.2
[PATCH V3] selftests: livepatch: add test cases of stack_order sysfs interface
Add selftest test cases to sysfs attribute 'stack_order'. Suggested-by: Petr Mladek Signed-off-by: Wardenjohn --- .../testing/selftests/livepatch/test-sysfs.sh | 71 +++ 1 file changed, 71 insertions(+) diff --git a/tools/testing/selftests/livepatch/test-sysfs.sh b/tools/testing/selftests/livepatch/test-sysfs.sh index 05a14f5a7bfb..e44a051be307 100755 --- a/tools/testing/selftests/livepatch/test-sysfs.sh +++ b/tools/testing/selftests/livepatch/test-sysfs.sh @@ -5,6 +5,8 @@ . $(dirname $0)/functions.sh MOD_LIVEPATCH=test_klp_livepatch +MOD_LIVEPATCH2=test_klp_callbacks_demo +MOD_LIVEPATCH3=test_klp_syscall setup_config @@ -19,6 +21,8 @@ check_sysfs_rights "$MOD_LIVEPATCH" "enabled" "-rw-r--r--" check_sysfs_value "$MOD_LIVEPATCH" "enabled" "1" check_sysfs_rights "$MOD_LIVEPATCH" "force" "--w---" check_sysfs_rights "$MOD_LIVEPATCH" "replace" "-r--r--r--" +check_sysfs_rights "$MOD_LIVEPATCH" "stack_order" "-r--r--r--" +check_sysfs_value "$MOD_LIVEPATCH" "stack_order" "1" check_sysfs_rights "$MOD_LIVEPATCH" "transition" "-r--r--r--" check_sysfs_value "$MOD_LIVEPATCH" "transition" "0" check_sysfs_rights "$MOD_LIVEPATCH" "vmlinux/patched" "-r--r--r--" @@ -131,4 +135,71 @@ livepatch: '$MOD_LIVEPATCH': completing unpatching transition livepatch: '$MOD_LIVEPATCH': unpatching complete % rmmod $MOD_LIVEPATCH" +start_test "sysfs test stack_order value" + +load_lp $MOD_LIVEPATCH + +check_sysfs_value "$MOD_LIVEPATCH" "stack_order" "1" + +load_lp $MOD_LIVEPATCH2 + +check_sysfs_value "$MOD_LIVEPATCH2" "stack_order" "2" + +load_lp $MOD_LIVEPATCH3 + +check_sysfs_value "$MOD_LIVEPATCH3" "stack_order" "3" + +disable_lp $MOD_LIVEPATCH2 +unload_lp $MOD_LIVEPATCH2 + +check_sysfs_value "$MOD_LIVEPATCH" "stack_order" "1" +check_sysfs_value "$MOD_LIVEPATCH3" "stack_order" "2" + +disable_lp $MOD_LIVEPATCH3 +unload_lp $MOD_LIVEPATCH3 + +disable_lp $MOD_LIVEPATCH +unload_lp $MOD_LIVEPATCH + +check_result "% insmod test_modules/$MOD_LIVEPATCH.ko +livepatch: enabling patch '$MOD_LIVEPATCH' +livepatch: '$MOD_LIVEPATCH': initializing patching transition +livepatch: '$MOD_LIVEPATCH': starting patching transition +livepatch: '$MOD_LIVEPATCH': completing patching transition +livepatch: '$MOD_LIVEPATCH': patching complete +% insmod test_modules/$MOD_LIVEPATCH2.ko +livepatch: enabling patch '$MOD_LIVEPATCH2' +livepatch: '$MOD_LIVEPATCH2': initializing patching transition +$MOD_LIVEPATCH2: pre_patch_callback: vmlinux +livepatch: '$MOD_LIVEPATCH2': starting patching transition +livepatch: '$MOD_LIVEPATCH2': completing patching transition +$MOD_LIVEPATCH2: post_patch_callback: vmlinux +livepatch: '$MOD_LIVEPATCH2': patching complete +% insmod test_modules/$MOD_LIVEPATCH3.ko +livepatch: enabling patch '$MOD_LIVEPATCH3' +livepatch: '$MOD_LIVEPATCH3': initializing patching transition +livepatch: '$MOD_LIVEPATCH3': starting patching transition +livepatch: '$MOD_LIVEPATCH3': completing patching transition +livepatch: '$MOD_LIVEPATCH3': patching complete +% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH2/enabled +livepatch: '$MOD_LIVEPATCH2': initializing unpatching transition +$MOD_LIVEPATCH2: pre_unpatch_callback: vmlinux +livepatch: '$MOD_LIVEPATCH2': starting unpatching transition +livepatch: '$MOD_LIVEPATCH2': completing unpatching transition +$MOD_LIVEPATCH2: post_unpatch_callback: vmlinux +livepatch: '$MOD_LIVEPATCH2': unpatching complete +% rmmod $MOD_LIVEPATCH2 +% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH3/enabled +livepatch: '$MOD_LIVEPATCH3': initializing unpatching transition +livepatch: '$MOD_LIVEPATCH3': starting unpatching transition +livepatch: '$MOD_LIVEPATCH3': completing unpatching transition +livepatch: '$MOD_LIVEPATCH3': unpatching complete +% rmmod $MOD_LIVEPATCH3 +% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH/enabled +livepatch: '$MOD_LIVEPATCH': initializing unpatching transition +livepatch: '$MOD_LIVEPATCH': starting unpatching transition +livepatch: '$MOD_LIVEPATCH': completing unpatching transition +livepatch: '$MOD_LIVEPATCH': unpatching complete +% rmmod $MOD_LIVEPATCH" + exit 0 -- 2.43.5
Re: [PATCH V2 4/4] selftests/mm: skip virtual_address_range tests on riscv
On Tue, 08 Oct 2024 02:41:41 PDT (-0700), zhangchun...@iscas.ac.cn wrote: RISC-V doesn't currently have the behavior of restricting the virtual address space which virtual_address_range tests check, this will cause the tests fail. So lets disable the whole test suite for riscv64 for now, not build it and run_vmtests.sh will skip it if it is not present. Reviewed-by: Charlie Jenkins Signed-off-by: Chunyan Zhang --- V1: https://lore.kernel.org/linux-mm/ZuOuedBpS7i3T%2Fo0@ghost/T/ --- tools/testing/selftests/mm/Makefile | 2 ++ tools/testing/selftests/mm/run_vmtests.sh | 10 ++ 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 02e1204971b0..76a378c5c141 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -115,7 +115,9 @@ endif ifneq (,$(filter $(ARCH),arm64 mips64 parisc64 powerpc riscv64 s390x sparc64 x86_64 s390)) TEST_GEN_FILES += va_high_addr_switch +ifneq ($(ARCH),riscv64) TEST_GEN_FILES += virtual_address_range +endif TEST_GEN_FILES += write_to_hugetlbfs endif diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index c5797ad1d37b..4493bfd1911c 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -347,10 +347,12 @@ if [ $VADDR64 -ne 0 ]; then # allows high virtual address allocation requests independent # of platform's physical memory. - prev_policy=$(cat /proc/sys/vm/overcommit_memory) - echo 1 > /proc/sys/vm/overcommit_memory - CATEGORY="hugevm" run_test ./virtual_address_range - echo $prev_policy > /proc/sys/vm/overcommit_memory + if [ -x ./virtual_address_range ]; then + prev_policy=$(cat /proc/sys/vm/overcommit_memory) + echo 1 > /proc/sys/vm/overcommit_memory + CATEGORY="hugevm" run_test ./virtual_address_range + echo $prev_policy > /proc/sys/vm/overcommit_memory + fi # va high address boundary switch test ARCH_ARM64="arm64" Acked-by: Palmer Dabbelt (I'm taking the first two as they're RISC-V bits)
[PATCH 2/2] rcuscale: Remove redundant WARN_ON_ONCE() splat
There are two places where WARN_ON_ONCE() is called two times in the error paths. One which is encapsulated into if() condition and another one, which is unnecessary, is placed in the brackets. Remove an extra WARN_ON_ONCE() splat which is in brackets. Signed-off-by: Uladzislau Rezki (Sony) --- kernel/rcu/rcuscale.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c index de7d511e6be4..1d8bb603c289 100644 --- a/kernel/rcu/rcuscale.c +++ b/kernel/rcu/rcuscale.c @@ -889,13 +889,11 @@ kfree_scale_init(void) if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) { pr_alert("ERROR: call_rcu() CBs are not being lazy as expected!\n"); - WARN_ON_ONCE(1); goto unwind; } if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start > 3 * HZ)) { pr_alert("ERROR: call_rcu() CBs are being too lazy!\n"); - WARN_ON_ONCE(1); goto unwind; } } -- 2.39.5
Re: [PATCH v4 01/14] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct
On 22/10/24 11:20, Nicolin Chen wrote: Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device, i.e. iommufd_device (idev) object, against an iommufd_viommu (vIOMMU) object in the VM. This vDEVICE object (and its structure) holds all the information and attributes in a VM, regarding the device related to the vIOMMU. As an initial patch, add a per-vIOMMU virtual ID. This can be: - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table - Virtual ID on a nested Intel VT-D IOMMU, an index to a Context Table Potentially, this vDEVICE structure can hold some vData for Confidential Compute Architecture (CCA). Add a pair of vdevice_alloc and vdevice_free in struct iommufd_viommu_ops to allow driver-level vDEVICE structure allocations. Similar to iommufd_viommu_alloc, add an iommufd_vdevice_alloc helper, so IOMMU drivers can allocate core-embedded style structures. Signed-off-by: Nicolin Chen --- include/linux/iommufd.h | 32 1 file changed, 32 insertions(+) diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 5c13c35952d8..5d61a1d2947a 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -31,6 +31,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, IOMMUFD_OBJ_VIOMMU, + IOMMUFD_OBJ_VDEVICE, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -92,6 +93,14 @@ struct iommufd_viommu { unsigned int type; }; +struct iommufd_vdevice { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommufd_device *idev; + struct iommufd_viommu *viommu; + u64 id; /* per-vIOMMU virtual ID */ +}; + /** * struct iommufd_viommu_ops - vIOMMU specific operations * @free: Free all driver-specific parts of an iommufd_viommu. The memory of the @@ -101,12 +110,24 @@ struct iommufd_viommu { * must be defined in include/uapi/linux/iommufd.h. * It must fully initialize the new iommu_domain before * returning. Upon failure, ERR_PTR must be returned. + * @vdevice_alloc: Allocate a driver-managed iommufd_vdevice to init some driver + * specific structure or HW procedure. Note that the core-level + * structure is filled by the iommufd core after calling this op. + * It is suggested to call iommufd_vdevice_alloc() helper for + * a bundled allocation of the core and the driver structures, + * using the ictx pointer in the given @viommu. + * @vdevice_free: Free a driver-managed iommufd_vdevice to de-init its structure + *or HW procedure. The memory of the vdevice will be free-ed by + *iommufd core. */ struct iommufd_viommu_ops { void (*free)(struct iommufd_viommu *viommu); struct iommu_domain *(*domain_alloc_nested)( struct iommufd_viommu *viommu, const struct iommu_user_data *user_data); + struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu *viommu, +struct device *dev, u64 id); + void (*vdevice_free)(struct iommufd_vdevice *vdev); }; #if IS_ENABLED(CONFIG_IOMMUFD) @@ -200,4 +221,15 @@ _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, ret->member.ops = viommu_ops; \ ret; \ }) +#define iommufd_vdevice_alloc(ictx, drv_struct, member) \ + ({ \ + static_assert( \ + __same_type(struct iommufd_vdevice,\ + ((struct drv_struct *)NULL)->member)); \ + static_assert(offsetof(struct drv_struct, member.obj) == 0); \ + container_of(_iommufd_object_alloc(ictx, \ + sizeof(struct drv_struct), \ + IOMMUFD_OBJ_VDEVICE), \ +struct drv_struct, member.obj); \ + }) #endif A nit: it hurts eyes to read: mock_vdev = iommufd_vdevice_alloc(viommu->ictx, mock_vdevice, core); vs. mock_vdev = iommufd_vdevice_alloc(viommu->ictx, struct mock_vdevice, core); as for the former I go searching for a "mock_vdevice" variable and for the latter it is clear it is 1) a macro 2) which does some type checking. also, it makes it impossible to pass things like typeof(..) or a type from typedef. Thanks, -- Alexey
[PATCH next] rcu: Unlock correctly in rcu_dump_cpu_stacks()
The unlock needs to be outside the } close curly braces for this if statement. Otherwise it leads to a deadlock. Fixes: 744e87210b1a ("rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()") Signed-off-by: Dan Carpenter --- kernel/rcu/tree_stall.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 8994391b95c7..925fcdad5dea 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -357,8 +357,8 @@ static void rcu_dump_cpu_stacks(unsigned long gp_seq) pr_err("Offline CPU %d blocking current GP.\n", cpu); else dump_cpu_task(cpu); - raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } printk_deferred_exit(); } -- 2.45.2
[PATCH V4 00/15] selftests/resctrl: Support diverse platforms with MBM and MBA tests
Changes since V3: - V3: https://lore.kernel.org/all/cover.1729218182.git.reinette.cha...@intel.com/ - Rebased on HEAD 2a027d6bb660 of kselftest/next. - Fix empty string parsing issues pointed out by Ilpo. - Add Reviewed-by tags. - Please see individual patches for detailed changes. Changes since V2: - V2: https://lore.kernel.org/all/cover.1726164080.git.reinette.cha...@intel.com/ - Add fix to protect against buffer overflow when parsing text from sysfs files. - Add cleanup patch to address use of magic constants as pointed out by Ilpo. - Add Reviewed-by tags where received, except for "selftests/resctrl: Use cache size to determine "fill_buf" buffer size" that changed too much since receiving the Reviewed-by tag. - Please see individual patches for detailed changes. Changes since V1: - V1: https://lore.kernel.org/cover.1724970211.git.reinette.cha...@intel.com/ - V2 contains the same general solutions to stated problem as V1 but these are now preceded by more fixes (patches 1 to 5) and improved robustness (patches 6 to 9) to existing tests before the series gets back to solving the original problem with more confidence in patches 10 to 13. - The posibility of making "memflush = false" for CMT test was discussed during V1. Modifying this setting does not have a significant impact on the observed results that are already well within acceptable range and this version thus keeps original default. If performance was a goal it may be possible to do further experimentation where "memflush = false" could eliminate the need for the sleep(1) within the test wrapper, but improving the performance is not a goal of this work. - (New) Support what seems to be unintended ability for user space to provide parameters to "fill_buf" by making the parsing robust and only support changing parameters that are supported to be changed. Drop support for "write" operation since it has never been measured. - (New) Improve wraparound handling. (Ilpo) - (New) A couple of new fixes addressing issues discovered during development. - (Change from V1) To support fill_buf parameters provided by user space as well as test specific fill_buf parameters struct fill_buf_param is no longer just a member of struct resctrl_val_param, instead there could be at most two instances of struct fill_buf_param, the immutable parameters provided by user space and the parameters used by individual tests. (Ilpo) - Please see individual patches for detailed changes. V1 cover: The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald Rapids systems. The test failures result from the following two properties of these systems: 1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl MBA and MBM selftests measure memory traffic for which a hardcoded 250MB buffer has been sufficient so far. On platforms with L3 cache larger than the buffer, the buffer fits in the L3 cache and thus no/very little memory traffic is generated during the "memory bandwidth" tests. 2) Some platform features, for example RAS features or memory performance features that generate memory traffic may drive accesses that are counted differently by performance counters and MBM respectively, for instance generating "overhead" traffic which is not counted against any specific RMID. Until now these counting differences have always been "in the noise". On Emerald Rapids systems the maximum MBA throttling (10% memory bandwidth) throttles memory bandwidth to where memory accesses by these other platform features push the memory bandwidth difference between memory controller performance counters and resctrl (MBM) beyond the tests' hardcoded tolerance. Make the tests more robust against platform variations: 1) Let the buffer used by memory bandwidth tests be guided by the size of the L3 cache. 2) Larger buffers require longer initialization time before the buffer can be used to measurement. Rework the tests to ensure that buffer initialization is complete before measurements start. 3) Do not compare performance counters and MBM measurements at low bandwidth. The value of "low" is hardcoded to 750MiB based on measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake systems. This limit is not applicable to AMD systems since it only applies to the MBA and MBM tests that are isolated to Intel. [1] https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-platinum-8592-processor-320m-cache-1-9-ghz.html Reinette Chatre (15): selftests/resctrl: Make functions only used in same file static selftests/resctrl: Print accurate buffer size as part of MBM results selftests/resctrl: Fix memory overflow due to unhandled wraparound selftests/resctrl: Protect against array overrun during iMC config parsing selftests/resctrl: Protect against array overflow when reading strings sel
Re: [PATCH net-next v2 2/4] net: hsr: Add VLAN CTAG filter support
Hi Vadim, On 10/24/2024 7:06 PM, Vadim Fedorenko wrote: > On 24/10/2024 11:30, MD Danish Anwar wrote: >> From: Murali Karicheri >> >> This patch adds support for VLAN ctag based filtering at slave devices. >> The slave ethernet device may be capable of filtering ethernet packets >> based on VLAN ID. This requires that when the VLAN interface is created >> over an HSR/PRP interface, it passes the VID information to the >> associated slave ethernet devices so that it updates the hardware >> filters to filter ethernet frames based on VID. This patch adds the >> required functions to propagate the vid information to the slave >> devices. >> >> Signed-off-by: Murali Karicheri >> Signed-off-by: MD Danish Anwar >> --- >> net/hsr/hsr_device.c | 71 +++- >> 1 file changed, 70 insertions(+), 1 deletion(-) >> >> diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c >> index 0ca47ebb01d3..ff586bdc2bde 100644 >> --- a/net/hsr/hsr_device.c >> +++ b/net/hsr/hsr_device.c >> @@ -515,6 +515,68 @@ static void hsr_change_rx_flags(struct net_device >> *dev, int change) >> } >> } >> +static int hsr_ndo_vlan_rx_add_vid(struct net_device *dev, >> + __be16 proto, u16 vid) >> +{ >> + struct hsr_port *port; >> + struct hsr_priv *hsr; >> + int ret = 0; >> + >> + hsr = netdev_priv(dev); >> + >> + hsr_for_each_port(hsr, port) { >> + if (port->type == HSR_PT_MASTER) >> + continue; >> + >> + ret = vlan_vid_add(port->dev, proto, vid); >> + switch (port->type) { >> + case HSR_PT_SLAVE_A: >> + if (ret) { >> + netdev_err(dev, "add vid failed for Slave-A\n"); >> + return ret; >> + } >> + break; >> + >> + case HSR_PT_SLAVE_B: >> + if (ret) { >> + /* clean up Slave-A */ >> + netdev_err(dev, "add vid failed for Slave-B\n"); >> + vlan_vid_del(port->dev, proto, vid); >> + return ret; >> + } >> + break; >> + default: >> + break; >> + } >> + } >> + >> + return 0; >> +} > > This function doesn't match with hsr_ndo_vlan_rx_kill_vid(). > vlan_vid_add() can potentially be executed for port->type > equals to HSR_PT_INTERLINK, but the result will be ignored. And > the vlan_vid_del() will never happen in this case. Is it desired > behavior? Maybe it's better to synchronize add/del code and refactor > error path to avoid coping the code? > The kill_vid / add_vid is not similar because during add_vid, if vlan_vid_add() succeeds for one port but fails for other, we need to delete it for the earlier port. We can only continue if vlan_vid_add() succeeds for both ports. That's the reason the switch case handling of add_vid can not match the same for kill_vid. Since cleanup of port is needed, it's not possible to synchronize add/kill code We only care about HSR_PT_SLAVE_A and HSR_PT_SLAVE_B here. So it's okay to ignore HSR_PT_INTERLINK. It's a desired behaviour here. >> + >> +static int hsr_ndo_vlan_rx_kill_vid(struct net_device *dev, >> + __be16 proto, u16 vid) >> +{ >> + struct hsr_port *port; >> + struct hsr_priv *hsr; >> + >> + hsr = netdev_priv(dev); >> + >> + hsr_for_each_port(hsr, port) { >> + if (port->type == HSR_PT_MASTER) >> + continue; >> + switch (port->type) { >> + case HSR_PT_SLAVE_A: >> + case HSR_PT_SLAVE_B: >> + vlan_vid_del(port->dev, proto, vid); >> + break; >> + default: >> + break; >> + } >> + } >> + >> + return 0; >> +} >> + >> static const struct net_device_ops hsr_device_ops = { >> .ndo_change_mtu = hsr_dev_change_mtu, >> .ndo_open = hsr_dev_open, >> @@ -523,6 +585,8 @@ static const struct net_device_ops hsr_device_ops = { >> .ndo_change_rx_flags = hsr_change_rx_flags, >> .ndo_fix_features = hsr_fix_features, >> .ndo_set_rx_mode = hsr_set_rx_mode, >> + .ndo_vlan_rx_add_vid = hsr_ndo_vlan_rx_add_vid, >> + .ndo_vlan_rx_kill_vid = hsr_ndo_vlan_rx_kill_vid, >> }; >> static const struct device_type hsr_type = { >> @@ -569,7 +633,8 @@ void hsr_dev_setup(struct net_device *dev) >> dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | >> NETIF_F_HIGHDMA | >> NETIF_F_GSO_MASK | NETIF_F_HW_CSUM | >> - NETIF_F_HW_VLAN_CTAG_TX; >> + NETIF_F_HW_VLAN_CTAG_TX | >> + NETIF_F_HW_VLAN_CTAG_FILTER; >> dev->features = dev->hw_features; >> } >> @@ -647,6 +712,10 @@ int hsr_dev_finalize(struct net_device *hsr_dev, >> struct net_device *slave[2], >> (slave[1]->features & NETIF_F_HW_HSR_FWD)) >> hsr->fwd_offloaded = true; >> + if ((slave[0]->features & NETIF_F_HW_VLAN_CTAG_FILTER) && >> + (slave[1]->features & NETIF_F_HW_VLAN_CTAG_FILTER)) >> + hsr_dev->features
[PATCH V4 10/15] selftests/resctrl: Make benchmark parameter passing robust
The benchmark used during the CMT, MBM, and MBA tests can be provided by the user via (-b) parameter, if not provided the default "fill_buf" benchmark is used. The user is additionally able to override any of the "fill_buf" default parameters when running the tests with "-b fill_buf ". The "fill_buf" parameters are managed as an array of strings. Using an array of strings is complex because it requires transformations to/from strings at every producer and consumer. This is made worse for the individual tests where the default benchmark parameters values may not be appropriate and additional data wrangling is required. For example, the CMT test duplicates the entire array of strings in order to replace one of the parameters. More issues appear when combining the usage of an array of strings with the use case of user overriding default parameters by specifying "-b fill_buf ". This use case is fragile with opportunities to trigger a SIGSEGV because of opportunities for NULL pointers to exist in the array of strings. For example, by running below (thus by specifying "fill_buf" should be used but all parameters are NULL): $ sudo resctrl_tests -t mbm -b fill_buf Replace the "array of strings" parameters used for "fill_buf" with new struct fill_buf_param that contains the "fill_buf" parameters that can be used directly without transformations to/from strings. Two instances of struct fill_buf_param may exist at any point in time: * If the user provides new parameters to "fill_buf", the user parameter structure (struct user_params) will point to a fully initialized and immutable struct fill_buf_param containing the user provided parameters. * If "fill_buf" is the benchmark that should be used by a test, then the test parameter structure (struct resctrl_val_param) will point to a fully initialized struct fill_buf_param. The latter may contain (a) the user provided parameters verbatim, (b) user provided parameters adjusted to be appropriate for the test, or (c) the default parameters for "fill_buf" that is appropriate for the test if the user did not provide "fill_buf" parameters nor an alternate benchmark. The existing behavior of CMT test is to use test defined value for the buffer size even if the user provides another value via command line. This behavior is maintained since the test requires that the buffer size matches the size of the cache allocated, and the amount of cache allocated can instead be changed by the user with the "-n" command line parameter. Signed-off-by: Reinette Chatre --- Changes since V3: - Handle empty string input. (Ilpo) Changes since V2: - Use empty initializers. (Ilpo) - Let memflush be bool instead of int. (Ilpo) - Make user input checks more robust. (Ilpo) - Assign values as part of local variable definition. (Ilpo) Changes since V1: - Maintain original behavior where user can override "fill_buf" parameters via command line ... but only those that can actually be changed. (Ilpo) - Fix parsing issues associated with original behavior to ensure any parameter is valid before any attempt to use it. - Move patch earlier in series to highlight that this fixes existing issues. - Make struct fill_buf_param dynamic to support user provided parameters as well as test specific parameters. - Rewrite changelog. --- tools/testing/selftests/resctrl/cmt_test.c| 32 ++ tools/testing/selftests/resctrl/fill_buf.c| 4 +- tools/testing/selftests/resctrl/mba_test.c| 13 ++- tools/testing/selftests/resctrl/mbm_test.c| 22 ++-- tools/testing/selftests/resctrl/resctrl.h | 59 +++--- .../testing/selftests/resctrl/resctrl_tests.c | 103 ++ tools/testing/selftests/resctrl/resctrl_val.c | 41 --- 7 files changed, 178 insertions(+), 96 deletions(-) diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c index 0c045080d808..4c3cf2c25a38 100644 --- a/tools/testing/selftests/resctrl/cmt_test.c +++ b/tools/testing/selftests/resctrl/cmt_test.c @@ -116,15 +116,13 @@ static void cmt_test_cleanup(void) static int cmt_run_test(const struct resctrl_test *test, const struct user_params *uparams) { - const char * const *cmd = uparams->benchmark_cmd; - const char *new_cmd[BENCHMARK_ARGS]; + struct fill_buf_param fill_buf = {}; unsigned long cache_total_size = 0; int n = uparams->bits ? : 5; unsigned long long_mask; - char *span_str = NULL; int count_of_bits; size_t span; - int ret, i; + int ret; ret = get_full_cbm("L3", &long_mask); if (ret) @@ -155,32 +153,26 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param span = cache_portion_size(cache_total_size, param.mask, long_mask); - if (strcmp(cmd[0], "fill_buf") == 0) { - /* Du
Re: [PATCH] vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
On 10/24/2024 03:43, Stefano Garzarella wrote: Other setsockopt() in the tests where we use unsigned long are SO_VM_SOCKETS_* but they are expected to be unsigned, so we should be fine. It's actually not "signed vs unsigned", but a "size + endianess" problem. Also, looking at SO_VM_SOCKETS_* code in the test, it uses unsigned long and size_t which (I believe) will both shrink to 4 bytes on 32-bit machines, while the corresponding kernel code in af_vsock.c uses u64. It looks to me that this kernel code will be unhappy to receive just 4 bytes when it expects 8.
stable-rc linux-6.6.y: Queues: tinyconfig: undefined reference to `irq_work_queue'
Most of the tinyconfigs are failing on stable-rc linux-6.6.y. Build errors: -- aarch64-linux-gnu-ld: kernel/task_work.o: in function `task_work_add': task_work.c:(.text+0x190): undefined reference to `irq_work_queue' task_work.c:(.text+0x190): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `irq_work_queue' Reported-by: Linux Kernel Functional Testing metadata: git_describe: v6.6.57-251-g1870a9bd3fe7 git_repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Build: v6.6.57-251-g1870a9bd3fe7 Details: https://qa-reports.linaro.org/lkft/linux-stable-rc-queues-queue_6.6/build/v6.6.57-251-g1870a9bd3fe7 kernel_version: 6.6.58 Regressions (compared to build v6.6.57) parisc: * build/gcc-11-tinyconfig mips: * build/gcc-12-tinyconfig * build/clang-19-tinyconfig * build/gcc-8-tinyconfig * build/clang-nightly-tinyconfig arm: * build/clang-19-tinyconfig * build/gcc-8-tinyconfig * build/gcc-13-tinyconfig * build/clang-nightly-tinyconfig powerpc: * build/clang-19-tinyconfig * build/gcc-8-tinyconfig * build/gcc-13-tinyconfig * build/clang-nightly-tinyconfig arm64: * build/clang-19-tinyconfig * build/gcc-8-tinyconfig * build/gcc-13-tinyconfig * build/clang-nightly-tinyconfig arc: * build/gcc-9-tinyconfig * build/gcc-8-tinyconfig s390: * build/clang-19-tinyconfig * build/gcc-8-tinyconfig * build/gcc-13-tinyconfig * build/clang-nightly-tinyconfig sparc: * build/gcc-8-tinyconfig * build/gcc-11-tinyconfig riscv: * build/clang-19-tinyconfig * build/gcc-8-tinyconfig * build/gcc-13-tinyconfig compare history links: - - https://qa-reports.linaro.org/lkft/linux-stable-rc-queues-queue_6.6/build/v6.6.57-251-g1870a9bd3fe7/testrun/25533195/suite/build/test/gcc-13-tinyconfig/history/ - https://qa-reports.linaro.org/lkft/linux-stable-rc-queues-queue_6.6/build/v6.6.57-251-g1870a9bd3fe7/testrun/25533195/suite/build/test/gcc-13-tinyconfig/log -- Linaro LKFT https://lkft.linaro.org
Re: [PATCH 1/4] dt-bindings: remoteproc: fsl,imx-rproc: add new compatible
Good day, On Wed, Oct 23, 2024 at 12:21:11PM -0400, Laurentiu Mihalcea wrote: > From: Laurentiu Mihalcea > > Add new compatible for imx95's CM7 with SOF. > > Signed-off-by: Laurentiu Mihalcea > --- > .../bindings/remoteproc/fsl,imx-rproc.yaml| 58 +-- > 1 file changed, 53 insertions(+), 5 deletions(-) > > diff --git a/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml > b/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml > index 57d75acb0b5e..ab0d8e017965 100644 > --- a/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml > +++ b/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml > @@ -28,6 +28,15 @@ properties: >- fsl,imx8qxp-cm4 >- fsl,imx8ulp-cm33 >- fsl,imx93-cm33 > + - fsl,imx95-cm7-sof Why is this added in the remoteproc bindings when the driver is sound/soc/sof/imx/imx95.c? > + > + reg: > +maxItems: 2 > + > + reg-names: > +items: > + - const: dram > + - const: mailbox > >clocks: > maxItems: 1 > @@ -38,10 +47,8 @@ properties: >Phandle to syscon block which provide access to System Reset Controller > >mbox-names: > -items: > - - const: tx > - - const: rx > - - const: rxdb > +minItems: 1 > +maxItems: 4 > >mboxes: > description: > @@ -49,7 +56,7 @@ properties: >List of <&phandle type channel> - 1 channel for TX, 1 channel for RX, > 1 channel for RXDB. >(see mailbox/fsl,mu.yaml) > minItems: 1 > -maxItems: 3 > +maxItems: 4 > >memory-region: > description: > @@ -84,6 +91,10 @@ properties: >This property is to specify the resource id of the remote processor in > SoC >which supports SCFW > > + port: > +$ref: /schemas/sound/audio-graph-port.yaml# > +unevaluatedProperties: false > + > required: >- compatible > > @@ -114,6 +125,43 @@ allOf: >properties: > power-domains: false > > + - if: > + properties: > +compatible: > + contains: > +const: fsl,imx95-cm7-sof > +then: > + properties: > +mboxes: > + minItems: 4 > +mbox-names: > + items: > +- const: txdb0 > +- const: txdb1 > +- const: rxdb0 > +- const: rxdb1 > +memory-region: > + maxItems: 1 > + required: > +- reg > +- reg-names > +- mboxes > +- mbox-names > +- memory-region > +- port > +else: > + properties: > +reg: false > +reg-names: false > +mboxes: > + maxItems: 3 > +mbox-names: > + items: > +- const: tx > +- const: rx > +- const: rxdb > +port: false > + > additionalProperties: false > > examples: > -- > 2.34.1 >
[PATCH net v6] ipv6: Fix soft lockups in fib6_select_path under high next hop churn
Soft lockups have been observed on a cluster of Linux-based edge routers located in a highly dynamic environment. Using the `bird` service, these routers continuously update BGP-advertised routes due to frequently changing nexthop destinations, while also managing significant IPv6 traffic. The lockups occur during the traversal of the multipath circular linked-list in the `fib6_select_path` function, particularly while iterating through the siblings in the list. The issue typically arises when the nodes of the linked list are unexpectedly deleted concurrently on a different core—indicated by their 'next' and 'previous' elements pointing back to the node itself and their reference count dropping to zero. This results in an infinite loop, leading to a soft lockup that triggers a system panic via the watchdog timer. Apply RCU primitives in the problematic code sections to resolve the issue. Where necessary, update the references to fib6_siblings to annotate or use the RCU APIs. Include a test script that reproduces the issue. The script periodically updates the routing table while generating a heavy load of outgoing IPv6 traffic through multiple iperf3 clients. It consistently induces infinite soft lockups within a couple of minutes. Kernel log: 0 [bd13003e8d30] machine_kexec at 8ceaf3eb 1 [bd13003e8d90] __crash_kexec at 8d0120e3 2 [bd13003e8e58] panic at 8cef65d4 3 [bd13003e8ed8] watchdog_timer_fn at 8d05cb03 4 [bd13003e8f08] __hrtimer_run_queues at 8cfec62f 5 [bd13003e8f70] hrtimer_interrupt at 8cfed756 6 [bd13003e8fd0] __sysvec_apic_timer_interrupt at 8cea01af 7 [bd13003e8ff0] sysvec_apic_timer_interrupt at 8df1b83d -- -- 8 [bd13003d3708] asm_sysvec_apic_timer_interrupt at 8e000ecb [exception RIP: fib6_select_path+299] RIP: 8ddafe7b RSP: bd13003d37b8 RFLAGS: 0287 RAX: 975850b43600 RBX: 975850b40200 RCX: RDX: 3fff RSI: 51d383e4 RDI: 975850b43618 RBP: bd13003d3800 R8: R9: 975850b40200 R10: R11: R12: bd13003d3830 R13: 975850b436a8 R14: 975850b43600 R15: 0007 ORIG_RAX: CS: 0010 SS: 0018 9 [bd13003d3808] ip6_pol_route at 8ddb030c 10 [bd13003d3888] ip6_pol_route_input at 8ddb068c 11 [bd13003d3898] fib6_rule_lookup at 8ddf02b5 12 [bd13003d3928] ip6_route_input at 8ddb0f47 13 [bd13003d3a18] ip6_rcv_finish_core.constprop.0 at 8dd950d0 14 [bd13003d3a30] ip6_list_rcv_finish.constprop.0 at 8dd96274 15 [bd13003d3a98] ip6_sublist_rcv at 8dd96474 16 [bd13003d3af8] ipv6_list_rcv at 8dd96615 17 [bd13003d3b60] __netif_receive_skb_list_core at 8dc16fec 18 [bd13003d3be0] netif_receive_skb_list_internal at 8dc176b3 19 [bd13003d3c50] napi_gro_receive at 8dc565b9 20 [bd13003d3c80] ice_receive_skb at c087e4f5 [ice] 21 [bd13003d3c90] ice_clean_rx_irq at c0881b80 [ice] 22 [bd13003d3d20] ice_napi_poll at c088232f [ice] 23 [bd13003d3d80] __napi_poll at 8dc18000 24 [bd13003d3db8] net_rx_action at 8dc18581 25 [bd13003d3e40] __do_softirq at 8df352e9 26 [bd13003d3eb0] run_ksoftirqd at 8ceffe47 27 [bd13003d3ec0] smpboot_thread_fn at 8cf36a30 28 [bd13003d3ee8] kthread at 8cf2b39f 29 [bd13003d3f28] ret_from_fork at 8ce5fa64 30 [bd13003d3f50] ret_from_fork_asm at 8ce03cbb Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Reported-by: Adrian Oliver Signed-off-by: Omid Ehtemam-Haghighi Cc: David S. Miller Cc: David Ahern Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Shuah Khan Cc: Ido Schimmel Cc: Kuniyuki Iwashima Cc: Simon Horman Cc: net...@vger.kernel.org Cc: linux-kselft...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- v5 -> v6: * Adjust the comment line lengths in the test script to a maximum of 80 characters * Change memory allocation in inet6_rt_notify from gfp_any() to GFP_ATOMIC for atomic allocation in non-blocking contexts, as suggested by Ido Schimmel * NOTE: I have executed the test script on both bare-metal servers and virtualized environments such as QEMU and vng. In the case of bare-metal, it consistently triggers a soft lockup in under a minute on unpatched kernels. For the virtualized environments, an unpatched kernel compiled with the Ubuntu 24.04 configuration also triggers a soft lockup, though it takes longer; however, it did not trigger a soft lockup on kernels compiled with configurations provided in: https://github.com/linux-netdev/nipa/wiki/How-to-ru
Re: [PATCH] selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run
在 2024/10/24 22:26, Shuah Khan 写道: On 10/24/24 03:50, zhouyuhang wrote: From: zhouyuhang Test case idmap_mount_tree_invalid failed to run on the newer kernel with the following output: # RUN mount_setattr_idmapped.idmap_mount_tree_invalid ... # mount_setattr_test.c:1428:idmap_mount_tree_invalid:Expected sys_mount_setattr(open_tree_fd, "", AT_EMPTY_PATH, &attr, sizeof(attr)) (0) ! = 0 (0) # idmap_mount_tree_invalid: Test terminated by assertion This is because tmpfs is mounted at "/mnt/A", and tmpfs already contains the flag FS_ALLOW_IDMAP after the commit 7a80e5b8c6fa ("shmem: support idmapped mounts for tmpfs"). So calling sys_mount_setattr here returns 0 instead of -EINVAL as expected. Ramfs is mounted at "/mnt/B" and does not support idmap mounts. So we can use "/mnt/B" instead of "/mnt/A" to make the test run successfully with the following output: # Starting 1 tests from 1 test cases. # RUN mount_setattr_idmapped.idmap_mount_tree_invalid ... # OK mount_setattr_idmapped.idmap_mount_tree_invalid ok 1 mount_setattr_idmapped.idmap_mount_tree_invalid # PASSED: 1 / 1 tests passed. Sounds like this code is testing this very condition passing in invalid mount to see what happens. If that is the intent this patch is incorrect. I think I probably understand what you mean, what you're saying is that the output of this line of errors is the condition, and the main purpose of the test case is to see what happens when it invalid mount. But it's valid now, isn't it? So we need to fix it. I don't think that constructing this error with ramfs will have any impact on the code that follows. If you feel that using "/mnt/B" is unreliable, I think we can temporarily mount ramfs to "/mnt/A" here and continue using "/mnt/A". Do you think this is feasible? Looking forward to your reply, thank you. Signed-off-by: zhouyuhang --- tools/testing/selftests/mount_setattr/mount_setattr_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c index c6a8c732b802..54552c19bc24 100644 --- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c +++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c @@ -1414,7 +1414,7 @@ TEST_F(mount_setattr_idmapped, idmap_mount_tree_invalid) ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/b", 0, 0, 0), 0); ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/BB/b", 0, 0, 0), 0); - open_tree_fd = sys_open_tree(-EBADF, "/mnt/A", + open_tree_fd = sys_open_tree(-EBADF, "/mnt/B", AT_RECURSIVE | AT_EMPTY_PATH | AT_NO_AUTOMOUNT | thanks, -- Shuah
Re: [PATCH v4 4/4] selftests: pidfd: add tests for PIDFD_SELF_*
struct f_owner_ex |^~ /usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:196:8: error: redefinition of ‘struct flock’ 196 | struct flock { |^ /usr/include/x86_64-linux-gnu/bits/fcntl.h:35:8: note: originally defined here 35 | struct flock |^ /usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:210:8: error: redefinition of ‘struct flock64’ 210 | struct flock64 { |^~~ /usr/include/x86_64-linux-gnu/bits/fcntl.h:50:8: note: originally defined here 50 | struct flock64 |^~~ make: *** [../lib.mk:222: /usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup/test_kill] Error 1 make: Leaving directory '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup' The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20241025/202410251504.707d78fc-oliver.s...@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH v3] remoteproc: Add a new remoteproc state RPROC_DEFUNCT
On Mon, Oct 21, 2024 at 09:12:47AM -0600, Mathieu Poirier wrote: > Hi Mukesh, > > On Wed, Oct 16, 2024 at 10:25:46AM +0530, Mukesh Ojha wrote: > > Multiple call to glink_subdev_stop() for the same remoteproc can happen > > if rproc_stop() fails from Process-A that leaves the rproc state to > > RPROC_CRASHED state later a call to recovery_store from user space in > > Process B triggers rproc_trigger_recovery() of the same remoteproc to > > recover it results in NULL pointer dereference issue in > > qcom_glink_smem_unregister(). > > > > There is other side to this issue if we want to fix this via adding a > > NULL check on glink->edge which does not guarantees that the remoteproc > > will recover in second call from Process B as it has failed in the first > > Process A during SMC shutdown call and may again fail at the same call > > and rproc can not recover for such case. > > > > Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of > > remoteproc and the only way to recover from it via system restart. > > > > Process-A Process-B > > > > fatal error interrupt happens > > > > rproc_crash_handler_work() > > mutex_lock_interruptible(&rproc->lock); > > ... > > > >rproc->state = RPROC_CRASHED; > > ... > > mutex_unlock(&rproc->lock); > > > > rproc_trigger_recovery() > > mutex_lock_interruptible(&rproc->lock); > > > > adsp_stop() > > qcom_q6v5_pas 20c0.remoteproc: failed to shutdown: -22 > > remoteproc remoteproc3: can't stop rproc: -22 > > mutex_unlock(&rproc->lock); > > Ok, that can happen. > > > > > echo enabled > > > /sys/class/remoteproc/remoteprocX/recovery > > recovery_store() > > rproc_trigger_recovery() > > > > mutex_lock_interruptible(&rproc->lock); > >rproc_stop() > > glink_subdev_stop() > > > > qcom_glink_smem_unregister() ==| > > > > | > > > > V > > I am missing some information here but I will _assume_ this is caused by > glink->edge being set to NULL [1] when glink_subdev_stop() is first called by > process A. Instead of adding a new state to the core I think a better idea > would be to add a check for a NULL value on @smem in > qcom_glink_smem_unregister(). This is a problem that should be fixed in the > driver rather than the core. > > [1]. > https://elixir.bootlin.com/linux/v6.12-rc4/source/drivers/remoteproc/qcom_common.c#L213 I did the same here [1] but after discussion with Bjorn, realized that remoteproc might not even recover and may fail in the second attempt as well and only way is reboot of the machine. [1] https://lore.kernel.org/lkml/20240925103351.1628788-1-quic_mo...@quicinc.com/ > > > Unable to handle kernel > > NULL pointer dereference > > at virtual > > address 0358 > > > > Signed-off-by: Mukesh Ojha > > --- > > Changes in v3: > > - Fix kernel test reported error. > > > > Changes in v2: > > - Removed NULL pointer check instead added a new state to signify > >non-recoverable state of remoteproc. > > > > drivers/remoteproc/remoteproc_core.c | 3 ++- > > drivers/remoteproc/remoteproc_sysfs.c | 1 + > > include/linux/remoteproc.h| 5 - > > 3 files changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/remoteproc/remoteproc_core.c > > b/drivers/remoteproc/remoteproc_core.c > > index f276956f2c5c..c4e14503b971 100644 > > --- a/drivers/remoteproc/remoteproc_core.c > > +++ b/drivers/remoteproc/remoteproc_core.c > > @@ -1727,6 +1727,7 @@ static int rproc_stop(struct rproc *rproc, bool > > crashed) > > /* power off the remote processor */ > > ret = rproc->ops->stop(rproc); > > if (ret) { > > + rproc->state = RPROC_DEFUNCT; > > dev_err(dev, "can't stop rproc: %d\n", ret); > > return ret; > > } > > @@ -1839,7 +1840,7 @@ int rproc_trigger_recovery(struct rproc *rproc) > > return ret; > > > > /* State could have changed before we got the mutex */ > > - if (rproc->state != RPROC_CRASHED) > > + if (rproc->state == RPROC_DEFUNCT || rproc->state != RPROC_CRASHED) > > goto unlock_mutex; > > The problem is that rproc_trigger_recovery() an only be called once for a > remoteproc, something that modifies the state machine and may introduce > backward > compatibility issues for other remote processor implementations. >
Re: [PATCH] selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run
On 10/24/24 03:50, zhouyuhang wrote: From: zhouyuhang Test case idmap_mount_tree_invalid failed to run on the newer kernel with the following output: # RUN mount_setattr_idmapped.idmap_mount_tree_invalid ... # mount_setattr_test.c:1428:idmap_mount_tree_invalid:Expected sys_mount_setattr(open_tree_fd, "", AT_EMPTY_PATH, &attr, sizeof(attr)) (0) ! = 0 (0) # idmap_mount_tree_invalid: Test terminated by assertion This is because tmpfs is mounted at "/mnt/A", and tmpfs already contains the flag FS_ALLOW_IDMAP after the commit 7a80e5b8c6fa ("shmem: support idmapped mounts for tmpfs"). So calling sys_mount_setattr here returns 0 instead of -EINVAL as expected. Ramfs is mounted at "/mnt/B" and does not support idmap mounts. So we can use "/mnt/B" instead of "/mnt/A" to make the test run successfully with the following output: # Starting 1 tests from 1 test cases. # RUN mount_setattr_idmapped.idmap_mount_tree_invalid ... #OK mount_setattr_idmapped.idmap_mount_tree_invalid ok 1 mount_setattr_idmapped.idmap_mount_tree_invalid # PASSED: 1 / 1 tests passed. Sounds like this code is testing this very condition passing in invalid mount to see what happens. If that is the intent this patch is incorrect. Signed-off-by: zhouyuhang --- tools/testing/selftests/mount_setattr/mount_setattr_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c index c6a8c732b802..54552c19bc24 100644 --- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c +++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c @@ -1414,7 +1414,7 @@ TEST_F(mount_setattr_idmapped, idmap_mount_tree_invalid) ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/b", 0, 0, 0), 0); ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/BB/b", 0, 0, 0), 0); - open_tree_fd = sys_open_tree(-EBADF, "/mnt/A", + open_tree_fd = sys_open_tree(-EBADF, "/mnt/B", AT_RECURSIVE | AT_EMPTY_PATH | AT_NO_AUTOMOUNT | thanks, -- Shuah
[PATCH net-next v2 3/4] net: ti: icssg-prueth: Add VLAN support for HSR mode
From: Ravi Gunasekaran Add support for VLAN addition/deletion in HSR mode. In HSR mode, even if the host port is not a member of the VLAN domain, the slave ports should simply forward the frames. So allow forwarding of all VLAN frames in HSR mode. Signed-off-by: Ravi Gunasekaran Signed-off-by: MD Danish Anwar --- drivers/net/ethernet/ti/icssg/icssg_prueth.c | 45 +++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c index 0556910938fa..b4d70c6e0cff 100644 --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c @@ -808,6 +808,47 @@ static netdev_features_t emac_ndo_fix_features(struct net_device *ndev, return features; } +static int emac_ndo_vlan_rx_add_vid(struct net_device *ndev, + __be16 proto, u16 vid) +{ + struct prueth_emac *emac = netdev_priv(ndev); + struct prueth *prueth = emac->prueth; + int untag_mask = 0; + int port_mask; + + if (prueth->is_hsr_offload_mode) { + port_mask = BIT(PRUETH_PORT_HOST) | BIT(emac->port_id); + untag_mask = 0; + + netdev_dbg(emac->ndev, "VID add vid:%u port_mask:%X untag_mask %X\n", + vid, port_mask, untag_mask); + + icssg_vtbl_modify(emac, vid, port_mask, untag_mask, true); + icssg_set_pvid(emac->prueth, vid, emac->port_id); + } + return 0; +} + +static int emac_ndo_vlan_rx_del_vid(struct net_device *ndev, + __be16 proto, u16 vid) +{ + struct prueth_emac *emac = netdev_priv(ndev); + struct prueth *prueth = emac->prueth; + int untag_mask = 0; + int port_mask; + + if (prueth->is_hsr_offload_mode) { + port_mask = BIT(PRUETH_PORT_HOST); + untag_mask = 0; + + netdev_dbg(emac->ndev, "VID del vid:%u port_mask:%X untag_mask %X\n", + vid, port_mask, untag_mask); + + icssg_vtbl_modify(emac, vid, port_mask, untag_mask, false); + } + return 0; +} + static const struct net_device_ops emac_netdev_ops = { .ndo_open = emac_ndo_open, .ndo_stop = emac_ndo_stop, @@ -820,6 +861,8 @@ static const struct net_device_ops emac_netdev_ops = { .ndo_get_stats64 = icssg_ndo_get_stats64, .ndo_get_phys_port_name = icssg_ndo_get_phys_port_name, .ndo_fix_features = emac_ndo_fix_features, + .ndo_vlan_rx_add_vid = emac_ndo_vlan_rx_add_vid, + .ndo_vlan_rx_kill_vid = emac_ndo_vlan_rx_del_vid, }; static int prueth_netdev_init(struct prueth *prueth, @@ -947,7 +990,7 @@ static int prueth_netdev_init(struct prueth *prueth, ndev->netdev_ops = &emac_netdev_ops; ndev->ethtool_ops = &icssg_ethtool_ops; ndev->hw_features = NETIF_F_SG; - ndev->features = ndev->hw_features; + ndev->features = ndev->hw_features | NETIF_F_HW_VLAN_CTAG_FILTER; ndev->hw_features |= NETIF_PRUETH_HSR_OFFLOAD_FEATURES; netif_napi_add(ndev, &emac->napi_rx, icssg_napi_rx_poll); -- 2.34.1
[PATCH v2 0/3] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
Hi, the following was in the PREEMPT_RT queue since last softirq rework. The result is that timer wake ups (hrtimer, timer_list) happens in hardirq processing them requires to wake ksoftirqd. ksoftirqd runs at SCHED_OTHER so it will compete for resources with all other tasks in the system, potentially delayed the processing further. The idea was to let the timers be processed by a dedicated thread running at low SCHED_FIFO priority. While looking at it again, it might make sense to have the pending_softirq flag per-thread to avoid threads with higher priority picking up softirqs from low-priority threads. This isn't yet a problem because adding softirqs for processing happens only from threaded interrupts. So the low-priority thread will wait until the high-priority thread is done. And the high-priority thread will PI-boost the low-priority thread until it is done. It would only make sense to make the flags per-thread once the BH lock is gone. The patch is limited to PREEMPT_RT. The ksoftirqd bullets from above apply also to !PREEMPT_RT +threadirqs. Would it make sense to restrict it to force_irqthreads() instead? v1…v2: https://lore.kernel.org/all/20241004103842.131014-1-bige...@linutronix.de/ Frederick's comments: - Use __raise_softirq_irqoff() to raise the softirq for !PREEMPT_RT. Also a lockdep test to ensure that this is always invoked from an IRQ. - Make raise_ktimers_thread() only OR the flag and nothing else to align with __raise_softirq_irqoff(). The wake happens on return from interrupt anyway. - A comment in timersd_setup() and interrupt.h - local_pending_timers() => local_timers_pending(). Sebastian
[PATCH net-next v10 02/23] net: introduce OpenVPN Data Channel Offload (ovpn)
OpenVPN is a userspace software existing since around 2005 that allows users to create secure tunnels. So far OpenVPN has implemented all operations in userspace, which implies several back and forth between kernel and user land in order to process packets (encapsulate/decapsulate, encrypt/decrypt, rerouting..). With `ovpn` we intend to move the fast path (data channel) entirely in kernel space and thus improve user measured throughput over the tunnel. `ovpn` is implemented as a simple virtual network device driver, that can be manipulated by means of the standard RTNL APIs. A device of kind `ovpn` allows only IPv4/6 traffic and can be of type: * P2P (peer-to-peer): any packet sent over the interface will be encapsulated and transmitted to the other side (typical OpenVPN client or peer-to-peer behaviour); * P2MP (point-to-multipoint): packets sent over the interface are transmitted to peers based on existing routes (typical OpenVPN server behaviour). After the interface has been created, OpenVPN in userspace can configure it using a new Netlink API. Specifically it is possible to manage peers and their keys. The OpenVPN control channel is multiplexed over the same transport socket by means of OP codes. Anything that is not DATA_V2 (OpenVPN OP code for data traffic) is sent to userspace and handled there. This way the `ovpn` codebase is kept as compact as possible while focusing on handling data traffic only (fast path). Any OpenVPN control feature (like cipher negotiation, TLS handshake, rekeying, etc.) is still fully handled by the userspace process. When userspace establishes a new connection with a peer, it first performs the handshake and then passes the socket to the `ovpn` kernel module, which takes ownership. From this moment on `ovpn` will handle data traffic for the new peer. When control packets are received on the link, they are forwarded to userspace through the same transport socket they were received on, as userspace is still listening to them. Some events (like peer deletion) are sent to a Netlink multicast group. Although it wasn't easy to convince the community, `ovpn` implements only a limited number of the data-channel features supported by the userspace program. Each feature that made it to `ovpn` was attentively vetted to avoid carrying too much legacy along with us (and to give a clear cut to old and probalby-not-so-useful features). Notably, only encryption using AEAD ciphers (specifically ChaCha20Poly1305 and AES-GCM) was implemented. Supporting any other cipher out there was not deemed useful. Both UDP and TCP sockets ae supported. As explained above, in case of P2MP mode, OpenVPN will use the main system routing table to decide which packet goes to which peer. This implies that no routing table was re-implemented in the `ovpn` kernel module. This kernel module can be enabled by selecting the CONFIG_OVPN entry in the networking drivers section. NOTE: this first patch introduces the very basic framework only. Features are then added patch by patch, however, although each patch will compile and possibly not break at runtime, only after having applied the full set it is expected to see the ovpn module fully working. Cc: steffen.klass...@secunet.com Cc: antony.ant...@secunet.com Signed-off-by: Antonio Quartulli --- MAINTAINERS | 8 drivers/net/Kconfig | 13 ++ drivers/net/Makefile | 1 + drivers/net/ovpn/Makefile | 11 + drivers/net/ovpn/io.c | 22 + drivers/net/ovpn/io.h | 15 ++ drivers/net/ovpn/main.c | 116 ++ drivers/net/ovpn/main.h | 15 ++ include/uapi/linux/udp.h | 1 + 9 files changed, 202 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index f39ab140710f16b1245924bfe381cd64d499ff8a..09e193bbc218d74846cbae26f80ada3e04c3692a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17286,6 +17286,14 @@ F: arch/openrisc/ F: drivers/irqchip/irq-ompic.c F: drivers/irqchip/irq-or1k-* +OPENVPN DATA CHANNEL OFFLOAD +M: Antonio Quartulli +L: openvpn-de...@lists.sourceforge.net (moderated for non-subscribers) +L: net...@vger.kernel.org +S: Supported +T: git https://github.com/OpenVPN/linux-kernel-ovpn.git +F: drivers/net/ovpn/ + OPENVSWITCH M: Pravin B Shelar L: net...@vger.kernel.org diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 1fd5acdc73c6af0e1a861867039c3624fc618e25..269b73fcfd348a48174fb96b8f8d4f8788636fa8 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -115,6 +115,19 @@ config WIREGUARD_DEBUG Say N here unless you know what you're doing. +config OVPN + tristate "OpenVPN data channel offload" + depends on NET && INET + select NET_UDP_TUNNEL + select DST_CACHE + select CRYPTO + select CRYPTO_AES + select CRYPTO_GCM + select CRYPTO_CHACHA20POLY1305 + help + This module enhances the p
[PATCH net-next v10 01/23] netlink: add NLA_POLICY_MAX_LEN macro
Similarly to NLA_POLICY_MIN_LEN, NLA_POLICY_MAX_LEN defines a policy with a maximum length value. The netlink generator for YAML specs has been extended accordingly. Cc: donald.hun...@gmail.com Signed-off-by: Antonio Quartulli --- include/net/netlink.h | 1 + tools/net/ynl/ynl-gen-c.py | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/net/netlink.h b/include/net/netlink.h index db6af207287c839408c58cb28b82408e0548eaca..2dc671c977ff3297975269d236264907009703d3 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -469,6 +469,7 @@ struct nla_policy { .max = _len \ } #define NLA_POLICY_MIN_LEN(_len) NLA_POLICY_MIN(NLA_BINARY, _len) +#define NLA_POLICY_MAX_LEN(_len) NLA_POLICY_MAX(NLA_BINARY, _len) /** * struct nl_info - netlink source information diff --git a/tools/net/ynl/ynl-gen-c.py b/tools/net/ynl/ynl-gen-c.py index 1a825b4081b222cf97eb73f01a2a5c1ffe47cd5c..aa22eb0924754f38ea0b9e68a1ff5a55d94d6717 100755 --- a/tools/net/ynl/ynl-gen-c.py +++ b/tools/net/ynl/ynl-gen-c.py @@ -481,7 +481,7 @@ class TypeBinary(Type): pass elif len(self.checks) == 1: check_name = list(self.checks)[0] -if check_name not in {'exact-len', 'min-len'}: +if check_name not in {'exact-len', 'min-len', 'max-len'}: raise Exception('Unsupported check for binary type: ' + check_name) else: raise Exception('More than one check for binary type not implemented, yet') @@ -492,6 +492,8 @@ class TypeBinary(Type): mem = 'NLA_POLICY_EXACT_LEN(' + self.get_limit_str('exact-len') + ')' elif 'min-len' in self.checks: mem = '{ .len = ' + self.get_limit_str('min-len') + ', }' +elif 'max-len' in self.checks: +mem = 'NLA_POLICY_MAX_LEN(' + self.get_limit_str('max-len') + ')' return mem -- 2.45.2
[PATCH net-next v10 03/23] ovpn: add basic netlink support
This commit introduces basic netlink support with family registration/unregistration functionalities and stub pre/post-doit. More importantly it introduces the YAML uAPI description along with its auto-generated files: - include/uapi/linux/ovpn.h - drivers/net/ovpn/netlink-gen.c - drivers/net/ovpn/netlink-gen.h Cc: donald.hun...@gmail.com Signed-off-by: Antonio Quartulli --- Documentation/netlink/specs/ovpn.yaml | 362 ++ MAINTAINERS | 2 + drivers/net/ovpn/Makefile | 2 + drivers/net/ovpn/main.c | 15 +- drivers/net/ovpn/netlink-gen.c| 212 drivers/net/ovpn/netlink-gen.h| 41 drivers/net/ovpn/netlink.c| 157 +++ drivers/net/ovpn/netlink.h| 15 ++ drivers/net/ovpn/ovpnstruct.h | 25 +++ include/uapi/linux/ovpn.h | 109 ++ 10 files changed, 939 insertions(+), 1 deletion(-) diff --git a/Documentation/netlink/specs/ovpn.yaml b/Documentation/netlink/specs/ovpn.yaml new file mode 100644 index ..79339c25d607f1b5d15a0a973f6fc23637e158a2 --- /dev/null +++ b/Documentation/netlink/specs/ovpn.yaml @@ -0,0 +1,362 @@ +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) +# +# Author: Antonio Quartulli +# +# Copyright (c) 2024, OpenVPN Inc. +# + +name: ovpn + +protocol: genetlink + +doc: Netlink protocol to control OpenVPN network devices + +definitions: + - +type: const +name: nonce-tail-size +value: 8 + - +type: enum +name: cipher-alg +entries: [ none, aes-gcm, chacha20-poly1305 ] + - +type: enum +name: del-peer-reason +entries: [ teardown, userspace, expired, transport-error, transport-disconnect ] + - +type: enum +name: key-slot +entries: [ primary, secondary ] + +attribute-sets: + - +name: peer +attributes: + - +name: id +type: u32 +doc: | + The unique ID of the peer. To be used to identify peers during + operations +checks: + max: 0xFF + - +name: remote-ipv4 +type: u32 +doc: The remote IPv4 address of the peer +byte-order: big-endian +display-hint: ipv4 + - +name: remote-ipv6 +type: binary +doc: The remote IPv6 address of the peer +display-hint: ipv6 +checks: + exact-len: 16 + - +name: remote-ipv6-scope-id +type: u32 +doc: The scope id of the remote IPv6 address of the peer (RFC2553) + - +name: remote-port +type: u16 +doc: The remote port of the peer +byte-order: big-endian +checks: + min: 1 + - +name: socket +type: u32 +doc: The socket to be used to communicate with the peer + - +name: vpn-ipv4 +type: u32 +doc: The IPv4 address assigned to the peer by the server +byte-order: big-endian +display-hint: ipv4 + - +name: vpn-ipv6 +type: binary +doc: The IPv6 address assigned to the peer by the server +display-hint: ipv6 +checks: + exact-len: 16 + - +name: local-ipv4 +type: u32 +doc: The local IPv4 to be used to send packets to the peer (UDP only) +byte-order: big-endian +display-hint: ipv4 + - +name: local-ipv6 +type: binary +doc: The local IPv6 to be used to send packets to the peer (UDP only) +display-hint: ipv6 +checks: + exact-len: 16 + - +name: local-port +type: u16 +doc: The local port to be used to send packets to the peer (UDP only) +byte-order: big-endian +checks: + min: 1 + - +name: keepalive-interval +type: u32 +doc: | + The number of seconds after which a keep alive message is sent to the + peer + - +name: keepalive-timeout +type: u32 +doc: | + The number of seconds from the last activity after which the peer is + assumed dead + - +name: del-reason +type: u32 +doc: The reason why a peer was deleted +enum: del-peer-reason + - +name: vpn-rx-bytes +type: uint +doc: Number of bytes received over the tunnel + - +name: vpn-tx-bytes +type: uint +doc: Number of bytes transmitted over the tunnel + - +name: vpn-rx-packets +type: uint +doc: Number of packets received over the tunnel + - +name: vpn-tx-packets +type: uint +doc: Number of packets transmitted over the tunnel + - +name: link-rx-bytes +type: uint +doc: Number of bytes received at the transport level + - +name: link-tx-bytes
[PATCH net-next v10 05/23] ovpn: keep carrier always on
An ovpn interface will keep carrier always on and let the user decide when an interface should be considered disconnected. This way, even if an ovpn interface is not connected to any peer, it can still retain all IPs and routes and thus prevent any data leak. Signed-off-by: Antonio Quartulli Reviewed-by: Andrew Lunn --- drivers/net/ovpn/main.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_open(struct net_device *dev) { + /* ovpn keeps the carrier always on to avoid losing IP or route +* configuration upon disconnection. This way it can prevent leaks +* of traffic outside of the VPN tunnel. +* The user may override this behaviour by tearing down the interface +* manually. +*/ + netif_carrier_on(dev); netif_tx_start_all_queues(dev); return 0; } -- 2.45.2
[PATCH net-next v10 04/23] ovpn: add basic interface creation/destruction/management routines
Add basic infrastructure for handling ovpn interfaces. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/main.c | 115 -- drivers/net/ovpn/main.h | 7 +++ drivers/net/ovpn/ovpnstruct.h | 8 +++ drivers/net/ovpn/packet.h | 40 +++ include/uapi/linux/if_link.h | 15 ++ 5 files changed, 180 insertions(+), 5 deletions(-) diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -10,18 +10,52 @@ #include #include #include +#include +#include #include -#include +#include #include "ovpnstruct.h" #include "main.h" #include "netlink.h" #include "io.h" +#include "packet.h" /* Driver info */ #define DRV_DESCRIPTION"OpenVPN data channel offload (ovpn)" #define DRV_COPYRIGHT "(C) 2020-2024 OpenVPN, Inc." +static void ovpn_struct_free(struct net_device *net) +{ +} + +static int ovpn_net_open(struct net_device *dev) +{ + netif_tx_start_all_queues(dev); + return 0; +} + +static int ovpn_net_stop(struct net_device *dev) +{ + netif_tx_stop_all_queues(dev); + return 0; +} + +static const struct net_device_ops ovpn_netdev_ops = { + .ndo_open = ovpn_net_open, + .ndo_stop = ovpn_net_stop, + .ndo_start_xmit = ovpn_net_xmit, +}; + +static const struct device_type ovpn_type = { + .name = OVPN_FAMILY_NAME, +}; + +static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = { + [IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P, + OVPN_MODE_MP), +}; + /** * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn' * @dev: the interface to check @@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev) return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit; } +static void ovpn_setup(struct net_device *dev) +{ + /* compute the overhead considering AEAD encryption */ + const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 + +sizeof(struct udphdr) + +max(sizeof(struct ipv6hdr), sizeof(struct iphdr)); + + netdev_features_t feat = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM | +NETIF_F_GSO | NETIF_F_GSO_SOFTWARE | +NETIF_F_HIGHDMA; + + dev->needs_free_netdev = true; + + dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS; + + dev->netdev_ops = &ovpn_netdev_ops; + + dev->priv_destructor = ovpn_struct_free; + + dev->hard_header_len = 0; + dev->addr_len = 0; + dev->mtu = ETH_DATA_LEN - overhead; + dev->min_mtu = IPV4_MIN_MTU; + dev->max_mtu = IP_MAX_MTU - overhead; + + dev->type = ARPHRD_NONE; + dev->flags = IFF_POINTOPOINT | IFF_NOARP; + dev->priv_flags |= IFF_NO_QUEUE; + + dev->lltx = true; + dev->features |= feat; + dev->hw_features |= feat; + dev->hw_enc_features |= feat; + + dev->needed_headroom = OVPN_HEAD_ROOM; + dev->needed_tailroom = OVPN_MAX_PADDING; + + SET_NETDEV_DEVTYPE(dev, &ovpn_type); +} + static int ovpn_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - return -EOPNOTSUPP; + struct ovpn_struct *ovpn = netdev_priv(dev); + enum ovpn_mode mode = OVPN_MODE_P2P; + + if (data && data[IFLA_OVPN_MODE]) { + mode = nla_get_u8(data[IFLA_OVPN_MODE]); + netdev_dbg(dev, "setting device mode: %u\n", mode); + } + + ovpn->dev = dev; + ovpn->mode = mode; + + /* turn carrier explicitly off after registration, this way state is +* clearly defined +*/ + netif_carrier_off(dev); + + return register_netdevice(dev); } static struct rtnl_link_ops ovpn_link_ops = { .kind = OVPN_FAMILY_NAME, .netns_refund = false, + .priv_size = sizeof(struct ovpn_struct), + .setup = ovpn_setup, + .policy = ovpn_policy, + .maxtype = IFLA_OVPN_MAX, .newlink = ovpn_newlink, .dellink = unregister_netdevice_queue, }; @@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, unsigned long state, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct ovpn_struct *ovpn; if (!ovpn_dev_is_valid(dev)) return NOTIFY_DONE; + ovpn = netdev_priv(dev); + switch (state) { case NETDEV_REGISTER: - /* add device to internal list for later destruction upon -* unregistration -*/ + ovpn-
[PATCH net-next v10 06/23] ovpn: introduce the ovpn_peer object
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client). This includes status for crypto, tx/rx buffers, napi, etc. Only support for one peer is introduced (P2P mode). Multi peer support is introduced with a later patch. Along with the ovpn_peer, also the ovpn_bind object is introcued as the two are strictly related. An ovpn_bind object wraps a sockaddr representing the local coordinates being used to talk to a specific peer. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/Makefile | 2 + drivers/net/ovpn/bind.c | 58 +++ drivers/net/ovpn/bind.h | 117 ++ drivers/net/ovpn/main.c | 11 ++ drivers/net/ovpn/main.h | 2 + drivers/net/ovpn/ovpnstruct.h | 4 + drivers/net/ovpn/peer.c | 354 ++ drivers/net/ovpn/peer.h | 79 ++ 8 files changed, 627 insertions(+) diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index 201dc001419f1d99ae95c0ee0f96e68f8a4eac16..ce13499b3e1775a7f2a9ce16c6cb0aa088f93685 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -7,7 +7,9 @@ # Author: Antonio Quartulli obj-$(CONFIG_OVPN) := ovpn.o +ovpn-y += bind.o ovpn-y += main.o ovpn-y += io.o ovpn-y += netlink.o ovpn-y += netlink-gen.o +ovpn-y += peer.o diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c new file mode 100644 index ..b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a --- /dev/null +++ b/drivers/net/ovpn/bind.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2012-2024 OpenVPN, Inc. + * + * Author:James Yonan + * Antonio Quartulli + */ + +#include +#include + +#include "ovpnstruct.h" +#include "bind.h" +#include "peer.h" + +/** + * ovpn_bind_from_sockaddr - retrieve binding matching sockaddr + * @ss: the sockaddr to match + * + * Return: the bind matching the passed sockaddr if found, NULL otherwise + */ +struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss) +{ + struct ovpn_bind *bind; + size_t sa_len; + + if (ss->ss_family == AF_INET) + sa_len = sizeof(struct sockaddr_in); + else if (ss->ss_family == AF_INET6) + sa_len = sizeof(struct sockaddr_in6); + else + return ERR_PTR(-EAFNOSUPPORT); + + bind = kzalloc(sizeof(*bind), GFP_ATOMIC); + if (unlikely(!bind)) + return ERR_PTR(-ENOMEM); + + memcpy(&bind->remote, ss, sa_len); + + return bind; +} + +/** + * ovpn_bind_reset - assign new binding to peer + * @peer: the peer whose binding has to be replaced + * @new: the new bind to assign + */ +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new) +{ + struct ovpn_bind *old; + + spin_lock_bh(&peer->lock); + old = rcu_replace_pointer(peer->bind, new, true); + spin_unlock_bh(&peer->lock); + + kfree_rcu(old, rcu); +} diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h new file mode 100644 index ..859213d5040deb36c416eafcf5c6ab31c4d52c7a --- /dev/null +++ b/drivers/net/ovpn/bind.h @@ -0,0 +1,117 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2012-2024 OpenVPN, Inc. + * + * Author:James Yonan + * Antonio Quartulli + */ + +#ifndef _NET_OVPN_OVPNBIND_H_ +#define _NET_OVPN_OVPNBIND_H_ + +#include +#include +#include +#include +#include +#include + +struct ovpn_peer; + +/** + * union ovpn_sockaddr - basic transport layer address + * @in4: IPv4 address + * @in6: IPv6 address + */ +union ovpn_sockaddr { + struct sockaddr_in in4; + struct sockaddr_in6 in6; +}; + +/** + * struct ovpn_bind - remote peer binding + * @remote: the remote peer sockaddress + * @local: local endpoint used to talk to the peer + * @local.ipv4: local IPv4 used to talk to the peer + * @local.ipv6: local IPv6 used to talk to the peer + * @rcu: used to schedule RCU cleanup job + */ +struct ovpn_bind { + union ovpn_sockaddr remote; /* remote sockaddr */ + + union { + struct in_addr ipv4; + struct in6_addr ipv6; + } local; + + struct rcu_head rcu; +}; + +/** + * skb_protocol_to_family - translate skb->protocol to AF_INET or AF_INET6 + * @skb: the packet sk_buff to inspect + * + * Return: AF_INET, AF_INET6 or 0 in case of unknown protocol + */ +static inline unsigned short skb_protocol_to_family(const struct sk_buff *skb) +{ + switch (skb->protocol) { + case htons(ETH_P_IP): + return AF_INET; + case htons(ETH_P_IPV6): + return AF_INET6; + default: + return 0; + } +} + +/** + * ovpn_bind_skb_src_match - match packet source with binding + * @bind: the binding to match + * @skb: the pack
[PATCH net-next v10 07/23] ovpn: introduce the ovpn_socket object
This specific structure is used in the ovpn kernel module to wrap and carry around a standard kernel socket. ovpn takes ownership of passed sockets and therefore an ovpn specific objects is attached to them for status tracking purposes. Initially only UDP support is introduced. TCP will come in a later patch. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/Makefile | 2 + drivers/net/ovpn/socket.c | 120 ++ drivers/net/ovpn/socket.h | 48 +++ drivers/net/ovpn/udp.c| 72 drivers/net/ovpn/udp.h| 17 +++ 5 files changed, 259 insertions(+) diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index ce13499b3e1775a7f2a9ce16c6cb0aa088f93685..56bddc9bef83e0befde6af3c3565bb91731d7b22 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -13,3 +13,5 @@ ovpn-y += io.o ovpn-y += netlink.o ovpn-y += netlink-gen.o ovpn-y += peer.o +ovpn-y += socket.o +ovpn-y += udp.o diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c new file mode 100644 index ..090a3232ab0ec19702110f1a90f45c7f10889f6f --- /dev/null +++ b/drivers/net/ovpn/socket.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author:James Yonan + * Antonio Quartulli + */ + +#include +#include + +#include "ovpnstruct.h" +#include "main.h" +#include "io.h" +#include "peer.h" +#include "socket.h" +#include "udp.h" + +static void ovpn_socket_detach(struct socket *sock) +{ + if (!sock) + return; + + sockfd_put(sock); +} + +/** + * ovpn_socket_release_kref - kref_put callback + * @kref: the kref object + */ +void ovpn_socket_release_kref(struct kref *kref) +{ + struct ovpn_socket *sock = container_of(kref, struct ovpn_socket, + refcount); + + ovpn_socket_detach(sock->sock); + kfree_rcu(sock, rcu); +} + +static bool ovpn_socket_hold(struct ovpn_socket *sock) +{ + return kref_get_unless_zero(&sock->refcount); +} + +static struct ovpn_socket *ovpn_socket_get(struct socket *sock) +{ + struct ovpn_socket *ovpn_sock; + + rcu_read_lock(); + ovpn_sock = rcu_dereference_sk_user_data(sock->sk); + if (!ovpn_socket_hold(ovpn_sock)) { + pr_warn("%s: found ovpn_socket with ref = 0\n", __func__); + ovpn_sock = NULL; + } + rcu_read_unlock(); + + return ovpn_sock; +} + +static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) +{ + int ret = -EOPNOTSUPP; + + if (!sock || !peer) + return -EINVAL; + + if (sock->sk->sk_protocol == IPPROTO_UDP) + ret = ovpn_udp_socket_attach(sock, peer->ovpn); + + return ret; +} + +/** + * ovpn_socket_new - create a new socket and initialize it + * @sock: the kernel socket to embed + * @peer: the peer reachable via this socket + * + * Return: an openvpn socket on success or a negative error code otherwise + */ +struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer) +{ + struct ovpn_socket *ovpn_sock; + int ret; + + ret = ovpn_socket_attach(sock, peer); + if (ret < 0 && ret != -EALREADY) + return ERR_PTR(ret); + + /* if this socket is already owned by this interface, just increase the +* refcounter and use it as expected. +* +* Since UDP sockets can be used to talk to multiple remote endpoints, +* openvpn normally instantiates only one socket and shares it among all +* its peers. For this reason, when we find out that a socket is already +* used for some other peer in *this* instance, we can happily increase +* its refcounter and use it normally. +*/ + if (ret == -EALREADY) { + /* caller is expected to increase the sock refcounter before +* passing it to this function. For this reason we drop it if +* not needed, like when this socket is already owned. +*/ + ovpn_sock = ovpn_socket_get(sock); + sockfd_put(sock); + return ovpn_sock; + } + + ovpn_sock = kzalloc(sizeof(*ovpn_sock), GFP_KERNEL); + if (!ovpn_sock) + return ERR_PTR(-ENOMEM); + + ovpn_sock->ovpn = peer->ovpn; + ovpn_sock->sock = sock; + kref_init(&ovpn_sock->refcount); + + rcu_assign_sk_user_data(sock->sk, ovpn_sock); + + return ovpn_sock; +} diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h new file mode 100644 index ..5ad9c5073b085482da95ee8ebf40acf20bf2e4b3 --- /dev/null +++ b/drivers/net/ovpn/socket.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data chann
[PATCH net-next v10 11/23] ovpn: store tunnel and transport statistics
Byte/packet counters for in-tunnel and transport streams are now initialized and updated as needed. To be exported via netlink. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/Makefile | 1 + drivers/net/ovpn/crypto_aead.c | 2 ++ drivers/net/ovpn/io.c | 11 ++ drivers/net/ovpn/peer.c| 2 ++ drivers/net/ovpn/peer.h| 5 + drivers/net/ovpn/skb.h | 1 + drivers/net/ovpn/stats.c | 21 +++ drivers/net/ovpn/stats.h | 47 ++ 8 files changed, 90 insertions(+) diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index ccdaeced1982c851475657860a005ff2b9dfbd13..d43fda72646bdc7644d9a878b56da0a0e5680c98 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -17,4 +17,5 @@ ovpn-y += netlink-gen.o ovpn-y += peer.o ovpn-y += pktid.o ovpn-y += socket.o +ovpn-y += stats.o ovpn-y += udp.o diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c index f9e3feb297b19868b1084048933796fcc7a47d6e..072bb0881764752520e8e26e18337c1274ce1aa4 100644 --- a/drivers/net/ovpn/crypto_aead.c +++ b/drivers/net/ovpn/crypto_aead.c @@ -48,6 +48,7 @@ int ovpn_aead_encrypt(struct ovpn_peer *peer, struct ovpn_crypto_key_slot *ks, int nfrags, ret; u32 pktid, op; + ovpn_skb_cb(skb)->orig_len = skb->len; ovpn_skb_cb(skb)->peer = peer; ovpn_skb_cb(skb)->ks = ks; @@ -159,6 +160,7 @@ int ovpn_aead_decrypt(struct ovpn_peer *peer, struct ovpn_crypto_key_slot *ks, payload_offset = OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE + tag_size; payload_len = skb->len - payload_offset; + ovpn_skb_cb(skb)->orig_len = skb->len; ovpn_skb_cb(skb)->payload_offset = payload_offset; ovpn_skb_cb(skb)->peer = peer; ovpn_skb_cb(skb)->ks = ks; diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 4c81c4547d35d2a73f680ef1f5d8853ffbd952e0..d56e74660c7be9020b5bdf7971322d41afd436d6 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -12,6 +12,7 @@ #include #include #include +#include #include "ovpnstruct.h" #include "peer.h" @@ -68,6 +69,7 @@ void ovpn_decrypt_post(void *data, int ret) unsigned int payload_offset = 0; struct sk_buff *skb = data; struct ovpn_peer *peer; + unsigned int orig_len; __be16 proto; __be32 *pid; @@ -80,6 +82,7 @@ void ovpn_decrypt_post(void *data, int ret) payload_offset = ovpn_skb_cb(skb)->payload_offset; ks = ovpn_skb_cb(skb)->ks; peer = ovpn_skb_cb(skb)->peer; + orig_len = ovpn_skb_cb(skb)->orig_len; /* crypto is done, cleanup skb CB and its members */ @@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; } + /* increment RX stats */ + ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len); + ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len); + ovpn_netdev_write(peer, skb); /* skb is passed to upper layer - don't free it */ skb = NULL; @@ -175,6 +182,7 @@ void ovpn_encrypt_post(void *data, int ret) struct ovpn_crypto_key_slot *ks; struct sk_buff *skb = data; struct ovpn_peer *peer; + unsigned int orig_len; /* encryption is happening asynchronously. This function will be * called later by the crypto callback with a proper return value @@ -184,6 +192,7 @@ void ovpn_encrypt_post(void *data, int ret) ks = ovpn_skb_cb(skb)->ks; peer = ovpn_skb_cb(skb)->peer; + orig_len = ovpn_skb_cb(skb)->orig_len; /* crypto is done, cleanup skb CB and its members */ @@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret) goto err; skb_mark_not_on_list(skb); + ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len); + ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len); switch (peer->sock->sock->sk->sk_protocol) { case IPPROTO_UDP: diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index 98ae7662f1e76811e625dc5f4b4c5c884856fbd6..5025bfb759d6a5f31e3f2ec094fe561fbdb9f451 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -48,6 +48,8 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) ovpn_crypto_state_init(&peer->crypto); spin_lock_init(&peer->lock); kref_init(&peer->refcount); + ovpn_peer_stats_init(&peer->vpn_stats); + ovpn_peer_stats_init(&peer->link_stats); ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL); if (ret < 0) { diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 754fea470d1b4787f64a931d6c6adc24182fc16f..eb1e31e854fbfff25d07fba8026789e41a76c113 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -13,6 +13,7 @@ #include #include "crypto.h" +#include "stats.h" /** * s
[PATCH net-next v10 09/23] ovpn: implement basic RX path (UDP)
Packets received over the socket are forwarded to the user device. Implementation is UDP only. TCP will be added by a later patch. Note: no decryption/decapsulation exists yet, packets are forwarded as they arrive without much processing. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/io.c | 66 ++- drivers/net/ovpn/io.h | 2 + drivers/net/ovpn/main.c | 13 +- drivers/net/ovpn/ovpnstruct.h | 3 ++ drivers/net/ovpn/proto.h | 75 ++ drivers/net/ovpn/socket.c | 24 ++ drivers/net/ovpn/udp.c| 104 +- drivers/net/ovpn/udp.h| 3 +- 8 files changed, 286 insertions(+), 4 deletions(-) diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 77ba4d33ae0bd2f52e8bd1c06a182d24285297b4..791a1b117125118b179cb13cdfd5fbab6523a360 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -9,15 +9,79 @@ #include #include +#include #include -#include "io.h" #include "ovpnstruct.h" #include "peer.h" +#include "io.h" +#include "netlink.h" +#include "proto.h" #include "udp.h" #include "skb.h" #include "socket.h" +/* Called after decrypt to write the IP packet to the device. + * This method is expected to manage/free the skb. + */ +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) +{ + unsigned int pkt_len; + + /* we can't guarantee the packet wasn't corrupted before entering the +* VPN, therefore we give other layers a chance to check that +*/ + skb->ip_summed = CHECKSUM_NONE; + + /* skb hash for transport packet no longer valid after decapsulation */ + skb_clear_hash(skb); + + /* post-decrypt scrub -- prepare to inject encapsulated packet onto the +* interface, based on __skb_tunnel_rx() in dst.h +*/ + skb->dev = peer->ovpn->dev; + skb_set_queue_mapping(skb, 0); + skb_scrub_packet(skb, true); + + skb_reset_network_header(skb); + skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb); + + memset(skb->cb, 0, sizeof(skb->cb)); + + /* cause packet to be "received" by the interface */ + pkt_len = skb->len; + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, +skb) == NET_RX_SUCCESS)) + /* update RX stats with the size of decrypted packet */ + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); +} + +static void ovpn_decrypt_post(struct sk_buff *skb, int ret) +{ + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; + + if (unlikely(ret < 0)) + goto drop; + + ovpn_netdev_write(peer, skb); + /* skb is passed to upper layer - don't free it */ + skb = NULL; +drop: + if (unlikely(skb)) + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); + ovpn_peer_put(peer); + kfree_skb(skb); +} + +/* pick next packet from RX queue, decrypt and forward it to the device */ +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb) +{ + ovpn_skb_cb(skb)->peer = peer; + ovpn_decrypt_post(skb, 0); +} + static void ovpn_encrypt_post(struct sk_buff *skb, int ret) { struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h index aa259be66441f7b0262f39da12d6c3dce0a9b24c..9667a0a470e0b4b427524fffb5b9b395007e5a2f 100644 --- a/drivers/net/ovpn/io.h +++ b/drivers/net/ovpn/io.h @@ -12,4 +12,6 @@ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev); +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb); + #endif /* _NET_OVPN_OVPN_H_ */ diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 5492ce07751d135c1484fe1ed8227c646df94969..73348765a8cf24321aa6be78e75f607d6dbffb1d 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -32,7 +33,16 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_init(struct net_device *dev) { - return 0; + struct ovpn_struct *ovpn = netdev_priv(dev); + + return gro_cells_init(&ovpn->gro_cells, dev); +} + +static void ovpn_net_uninit(struct net_device *dev) +{ + struct ovpn_struct *ovpn = netdev_priv(dev); + + gro_cells_destroy(&ovpn->gro_cells); } static int ovpn_net_open(struct net_device *dev) @@ -56,6 +66,7 @@ static int ovpn_net_stop(struct net_device *dev) static const struct net_device_ops ovpn_netdev_ops = { .ndo_init = ovpn_net_init, + .ndo_uninit = ovpn_net_uninit, .ndo_open = ovpn_net_open, .ndo_stop = ovpn_net_stop, .ndo_start_xmit = ovpn_net_xmit, diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/ne
[PATCH net-next v10 10/23] ovpn: implement packet processing
This change implements encryption/decryption and encapsulation/decapsulation of OpenVPN packets. Support for generic crypto state is added along with a wrapper for the AEAD crypto kernel API. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/Makefile | 3 + drivers/net/ovpn/crypto.c | 153 + drivers/net/ovpn/crypto.h | 139 drivers/net/ovpn/crypto_aead.c | 367 + drivers/net/ovpn/crypto_aead.h | 31 drivers/net/ovpn/io.c | 146 ++-- drivers/net/ovpn/io.h | 3 + drivers/net/ovpn/packet.h | 2 +- drivers/net/ovpn/peer.c| 29 drivers/net/ovpn/peer.h| 6 + drivers/net/ovpn/pktid.c | 130 +++ drivers/net/ovpn/pktid.h | 87 ++ drivers/net/ovpn/proto.h | 31 drivers/net/ovpn/skb.h | 4 + 14 files changed, 1120 insertions(+), 11 deletions(-) diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index 56bddc9bef83e0befde6af3c3565bb91731d7b22..ccdaeced1982c851475657860a005ff2b9dfbd13 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -8,10 +8,13 @@ obj-$(CONFIG_OVPN) := ovpn.o ovpn-y += bind.o +ovpn-y += crypto.o +ovpn-y += crypto_aead.o ovpn-y += main.o ovpn-y += io.o ovpn-y += netlink.o ovpn-y += netlink-gen.o ovpn-y += peer.o +ovpn-y += pktid.o ovpn-y += socket.o ovpn-y += udp.o diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c new file mode 100644 index ..f1f7510e2f735e367f96eb4982ba82c9af3c8bfc --- /dev/null +++ b/drivers/net/ovpn/crypto.c @@ -0,0 +1,153 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author:James Yonan + * Antonio Quartulli + */ + +#include +#include +#include +#include + +#include "ovpnstruct.h" +#include "main.h" +#include "packet.h" +#include "pktid.h" +#include "crypto_aead.h" +#include "crypto.h" + +static void ovpn_ks_destroy_rcu(struct rcu_head *head) +{ + struct ovpn_crypto_key_slot *ks; + + ks = container_of(head, struct ovpn_crypto_key_slot, rcu); + ovpn_aead_crypto_key_slot_destroy(ks); +} + +void ovpn_crypto_key_slot_release(struct kref *kref) +{ + struct ovpn_crypto_key_slot *ks; + + ks = container_of(kref, struct ovpn_crypto_key_slot, refcount); + call_rcu(&ks->rcu, ovpn_ks_destroy_rcu); +} + +/* can only be invoked when all peer references have been dropped (i.e. RCU + * release routine) + */ +void ovpn_crypto_state_release(struct ovpn_crypto_state *cs) +{ + struct ovpn_crypto_key_slot *ks; + + ks = rcu_access_pointer(cs->slots[0]); + if (ks) { + RCU_INIT_POINTER(cs->slots[0], NULL); + ovpn_crypto_key_slot_put(ks); + } + + ks = rcu_access_pointer(cs->slots[1]); + if (ks) { + RCU_INIT_POINTER(cs->slots[1], NULL); + ovpn_crypto_key_slot_put(ks); + } +} + +/* Reset the ovpn_crypto_state object in a way that is atomic + * to RCU readers. + */ +int ovpn_crypto_state_reset(struct ovpn_crypto_state *cs, + const struct ovpn_peer_key_reset *pkr) +{ + struct ovpn_crypto_key_slot *old = NULL, *new; + u8 idx; + + if (pkr->slot != OVPN_KEY_SLOT_PRIMARY && + pkr->slot != OVPN_KEY_SLOT_SECONDARY) + return -EINVAL; + + new = ovpn_aead_crypto_key_slot_new(&pkr->key); + if (IS_ERR(new)) + return PTR_ERR(new); + + spin_lock_bh(&cs->lock); + idx = cs->primary_idx; + switch (pkr->slot) { + case OVPN_KEY_SLOT_PRIMARY: + old = rcu_replace_pointer(cs->slots[idx], new, + lockdep_is_held(&cs->lock)); + break; + case OVPN_KEY_SLOT_SECONDARY: + old = rcu_replace_pointer(cs->slots[!idx], new, + lockdep_is_held(&cs->lock)); + break; + } + spin_unlock_bh(&cs->lock); + + if (old) + ovpn_crypto_key_slot_put(old); + + return 0; +} + +void ovpn_crypto_key_slot_delete(struct ovpn_crypto_state *cs, +enum ovpn_key_slot slot) +{ + struct ovpn_crypto_key_slot *ks = NULL; + u8 idx; + + if (slot != OVPN_KEY_SLOT_PRIMARY && + slot != OVPN_KEY_SLOT_SECONDARY) { + pr_warn("Invalid slot to release: %u\n", slot); + return; + } + + spin_lock_bh(&cs->lock); + idx = cs->primary_idx; + switch (slot) { + case OVPN_KEY_SLOT_PRIMARY: + ks = rcu_replace_pointer(cs->slots[idx], NULL, +lockdep_is_held(&cs->lock)); + break; + case OVPN_KEY_SLOT_SECONDARY: + ks = rcu_replace_pointer(cs
[PATCH net-next v10 12/23] ovpn: implement TCP transport
With this change ovpn is allowed to communicate to peers also via TCP. Parsing of incoming messages is implemented through the strparser API. Signed-off-by: Antonio Quartulli --- drivers/net/Kconfig | 1 + drivers/net/ovpn/Makefile | 1 + drivers/net/ovpn/io.c | 4 + drivers/net/ovpn/main.c | 3 + drivers/net/ovpn/peer.h | 37 drivers/net/ovpn/socket.c | 44 +++- drivers/net/ovpn/socket.h | 9 +- drivers/net/ovpn/tcp.c| 506 ++ drivers/net/ovpn/tcp.h| 44 9 files changed, 643 insertions(+), 6 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 269b73fcfd348a48174fb96b8f8d4f8788636fa8..f37ce285e61fbee3201f4095ada3230305df511b 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -118,6 +118,7 @@ config WIREGUARD_DEBUG config OVPN tristate "OpenVPN data channel offload" depends on NET && INET + select STREAM_PARSER select NET_UDP_TUNNEL select DST_CACHE select CRYPTO diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index d43fda72646bdc7644d9a878b56da0a0e5680c98..f4d4bd87c851c8dd5b81e357315c4b22de4bd092 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -18,4 +18,5 @@ ovpn-y += peer.o ovpn-y += pktid.o ovpn-y += socket.o ovpn-y += stats.o +ovpn-y += tcp.o ovpn-y += udp.o diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index d56e74660c7be9020b5bdf7971322d41afd436d6..deda19ab87391f86964ba43088b7847d22420eee 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -22,6 +22,7 @@ #include "crypto_aead.h" #include "netlink.h" #include "proto.h" +#include "tcp.h" #include "udp.h" #include "skb.h" #include "socket.h" @@ -213,6 +214,9 @@ void ovpn_encrypt_post(void *data, int ret) case IPPROTO_UDP: ovpn_udp_send_skb(peer->ovpn, peer, skb); break; + case IPPROTO_TCP: + ovpn_tcp_send_skb(peer, skb); + break; default: /* no transport configured yet */ goto err; diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 73348765a8cf24321aa6be78e75f607d6dbffb1d..0488e395eb27d3dba1efc8ff39c023e0ac4a38dd 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -22,6 +22,7 @@ #include "io.h" #include "packet.h" #include "peer.h" +#include "tcp.h" /* Driver info */ #define DRV_DESCRIPTION"OpenVPN data channel offload (ovpn)" @@ -237,6 +238,8 @@ static int __init ovpn_init(void) goto unreg_rtnl; } + ovpn_tcp_init(); + return 0; unreg_rtnl: diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index eb1e31e854fbfff25d07fba8026789e41a76c113..2b7fa9510e362ef3646157bb0d361bab19ddaa99 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -11,6 +11,7 @@ #define _NET_OVPN_OVPNPEER_H_ #include +#include #include "crypto.h" #include "stats.h" @@ -23,6 +24,18 @@ * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel * @sock: the socket being used to talk to this peer + * @tcp: keeps track of TCP specific state + * @tcp.strp: stream parser context (TCP only) + * @tcp.tx_work: work for deferring outgoing packet processing (TCP only) + * @tcp.user_queue: received packets that have to go to userspace (TCP only) + * @tcp.tx_in_progress: true if TX is already ongoing (TCP only) + * @tcp.out_msg.skb: packet scheduled for sending (TCP only) + * @tcp.out_msg.offset: offset where next send should start (TCP only) + * @tcp.out_msg.len: remaining data to send within packet (TCP only) + * @tcp.sk_cb.sk_data_ready: pointer to original cb (TCP only) + * @tcp.sk_cb.sk_write_space: pointer to original cb (TCP only) + * @tcp.sk_cb.prot: pointer to original prot object (TCP only) + * @tcp.sk_cb.ops: pointer to the original prot_ops object (TCP only) * @crypto: the crypto configuration (ciphers, keys, etc..) * @dst_cache: cache for dst_entry used to send to peer * @bind: remote peer binding @@ -43,6 +56,30 @@ struct ovpn_peer { struct in6_addr ipv6; } vpn_addrs; struct ovpn_socket *sock; + + /* state of the TCP reading. Needed to keep track of how much of a +* single packet has already been read from the stream and how much is +* missing +*/ + struct { + struct strparser strp; + struct work_struct tx_work; + struct sk_buff_head user_queue; + bool tx_in_progress; + + struct { + struct sk_buff *skb; + int offset; + int len; + } out_msg; + + struct { + void (*sk_data_ready)(struct sock *sk); + void (*sk_write_space)(struct sock *sk); +
[PATCH net-next v10 13/23] ovpn: implement multi-peer support
With this change an ovpn instance will be able to stay connected to multiple remote endpoints. This functionality is strictly required when running ovpn on an OpenVPN server. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/main.c | 55 +- drivers/net/ovpn/ovpnstruct.h | 19 + drivers/net/ovpn/peer.c | 166 -- drivers/net/ovpn/peer.h | 9 +++ 4 files changed, 243 insertions(+), 6 deletions(-) diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 0488e395eb27d3dba1efc8ff39c023e0ac4a38dd..c7453127ab640d7268c1ce919a87cc5419fac9ee 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -30,6 +30,9 @@ static void ovpn_struct_free(struct net_device *net) { + struct ovpn_struct *ovpn = netdev_priv(net); + + kfree(ovpn->peers); } static int ovpn_net_init(struct net_device *dev) @@ -133,12 +136,52 @@ static void ovpn_setup(struct net_device *dev) SET_NETDEV_DEVTYPE(dev, &ovpn_type); } +static int ovpn_mp_alloc(struct ovpn_struct *ovpn) +{ + struct in_device *dev_v4; + int i; + + if (ovpn->mode != OVPN_MODE_MP) + return 0; + + dev_v4 = __in_dev_get_rtnl(ovpn->dev); + if (dev_v4) { + /* disable redirects as Linux gets confused by ovpn +* handling same-LAN routing. +* This happens because a multipeer interface is used as +* relay point between hosts in the same subnet, while +* in a classic LAN this would not be needed because the +* two hosts would be able to talk directly. +*/ + IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false); + IPV4_DEVCONF_ALL(dev_net(ovpn->dev), SEND_REDIRECTS) = false; + } + + /* the peer container is fairly large, therefore we allocate it only in +* MP mode +*/ + ovpn->peers = kzalloc(sizeof(*ovpn->peers), GFP_KERNEL); + if (!ovpn->peers) + return -ENOMEM; + + spin_lock_init(&ovpn->peers->lock); + + for (i = 0; i < ARRAY_SIZE(ovpn->peers->by_id); i++) { + INIT_HLIST_HEAD(&ovpn->peers->by_id[i]); + INIT_HLIST_NULLS_HEAD(&ovpn->peers->by_vpn_addr[i], i); + INIT_HLIST_NULLS_HEAD(&ovpn->peers->by_transp_addr[i], i); + } + + return 0; +} + static int ovpn_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct ovpn_struct *ovpn = netdev_priv(dev); enum ovpn_mode mode = OVPN_MODE_P2P; + int err; if (data && data[IFLA_OVPN_MODE]) { mode = nla_get_u8(data[IFLA_OVPN_MODE]); @@ -149,6 +192,10 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, ovpn->mode = mode; spin_lock_init(&ovpn->lock); + err = ovpn_mp_alloc(ovpn); + if (err < 0) + return err; + /* turn carrier explicitly off after registration, this way state is * clearly defined */ @@ -197,8 +244,14 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, netif_carrier_off(dev); ovpn->registered = false; - if (ovpn->mode == OVPN_MODE_P2P) + switch (ovpn->mode) { + case OVPN_MODE_P2P: ovpn_peer_release_p2p(ovpn); + break; + case OVPN_MODE_MP: + ovpn_peers_free(ovpn); + break; + } break; case NETDEV_POST_INIT: case NETDEV_GOING_DOWN: diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index 4a48fc048890ab1cda78bc104fe3034b4a49d226..12ed5e22c2108c9f143d1984048eb40c887cac63 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -15,6 +15,23 @@ #include #include +/** + * struct ovpn_peer_collection - container of peers for MultiPeer mode + * @by_id: table of peers index by ID + * @by_vpn_addr: table of peers indexed by VPN IP address (items can be + * rehashed on the fly due to peer IP change) + * @by_transp_addr: table of peers indexed by transport address (items can be + * rehashed on the fly due to peer IP change) + * @lock: protects writes to peer tables + */ +struct ovpn_peer_collection { + DECLARE_HASHTABLE(by_id, 12); + struct hlist_nulls_head by_vpn_addr[1 << 12]; + struct hlist_nulls_head by_transp_addr[1 << 12]; + + spinlock_t lock; /* protects writes to peer tables */ +}; + /** * struct ovpn_struct - per ovpn interface state * @dev: the actual netdev representing the tunnel @@ -22,6 +39,7 @@ * @registered: whether dev is still registered with netdev or not * @mode: device operation mo
Re: [PATCH vhost 1/2] vdpa/mlx5: Fix PA offset with unaligned starting iotlb map
On Mon, Oct 21, 2024 at 9:41 PM Dragos Tatulea wrote: > > From: Si-Wei Liu > > When calculating the physical address range based on the iotlb and mr > [start,end) ranges, the offset of mr->start relative to map->start > is not taken into account. This leads to some incorrect and duplicate > mappings. > > For the case when mr->start < map->start the code is already correct: > the range in [mr->start, map->start) was handled by a different > iteration. > > Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code") > Cc: sta...@vger.kernel.org > Signed-off-by: Si-Wei Liu > Signed-off-by: Dragos Tatulea > --- Acked-by: Jason Wang Thanks
[PATCH net-next v10 14/23] ovpn: implement peer lookup logic
In a multi-peer scenario there are a number of situations when a specific peer needs to be looked up. We may want to lookup a peer by: 1. its ID 2. its VPN destination IP 3. its transport IP/port couple For each of the above, there is a specific routing table referencing all peers for fast look up. Case 2. is a bit special in the sense that an outgoing packet may not be sent to the peer VPN IP directly, but rather to a network behind it. For this reason we first perform a nexthop lookup in the system routing table and then we use the retrieved nexthop as peer search key. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/peer.c | 272 ++-- 1 file changed, 264 insertions(+), 8 deletions(-) diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index 73ef509faab9701192a45ffe78a46dbbbeab01c2..c7dc9032c2b55fd42befc1f3e7a0eca893a96576 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "ovpnstruct.h" #include "bind.h" @@ -125,6 +126,94 @@ static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb, return true; } +/** + * ovpn_nexthop_from_skb4 - retrieve IPv4 nexthop for outgoing skb + * @skb: the outgoing packet + * + * Return: the IPv4 of the nexthop + */ +static __be32 ovpn_nexthop_from_skb4(struct sk_buff *skb) +{ + const struct rtable *rt = skb_rtable(skb); + + if (rt && rt->rt_uses_gateway) + return rt->rt_gw4; + + return ip_hdr(skb)->daddr; +} + +/** + * ovpn_nexthop_from_skb6 - retrieve IPv6 nexthop for outgoing skb + * @skb: the outgoing packet + * + * Return: the IPv6 of the nexthop + */ +static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb) +{ + const struct rt6_info *rt = skb_rt6_info(skb); + + if (!rt || !(rt->rt6i_flags & RTF_GATEWAY)) + return ipv6_hdr(skb)->daddr; + + return rt->rt6i_gateway; +} + +#define ovpn_get_hash_head(_tbl, _key, _key_len) ({\ + typeof(_tbl) *__tbl = &(_tbl); \ + (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \ + +/** + * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address + * @ovpn: the openvpn instance to search + * @addr: VPN IPv4 to use as search key + * + * Refcounter is not increased for the returned peer. + * + * Return: the peer if found or NULL otherwise + */ +static struct ovpn_peer *ovpn_peer_get_by_vpn_addr4(struct ovpn_struct *ovpn, + __be32 addr) +{ + struct hlist_nulls_head *nhead; + struct hlist_nulls_node *ntmp; + struct ovpn_peer *tmp; + + nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, &addr, + sizeof(addr)); + + hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, hash_entry_addr4) + if (addr == tmp->vpn_addrs.ipv4.s_addr) + return tmp; + + return NULL; +} + +/** + * ovpn_peer_get_by_vpn_addr6 - retrieve peer by its VPN IPv6 address + * @ovpn: the openvpn instance to search + * @addr: VPN IPv6 to use as search key + * + * Refcounter is not increased for the returned peer. + * + * Return: the peer if found or NULL otherwise + */ +static struct ovpn_peer *ovpn_peer_get_by_vpn_addr6(struct ovpn_struct *ovpn, + struct in6_addr *addr) +{ + struct hlist_nulls_head *nhead; + struct hlist_nulls_node *ntmp; + struct ovpn_peer *tmp; + + nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, addr, + sizeof(*addr)); + + hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, hash_entry_addr6) + if (ipv6_addr_equal(addr, &tmp->vpn_addrs.ipv6)) + return tmp; + + return NULL; +} + /** * ovpn_peer_transp_match - check if sockaddr and peer binding match * @peer: the peer to get the binding from @@ -202,14 +291,44 @@ ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn, struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct sk_buff *skb) { - struct ovpn_peer *peer = NULL; + struct ovpn_peer *tmp, *peer = NULL; struct sockaddr_storage ss = { 0 }; + struct hlist_nulls_head *nhead; + struct hlist_nulls_node *ntmp; + size_t sa_len; if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) return NULL; if (ovpn->mode == OVPN_MODE_P2P) - peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); + return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); + + switch (ss.ss_family) { + case AF_INET: + sa_len = sizeof(struct sockaddr_in); + break; + case AF_INET6: + sa_len = sizeof(struct sockaddr_in6); + break; + default: +
[PATCH net-next v10 15/23] ovpn: implement keepalive mechanism
OpenVPN supports configuring a periodic keepalive packet. message to allow the remote endpoint detect link failures. This change implements the keepalive sending and timer expiring logic. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/io.c | 77 + drivers/net/ovpn/io.h | 5 ++ drivers/net/ovpn/main.c | 3 + drivers/net/ovpn/ovpnstruct.h | 2 + drivers/net/ovpn/peer.c | 188 ++ drivers/net/ovpn/peer.h | 15 drivers/net/ovpn/proto.h | 2 - 7 files changed, 290 insertions(+), 2 deletions(-) diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index deda19ab87391f86964ba43088b7847d22420eee..63c140138bf98e5d1df79a2565b666d86513323d 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -27,6 +27,33 @@ #include "skb.h" #include "socket.h" +const unsigned char ovpn_keepalive_message[OVPN_KEEPALIVE_SIZE] = { + 0x2a, 0x18, 0x7b, 0xf3, 0x64, 0x1e, 0xb4, 0xcb, + 0x07, 0xed, 0x2d, 0x0a, 0x98, 0x1f, 0xc7, 0x48 +}; + +/** + * ovpn_is_keepalive - check if skb contains a keepalive message + * @skb: packet to check + * + * Assumes that the first byte of skb->data is defined. + * + * Return: true if skb contains a keepalive or false otherwise + */ +static bool ovpn_is_keepalive(struct sk_buff *skb) +{ + if (*skb->data != ovpn_keepalive_message[0]) + return false; + + if (skb->len != OVPN_KEEPALIVE_SIZE) + return false; + + if (!pskb_may_pull(skb, OVPN_KEEPALIVE_SIZE)) + return false; + + return !memcmp(skb->data, ovpn_keepalive_message, OVPN_KEEPALIVE_SIZE); +} + /* Called after decrypt to write the IP packet to the device. * This method is expected to manage/free the skb. */ @@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; } + /* keep track of last received authenticated packet for keepalive */ + peer->last_recv = ktime_get_real_seconds(); + /* point to encapsulated IP packet */ __skb_pull(skb, payload_offset); @@ -121,6 +151,12 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; } + if (ovpn_is_keepalive(skb)) { + net_dbg_ratelimited("%s: ping received from peer %u\n", + peer->ovpn->dev->name, peer->id); + goto drop; + } + net_info_ratelimited("%s: unsupported protocol received from peer %u\n", peer->ovpn->dev->name, peer->id); goto drop; @@ -221,6 +257,10 @@ void ovpn_encrypt_post(void *data, int ret) /* no transport configured yet */ goto err; } + + /* keep track of last sent packet for keepalive */ + peer->last_sent = ktime_get_real_seconds(); + /* skb passed down the stack - don't free it */ skb = NULL; err: @@ -361,3 +401,40 @@ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) kfree_skb_list(skb); return NET_XMIT_DROP; } + +/** + * ovpn_xmit_special - encrypt and transmit an out-of-band message to peer + * @peer: peer to send the message to + * @data: message content + * @len: message length + * + * Assumes that caller holds a reference to peer + */ +void ovpn_xmit_special(struct ovpn_peer *peer, const void *data, + const unsigned int len) +{ + struct ovpn_struct *ovpn; + struct sk_buff *skb; + + ovpn = peer->ovpn; + if (unlikely(!ovpn)) + return; + + skb = alloc_skb(256 + len, GFP_ATOMIC); + if (unlikely(!skb)) + return; + + skb_reserve(skb, 128); + skb->priority = TC_PRIO_BESTEFFORT; + __skb_put_data(skb, data, len); + + /* increase reference counter when passing peer to sending queue */ + if (!ovpn_peer_hold(peer)) { + netdev_dbg(ovpn->dev, "%s: cannot hold peer reference for sending special packet\n", + __func__); + kfree_skb(skb); + return; + } + + ovpn_send(ovpn, skb, peer); +} diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h index ad81dd86924689309b3299573575a1705eddaf99..eb224114152c29f42aadf026212e8d278006b490 100644 --- a/drivers/net/ovpn/io.h +++ b/drivers/net/ovpn/io.h @@ -10,9 +10,14 @@ #ifndef _NET_OVPN_OVPN_H_ #define _NET_OVPN_OVPN_H_ +#define OVPN_KEEPALIVE_SIZE 16 +extern const unsigned char ovpn_keepalive_message[OVPN_KEEPALIVE_SIZE]; + netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev); void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb); +void ovpn_xmit_special(struct ovpn_peer *peer, const void *data, + const unsigned int len); void ovpn_encrypt_post(void *data, int ret); void ovpn_decrypt_post(
[PATCH net-next v10 17/23] ovpn: add support for peer floating
A peer connected via UDP may change its IP address without reconnecting (float). Add support for detecting and updating the new peer IP/port in case of floating. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/bind.c | 10 ++-- drivers/net/ovpn/io.c | 9 drivers/net/ovpn/peer.c | 129 ++-- drivers/net/ovpn/peer.h | 2 + 4 files changed, 139 insertions(+), 11 deletions(-) diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c index b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a..d17d078c5730bf4336dc87f45cdba3f6b8cad770 100644 --- a/drivers/net/ovpn/bind.c +++ b/drivers/net/ovpn/bind.c @@ -47,12 +47,8 @@ struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss) * @new: the new bind to assign */ void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new) + __must_hold(&peer->lock) { - struct ovpn_bind *old; - - spin_lock_bh(&peer->lock); - old = rcu_replace_pointer(peer->bind, new, true); - spin_unlock_bh(&peer->lock); - - kfree_rcu(old, rcu); + kfree_rcu(rcu_replace_pointer(peer->bind, new, + lockdep_is_held(&peer->lock)), rcu); } diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) /* keep track of last received authenticated packet for keepalive */ peer->last_recv = ktime_get_real_seconds(); + if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) { + /* check if this peer changed it's IP address and update +* state +*/ + ovpn_peer_float(peer, skb); + /* update source endpoint for this peer */ + ovpn_peer_update_local_endpoint(peer, skb); + } + /* point to encapsulated IP packet */ __skb_pull(skb, payload_offset); diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index 3f67d200e283213fcb732d10f9edeb53e0a0e9ee..da6215bbb643592e4567e61e4b4976d367ed109c 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -94,6 +94,131 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) return peer; } +/** + * ovpn_peer_reset_sockaddr - recreate binding for peer + * @peer: peer to recreate the binding for + * @ss: sockaddr to use as remote endpoint for the binding + * @local_ip: local IP for the binding + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer, + const struct sockaddr_storage *ss, + const u8 *local_ip) + __must_hold(&peer->lock) +{ + struct ovpn_bind *bind; + size_t ip_len; + + /* create new ovpn_bind object */ + bind = ovpn_bind_from_sockaddr(ss); + if (IS_ERR(bind)) + return PTR_ERR(bind); + + if (local_ip) { + if (ss->ss_family == AF_INET) { + ip_len = sizeof(struct in_addr); + } else if (ss->ss_family == AF_INET6) { + ip_len = sizeof(struct in6_addr); + } else { + netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n", + __func__); + kfree(bind); + return -EINVAL; + } + + memcpy(&bind->local, local_ip, ip_len); + } + + /* set binding */ + ovpn_bind_reset(peer, bind); + + return 0; +} + +#define ovpn_get_hash_head(_tbl, _key, _key_len) ({\ + typeof(_tbl) *__tbl = &(_tbl); \ + (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \ + +/** + * ovpn_peer_float - update remote endpoint for peer + * @peer: peer to update the remote endpoint for + * @skb: incoming packet to retrieve the source address (remote) from + */ +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) +{ + struct hlist_nulls_head *nhead; + struct sockaddr_storage ss; + const u8 *local_ip = NULL; + struct sockaddr_in6 *sa6; + struct sockaddr_in *sa; + struct ovpn_bind *bind; + sa_family_t family; + size_t salen; + + rcu_read_lock(); + bind = rcu_dereference(peer->bind); + if (unlikely(!bind)) { + rcu_read_unlock(); + return; + } + + spin_lock_bh(&peer->lock); + if (likely(ovpn_bind_skb_src_match(bind, skb))) + goto unlock; + + family = skb_protocol_to_family(skb); + + if (bind->remote.in4.sin_family == family) + local_ip = (u8 *)&bind->local; + + switch (family) { + case AF_INET
[PATCH net-next v10 16/23] ovpn: add support for updating local UDP endpoint
In case of UDP links, the local endpoint used to communicate with a given peer may change without a connection restart. Add support for learning the new address in case of change. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/peer.c | 45 + drivers/net/ovpn/peer.h | 3 +++ 2 files changed, 48 insertions(+) diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index e8a42212af391916b5321e729f7e8a864d0a541f..3f67d200e283213fcb732d10f9edeb53e0a0e9ee 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -416,6 +416,51 @@ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id) return peer; } +/** + * ovpn_peer_update_local_endpoint - update local endpoint for peer + * @peer: peer to update the endpoint for + * @skb: incoming packet to retrieve the destination address (local) from + */ +void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer, +struct sk_buff *skb) +{ + struct ovpn_bind *bind; + + rcu_read_lock(); + bind = rcu_dereference(peer->bind); + if (unlikely(!bind)) + goto unlock; + + spin_lock_bh(&peer->lock); + switch (skb_protocol_to_family(skb)) { + case AF_INET: + if (unlikely(bind->local.ipv4.s_addr != ip_hdr(skb)->daddr)) { + netdev_dbg(peer->ovpn->dev, + "%s: learning local IPv4 for peer %d (%pI4 -> %pI4)\n", + __func__, peer->id, &bind->local.ipv4.s_addr, + &ip_hdr(skb)->daddr); + bind->local.ipv4.s_addr = ip_hdr(skb)->daddr; + } + break; + case AF_INET6: + if (unlikely(!ipv6_addr_equal(&bind->local.ipv6, + &ipv6_hdr(skb)->daddr))) { + netdev_dbg(peer->ovpn->dev, + "%s: learning local IPv6 for peer %d (%pI6c -> %pI6c\n", + __func__, peer->id, &bind->local.ipv6, + &ipv6_hdr(skb)->daddr); + bind->local.ipv6 = ipv6_hdr(skb)->daddr; + } + break; + default: + break; + } + spin_unlock_bh(&peer->lock); + +unlock: + rcu_read_unlock(); +} + /** * ovpn_peer_get_by_dst - Lookup peer to send skb to * @ovpn: the private data representing the current VPN session diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 952927ae78a3ab753aaf2c6cc6f77121bdac34be..1a8638d266b11a4a80ee2f088394d47a7798c3af 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -152,4 +152,7 @@ bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout); void ovpn_peer_keepalive_work(struct work_struct *work); +void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer, +struct sk_buff *skb); + #endif /* _NET_OVPN_OVPNPEER_H_ */ -- 2.45.2
Re: [PATCH v7 1/3] modules: Support extended MODVERSIONS info
On Wed, Oct 23, 2024 at 02:31:28AM +, Matthew Maurer wrote: > Adds a new format for MODVERSIONS which stores each field in a separate > ELF section. This initially adds support for variable length names, but > could later be used to add additional fields to MODVERSIONS in a > backwards compatible way if needed. Any new fields will be ignored by > old user tooling, unlike the current format where user tooling cannot > tolerate adjustments to the format (for example making the name field > longer). > > Since PPC munges its version records to strip leading dots, we reproduce > the munging for the new format. Other architectures do not appear to > have architecture-specific usage of this information. > > Signed-off-by: Matthew Maurer Reviewed-by: Sami Tolvanen Sami
RE: [PATCH v4 00/11] iommufd: Add vIOMMU infrastructure (Part-1)
> From: Nicolin Chen > Sent: Tuesday, October 22, 2024 8:19 AM > > This series introduces a new vIOMMU infrastructure and related ioctls. > > IOMMUFD has been using the HWPT infrastructure for all cases, including a > nested IO page table support. Yet, there're limitations for an HWPT-based > structure to support some advanced HW-accelerated features, such as > CMDQV > on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi- > IOMMU > environment, it is not straightforward for nested HWPTs to share the same > parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a > parent HWPT typically hold one stage-2 IO pagetable and tag it with only > one ID in the cache entries. When sharing one large stage-2 IO pagetable > across physical IOMMU instances, that one ID may not always be available > across all the IOMMU instances. In other word, it's ideal for SW to have > a different container for the stage-2 IO pagetable so it can hold another > ID that's available. Just holding multiple IDs doesn't require a different container. This is just a side effect when vIOMMU will be required for other said reasons. If we have to put more words here I'd prefer to adding a bit more for CMDQV which is more compelling. not a big deal though. 😊 > > For this "different container", add vIOMMU, an additional layer to hold > extra virtualization information: > > > ___ > | iommufd (with vIOMMU)| > | | > | [5] | > |_ | > | | | | > | ||vIOMMU | | > | || | | > | || | | > | | [1] | | [4] [2]| > | | __ | | _ | > | || || [3] || | || | > | || IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | > | ||__||_||_| || | > | || | | | | > > |__||__|__|_ > __|_| > || | | | > __v_ |__v_ __v_ ___v__ > | struct | | PFN | (paging) | | (nested) | |struct| > |iommu_device| |-->|iommu_domain|<|iommu_domain|< > |device| > || storage|| || |__| > nit - [1] ... [5] can be removed. > The vIOMMU object should be seen as a slice of a physical IOMMU instance > that is passed to or shared with a VM. That can be some HW/SW resources: > - Security namespace for guest owned ID, e.g. guest-controlled cache tags > - Access to a sharable nesting parent pagetable across physical IOMMUs > - Virtualization of various platforms IDs, e.g. RIDs and others > - Delivery of paravirtualized invalidation > - Direct assigned invalidation queues > - Direct assigned interrupts > - Non-affiliated event reporting sorry no idea about 'non-affiliated event'. Can you elaborate? > > On a multi-IOMMU system, the vIOMMU object must be instanced to the > number > of the physical IOMMUs that are passed to (via devices) a guest VM, while 'to the number of the physical IOMMUs that have a slice passed to ..." > being able to hold the shareable parent HWPT. Each vIOMMU then just > needs > to allocate its own individual ID to tag its own cache: > > | | paging_hwpt0 | > | hwpt_nested0 |--->| viommu0 -- > | | IDx | > > > | | paging_hwpt0 | > | hwpt_nested1 |--->| viommu1 -- > | | IDy | > > > As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an > allocation > only. And implement it in arm-smmu-v3 driver as a real world use case. > > More vIOMMU-based structs and ioctls will be introduced in the follow-up > series to support vDEVICE, vIRQ (vEVENT) and vQUEUE objects. Although we > repurposed the vIOMMU object from an earlier RFC, just for a referece: > https://lore.kernel.org/all/cover.1712978212.git.nicol...@nvidia.com/ > > This series is on Github: >
RE: [PATCH v4 01/11] iommufd: Move struct iommufd_object to public iommufd header
> From: Nicolin Chen > Sent: Tuesday, October 22, 2024 8:19 AM > > Prepare for an embedded structure design for driver-level iommufd_viommu > objects: > // include/linux/iommufd.h > struct iommufd_viommu { > struct iommufd_object obj; > > }; > > // Some IOMMU driver > struct iommu_driver_viommu { > struct iommufd_viommu core; > > }; > > It has to expose struct iommufd_object and enum iommufd_object_type > from > the core-level private header to the public iommufd header. > > Reviewed-by: Jason Gunthorpe > Signed-off-by: Nicolin Chen Reviewed-by: Kevin Tian
[PATCH v5 1/5] pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
The means by which a pid is determined from a pidfd is duplicated, with some callers holding a reference to the (pid)fd, and others explicitly pinning the pid. Introduce __pidfd_get_pid() which narrows this to one approach of pinning the pid, with an optional output parameters for file->f_flags to avoid the need to hold onto a file to retrieve this. Additionally, allow the ability to open a pidfd by opening a /proc/ directory, utilised by the pidfd_send_signal() system call, providing a pidfd_get_pid_proc() helper function to do so. Doing this allows us to eliminate open-coded pidfd pid lookup and to consistently handle this in one place. This lays the groundwork for a subsequent patch which adds a new sentinel pidfd to explicitly reference the current process (i.e. thread group leader) without the need for a pidfd. Reviewed-by: Shakeel Butt Signed-off-by: Lorenzo Stoakes --- include/linux/pid.h | 30 +- kernel/pid.c| 42 -- kernel/signal.c | 29 ++--- 3 files changed, 59 insertions(+), 42 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index a3aad9b4074c..d466890e1b35 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -2,6 +2,7 @@ #ifndef _LINUX_PID_H #define _LINUX_PID_H +#include #include #include #include @@ -72,8 +73,35 @@ extern struct pid init_struct_pid; struct file; + +/** + * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd. + * + * @pidfd: The pidfd whose pid we want, or the fd of a /proc/ file if + * @alloc_proc is also set. + * @allow_proc: If set, then an fd of a /proc/ file can be passed instead + * of a pidfd, and this will be used to determine the pid. + * @flags: Output variable, if non-NULL, then the file->f_flags of the + * pidfd will be set here. + * + * Returns: If successful, the pid associated with the pidfd, otherwise an + * error. + */ +struct pid *__pidfd_get_pid(unsigned int pidfd, bool allow_proc, + unsigned int *flags); + +static inline struct pid *pidfd_get_pid(unsigned int pidfd, unsigned int *flags) +{ + return __pidfd_get_pid(pidfd, /* allow_proc = */ false, flags); +} + +static inline struct pid *pidfd_get_pid_proc(unsigned int pidfd, +unsigned int *flags) +{ + return __pidfd_get_pid(pidfd, /* allow_proc = */ true, flags); +} + struct pid *pidfd_pid(const struct file *file); -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags); struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags); int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret); void do_notify_pidfd(struct task_struct *task); diff --git a/kernel/pid.c b/kernel/pid.c index 2715afb77eab..94c97559e5c5 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -534,22 +535,32 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) } EXPORT_SYMBOL_GPL(find_ge_pid); -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) +struct pid *__pidfd_get_pid(unsigned int pidfd, bool allow_proc, + unsigned int *flags) { - struct fd f; struct pid *pid; + struct fd f = fdget(pidfd); + struct file *file = fd_file(f); - f = fdget(fd); - if (!fd_file(f)) + if (!file) return ERR_PTR(-EBADF); - pid = pidfd_pid(fd_file(f)); - if (!IS_ERR(pid)) { - get_pid(pid); - *flags = fd_file(f)->f_flags; + pid = pidfd_pid(file); + /* If we allow opening a pidfd via /proc/, do so. */ + if (IS_ERR(pid) && allow_proc) + pid = tgid_pidfd_to_pid(file); + + if (IS_ERR(pid)) { + fdput(f); + return pid; } + /* Pin pid before we release fd. */ + get_pid(pid); + if (flags) + *flags = file->f_flags; fdput(f); + return pid; } @@ -747,23 +758,18 @@ SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd, unsigned int, flags) { struct pid *pid; - struct fd f; int ret; /* flags is currently unused - make sure it's unset */ if (flags) return -EINVAL; - f = fdget(pidfd); - if (!fd_file(f)) - return -EBADF; - - pid = pidfd_pid(fd_file(f)); + pid = pidfd_get_pid(pidfd, NULL); if (IS_ERR(pid)) - ret = PTR_ERR(pid); - else - ret = pidfd_getfd(pid, fd); + return PTR_ERR(pid); - fdput(f); + ret = pidfd_getfd(pid, fd); + + put_pid(pid); return ret; } diff --git a/kernel/signal.c b/kernel/signal.c index 4344860ffcac..9a35b1cf40ad 100644 --- a/kernel/signal.c +++ b/kerne
[PATCH v5 4/5] selftests: pidfd: add pidfd.h UAPI wrapper
Conflicts can arise between system fcntl.h and linux/fcntl.h, imported by the linux/pidfd.h UAPI header. Work around this by adding a wrapper for linux/pidfd.h to tools/include/ which sets the linux/fcntl.h header guard ahead of importing the pidfd.h header file. Adjust the pidfd selftests Makefile to reference this include directory and put it at a higher precidence than any make header installed headers to ensure the wrapper is preferred. This way we can directly import the UAPI header file without issue, use the latest system header file without having to duplicate anything. Reviewed-by: Shuah Khan Signed-off-by: Lorenzo Stoakes --- tools/include/linux/pidfd.h| 14 ++ tools/testing/selftests/pidfd/Makefile | 3 +-- 2 files changed, 15 insertions(+), 2 deletions(-) create mode 100644 tools/include/linux/pidfd.h diff --git a/tools/include/linux/pidfd.h b/tools/include/linux/pidfd.h new file mode 100644 index ..113c8023072d --- /dev/null +++ b/tools/include/linux/pidfd.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef _TOOLS_LINUX_PIDFD_H +#define _TOOLS_LINUX_PIDFD_H + +/* + * Some systems have issues with the linux/fcntl.h import in linux/pidfd.h, so + * work around this by setting the header guard. + */ +#define _LINUX_FCNTL_H +#include "../../../include/uapi/linux/pidfd.h" +#undef _LINUX_FCNTL_H + +#endif /* _TOOLS_LINUX_PIDFD_H */ diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile index d731e3e76d5b..f5038c9dae14 100644 --- a/tools/testing/selftests/pidfd/Makefile +++ b/tools/testing/selftests/pidfd/Makefile @@ -1,8 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only -CFLAGS += -g $(KHDR_INCLUDES) -pthread -Wall +CFLAGS += -g -isystem $(top_srcdir)/tools/include $(KHDR_INCLUDES) -pthread -Wall TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test \ pidfd_poll_test pidfd_wait pidfd_getfd_test pidfd_setns_test include ../lib.mk - -- 2.47.0
Re: [PATCH V4 00/15] selftests/resctrl: Support diverse platforms with MBM and MBA tests
On 10/25/24 6:54 AM, Ilpo Järvinen wrote: > On Thu, 24 Oct 2024, Reinette Chatre wrote: > >> Hi Shuah, >> >> On 10/24/24 3:36 PM, Shuah Khan wrote: >>> >>> Is this patch series ready to be applied? >>> >> >> I believe it is close ... I would like to give Ilpo some time to peek >> at patches 2 and 10 to confirm if I got their fixes right this time. The >> rest of the series is ready. > > Hi, > > I took a look at those two patches now and they seemed fine to me so this > series should be ready to go now. > Thank you very much Ilpo. Reinette
Re: [PATCH v4 02/11] iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct
On Fri, Oct 25, 2024 at 08:47:40AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Tuesday, October 22, 2024 9:16 PM > > > > On Tue, Oct 22, 2024 at 04:59:07PM +0800, Baolu Lu wrote: > > > > > Is it feasible to make vIOMMU object more generic, rather than strictly > > > tying it to nested translation? For example, a normal paging domain that > > > translates gPAs to hPAs could also have a vIOMMU object associated with > > > it. > > > > > > While we can only support vIOMMU object allocation uAPI for S2 paging > > > domains in the context of this series, we could consider leaving the > > > option open to associate a vIOMMU object with other normal paging > > > domains that are not a nested parent? > > > > Why? The nested parent flavour of the domain is basically free to > > create, what reason would be to not do that? > > > > If the HW doesn't support it, then does the HW really need/support a > > VIOMMU? > > Now it's agreed to build trusted I/O on top of this new vIOMMU object. > format-wise probably it's free to assume that nested parent is supported > on any new platform which will support trusted I/O. But I'm not sure > all the conditions around allowing nested are same as for trusted I/O, > e.g. for ARM nesting is allowed only for CANWBS/S2FWB. Are they > always guaranteed in trusted I/O configuration? ARM is a big ? what exactly will come, but I'm expecting that to be resolved either with continued HW support or Linux will add the cache flushing and relax the test. > Baolu did raise a good open to confirm given it will be used beyond > nesting. 😊 Even CC is "nesting", it is just nested with a fixed Identity S1 in the baseline case. The S2 translation still exists and still has to be consistent with whatever the secure world is doing. So, my feeling is that the S2 nested domain is mandatory for the viommu, especially for CC, it must exists. In the end there may be more options than just a nested parent. For instance if the CC design relies on the secure world sharing the CPU and IOMMU page table we might need a new HWPT type to represent that configuration. >From a uapi perspective we seem OK here as the hwpt input could be anything. We might have to adjust some checks in the kernel someday. Jason
[PATCH net-next 2/2] net: netconsole: selftests: Add userdata validation
Extend netcons_basic selftest to verify the userdata functionality by: 1. Creating a test key in the userdata configfs directory 2. Writing a known value to the key 3. Validating the key-value pair appears in the captured network output This ensures the userdata feature is properly tested during selftests. Signed-off-by: Breno Leitao --- .../selftests/drivers/net/netcons_basic.sh| 29 +++ 1 file changed, 29 insertions(+) diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh b/tools/testing/selftests/drivers/net/netcons_basic.sh index 4ad1e216c6b0..d182dcc2a10b 100755 --- a/tools/testing/selftests/drivers/net/netcons_basic.sh +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh @@ -26,10 +26,13 @@ DSTIP=192.168.2.2 PORT="" MSG="netconsole selftest" +USERDATA_KEY="key" +USERDATA_VALUE="value" TARGET=$(mktemp -u netcons_X) DEFAULT_PRINTK_VALUES=$(cat /proc/sys/kernel/printk) NETCONS_CONFIGFS="/sys/kernel/config/netconsole" NETCONS_PATH="${NETCONS_CONFIGFS}"/"${TARGET}" +KEY_PATH="${NETCONS_PATH}/userdata/${USERDATA_KEY}" # NAMESPACE will be populated by setup_ns with a random value NAMESPACE="" @@ -122,6 +125,8 @@ function cleanup() { # delete netconsole dynamic reconfiguration echo 0 > "${NETCONS_PATH}"/enabled + # Remove key + rmdir "${KEY_PATH}" # Remove the configfs entry rmdir "${NETCONS_PATH}" @@ -136,6 +141,18 @@ function cleanup() { echo "${DEFAULT_PRINTK_VALUES}" > /proc/sys/kernel/printk } +function set_user_data() { + if [[ ! -d "${NETCONS_PATH}""/userdata" ]] + then + echo "Userdata path not available in ${NETCONS_PATH}/userdata" + exit "${ksft_skip}" + fi + + mkdir -p "${KEY_PATH}" + VALUE_PATH="${KEY_PATH}""/value" + echo "${USERDATA_VALUE}" > "${VALUE_PATH}" +} + function listen_port_and_save_to() { local OUTPUT=${1} # Just wait for 2 seconds @@ -146,6 +163,10 @@ function listen_port_and_save_to() { function validate_result() { local TMPFILENAME="$1" + # TMPFILENAME will contain something like: + # 6.11.1-0_fbk0_rc13_509_g30d75cea12f7,13,1822,115075213798,-;netconsole selftest: netcons_gtJHM + # key=value + # Check if the file exists if [ ! -f "$TMPFILENAME" ]; then echo "FAIL: File was not generated." >&2 @@ -158,6 +179,12 @@ function validate_result() { exit "${ksft_fail}" fi + if ! grep -q "${USERDATA_KEY}=${USERDATA_VALUE}" "${TMPFILENAME}"; then + echo "FAIL: ${USERDATA_KEY}=${USERDATA_VALUE} not found in ${TMPFILENAME}" >&2 + cat "${TMPFILENAME}" >&2 + exit "${ksft_fail}" + fi + # Delete the file once it is validated, otherwise keep it # for debugging purposes rm "${TMPFILENAME}" @@ -220,6 +247,8 @@ trap cleanup EXIT set_network # Create a dynamic target for netconsole create_dynamic_target +# Set userdata "key" with the "value" value +set_user_data # Listed for netconsole port inside the namespace and destination interface listen_port_and_save_to "${OUTPUT_FILE}" & # Wait for socat to start and listen to the port. -- 2.43.5
Re: [PATCH v4 06/11] iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
On Fri, Oct 25, 2024 at 09:04:15AM +, Tian, Kevin wrote: > > From: Nicolin Chen > > Sent: Tuesday, October 22, 2024 8:19 AM > > > > +static struct iommufd_hwpt_nested * > > +iommufd_hwpt_nested_alloc_for_viommu(struct iommufd_viommu > > *viommu, > > + const struct iommu_user_data *user_data) > > probably "_for" can be skipped to reduce the name length That would sound like a hwpt_nested allocating vIOMMU... It'd be probably neutral to have iommufd_viommu_alloc_hwpt_nested, yet we have iommufd_hwpt_nested_alloc (HWPT-based) to align with.. > looks there missed a check on flags in this path. Oh yes, I missed that. Will pass in the cmd->flags. Thanks Nicolin
Re: [PATCH v4 04/11] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
On Fri, Oct 25, 2024 at 09:05:58AM +, Tian, Kevin wrote: > > From: Nicolin Chen > > Sent: Tuesday, October 22, 2024 8:19 AM > > + > > + viommu->type = cmd->type; > > + viommu->ictx = ucmd->ictx; > > + viommu->hwpt = hwpt_paging; > > + /* Assume physical IOMMUs are unpluggable (the most likely case) > > */ > > + viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev); > > + > > so what would happen if this assumption breaks? I had a very verbose comments previously that Alexey suggested to optimize away.. Perhaps I should add back the part that mentions adding a refcount for pluggable ones.. Nicolin
Re: [PATCH v4 00/11] iommufd: Add vIOMMU infrastructure (Part-1)
On Fri, Oct 25, 2024 at 08:34:05AM +, Tian, Kevin wrote: > > From: Nicolin Chen > > Sent: Tuesday, October 22, 2024 8:19 AM > > > > This series introduces a new vIOMMU infrastructure and related ioctls. > > > > IOMMUFD has been using the HWPT infrastructure for all cases, including a > > nested IO page table support. Yet, there're limitations for an HWPT-based > > structure to support some advanced HW-accelerated features, such as > > CMDQV > > on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi- > > IOMMU > > environment, it is not straightforward for nested HWPTs to share the same > > parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a > > parent HWPT typically hold one stage-2 IO pagetable and tag it with only > > one ID in the cache entries. When sharing one large stage-2 IO pagetable > > across physical IOMMU instances, that one ID may not always be available > > across all the IOMMU instances. In other word, it's ideal for SW to have > > a different container for the stage-2 IO pagetable so it can hold another > > ID that's available. > > Just holding multiple IDs doesn't require a different container. This is > just a side effect when vIOMMU will be required for other said reasons. > > If we have to put more words here I'd prefer to adding a bit more for > CMDQV which is more compelling. not a big deal though. 😊 Ack. > > For this "different container", add vIOMMU, an additional layer to hold > > extra virtualization information: > > > > > > ___ > > | iommufd (with vIOMMU)| > > | | > > | [5] | > > |_ | > > | | | | > > | ||vIOMMU | | > > | || | | > > | || | | > > | | [1] | | [4] [2]| > > | | __ | | _ | > > | || || [3] || | || | > > | || IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | > > | ||__||_||_| || | > > | || | | | | > > > > |__||__|__|_ > > __|_| > > || | | | > > __v_ |__v_ __v_ ___v__ > > | struct | | PFN | (paging) | | (nested) | |struct| > > |iommu_device| |-->|iommu_domain|<|iommu_domain|< > > |device| > > || storage|| || |__| > > > > nit - [1] ... [5] can be removed. They are copied from the Documentation where numbers are needed. I will take all the numbers out in the cover-letters. > > The vIOMMU object should be seen as a slice of a physical IOMMU instance > > that is passed to or shared with a VM. That can be some HW/SW resources: > > - Security namespace for guest owned ID, e.g. guest-controlled cache tags > > - Access to a sharable nesting parent pagetable across physical IOMMUs > > - Virtualization of various platforms IDs, e.g. RIDs and others > > - Delivery of paravirtualized invalidation > > - Direct assigned invalidation queues > > - Direct assigned interrupts > > - Non-affiliated event reporting > > sorry no idea about 'non-affiliated event'. Can you elaborate? I'll put an "e.g.". > > On a multi-IOMMU system, the vIOMMU object must be instanced to the > > number > > of the physical IOMMUs that are passed to (via devices) a guest VM, while > > 'to the number of the physical IOMMUs that have a slice passed to ..." Ack. Thanks Nicolin
Re: [PATCH v4 04/11] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
On Fri, Oct 25, 2024 at 08:59:11AM +, Tian, Kevin wrote: > > From: Nicolin Chen > > Sent: Tuesday, October 22, 2024 8:19 AM > > > > Add a new ioctl for user space to do a vIOMMU allocation. It must be based > > on a nesting parent HWPT, so take its refcount. > > > > If an IOMMU driver supports a driver-managed vIOMMU object, it must > > define > > why highlight 'driver-managed', implying a core-managed vIOMMU > object some day? Oh, core-managed vIOMMU is gone since this version. I should have updated the commit message here too. > > +/** > > + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC) > > + * @size: sizeof(struct iommu_viommu_alloc) > > + * @flags: Must be 0 > > + * @type: Type of the virtual IOMMU. Must be defined in enum > > iommu_viommu_type > > + * @dev_id: The device's physical IOMMU will be used to back the virtual > > IOMMU > > + * @hwpt_id: ID of a nesting parent HWPT to associate to > > + * @out_viommu_id: Output virtual IOMMU ID for the allocated object > > + * > > + * Allocate a virtual IOMMU object that represents the underlying physical > > + * IOMMU's virtualization support. The vIOMMU object is a security-isolated > > + * slice of the physical IOMMU HW that is unique to a specific VM. > > the object itself is a software abstraction, while a 'slice' is a set of > real hw resources. Yea, let's do this: * Allocate a virtual IOMMU object, representing the underlying physical IOMMU's * virtualization support that is a security-isolated slice of the real IOMMU HW * that is unique to a specific VM. Thanks Nicolin
[PATCH net-next 1/2] net: netconsole: selftests: Change the IP subnet
Use a less populated IP range to run the tests, as suggested by Petr in Link: https://lore.kernel.org/netdev/87ikvukv3s@nvidia.com/. Suggested-by: Petr Machata Signed-off-by: Breno Leitao --- tools/testing/selftests/drivers/net/netcons_basic.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh b/tools/testing/selftests/drivers/net/netcons_basic.sh index 06021b2059b7..4ad1e216c6b0 100755 --- a/tools/testing/selftests/drivers/net/netcons_basic.sh +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh @@ -20,9 +20,9 @@ SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")") # Simple script to test dynamic targets in netconsole SRCIF="" # to be populated later -SRCIP=192.168.1.1 +SRCIP=192.168.2.1 DSTIF="" # to be populated later -DSTIP=192.168.1.2 +DSTIP=192.168.2.2 PORT="" MSG="netconsole selftest" -- 2.43.5
Re: [PATCH v4 01/14] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct
On Fri, Oct 25, 2024 at 10:20:54AM -0300, Jason Gunthorpe wrote: > On Fri, Oct 25, 2024 at 06:53:01PM +1100, Alexey Kardashevskiy wrote: > > > +#define iommufd_vdevice_alloc(ictx, drv_struct, member) > > > \ > > > + ({ \ > > > + static_assert( \ > > > + __same_type(struct iommufd_vdevice,\ > > > + ((struct drv_struct *)NULL)->member)); \ > > > + static_assert(offsetof(struct drv_struct, member.obj) == 0); \ > > > + container_of(_iommufd_object_alloc(ictx, \ > > > +sizeof(struct drv_struct), \ > > > +IOMMUFD_OBJ_VDEVICE), \ > > > + struct drv_struct, member.obj); \ > > > + }) > > > #endif > > > > A nit: it hurts eyes to read: > > > > mock_vdev = iommufd_vdevice_alloc(viommu->ictx, mock_vdevice, core); > > > > vs. > > > > mock_vdev = iommufd_vdevice_alloc(viommu->ictx, struct mock_vdevice, core); > > > > as for the former I go searching for a "mock_vdevice" variable and for the > > latter it is clear it is 1) a macro 2) which does some type checking. > > > > also, it makes it impossible to pass things like typeof(..) or a type from > > typedef. Thanks, > > Makes sense to me Ack. Will change accordingly. > And the container_of() should not be used in these macros, the point > was to avoid it to make the PTR_ERR behavior cleraer. Just put a force > type cast I recall that I changed it for a compiler complaint. But it seems to be gone now. Will change it back. Thanks Nicolin
Re: [PATCH net-next v2 2/4] net: hsr: Add VLAN CTAG filter support
On 24/10/2024 11:30, MD Danish Anwar wrote: From: Murali Karicheri This patch adds support for VLAN ctag based filtering at slave devices. The slave ethernet device may be capable of filtering ethernet packets based on VLAN ID. This requires that when the VLAN interface is created over an HSR/PRP interface, it passes the VID information to the associated slave ethernet devices so that it updates the hardware filters to filter ethernet frames based on VID. This patch adds the required functions to propagate the vid information to the slave devices. Signed-off-by: Murali Karicheri Signed-off-by: MD Danish Anwar --- net/hsr/hsr_device.c | 71 +++- 1 file changed, 70 insertions(+), 1 deletion(-) diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index 0ca47ebb01d3..ff586bdc2bde 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -515,6 +515,68 @@ static void hsr_change_rx_flags(struct net_device *dev, int change) } } +static int hsr_ndo_vlan_rx_add_vid(struct net_device *dev, + __be16 proto, u16 vid) +{ + struct hsr_port *port; + struct hsr_priv *hsr; + int ret = 0; + + hsr = netdev_priv(dev); + + hsr_for_each_port(hsr, port) { + if (port->type == HSR_PT_MASTER) + continue; + + ret = vlan_vid_add(port->dev, proto, vid); + switch (port->type) { + case HSR_PT_SLAVE_A: + if (ret) { + netdev_err(dev, "add vid failed for Slave-A\n"); + return ret; + } + break; + + case HSR_PT_SLAVE_B: + if (ret) { + /* clean up Slave-A */ + netdev_err(dev, "add vid failed for Slave-B\n"); + vlan_vid_del(port->dev, proto, vid); + return ret; + } + break; + default: + break; + } + } + + return 0; +} This function doesn't match with hsr_ndo_vlan_rx_kill_vid(). vlan_vid_add() can potentially be executed for port->type equals to HSR_PT_INTERLINK, but the result will be ignored. And the vlan_vid_del() will never happen in this case. Is it desired behavior? Maybe it's better to synchronize add/del code and refactor error path to avoid coping the code? + +static int hsr_ndo_vlan_rx_kill_vid(struct net_device *dev, + __be16 proto, u16 vid) +{ + struct hsr_port *port; + struct hsr_priv *hsr; + + hsr = netdev_priv(dev); + + hsr_for_each_port(hsr, port) { + if (port->type == HSR_PT_MASTER) + continue; + switch (port->type) { + case HSR_PT_SLAVE_A: + case HSR_PT_SLAVE_B: + vlan_vid_del(port->dev, proto, vid); + break; + default: + break; + } + } + + return 0; +} + static const struct net_device_ops hsr_device_ops = { .ndo_change_mtu = hsr_dev_change_mtu, .ndo_open = hsr_dev_open, @@ -523,6 +585,8 @@ static const struct net_device_ops hsr_device_ops = { .ndo_change_rx_flags = hsr_change_rx_flags, .ndo_fix_features = hsr_fix_features, .ndo_set_rx_mode = hsr_set_rx_mode, + .ndo_vlan_rx_add_vid = hsr_ndo_vlan_rx_add_vid, + .ndo_vlan_rx_kill_vid = hsr_ndo_vlan_rx_kill_vid, }; static const struct device_type hsr_type = { @@ -569,7 +633,8 @@ void hsr_dev_setup(struct net_device *dev) dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA | NETIF_F_GSO_MASK | NETIF_F_HW_CSUM | - NETIF_F_HW_VLAN_CTAG_TX; + NETIF_F_HW_VLAN_CTAG_TX | + NETIF_F_HW_VLAN_CTAG_FILTER; dev->features = dev->hw_features; } @@ -647,6 +712,10 @@ int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2], (slave[1]->features & NETIF_F_HW_HSR_FWD)) hsr->fwd_offloaded = true; + if ((slave[0]->features & NETIF_F_HW_VLAN_CTAG_FILTER) && + (slave[1]->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + hsr_dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER; + res = register_netdevice(hsr_dev); if (res) goto err_unregister;
Re: [PATCH RFC 1/3] pinctrl: mediatek: paris: Expose more configurations to GPIO set_config
Il 11/09/24 12:10, AngeloGioacchino Del Regno ha scritto: Il 09/09/24 20:37, Nícolas F. R. A. Prado ha scritto: Currently the set_config callback in the gpio_chip registered by the pinctrl_paris driver only supports PIN_CONFIG_INPUT_DEBOUNCE, despite [...] only supports operations configuring the input debounce parameter of the EINT controller and denies configuring params on the other AP GPIOs [...] (reword as needed) many other configurations already being implemented and available through the pinctrl API for configuration of pins by the Devicetree and other drivers. Expose all configurations currently implemented through the GPIO API so they can also be set from userspace, which is particularly useful to allow testing them from userspace. Signed-off-by: Nícolas F. R. A. Prado --- drivers/pinctrl/mediatek/pinctrl-paris.c | 20 ++-- You can do the same for pinctrl-moore too, it's trivial. Other than that, I agree about performing this change, as this may be useful for more than just testing. Nicolas, please don't forget to respin this patch. Thanks, Angelo 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/pinctrl/mediatek/pinctrl-paris.c b/drivers/pinctrl/mediatek/ pinctrl-paris.c index e12316c42698..668f8055a544 100644 --- a/drivers/pinctrl/mediatek/pinctrl-paris.c +++ b/drivers/pinctrl/mediatek/pinctrl-paris.c @@ -255,10 +255,9 @@ static int mtk_pinconf_get(struct pinctrl_dev *pctldev, return err; } -static int mtk_pinconf_set(struct pinctrl_dev *pctldev, unsigned int pin, +static int mtk_pinconf_set(struct mtk_pinctrl *hw, unsigned int pin, enum pin_config_param param, u32 arg) { - struct mtk_pinctrl *hw = pinctrl_dev_get_drvdata(pctldev); const struct mtk_pin_desc *desc; int err = -ENOTSUPP; u32 reg; @@ -795,7 +794,7 @@ static int mtk_pconf_group_set(struct pinctrl_dev *pctldev, unsigned group, int i, ret; for (i = 0; i < num_configs; i++) { - ret = mtk_pinconf_set(pctldev, grp->pin, + ret = mtk_pinconf_set(hw, grp->pin, pinconf_to_config_param(configs[i]), pinconf_to_config_argument(configs[i])); if (ret < 0) @@ -937,18 +936,19 @@ static int mtk_gpio_set_config(struct gpio_chip *chip, unsigned int offset, { struct mtk_pinctrl *hw = gpiochip_get_data(chip); const struct mtk_pin_desc *desc; - u32 debounce; + enum pin_config_param param = pinconf_to_config_param(config); + u32 arg = pinconf_to_config_argument(config); desc = (const struct mtk_pin_desc *)&hw->soc->pins[offset]; - if (!hw->eint || - pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE || - desc->eint.eint_n == EINT_NA) - return -ENOTSUPP; + if (param == PIN_CONFIG_INPUT_DEBOUNCE) { + if (!hw->eint || desc->eint.eint_n == EINT_NA) + return -ENOTSUPP; - debounce = pinconf_to_config_argument(config); + return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, arg); + } - return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, debounce); + return mtk_pinconf_set(hw, offset, param, arg); } static int mtk_build_gpiochip(struct mtk_pinctrl *hw)
[PATCH V4 09/15] selftests/resctrl: Remove unused measurement code
The MBM and MBA resctrl selftests run a benchmark during which it takes measurements of read memory bandwidth via perf. Code exists to support measurements of write memory bandwidth but there exists no path with which this code can execute. While code exists for write memory bandwidth measurement there has not yet been a use case for it. Remove this unused code. Rename relevant functions to include "read" so that it is clear that it relates only to memory bandwidth reads, while renaming the functions also add consistency by changing the "membw" instances to more prevalent "mem_bw". Signed-off-by: Reinette Chatre Reviewed-by: Ilpo Järvinen --- Changes since V2: - Add Ilpo's Reviewed-by tag. Changes since V1: - New patch. --- tools/testing/selftests/resctrl/mba_test.c| 4 +- tools/testing/selftests/resctrl/mbm_test.c| 4 +- tools/testing/selftests/resctrl/resctrl.h | 8 +- tools/testing/selftests/resctrl/resctrl_val.c | 234 ++ tools/testing/selftests/resctrl/resctrlfs.c | 17 -- 5 files changed, 85 insertions(+), 182 deletions(-) diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index da40a8ed4413..be0ead73e55d 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -21,7 +21,7 @@ static int mba_init(const struct resctrl_val_param *param, int domain_id) { int ret; - ret = initialize_mem_bw_imc(); + ret = initialize_read_mem_bw_imc(); if (ret) return ret; @@ -68,7 +68,7 @@ static int mba_setup(const struct resctrl_test *test, static int mba_measure(const struct user_params *uparams, struct resctrl_val_param *param, pid_t bm_pid) { - return measure_mem_bw(uparams, param, bm_pid, "reads"); + return measure_read_mem_bw(uparams, param, bm_pid); } static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc) diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index cf08ba5e314e..defa94293915 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -91,7 +91,7 @@ static int mbm_init(const struct resctrl_val_param *param, int domain_id) { int ret; - ret = initialize_mem_bw_imc(); + ret = initialize_read_mem_bw_imc(); if (ret) return ret; @@ -122,7 +122,7 @@ static int mbm_setup(const struct resctrl_test *test, static int mbm_measure(const struct user_params *uparams, struct resctrl_val_param *param, pid_t bm_pid) { - return measure_mem_bw(uparams, param, bm_pid, "reads"); + return measure_read_mem_bw(uparams, param, bm_pid); } static void mbm_test_cleanup(void) diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index ba1ce1b35699..82801245e4c1 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -126,7 +126,6 @@ int filter_dmesg(void); int get_domain_id(const char *resource, int cpu_no, int *domain_id); int mount_resctrlfs(void); int umount_resctrlfs(void); -const char *get_bw_report_type(const char *bw_report); bool resctrl_resource_exists(const char *resource); bool resctrl_mon_feature_exists(const char *resource, const char *feature); bool resource_info_file_exists(const char *resource, const char *file); @@ -143,10 +142,9 @@ unsigned char *alloc_buffer(size_t buf_size, int memflush); void mem_flush(unsigned char *buf, size_t buf_size); void fill_cache_read(unsigned char *buf, size_t buf_size, bool once); int run_fill_buf(size_t buf_size, int memflush); -int initialize_mem_bw_imc(void); -int measure_mem_bw(const struct user_params *uparams, - struct resctrl_val_param *param, pid_t bm_pid, - const char *bw_report); +int initialize_read_mem_bw_imc(void); +int measure_read_mem_bw(const struct user_params *uparams, + struct resctrl_val_param *param, pid_t bm_pid); void initialize_mem_bw_resctrl(const struct resctrl_val_param *param, int domain_id); int resctrl_val(const struct resctrl_test *test, diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 113ca18d67c1..c4ebf70a46ef 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -12,13 +12,10 @@ #define UNCORE_IMC "uncore_imc" #define READ_FILE_NAME "events/cas_count_read" -#define WRITE_FILE_NAME"events/cas_count_write" #define DYN_PMU_PATH "/sys/bus/event_source/devices" #define SCALE 0.6103515625 #define MAX_IMCS 20 #define MAX_TOKENS 5 -#define READ 0 -#define WRITE 1 #define CO
[PATCH net-next v10 19/23] ovpn: implement key add/get/del/swap via netlink
This change introduces the netlink commands needed to add, get, delete and swap keys for a specific peer. Userspace is expected to use these commands to create, inspect (non sensible data only), destroy and rotate session keys for a specific peer. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/crypto.c | 42 ++ drivers/net/ovpn/crypto.h | 4 + drivers/net/ovpn/crypto_aead.c | 17 +++ drivers/net/ovpn/crypto_aead.h | 2 + drivers/net/ovpn/netlink.c | 308 - 5 files changed, 369 insertions(+), 4 deletions(-) diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c index f1f7510e2f735e367f96eb4982ba82c9af3c8bfc..cfb014c947b968752ba3dab84ec42dc8ec086379 100644 --- a/drivers/net/ovpn/crypto.c +++ b/drivers/net/ovpn/crypto.c @@ -151,3 +151,45 @@ void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs) spin_unlock_bh(&cs->lock); } + +/** + * ovpn_crypto_config_get - populate keyconf object with non-sensible key data + * @cs: the crypto state to extract the key data from + * @slot: the specific slot to inspect + * @keyconf: the output object to populate + * + * Return: 0 on success or a negative error code otherwise + */ +int ovpn_crypto_config_get(struct ovpn_crypto_state *cs, + enum ovpn_key_slot slot, + struct ovpn_key_config *keyconf) +{ + struct ovpn_crypto_key_slot *ks; + int idx; + + switch (slot) { + case OVPN_KEY_SLOT_PRIMARY: + idx = cs->primary_idx; + break; + case OVPN_KEY_SLOT_SECONDARY: + idx = !cs->primary_idx; + break; + default: + return -EINVAL; + } + + rcu_read_lock(); + ks = rcu_dereference(cs->slots[idx]); + if (!ks || (ks && !ovpn_crypto_key_slot_hold(ks))) { + rcu_read_unlock(); + return -ENOENT; + } + rcu_read_unlock(); + + keyconf->cipher_alg = ovpn_aead_crypto_alg(ks); + keyconf->key_id = ks->key_id; + + ovpn_crypto_key_slot_put(ks); + + return 0; +} diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h index 3b437d26b531c3034cca5343c755ef9c7ef57276..96fd41f4b81b74f8a3ecfe33ee24ba0122d222fe 100644 --- a/drivers/net/ovpn/crypto.h +++ b/drivers/net/ovpn/crypto.h @@ -136,4 +136,8 @@ void ovpn_crypto_state_release(struct ovpn_crypto_state *cs); void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs); +int ovpn_crypto_config_get(struct ovpn_crypto_state *cs, + enum ovpn_key_slot slot, + struct ovpn_key_config *keyconf); + #endif /* _NET_OVPN_OVPNCRYPTO_H_ */ diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c index 072bb0881764752520e8e26e18337c1274ce1aa4..25e4e4a453b2bc499aec9a192fe3d86ba1aac511 100644 --- a/drivers/net/ovpn/crypto_aead.c +++ b/drivers/net/ovpn/crypto_aead.c @@ -367,3 +367,20 @@ ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc) ovpn_aead_crypto_key_slot_destroy(ks); return ERR_PTR(ret); } + +enum ovpn_cipher_alg ovpn_aead_crypto_alg(struct ovpn_crypto_key_slot *ks) +{ + const char *alg_name; + + if (!ks->encrypt) + return OVPN_CIPHER_ALG_NONE; + + alg_name = crypto_tfm_alg_name(crypto_aead_tfm(ks->encrypt)); + + if (!strcmp(alg_name, ALG_NAME_AES)) + return OVPN_CIPHER_ALG_AES_GCM; + else if (!strcmp(alg_name, ALG_NAME_CHACHAPOLY)) + return OVPN_CIPHER_ALG_CHACHA20_POLY1305; + else + return OVPN_CIPHER_ALG_NONE; +} diff --git a/drivers/net/ovpn/crypto_aead.h b/drivers/net/ovpn/crypto_aead.h index 77ee8141599bc06b0dc664c5b0a4dae660a89238..fb65be82436edd7ff89b171f7a89c9103b617d1f 100644 --- a/drivers/net/ovpn/crypto_aead.h +++ b/drivers/net/ovpn/crypto_aead.h @@ -28,4 +28,6 @@ struct ovpn_crypto_key_slot * ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc); void ovpn_aead_crypto_key_slot_destroy(struct ovpn_crypto_key_slot *ks); +enum ovpn_cipher_alg ovpn_aead_crypto_alg(struct ovpn_crypto_key_slot *ks); + #endif /* _NET_OVPN_OVPNAEAD_H_ */ diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c index d504445325ef82db04f87367c858adaf025f6297..fe9377b9b8145784917460cd5f222bc7fae4d8db 100644 --- a/drivers/net/ovpn/netlink.c +++ b/drivers/net/ovpn/netlink.c @@ -18,6 +18,7 @@ #include "netlink.h" #include "netlink-gen.h" #include "bind.h" +#include "crypto.h" #include "packet.h" #include "peer.h" #include "socket.h" @@ -679,24 +680,323 @@ int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) return ret; } +static int ovpn_nl_get_key_dir(struct genl_info *info, struct nlattr *key, + enum ovpn_cipher_alg cipher, + struct ovpn_key_direction *dir) +{ + struct nlattr *attrs[OVPN_A_KEYDIR_
[PATCH net-next v10 18/23] ovpn: implement peer add/get/dump/delete via netlink
This change introduces the netlink command needed to add, delete and retrieve/dump known peers. Userspace is expected to use these commands to handle known peer lifecycles. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/netlink.c | 578 - drivers/net/ovpn/peer.c| 48 ++-- drivers/net/ovpn/peer.h| 5 + 3 files changed, 609 insertions(+), 22 deletions(-) diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c index 2cc34eb1d1d870c6705714cb971c3c5dfb04afda..d504445325ef82db04f87367c858adaf025f6297 100644 --- a/drivers/net/ovpn/netlink.c +++ b/drivers/net/ovpn/netlink.c @@ -7,6 +7,7 @@ */ #include +#include #include #include @@ -16,6 +17,10 @@ #include "io.h" #include "netlink.h" #include "netlink-gen.h" +#include "bind.h" +#include "packet.h" +#include "peer.h" +#include "socket.h" MODULE_ALIAS_GENL_FAMILY(OVPN_FAMILY_NAME); @@ -86,29 +91,592 @@ void ovpn_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, netdev_put(ovpn->dev, &ovpn->dev_tracker); } +static int ovpn_nl_attr_sockaddr_remote(struct nlattr **attrs, + struct sockaddr_storage *ss) +{ + struct sockaddr_in6 *sin6; + struct sockaddr_in *sin; + struct in6_addr *in6; + __be16 port = 0; + __be32 *in; + int af; + + ss->ss_family = AF_UNSPEC; + + if (attrs[OVPN_A_PEER_REMOTE_PORT]) + port = nla_get_be16(attrs[OVPN_A_PEER_REMOTE_PORT]); + + if (attrs[OVPN_A_PEER_REMOTE_IPV4]) { + af = AF_INET; + ss->ss_family = AF_INET; + in = nla_data(attrs[OVPN_A_PEER_REMOTE_IPV4]); + } else if (attrs[OVPN_A_PEER_REMOTE_IPV6]) { + af = AF_INET6; + ss->ss_family = AF_INET6; + in6 = nla_data(attrs[OVPN_A_PEER_REMOTE_IPV6]); + } else { + return AF_UNSPEC; + } + + switch (ss->ss_family) { + case AF_INET6: + /* If this is a regular IPv6 just break and move on, +* otherwise switch to AF_INET and extract the IPv4 accordingly +*/ + if (!ipv6_addr_v4mapped(in6)) { + sin6 = (struct sockaddr_in6 *)ss; + sin6->sin6_port = port; + memcpy(&sin6->sin6_addr, in6, sizeof(*in6)); + break; + } + + /* v4-mapped-v6 address */ + ss->ss_family = AF_INET; + in = &in6->s6_addr32[3]; + fallthrough; + case AF_INET: + sin = (struct sockaddr_in *)ss; + sin->sin_port = port; + sin->sin_addr.s_addr = *in; + break; + } + + /* don't return ss->ss_family as it may have changed in case of +* v4-mapped-v6 address +*/ + return af; +} + +static u8 *ovpn_nl_attr_local_ip(struct nlattr **attrs) +{ + u8 *addr6; + + if (!attrs[OVPN_A_PEER_LOCAL_IPV4] && !attrs[OVPN_A_PEER_LOCAL_IPV6]) + return NULL; + + if (attrs[OVPN_A_PEER_LOCAL_IPV4]) + return nla_data(attrs[OVPN_A_PEER_LOCAL_IPV4]); + + addr6 = nla_data(attrs[OVPN_A_PEER_LOCAL_IPV6]); + /* this is an IPv4-mapped IPv6 address, therefore extract the actual +* v4 address from the last 4 bytes +*/ + if (ipv6_addr_v4mapped((struct in6_addr *)addr6)) + return addr6 + 12; + + return addr6; +} + +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn, +struct genl_info *info, +struct nlattr **attrs) +{ + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, + OVPN_A_PEER_ID)) + return -EINVAL; + + if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) { + NL_SET_ERR_MSG_MOD(info->extack, + "cannot specify both remote IPv4 or IPv6 address"); + return -EINVAL; + } + + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && + !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) { + NL_SET_ERR_MSG_MOD(info->extack, + "cannot specify remote port without IP address"); + return -EINVAL; + } + + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && + attrs[OVPN_A_PEER_LOCAL_IPV4]) { + NL_SET_ERR_MSG_MOD(info->extack, + "cannot specify local IPv4 address without remote"); + return -EINVAL; + } + + if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && + attrs[OVPN_A_PEER_LOCAL_IPV6]) { + NL_SET_ERR_MSG_MOD(info->extack, + "cannot specify local IPV6 address without remote"); + retu
[PATCH net-next v10 21/23] ovpn: notify userspace when a peer is deleted
Whenever a peer is deleted, send a notification to userspace so that it can react accordingly. This is most important when a peer is deleted due to ping timeout, because it all happens in kernelspace and thus userspace has no direct way to learn about it. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/netlink.c | 55 ++ drivers/net/ovpn/netlink.h | 1 + drivers/net/ovpn/peer.c| 1 + 3 files changed, 57 insertions(+) diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c index 2b2ba1a810a0e87fb9ffb43b988fa52725a9589b..4d7d835cb47fd1f03d7cdafa2eda9f03065b8024 100644 --- a/drivers/net/ovpn/netlink.c +++ b/drivers/net/ovpn/netlink.c @@ -999,6 +999,61 @@ int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info) return 0; } +/** + * ovpn_nl_peer_del_notify - notify userspace about peer being deleted + * @peer: the peer being deleted + * + * Return: 0 on success or a negative error code otherwise + */ +int ovpn_nl_peer_del_notify(struct ovpn_peer *peer) +{ + struct sk_buff *msg; + struct nlattr *attr; + int ret = -EMSGSIZE; + void *hdr; + + netdev_info(peer->ovpn->dev, "deleting peer with id %u, reason %d\n", + peer->id, peer->delete_reason); + + msg = nlmsg_new(100, GFP_ATOMIC); + if (!msg) + return -ENOMEM; + + hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0, OVPN_CMD_PEER_DEL_NTF); + if (!hdr) { + ret = -ENOBUFS; + goto err_free_msg; + } + + if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex)) + goto err_cancel_msg; + + attr = nla_nest_start(msg, OVPN_A_PEER); + if (!attr) + goto err_cancel_msg; + + if (nla_put_u8(msg, OVPN_A_PEER_DEL_REASON, peer->delete_reason)) + goto err_cancel_msg; + + if (nla_put_u32(msg, OVPN_A_PEER_ID, peer->id)) + goto err_cancel_msg; + + nla_nest_end(msg, attr); + + genlmsg_end(msg, hdr); + + genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg, + 0, OVPN_NLGRP_PEERS, GFP_ATOMIC); + + return 0; + +err_cancel_msg: + genlmsg_cancel(msg, hdr); +err_free_msg: + nlmsg_free(msg); + return ret; +} + /** * ovpn_nl_key_swap_notify - notify userspace peer's key must be renewed * @peer: the peer whose key needs to be renewed diff --git a/drivers/net/ovpn/netlink.h b/drivers/net/ovpn/netlink.h index 33390b13c8904d40b629662005a9eb92ff617c3b..4ab3abcf23dba11f6b92e3d69e700693adbc671b 100644 --- a/drivers/net/ovpn/netlink.h +++ b/drivers/net/ovpn/netlink.h @@ -12,6 +12,7 @@ int ovpn_nl_register(void); void ovpn_nl_unregister(void); +int ovpn_nl_peer_del_notify(struct ovpn_peer *peer); int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id); #endif /* _NET_OVPN_NETLINK_H_ */ diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index 8cfe1997ec116ae4fe74cd7105d228569e2a66a9..91c608f1ffa1d9dd1535ba308b6adc933dbbf1f1 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -242,6 +242,7 @@ void ovpn_peer_release_kref(struct kref *kref) { struct ovpn_peer *peer = container_of(kref, struct ovpn_peer, refcount); + ovpn_nl_peer_del_notify(peer); ovpn_peer_release(peer); } -- 2.45.2
[PATCH net-next v10 20/23] ovpn: kill key and notify userspace in case of IV exhaustion
IV wrap-around is cryptographically dangerous for a number of ciphers, therefore kill the key and inform userspace (via netlink) should the IV space go exhausted. Userspace has two ways of deciding when the key has to be renewed before exhausting the IV space: 1) time based approach: after X seconds/minutes userspace generates a new key and sends it to the kernel. This is based on guestimate and normally default timer value works well. 2) packet count based approach: after X packets/bytes userspace generates a new key and sends it to the kernel. Userspace keeps track of the amount of traffic by periodically polling GET_PEER and fetching the VPN/LINK stats. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/crypto.c | 19 drivers/net/ovpn/crypto.h | 2 ++ drivers/net/ovpn/io.c | 13 +++ drivers/net/ovpn/netlink.c | 55 ++ drivers/net/ovpn/netlink.h | 2 ++ 5 files changed, 91 insertions(+) diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c index cfb014c947b968752ba3dab84ec42dc8ec086379..a2346bc630be9b60604282d20a33321c277bc56f 100644 --- a/drivers/net/ovpn/crypto.c +++ b/drivers/net/ovpn/crypto.c @@ -55,6 +55,25 @@ void ovpn_crypto_state_release(struct ovpn_crypto_state *cs) } } +/* removes the key matching the specified id from the crypto context */ +void ovpn_crypto_kill_key(struct ovpn_crypto_state *cs, u8 key_id) +{ + struct ovpn_crypto_key_slot *ks = NULL; + + spin_lock_bh(&cs->lock); + if (rcu_access_pointer(cs->slots[0])->key_id == key_id) { + ks = rcu_replace_pointer(cs->slots[0], NULL, +lockdep_is_held(&cs->lock)); + } else if (rcu_access_pointer(cs->slots[1])->key_id == key_id) { + ks = rcu_replace_pointer(cs->slots[1], NULL, +lockdep_is_held(&cs->lock)); + } + spin_unlock_bh(&cs->lock); + + if (ks) + ovpn_crypto_key_slot_put(ks); +} + /* Reset the ovpn_crypto_state object in a way that is atomic * to RCU readers. */ diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h index 96fd41f4b81b74f8a3ecfe33ee24ba0122d222fe..b7a7be752d54f1f8bcd548e0a714511efcaf68a8 100644 --- a/drivers/net/ovpn/crypto.h +++ b/drivers/net/ovpn/crypto.h @@ -140,4 +140,6 @@ int ovpn_crypto_config_get(struct ovpn_crypto_state *cs, enum ovpn_key_slot slot, struct ovpn_key_config *keyconf); +void ovpn_crypto_kill_key(struct ovpn_crypto_state *cs, u8 key_id); + #endif /* _NET_OVPN_OVPNCRYPTO_H_ */ diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261..c04791a508e5c0ae292b7b5d8098096c676b2f99 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -248,6 +248,19 @@ void ovpn_encrypt_post(void *data, int ret) if (likely(ovpn_skb_cb(skb)->req)) aead_request_free(ovpn_skb_cb(skb)->req); + if (unlikely(ret == -ERANGE)) { + /* we ran out of IVs and we must kill the key as it can't be +* use anymore +*/ + netdev_warn(peer->ovpn->dev, + "killing key %u for peer %u\n", ks->key_id, + peer->id); + ovpn_crypto_kill_key(&peer->crypto, ks->key_id); + /* let userspace know so that a new key must be negotiated */ + ovpn_nl_key_swap_notify(peer, ks->key_id); + goto err; + } + if (unlikely(ret < 0)) goto err; diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c index fe9377b9b8145784917460cd5f222bc7fae4d8db..2b2ba1a810a0e87fb9ffb43b988fa52725a9589b 100644 --- a/drivers/net/ovpn/netlink.c +++ b/drivers/net/ovpn/netlink.c @@ -999,6 +999,61 @@ int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info) return 0; } +/** + * ovpn_nl_key_swap_notify - notify userspace peer's key must be renewed + * @peer: the peer whose key needs to be renewed + * @key_id: the ID of the key that needs to be renewed + * + * Return: 0 on success or a negative error code otherwise + */ +int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id) +{ + struct nlattr *k_attr; + struct sk_buff *msg; + int ret = -EMSGSIZE; + void *hdr; + + netdev_info(peer->ovpn->dev, "peer with id %u must rekey - primary key unusable.\n", + peer->id); + + msg = nlmsg_new(100, GFP_ATOMIC); + if (!msg) + return -ENOMEM; + + hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0, OVPN_CMD_KEY_SWAP_NTF); + if (!hdr) { + ret = -ENOBUFS; + goto err_free_msg; + } + + if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex)) + goto err_cancel_msg; + + k_att
[PATCH net-next v10 22/23] ovpn: add basic ethtool support
Implement support for basic ethtool functionality. Note that ovpn is a virtual device driver, therefore various ethtool APIs are just not meaningful and thus not implemented. Signed-off-by: Antonio Quartulli Reviewed-by: Andrew Lunn --- drivers/net/ovpn/main.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 1bd563e3f16f49dd01c897fbe79cbd90f4b8e9aa..9dcf51ae1497dda17d418b762011b04bfd0521df 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -7,6 +7,7 @@ * James Yonan */ +#include #include #include #include @@ -96,6 +97,19 @@ bool ovpn_dev_is_valid(const struct net_device *dev) return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit; } +static void ovpn_get_drvinfo(struct net_device *dev, +struct ethtool_drvinfo *info) +{ + strscpy(info->driver, OVPN_FAMILY_NAME, sizeof(info->driver)); + strscpy(info->bus_info, "ovpn", sizeof(info->bus_info)); +} + +static const struct ethtool_ops ovpn_ethtool_ops = { + .get_drvinfo= ovpn_get_drvinfo, + .get_link = ethtool_op_get_link, + .get_ts_info= ethtool_op_get_ts_info, +}; + static void ovpn_setup(struct net_device *dev) { /* compute the overhead considering AEAD encryption */ @@ -111,6 +125,7 @@ static void ovpn_setup(struct net_device *dev) dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS; + dev->ethtool_ops = &ovpn_ethtool_ops; dev->netdev_ops = &ovpn_netdev_ops; dev->priv_destructor = ovpn_struct_free; -- 2.45.2
[PATCH net-next v10 23/23] testing/selftests: add test tool and scripts for ovpn module
The ovpn-cli tool can be compiled and used as selftest for the ovpn kernel module. It implements the netlink API and can thus be integrated in any script for more automated testing. Along with the tool, 4 scripts are added that perform basic functionality tests by means of network namespaces. Cc: sh...@kernel.org Cc: linux-kselft...@vger.kernel.org Signed-off-by: Antonio Quartulli --- MAINTAINERS|1 + tools/testing/selftests/Makefile |1 + tools/testing/selftests/net/ovpn/.gitignore|2 + tools/testing/selftests/net/ovpn/Makefile | 17 + tools/testing/selftests/net/ovpn/config| 10 + tools/testing/selftests/net/ovpn/data64.key|5 + tools/testing/selftests/net/ovpn/ovpn-cli.c| 2370 tools/testing/selftests/net/ovpn/tcp_peers.txt |5 + .../testing/selftests/net/ovpn/test-chachapoly.sh |9 + tools/testing/selftests/net/ovpn/test-float.sh |9 + tools/testing/selftests/net/ovpn/test-tcp.sh |9 + tools/testing/selftests/net/ovpn/test.sh | 183 ++ tools/testing/selftests/net/ovpn/udp_peers.txt |5 + 13 files changed, 2626 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index cf3d55c3e98aaea8f8817faed99dd7499cd59a71..110485aec73ae5bfeef4f228490ed76e28e01870 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17295,6 +17295,7 @@ T: git https://github.com/OpenVPN/linux-kernel-ovpn.git F: Documentation/netlink/specs/ovpn.yaml F: drivers/net/ovpn/ F: include/uapi/linux/ovpn.h +F: tools/testing/selftests/net/ovpn/ OPENVSWITCH M: Pravin B Shelar diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 363d031a16f7e14152c904e6b68dab1f90c98392..be42906ecb11d4b0f9866d2c04b0e8fb27a2b995 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -68,6 +68,7 @@ TARGETS += net/hsr TARGETS += net/mptcp TARGETS += net/netfilter TARGETS += net/openvswitch +TARGETS += net/ovpn TARGETS += net/packetdrill TARGETS += net/rds TARGETS += net/tcp_ao diff --git a/tools/testing/selftests/net/ovpn/.gitignore b/tools/testing/selftests/net/ovpn/.gitignore new file mode 100644 index ..ee44c081ca7c089933659689303c303a9fa9713b --- /dev/null +++ b/tools/testing/selftests/net/ovpn/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0+ +ovpn-cli diff --git a/tools/testing/selftests/net/ovpn/Makefile b/tools/testing/selftests/net/ovpn/Makefile new file mode 100644 index ..c76d8fd953c5674941c8c2787813063b1bce180f --- /dev/null +++ b/tools/testing/selftests/net/ovpn/Makefile @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2020-2024 OpenVPN, Inc. +# +CFLAGS = -pedantic -Wextra -Wall -Wl,--no-as-needed -g -O0 -ggdb $(KHDR_INCLUDES) +CFLAGS += $(shell pkg-config --cflags libnl-3.0 libnl-genl-3.0) + +LDFLAGS = -lmbedtls -lmbedcrypto +LDFLAGS += $(shell pkg-config --libs libnl-3.0 libnl-genl-3.0) + +TEST_PROGS = test.sh \ + test-chachapoly.sh \ + test-tcp.sh \ + test-float.sh + +TEST_GEN_FILES = ovpn-cli + +include ../../lib.mk diff --git a/tools/testing/selftests/net/ovpn/config b/tools/testing/selftests/net/ovpn/config new file mode 100644 index ..71946ba9fa175c191725e369eb9b973503d9d9c4 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/config @@ -0,0 +1,10 @@ +CONFIG_NET=y +CONFIG_INET=y +CONFIG_STREAM_PARSER=y +CONFIG_NET_UDP_TUNNEL=y +CONFIG_DST_CACHE=y +CONFIG_CRYPTO=y +CONFIG_CRYPTO_AES=y +CONFIG_CRYPTO_GCM=y +CONFIG_CRYPTO_CHACHA20POLY1305=y +CONFIG_OVPN=m diff --git a/tools/testing/selftests/net/ovpn/data64.key b/tools/testing/selftests/net/ovpn/data64.key new file mode 100644 index ..a99e88c4e290f58b12f399b857b873f308d9ba09 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/data64.key @@ -0,0 +1,5 @@ +jRqMACN7d7/aFQNT8S7jkrBD8uwrgHbG5OQZP2eu4R1Y7tfpS2bf5RHv06Vi163CGoaIiTX99R3B +ia9ycAH8Wz1+9PWv51dnBLur9jbShlgZ2QHLtUc4a/gfT7zZwULXuuxdLnvR21DDeMBaTbkgbai9 +uvAa7ne1liIgGFzbv+Bas4HDVrygxIxuAnP5Qgc3648IJkZ0QEXPF+O9f0n5+QIvGCxkAUVx+5K6 +KIs+SoeWXnAopELmoGSjUpFtJbagXK82HfdqpuUxT2Tnuef0/14SzVE/vNleBNu2ZbyrSAaah8tE +BofkPJUBFY+YQcfZNM5Dgrw3i+Bpmpq/gpdg5w== diff --git a/tools/testing/selftests/net/ovpn/ovpn-cli.c b/tools/testing/selftests/net/ovpn/ovpn-cli.c new file mode 100644 index ..046dd069aaaf4e5b091947bd57ed79f8519a780f --- /dev/null +++ b/tools/testing/selftests/net/ovpn/ovpn-cli.c @@ -0,0 +1,2370 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel accelerator + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author:Antonio Quartulli + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#inc
RE: [PATCH v4 11/11] iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support
> From: Nicolin Chen > Sent: Tuesday, October 22, 2024 8:20 AM > > Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type. > Implement > an arm_vsmmu_alloc() with its viommu op > arm_vsmmu_domain_alloc_nested(), > to replace arm_smmu_domain_alloc_nesting(). As an initial step, copy the > VMID from s2_parent. A later cleanup series is required to move the VMID > allocation out of the stage-2 domain allocation routine to this. > > After that, replace nested_domain->s2_parent with nested_domain->vsmmu. > > Note that the validatting conditions for a nested_domain allocation are > moved from arm_vsmmu_domain_alloc_nested to arm_vsmmu_alloc, since > there > is no point in creating a vIOMMU (vsmmu) from the beginning if it would > not support a nested_domain. > > Signed-off-by: Nicolin Chen hmm I wonder whether this series should be merged with Jason's nesting series together and directly use vIOMMU to create nesting. Otherwise it looks a bit weird for one series to first enable a uAPI which is immediately replaced by another uAPI from the following series. Even if both are merged in one cycle, logically it doesn't sound clean when looking at the git history.
[PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
It is useful to be able to utilise the pidfd mechanism to reference the current thread or process (from a userland point of view - thread group leader from the kernel's point of view). Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader. For convenience and to avoid confusion from userland's perspective we alias these: * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what the user will want to use, as they would find it surprising if for instance fd's were unshared()'d and they wanted to invoke pidfd_getfd() and that failed. * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users have no concept of thread groups or what a thread group leader is, and from userland's perspective and nomenclature this is what userland considers to be a process. Due to the refactoring of the central __pidfd_get_pid() function we can implement this functionality centrally, providing the use of this sentinel in most functionality which utilises pidfd's. We need to explicitly adjust kernel_waitid_prepare() to permit this (though it wouldn't really make sense to use this there, we provide the ability for consistency). We explicitly disallow use of this in setns(), which would otherwise have required explicit custom handling, as it doesn't make sense to set the current calling thread to join the namespace of itself. As the callers of pidfd_get_pid() expect an increased reference count on the pid we do so in the self case, reducing churn and avoiding any breakage from existing logic which decrements this reference count. This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS, ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and pidfd_getfd() system calls. Things such as polling a pidfs and general fd operations are not supported, this strictly provides the sentinel for APIs which explicitly accept a pidfd. Reviewed-by: Shakeel Butt Signed-off-by: Lorenzo Stoakes --- include/linux/pid.h| 8 -- include/uapi/linux/pidfd.h | 15 +++ kernel/exit.c | 3 ++- kernel/nsproxy.c | 1 + kernel/pid.c | 51 -- 5 files changed, 57 insertions(+), 21 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index d466890e1b35..3b2ac7567a88 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -78,11 +78,15 @@ struct file; * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd. * * @pidfd: The pidfd whose pid we want, or the fd of a /proc/ file if - * @alloc_proc is also set. + * @alloc_proc is also set, or PIDFD_SELF_* to refer to the current + * thread or thread group leader. * @allow_proc: If set, then an fd of a /proc/ file can be passed instead * of a pidfd, and this will be used to determine the pid. + * @flags: Output variable, if non-NULL, then the file->f_flags of the - * pidfd will be set here. + * pidfd will be set here or If PIDFD_SELF_THREAD is set, this is + * set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP then + * this is set to zero. * * Returns: If successful, the pid associated with the pidfd, otherwise an * error. diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h index 565fc0629fff..0ca2ebf906fd 100644 --- a/include/uapi/linux/pidfd.h +++ b/include/uapi/linux/pidfd.h @@ -29,4 +29,19 @@ #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9) #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10) +/* + * Special sentinel values which can be used to refer to the current thread or + * thread group leader (which from a userland perspective is the process). + */ +#define PIDFD_SELF PIDFD_SELF_THREAD +#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP + +#define PIDFD_SELF_THREAD -100 /* Current thread. */ +#define PIDFD_SELF_THREAD_GROUP-200 /* Current thread group leader. */ + +static inline int pidfd_is_self_sentinel(pid_t pid) +{ + return pid == PIDFD_SELF_THREAD || pid == PIDFD_SELF_THREAD_GROUP; +} + #endif /* _UAPI_LINUX_PIDFD_H */ diff --git a/kernel/exit.c b/kernel/exit.c index 619f0014c33b..3eb20f8252ee 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -71,6 +71,7 @@ #include #include +#include #include #include @@ -1739,7 +1740,7 @@ int kernel_waitid_prepare(struct wait_opts *wo, int which, pid_t upid, break; case P_PIDFD: type = PIDTYPE_PID; - if (upid < 0) + if (upid < 0 && !pidfd_is_self_sentinel(upid)) return -EINVAL; pid = pidfd_get_pid(upid, &f_flags); diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index dc952c3b05af..d239f7eeaa1f 100644
[PATCH v5 0/5] introduce PIDFD_SELF* sentinels
If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like: int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD); ... close(pidfd); Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process thread. This series introduces sentinels for this purposes which can be passed as the pidfd in this instance rather than having to establish a dummy fd for this purpose. It is useful to refer to both the current thread from the userland's perspective for which we use PIDFD_SELF, and the current process from the userland's perspective, for which we use PIDFD_SELF_PROCESS. There is unfortunately some confusion between the kernel and userland as to what constitutes a process - a thread from the userland perspective is a process in userland, and a userland process is a thread group (more specifically the thread group leader from the kernel perspective). We therefore alias things thusly: * PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID. * PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID. In all of the kernel code we refer to PIDFD_SELF_THREAD and PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and PIDFD_SELF_PROCESS. This matters for cases where, for instance, a user unshare()'s FDs or does thread-specific signal handling and where the user would be hugely confused if the FDs referenced or signal processed referred to the thread group leader rather than the individual thread. We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and assert as much in selftests. All other interfaces except setns() will work implicitly with this new interface, however it doesn't make sense to test waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation. In the case of setns() we explicitly disallow use of PIDFD_SELF* as it doesn't make sense to obtain the namespaces of our own process, and it would require work to implement this functionality there that would be of no use. We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd operations such as open() or poll(), as this would require extensive work and be of no real use. v5: * Fixup self test dependencies on pidfd/pidfd.h. v4: * Avoid returning an fd in the __pidfd_get_pid() function as pointed out by Christian, instead simply always pin the pid and maintain fd scope in the helper alone. * Add wrapper header file in tools/include/linux to allow for import of UAPI pidfd.h header without encountering the collision between system fcntl.h and linux/fcntl.h as discussed with Shuah and John. * Fixup tests to import the UAPI pidfd.h header working around conflicts between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports, as reported by Shuah. * Use an int for pidfd_is_self_sentinel() to avoid any dependency on stdbool.h in userland. https://lore.kernel.org/linux-mm/cover.1729198898.git.lorenzo.stoa...@oracle.com/ v3: * Do not fput() an invalid fd as reported by kernel test bot. * Fix unintended churn from moving variable declaration. https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoa...@oracle.com/ v2: * Fix tests as reported by Shuah. * Correct RFC version lore link. https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoa...@oracle.com/ Non-RFC v1: * Removed RFC tag - there seems to be general consensus that this change is a good idea, but perhaps some debate to be had on implementation. It seems sensible then to move forward with the RFC flag removed. * Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases PIDFD_SELF and PIDFD_SELF_PROCESS respectively. * Updated testing accordingly. https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoa...@oracle.com/ RFC version: https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoa...@oracle.com/ Lorenzo Stoakes (5): pidfd: extend pidfd_get_pid() and de-duplicate pid lookup pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process tools: testing: separate out wait_for_pid() into helper header selftests: pidfd: add pidfd.h UAPI wrapper selftests: pidfd: add tests for PIDFD_SELF_* include/linux/pid.h | 34 - include/uapi/linux/pidfd.h| 15 ++ kernel/exit.c | 3 +- kernel/nsproxy.c | 1 + kernel/pid.c | 65 +--- kernel/signal.c | 29 +--- tools/include/linux/pidfd.h | 14 ++ tools/testing/selftests/cgroup/test_kill.c| 2 +- .../pid_namespace/regression_enomem.c | 2 +- tools/testing/selftests/pidfd/Makefile| 3 +- tools/testing/selftests/pidfd/pidfd.h | 28 +--- ...
[PATCH net-next v10 08/23] ovpn: implement basic TX path (UDP)
Packets sent over the ovpn interface are processed and transmitted to the connected peer, if any. Implementation is UDP only. TCP will be added by a later patch. Note: no crypto/encapsulation exists yet. packets are just captured and sent. Signed-off-by: Antonio Quartulli --- drivers/net/ovpn/io.c | 138 +++- drivers/net/ovpn/peer.c | 37 +++- drivers/net/ovpn/peer.h | 4 + drivers/net/ovpn/skb.h | 51 +++ drivers/net/ovpn/udp.c | 232 drivers/net/ovpn/udp.h | 8 ++ 6 files changed, 468 insertions(+), 2 deletions(-) diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index ad3813419c33cbdfe7e8ad6f5c8b444a3540a69f..77ba4d33ae0bd2f52e8bd1c06a182d24285297b4 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -9,14 +9,150 @@ #include #include +#include #include "io.h" +#include "ovpnstruct.h" +#include "peer.h" +#include "udp.h" +#include "skb.h" +#include "socket.h" + +static void ovpn_encrypt_post(struct sk_buff *skb, int ret) +{ + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; + + if (unlikely(ret < 0)) + goto err; + + skb_mark_not_on_list(skb); + + switch (peer->sock->sock->sk->sk_protocol) { + case IPPROTO_UDP: + ovpn_udp_send_skb(peer->ovpn, peer, skb); + break; + default: + /* no transport configured yet */ + goto err; + } + /* skb passed down the stack - don't free it */ + skb = NULL; +err: + if (unlikely(skb)) + dev_core_stats_tx_dropped_inc(peer->ovpn->dev); + ovpn_peer_put(peer); + kfree_skb(skb); +} + +static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb) +{ + ovpn_skb_cb(skb)->peer = peer; + + /* take a reference to the peer because the crypto code may run async. +* ovpn_encrypt_post() will release it upon completion +*/ + if (unlikely(!ovpn_peer_hold(peer))) { + DEBUG_NET_WARN_ON_ONCE(1); + return false; + } + + ovpn_encrypt_post(skb, 0); + return true; +} + +/* send skb to connected peer, if any */ +static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb, + struct ovpn_peer *peer) +{ + struct sk_buff *curr, *next; + + if (likely(!peer)) + /* retrieve peer serving the destination IP of this packet */ + peer = ovpn_peer_get_by_dst(ovpn, skb); + if (unlikely(!peer)) { + net_dbg_ratelimited("%s: no peer to send data to\n", + ovpn->dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + } + + /* this might be a GSO-segmented skb list: process each skb +* independently +*/ + skb_list_walk_safe(skb, curr, next) + if (unlikely(!ovpn_encrypt_one(peer, curr))) { + dev_core_stats_tx_dropped_inc(ovpn->dev); + kfree_skb(curr); + } + + /* skb passed over, no need to free */ + skb = NULL; +drop: + if (likely(peer)) + ovpn_peer_put(peer); + kfree_skb_list(skb); +} /* Send user data to the network */ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) { + struct ovpn_struct *ovpn = netdev_priv(dev); + struct sk_buff *segments, *curr, *next; + struct sk_buff_head skb_list; + __be16 proto; + int ret; + + /* reset netfilter state */ + nf_reset_ct(skb); + + /* verify IP header size in network packet */ + proto = ovpn_ip_check_protocol(skb); + if (unlikely(!proto || skb->protocol != proto)) { + net_err_ratelimited("%s: dropping malformed payload packet\n", + dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + } + + if (skb_is_gso(skb)) { + segments = skb_gso_segment(skb, 0); + if (IS_ERR(segments)) { + ret = PTR_ERR(segments); + net_err_ratelimited("%s: cannot segment packet: %d\n", + dev->name, ret); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + } + + consume_skb(skb); + skb = segments; + } + + /* from this moment on, "skb" might be a list */ + + __skb_queue_head_init(&skb_list); + skb_list_walk_safe(skb, curr, next) { + skb_mark_not_on_list(curr); + + curr = skb_share_check(curr, GFP_ATOMIC); + if (unlikely(!curr)) { + net_err_ratelimited("%s: skb_share_check failed\n", + dev->name); +
Re: [PATCH v4 00/14] iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE)
On Thu, Oct 24, 2024 at 11:14:21PM -0700, Nicolin Chen wrote: > On Fri, Oct 25, 2024 at 04:58:33PM +1100, Alexey Kardashevskiy wrote: > > > > > > Is there any real example of a .vdevice_alloc hook, besides the > > > > > > selftests? It is not in iommufd_viommu_p2-v4-with-rmr, hence the > > > > > > question. I am trying to sketch something with this new machinery > > > > > > and > > > > > > less guessing would be nice. Thanks, > > > > > > > > > > No, I am actually dropping that one, and moving the vdevice struct > > > > > to the private header, as there seems to be no use case: > > > > > > > > Why keep it then? > > > > > > We need that structure to store per-vIOMMU virtual ID. Hiding it > > > in the core only means we need to provide another vIOMMU APIs for > > > drivers to look up the ID, v.s. exposing it for drivers to access > > > directly. > > > > Sorry I lost you here. If we need it, then there should be an example of > > .vdevice_alloc() somewhere but you say they is not one. How do you test > > this, with just selftests? :) Thanks, > > A vDEVICE object will be core-allocated and core-managed, while the > vdevice_alloc is for driver-allocated purpose for which there is no > use case (at least with this series). You can check the vdev ioctl > in this version that has two pathways to allocate a vDEVICE object. > > A vdev_id is used to index viommu's xarray for a driver to convert > the id to a dev pointer via a vIOMMU API. Dropping .vdevice_alloc > just means the driver only lost its direct access. I think the point here is this has to go in stages at the present moment the iommu drivers don't need to hook the vdevice object, so Nicolin should take it out of this series. I would expect CC to need to be in this path, so we should bring it back in the CC series. For CC I'm broadly expecting that creating the CC type vIOMMU will call a CC implementation, and then creating a vdevice against the vIOMMU will also call the CC implementation. The two callbacks would ask the secure world to create the relevant VM visible objects. Jason
Re: [PATCH V4 10/15] selftests/resctrl: Make benchmark parameter passing robust
On Thu, 24 Oct 2024, Reinette Chatre wrote: > The benchmark used during the CMT, MBM, and MBA tests can be provided by > the user via (-b) parameter, if not provided the default "fill_buf" > benchmark is used. The user is additionally able to override > any of the "fill_buf" default parameters when running the tests with > "-b fill_buf ". > > The "fill_buf" parameters are managed as an array of strings. Using an > array of strings is complex because it requires transformations to/from > strings at every producer and consumer. This is made worse for the > individual tests where the default benchmark parameters values may not > be appropriate and additional data wrangling is required. For example, > the CMT test duplicates the entire array of strings in order to replace > one of the parameters. > > More issues appear when combining the usage of an array of strings with > the use case of user overriding default parameters by specifying > "-b fill_buf ". This use case is fragile with opportunities > to trigger a SIGSEGV because of opportunities for NULL pointers to exist > in the array of strings. For example, by running below (thus by specifying > "fill_buf" should be used but all parameters are NULL): > $ sudo resctrl_tests -t mbm -b fill_buf > > Replace the "array of strings" parameters used for "fill_buf" with > new struct fill_buf_param that contains the "fill_buf" parameters that > can be used directly without transformations to/from strings. Two > instances of struct fill_buf_param may exist at any point in time: > * If the user provides new parameters to "fill_buf", the > user parameter structure (struct user_params) will point to a > fully initialized and immutable struct fill_buf_param > containing the user provided parameters. > * If "fill_buf" is the benchmark that should be used by a test, > then the test parameter structure (struct resctrl_val_param) > will point to a fully initialized struct fill_buf_param. The > latter may contain (a) the user provided parameters verbatim, > (b) user provided parameters adjusted to be appropriate for > the test, or (c) the default parameters for "fill_buf" that > is appropriate for the test if the user did not provide > "fill_buf" parameters nor an alternate benchmark. > > The existing behavior of CMT test is to use test defined value for the > buffer size even if the user provides another value via command line. > This behavior is maintained since the test requires that the buffer size > matches the size of the cache allocated, and the amount of cache > allocated can instead be changed by the user with the "-n" command line > parameter. > > Signed-off-by: Reinette Chatre Thanks for the update. Reviewed-by: Ilpo Järvinen -- i. > --- > Changes since V3: > - Handle empty string input. (Ilpo) > > Changes since V2: > - Use empty initializers. (Ilpo) > - Let memflush be bool instead of int. (Ilpo) > - Make user input checks more robust. (Ilpo) > - Assign values as part of local variable definition. (Ilpo) > > Changes since V1: > - Maintain original behavior where user can override "fill_buf" > parameters via command line ... but only those that can actually > be changed. (Ilpo) > - Fix parsing issues associated with original behavior to ensure > any parameter is valid before any attempt to use it. > - Move patch earlier in series to highlight that this fixes existing > issues. > - Make struct fill_buf_param dynamic to support user provided > parameters as well as test specific parameters. > - Rewrite changelog. > --- > tools/testing/selftests/resctrl/cmt_test.c| 32 ++ > tools/testing/selftests/resctrl/fill_buf.c| 4 +- > tools/testing/selftests/resctrl/mba_test.c| 13 ++- > tools/testing/selftests/resctrl/mbm_test.c| 22 ++-- > tools/testing/selftests/resctrl/resctrl.h | 59 +++--- > .../testing/selftests/resctrl/resctrl_tests.c | 103 ++ > tools/testing/selftests/resctrl/resctrl_val.c | 41 --- > 7 files changed, 178 insertions(+), 96 deletions(-) > > diff --git a/tools/testing/selftests/resctrl/cmt_test.c > b/tools/testing/selftests/resctrl/cmt_test.c > index 0c045080d808..4c3cf2c25a38 100644 > --- a/tools/testing/selftests/resctrl/cmt_test.c > +++ b/tools/testing/selftests/resctrl/cmt_test.c > @@ -116,15 +116,13 @@ static void cmt_test_cleanup(void) > > static int cmt_run_test(const struct resctrl_test *test, const struct > user_params *uparams) > { > - const char * const *cmd = uparams->benchmark_cmd; > - const char *new_cmd[BENCHMARK_ARGS]; > + struct fill_buf_param fill_buf = {}; > unsigned long cache_total_size = 0; > int n = uparams->bits ? : 5; > unsigned long long_mask; > - char *span_str = NULL; > int count_of_bits; > size_t span; > - int ret, i; > + int ret; > > ret = get_full_cbm("L3", &long_mask); > i
Re: [PATCH V4 02/15] selftests/resctrl: Print accurate buffer size as part of MBM results
On Thu, 24 Oct 2024, Reinette Chatre wrote: > By default the MBM test uses the "fill_buf" benchmark to keep reading > from a buffer with size DEFAULT_SPAN while measuring memory bandwidth. > User space can provide an alternate benchmark or amend the size of > the buffer "fill_buf" should use. > > Analysis of the MBM measurements do not require that a buffer be used > and thus do not require knowing the size of the buffer if it was used > during testing. Even so, the buffer size is printed as informational > as part of the MBM test results. What is printed as buffer size is > hardcoded as DEFAULT_SPAN, even if the test relied on another benchmark > (that may or may not use a buffer) or if user space amended the buffer > size. > > Ensure that accurate buffer size is printed when using "fill_buf" > benchmark and omit the buffer size information if another benchmark > is used. > > Fixes: ecdbb911f22d ("selftests/resctrl: Add MBM test") > Signed-off-by: Reinette Chatre Reviewed-by: Ilpo Järvinen -- i. > --- > Backporting is not recommended. Backporting this fix will be > a challenge with all the refactoring done since then. This issue > does not impact default tests and there is no sign that > folks run these tests with anything but the defaults. This issue is > also minor since it does not impact actual test runs or results, > just the information printed during a test run. > > Changes since V3: > - Ensure string parsing handles case when user provides "". (Ilpo) > - Fix error returned. (Ilpo) > > Changes since V2: > - Make user input checks more robust. (Ilpo) > > Changes since V1: > - New patch. > --- > tools/testing/selftests/resctrl/mbm_test.c | 16 ++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/resctrl/mbm_test.c > b/tools/testing/selftests/resctrl/mbm_test.c > index 6b5a3b52d861..cf08ba5e314e 100644 > --- a/tools/testing/selftests/resctrl/mbm_test.c > +++ b/tools/testing/selftests/resctrl/mbm_test.c > @@ -40,7 +40,8 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, > size_t span) > ksft_print_msg("%s Check MBM diff within %d%%\n", > ret ? "Fail:" : "Pass:", MAX_DIFF_PERCENT); > ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per); > - ksft_print_msg("Span (MB): %zu\n", span / MB); > + if (span) > + ksft_print_msg("Span (MB): %zu\n", span / MB); > ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc); > ksft_print_msg("avg_bw_resc: %lu\n", avg_bw_resc); > > @@ -138,15 +139,26 @@ static int mbm_run_test(const struct resctrl_test > *test, const struct user_param > .setup = mbm_setup, > .measure= mbm_measure, > }; > + char *endptr = NULL; > + size_t span = 0; > int ret; > > remove(RESULT_FILE_NAME); > > + if (uparams->benchmark_cmd[0] && strcmp(uparams->benchmark_cmd[0], > "fill_buf") == 0) { > + if (uparams->benchmark_cmd[1] && *uparams->benchmark_cmd[1] != > '\0') { > + errno = 0; > + span = strtoul(uparams->benchmark_cmd[1], &endptr, 10); > + if (errno || *endptr != '\0') > + return -EINVAL; > + } > + } > + > ret = resctrl_val(test, uparams, uparams->benchmark_cmd, ¶m); > if (ret) > return ret; > > - ret = check_results(DEFAULT_SPAN); > + ret = check_results(span); > if (ret && (get_vendor() == ARCH_INTEL)) > ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA > Clustering is enabled. Check BIOS configuration.\n"); > >
Re: [PATCH v3] remoteproc: Add a new remoteproc state RPROC_DEFUNCT
On Fri, Oct 25, 2024 at 09:08:03AM -0600, Mathieu Poirier wrote: > On Fri, Oct 25, 2024 at 01:40:45PM +0530, Mukesh Ojha wrote: > > On Mon, Oct 21, 2024 at 09:12:47AM -0600, Mathieu Poirier wrote: > > > Hi Mukesh, > > > > > > On Wed, Oct 16, 2024 at 10:25:46AM +0530, Mukesh Ojha wrote: > > > > Multiple call to glink_subdev_stop() for the same remoteproc can happen > > > > if rproc_stop() fails from Process-A that leaves the rproc state to > > > > RPROC_CRASHED state later a call to recovery_store from user space in > > > > Process B triggers rproc_trigger_recovery() of the same remoteproc to > > > > recover it results in NULL pointer dereference issue in > > > > qcom_glink_smem_unregister(). > > > > > > > > There is other side to this issue if we want to fix this via adding a > > > > NULL check on glink->edge which does not guarantees that the remoteproc > > > > will recover in second call from Process B as it has failed in the first > > > > Process A during SMC shutdown call and may again fail at the same call > > > > and rproc can not recover for such case. > > > > > > > > Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of > > > > remoteproc and the only way to recover from it via system restart. > > > > > > > > Process-A Process-B > > > > > > > > fatal error interrupt happens > > > > > > > > rproc_crash_handler_work() > > > > mutex_lock_interruptible(&rproc->lock); > > > > ... > > > > > > > >rproc->state = RPROC_CRASHED; > > > > ... > > > > mutex_unlock(&rproc->lock); > > > > > > > > rproc_trigger_recovery() > > > > mutex_lock_interruptible(&rproc->lock); > > > > > > > > adsp_stop() > > > > qcom_q6v5_pas 20c0.remoteproc: failed to shutdown: -22 > > > > remoteproc remoteproc3: can't stop rproc: -22 > > > > mutex_unlock(&rproc->lock); > > > > > > Ok, that can happen. > > > > > > > > > > > echo enabled > > > > > /sys/class/remoteproc/remoteprocX/recovery > > > > recovery_store() > > > > > > > > rproc_trigger_recovery() > > > > > > > > mutex_lock_interruptible(&rproc->lock); > > > >rproc_stop() > > > > glink_subdev_stop() > > > > > > > > qcom_glink_smem_unregister() ==| > > > > > > > > | > > > > > > > > V > > > > > > I am missing some information here but I will _assume_ this is caused by > > > glink->edge being set to NULL [1] when glink_subdev_stop() is first > > > called by > > > process A. Instead of adding a new state to the core I think a better > > > idea > > > would be to add a check for a NULL value on @smem in > > > qcom_glink_smem_unregister(). This is a problem that should be fixed in > > > the > > > driver rather than the core. > > > > > > [1]. > > > https://elixir.bootlin.com/linux/v6.12-rc4/source/drivers/remoteproc/qcom_common.c#L213 > > > > > > I did the same here [1] but after discussion with Bjorn, realized that > > remoteproc might not even recover and may fail in the second attempt as > > well and only way is reboot of the machine. > > Whether in RPROC_CRASHED or RPROC_DEFUNCT state, the end result is the same - > manual intervention is needed. I don't see why another state needs to be > added. Is it really true ? As when recovery is disabled and any rproc crash will result in RPROC_CRASHED state, while recovery enablement can recover the rproc back to ONLINE while if rproc recovery is not successful it can be put into RPROC_DEFUNCT state. -Mukesh > > > > > [1] > > https://lore.kernel.org/lkml/20240925103351.1628788-1-quic_mo...@quicinc.com/ > > > > > > > > > Unable to handle > > > > kernel NULL pointer dereference > > > > at > > > > virtual address 0358 > > > > > > > > Signed-off-by: Mukesh Ojha > > > > --- > > > > Changes in v3: > > > > - Fix kernel test reported error. > > > > > > > > Changes in v2: > > > > - Removed NULL pointer check instead added a new state to signify > > > >non-recoverable state of remoteproc. > > > > > > > > drivers/remoteproc/remoteproc_core.c | 3 ++- > > > > drivers/remoteproc/remoteproc_sysfs.c | 1 + > > > > include/linux/remoteproc.h| 5 - > > > > 3 files changed, 7 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/remoteproc/remoteproc_core.c > > > > b/drivers/remoteproc/remoteproc_core.c > > > > index f2
Re: [PATCH v4 11/11] iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support
On Fri, Oct 25, 2024 at 09:18:05AM +, Tian, Kevin wrote: > > From: Nicolin Chen > > Sent: Tuesday, October 22, 2024 8:20 AM > > > > Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type. > > Implement > > an arm_vsmmu_alloc() with its viommu op > > arm_vsmmu_domain_alloc_nested(), > > to replace arm_smmu_domain_alloc_nesting(). As an initial step, copy the > > VMID from s2_parent. A later cleanup series is required to move the VMID > > allocation out of the stage-2 domain allocation routine to this. > > > > After that, replace nested_domain->s2_parent with nested_domain->vsmmu. > > > > Note that the validatting conditions for a nested_domain allocation are > > moved from arm_vsmmu_domain_alloc_nested to arm_vsmmu_alloc, since > > there > > is no point in creating a vIOMMU (vsmmu) from the beginning if it would > > not support a nested_domain. > > > > Signed-off-by: Nicolin Chen > > hmm I wonder whether this series should be merged with Jason's > nesting series together and directly use vIOMMU to create nesting. > Otherwise it looks a bit weird for one series to first enable a uAPI > which is immediately replaced by another uAPI from the following > series. It has changed from my original expectation, that's for sure. I've wondered the same thing. For now I've been keeping them separate and was going to review when this is all settled down. It is troublesome because of all the branches, but if we don't have a conflict we could take the whole lot through iommufd. Jason
[PATCH v3] vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
This happens on 64-bit big-endian machines. SO_RCVLOWAT requires an int parameter. However, instead of int, the test uses unsigned long in one place and size_t in another. Both are 8 bytes long on 64-bit machines. The kernel, having received the 8 bytes, doesn't test for the exact size of the parameter, it only cares that it's >= sizeof(int), and casts the 4 lower-addressed bytes to an int, which, on a big-endian machine, contains 0. 0 doesn't trigger an error, SO_RCVLOWAT returns with success and the socket stays with the default SO_RCVLOWAT = 1, which results in vsock_test failures, while vsock_perf doesn't even notice that it's failed to change it. Fixes: b1346338fbae ("vsock_test: POLLIN + SO_RCVLOWAT test") Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic") Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility") Signed-off-by: Konstantin Shkolnyy --- Notes: The problem was found on s390 (big endian), while x86-64 didn't show it. After this fix, all tests pass on s390. Changes for v3: - fix the same problem in vsock_perf and update commit message Changes for v2: - add "Fixes:" lines to the commit message tools/testing/vsock/vsock_perf.c | 6 +++--- tools/testing/vsock/vsock_test.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/testing/vsock/vsock_perf.c b/tools/testing/vsock/vsock_perf.c index 4e8578f815e0..22633c2848cc 100644 --- a/tools/testing/vsock/vsock_perf.c +++ b/tools/testing/vsock/vsock_perf.c @@ -133,7 +133,7 @@ static float get_gbps(unsigned long bits, time_t ns_delta) ((float)ns_delta / NSEC_PER_SEC); } -static void run_receiver(unsigned long rcvlowat_bytes) +static void run_receiver(int rcvlowat_bytes) { unsigned int read_cnt; time_t rx_begin_ns; @@ -163,7 +163,7 @@ static void run_receiver(unsigned long rcvlowat_bytes) printf("Listen port %u\n", port); printf("RX buffer %lu bytes\n", buf_size_bytes); printf("vsock buffer %lu bytes\n", vsock_buf_bytes); - printf("SO_RCVLOWAT %lu bytes\n", rcvlowat_bytes); + printf("SO_RCVLOWAT %d bytes\n", rcvlowat_bytes); fd = socket(AF_VSOCK, SOCK_STREAM, 0); @@ -439,7 +439,7 @@ static long strtolx(const char *arg) int main(int argc, char **argv) { unsigned long to_send_bytes = DEFAULT_TO_SEND_BYTES; - unsigned long rcvlowat_bytes = DEFAULT_RCVLOWAT_BYTES; + int rcvlowat_bytes = DEFAULT_RCVLOWAT_BYTES; int peer_cid = -1; bool sender = false; diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index f851f8961247..30857dd4ca97 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -833,7 +833,7 @@ static void test_stream_poll_rcvlowat_server(const struct test_opts *opts) static void test_stream_poll_rcvlowat_client(const struct test_opts *opts) { - unsigned long lowat_val = RCVLOWAT_BUF_SIZE; + int lowat_val = RCVLOWAT_BUF_SIZE; char buf[RCVLOWAT_BUF_SIZE]; struct pollfd fds; short poll_flags; @@ -1282,7 +1282,7 @@ static void test_stream_rcvlowat_def_cred_upd_client(const struct test_opts *opt static void test_stream_credit_update_test(const struct test_opts *opts, bool low_rx_bytes_test) { - size_t recv_buf_size; + int recv_buf_size; struct pollfd fds; size_t buf_size; void *buf; -- 2.34.1
Re: [PATCH v4 00/11] iommufd: Add vIOMMU infrastructure (Part-1)
On Fri, Oct 25, 2024 at 08:34:05AM +, Tian, Kevin wrote: > > The vIOMMU object should be seen as a slice of a physical IOMMU instance > > that is passed to or shared with a VM. That can be some HW/SW resources: > > - Security namespace for guest owned ID, e.g. guest-controlled cache tags > > - Access to a sharable nesting parent pagetable across physical IOMMUs > > - Virtualization of various platforms IDs, e.g. RIDs and others > > - Delivery of paravirtualized invalidation > > - Direct assigned invalidation queues > > - Direct assigned interrupts > > - Non-affiliated event reporting > > sorry no idea about 'non-affiliated event'. Can you elaborate? This would be an even that is not a connected to a device For instance a CMDQ experienced a problem. Jason
Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
On Fri, Oct 25, 2024 at 01:50:12PM +0100, Pedro Falcato wrote: > On Fri, Oct 25, 2024 at 10:41 AM Lorenzo Stoakes > wrote: > > > > It is useful to be able to utilise the pidfd mechanism to reference the > > current thread or process (from a userland point of view - thread group > > leader from the kernel's point of view). > > > > Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and > > PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader. > > > > For convenience and to avoid confusion from userland's perspective we alias > > these: > > > > * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what > > the user will want to use, as they would find it surprising if for > > instance fd's were unshared()'d and they wanted to invoke pidfd_getfd() > > and that failed. > > > > * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users > > have no concept of thread groups or what a thread group leader is, and > > from userland's perspective and nomenclature this is what userland > > considers to be a process. > > > > Due to the refactoring of the central __pidfd_get_pid() function we can > > implement this functionality centrally, providing the use of this sentinel > > in most functionality which utilises pidfd's. > > > > We need to explicitly adjust kernel_waitid_prepare() to permit this (though > > it wouldn't really make sense to use this there, we provide the ability for > > consistency). > > > > We explicitly disallow use of this in setns(), which would otherwise have > > required explicit custom handling, as it doesn't make sense to set the > > current calling thread to join the namespace of itself. > > > > As the callers of pidfd_get_pid() expect an increased reference count on > > the pid we do so in the self case, reducing churn and avoiding any breakage > > from existing logic which decrements this reference count. > > > > This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS, > > ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and > > pidfd_getfd() system calls. > > > > Things such as polling a pidfs and general fd operations are not supported, > > this strictly provides the sentinel for APIs which explicitly accept a > > pidfd. > > > > Reviewed-by: Shakeel Butt > > Signed-off-by: Lorenzo Stoakes > > --- > > include/linux/pid.h| 8 -- > > include/uapi/linux/pidfd.h | 15 +++ > > kernel/exit.c | 3 ++- > > kernel/nsproxy.c | 1 + > > kernel/pid.c | 51 -- > > 5 files changed, 57 insertions(+), 21 deletions(-) > > > > diff --git a/include/linux/pid.h b/include/linux/pid.h > > index d466890e1b35..3b2ac7567a88 100644 > > --- a/include/linux/pid.h > > +++ b/include/linux/pid.h > > @@ -78,11 +78,15 @@ struct file; > > * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd. > > * > > * @pidfd: The pidfd whose pid we want, or the fd of a /proc/ > > file if > > - * @alloc_proc is also set. > > + * @alloc_proc is also set, or PIDFD_SELF_* to refer to the > > current > > + * thread or thread group leader. > > * @allow_proc: If set, then an fd of a /proc/ file can be passed > > instead > > * of a pidfd, and this will be used to determine the pid. > > + > > * @flags: Output variable, if non-NULL, then the file->f_flags of the > > - * pidfd will be set here. > > + * pidfd will be set here or If PIDFD_SELF_THREAD is set, > > this is > > + * set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP > > then > > + * this is set to zero. > > * > > * Returns: If successful, the pid associated with the pidfd, otherwise an > > * error. > > diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h > > index 565fc0629fff..0ca2ebf906fd 100644 > > --- a/include/uapi/linux/pidfd.h > > +++ b/include/uapi/linux/pidfd.h > > @@ -29,4 +29,19 @@ > > #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9) > > #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10) > > > > +/* > > + * Special sentinel values which can be used to refer to the current > > thread or > > + * thread group leader (which from a userland perspective is the process). > > + */ > > +#define PIDFD_SELF PIDFD_SELF_THREAD > > +#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP > > + > > +#define PIDFD_SELF_THREAD -100 /* Current thread. */ > > This conflicts with AT_FDCWD, might be worth changing? > > > +#define PIDFD_SELF_THREAD_GROUP-200 /* Current thread group > > leader. */ > > We might want to pick some range outside of the negative errno space > (-4096 IIRC), since we have plenty of values to pick from (2^31 at > least). This is entirely up to Christian, I used the values he suggested in review. But I agree w
Re: [PATCH v3] remoteproc: Add a new remoteproc state RPROC_DEFUNCT
On Fri, Oct 25, 2024 at 01:40:45PM +0530, Mukesh Ojha wrote: > On Mon, Oct 21, 2024 at 09:12:47AM -0600, Mathieu Poirier wrote: > > Hi Mukesh, > > > > On Wed, Oct 16, 2024 at 10:25:46AM +0530, Mukesh Ojha wrote: > > > Multiple call to glink_subdev_stop() for the same remoteproc can happen > > > if rproc_stop() fails from Process-A that leaves the rproc state to > > > RPROC_CRASHED state later a call to recovery_store from user space in > > > Process B triggers rproc_trigger_recovery() of the same remoteproc to > > > recover it results in NULL pointer dereference issue in > > > qcom_glink_smem_unregister(). > > > > > > There is other side to this issue if we want to fix this via adding a > > > NULL check on glink->edge which does not guarantees that the remoteproc > > > will recover in second call from Process B as it has failed in the first > > > Process A during SMC shutdown call and may again fail at the same call > > > and rproc can not recover for such case. > > > > > > Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of > > > remoteproc and the only way to recover from it via system restart. > > > > > > Process-A Process-B > > > > > > fatal error interrupt happens > > > > > > rproc_crash_handler_work() > > > mutex_lock_interruptible(&rproc->lock); > > > ... > > > > > >rproc->state = RPROC_CRASHED; > > > ... > > > mutex_unlock(&rproc->lock); > > > > > > rproc_trigger_recovery() > > > mutex_lock_interruptible(&rproc->lock); > > > > > > adsp_stop() > > > qcom_q6v5_pas 20c0.remoteproc: failed to shutdown: -22 > > > remoteproc remoteproc3: can't stop rproc: -22 > > > mutex_unlock(&rproc->lock); > > > > Ok, that can happen. > > > > > > > > echo enabled > > > > /sys/class/remoteproc/remoteprocX/recovery > > > recovery_store() > > >rproc_trigger_recovery() > > > > > > mutex_lock_interruptible(&rproc->lock); > > > rproc_stop() > > > glink_subdev_stop() > > > > > > qcom_glink_smem_unregister() ==| > > > > > >| > > > > > >V > > > > I am missing some information here but I will _assume_ this is caused by > > glink->edge being set to NULL [1] when glink_subdev_stop() is first called > > by > > process A. Instead of adding a new state to the core I think a better idea > > would be to add a check for a NULL value on @smem in > > qcom_glink_smem_unregister(). This is a problem that should be fixed in the > > driver rather than the core. > > > > [1]. > > https://elixir.bootlin.com/linux/v6.12-rc4/source/drivers/remoteproc/qcom_common.c#L213 > > > I did the same here [1] but after discussion with Bjorn, realized that > remoteproc might not even recover and may fail in the second attempt as > well and only way is reboot of the machine. Whether in RPROC_CRASHED or RPROC_DEFUNCT state, the end result is the same - manual intervention is needed. I don't see why another state needs to be added. > > [1] > https://lore.kernel.org/lkml/20240925103351.1628788-1-quic_mo...@quicinc.com/ > > > > > > Unable to handle kernel > > > NULL pointer dereference > > > at > > > virtual address 0358 > > > > > > Signed-off-by: Mukesh Ojha > > > --- > > > Changes in v3: > > > - Fix kernel test reported error. > > > > > > Changes in v2: > > > - Removed NULL pointer check instead added a new state to signify > > >non-recoverable state of remoteproc. > > > > > > drivers/remoteproc/remoteproc_core.c | 3 ++- > > > drivers/remoteproc/remoteproc_sysfs.c | 1 + > > > include/linux/remoteproc.h| 5 - > > > 3 files changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/remoteproc/remoteproc_core.c > > > b/drivers/remoteproc/remoteproc_core.c > > > index f276956f2c5c..c4e14503b971 100644 > > > --- a/drivers/remoteproc/remoteproc_core.c > > > +++ b/drivers/remoteproc/remoteproc_core.c > > > @@ -1727,6 +1727,7 @@ static int rproc_stop(struct rproc *rproc, bool > > > crashed) > > > /* power off the remote processor */ > > > ret = rproc->ops->stop(rproc); > > > if (ret) { > > > + rproc->state = RPROC_DEFUNCT; > > > dev_err(dev, "can't stop rproc: %d\n", ret); > > > return ret; > > > } > > > @@ -1839,7 +1840,7 @@ int rproc_trigger_recovery(struct rproc *rproc) > > > return ret; > > > > > > /* State could have change
Re: [PATCH v7 1/3] modules: Support extended MODVERSIONS info
> Sorry I realise it's version 7, but although the above looks correct it's > kind of dense. > > I think the below would also work and is (I think) easier to follow, and > is more obviously similar to the existing code. I'm sure your version is > faster, but I don't think it's that performance critical. > > static void dedotify_ext_version_names(char *str_seq, unsigned long size) > { > char *end = str_seq + size; > char *p = str_seq; > > while (p < end) { > if (*p == '.') > memmove(p, p + 1, end - p - 1); > > p += strlen(p) + 1; > } > } > > The tail of str_seq will be filled with nulls as long as the last string > was null terminated. > > cheers As you alluded to, what you're providing is potentially O(n^2) in the number of symbols a module depends on - the existing code is O(n). If leading dots on names are rare, this is probably fine. If they're common, this will potentially make loading modules with a large number of imported symbols actually take a measurable amount of additional time. That said, I take your point about complexity, and trust you to know your arch's inputs/requirements, so if I don't hear back again I will incorporate that into the next revision of the patch (to be produced after the gendwarfksyms update comes out).
Re: [PATCH v4 01/14] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct
On Fri, Oct 25, 2024 at 06:53:01PM +1100, Alexey Kardashevskiy wrote: > > +#define iommufd_vdevice_alloc(ictx, drv_struct, member) > > \ > > + ({ \ > > + static_assert( \ > > + __same_type(struct iommufd_vdevice,\ > > + ((struct drv_struct *)NULL)->member)); \ > > + static_assert(offsetof(struct drv_struct, member.obj) == 0); \ > > + container_of(_iommufd_object_alloc(ictx, \ > > + sizeof(struct drv_struct), \ > > + IOMMUFD_OBJ_VDEVICE), \ > > +struct drv_struct, member.obj); \ > > + }) > > #endif > > A nit: it hurts eyes to read: > > mock_vdev = iommufd_vdevice_alloc(viommu->ictx, mock_vdevice, core); > > vs. > > mock_vdev = iommufd_vdevice_alloc(viommu->ictx, struct mock_vdevice, core); > > as for the former I go searching for a "mock_vdevice" variable and for the > latter it is clear it is 1) a macro 2) which does some type checking. > > also, it makes it impossible to pass things like typeof(..) or a type from > typedef. Thanks, Makes sense to me And the container_of() should not be used in these macros, the point was to avoid it to make the PTR_ERR behavior cleraer. Just put a force type cast Jason
Re: [PATCH net-next 1/2] net: netconsole: selftests: Change the IP subnet
Breno Leitao writes: > Use a less populated IP range to run the tests, as suggested by Petr in > Link: https://lore.kernel.org/netdev/87ikvukv3s@nvidia.com/. > > Suggested-by: Petr Machata > Signed-off-by: Breno Leitao > --- > tools/testing/selftests/drivers/net/netcons_basic.sh | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh > b/tools/testing/selftests/drivers/net/netcons_basic.sh > index 06021b2059b7..4ad1e216c6b0 100755 > --- a/tools/testing/selftests/drivers/net/netcons_basic.sh > +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh > @@ -20,9 +20,9 @@ SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")") > > # Simple script to test dynamic targets in netconsole > SRCIF="" # to be populated later > -SRCIP=192.168.1.1 > +SRCIP=192.168.2.1 I mentioned 192.0.2.0/24, which we commonly use in selftests. The range is meant for examples and documentation, which is not exactly selftests, but feels like it's not bending the rules too far. And we shouldn't see the range in the wild. > DSTIF="" # to be populated later > -DSTIP=192.168.1.2 > +DSTIP=192.168.2.2 > > PORT="" > MSG="netconsole selftest"
[PATCH v5 09/13] iommufd/selftest: Add refcount to mock_iommu_device
For an iommu_dev that can unplug (so far only this selftest does so), the viommu->iommu_dev pointer has no guarantee of its life cycle after it is copied from the idev->dev->iommu->iommu_dev. Track the user count of the iommu_dev. Postpone the exit routine using a completion, if refcount is unbalanced. The refcount inc/dec will be added in the following patch. Signed-off-by: Nicolin Chen --- drivers/iommu/iommufd/selftest.c | 32 1 file changed, 24 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 92d753985640..2d33b35da704 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -533,14 +533,17 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap) static struct iopf_queue *mock_iommu_iopf_queue; -static struct iommu_device mock_iommu_device = { -}; +static struct mock_iommu_device { + struct iommu_device iommu_dev; + struct completion complete; + refcount_t users; +} mock_iommu; static struct iommu_device *mock_probe_device(struct device *dev) { if (dev->bus != &iommufd_mock_bus_type.bus) return ERR_PTR(-ENODEV); - return &mock_iommu_device; + return &mock_iommu.iommu_dev; } static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt, @@ -1556,24 +1559,27 @@ int __init iommufd_test_init(void) if (rc) goto err_platform; - rc = iommu_device_sysfs_add(&mock_iommu_device, + rc = iommu_device_sysfs_add(&mock_iommu.iommu_dev, &selftest_iommu_dev->dev, NULL, "%s", dev_name(&selftest_iommu_dev->dev)); if (rc) goto err_bus; - rc = iommu_device_register_bus(&mock_iommu_device, &mock_ops, + rc = iommu_device_register_bus(&mock_iommu.iommu_dev, &mock_ops, &iommufd_mock_bus_type.bus, &iommufd_mock_bus_type.nb); if (rc) goto err_sysfs; + refcount_set(&mock_iommu.users, 1); + init_completion(&mock_iommu.complete); + mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq"); return 0; err_sysfs: - iommu_device_sysfs_remove(&mock_iommu_device); + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); err_bus: bus_unregister(&iommufd_mock_bus_type.bus); err_platform: @@ -1583,6 +1589,15 @@ int __init iommufd_test_init(void) return rc; } +static void iommufd_test_wait_for_users(void) +{ + if (refcount_dec_and_test(&mock_iommu.users)) + return; + /* Time out waiting for iommu device user count to become 0 */ + WARN_ON(!wait_for_completion_timeout(&mock_iommu.complete, +msecs_to_jiffies(1))); +} + void iommufd_test_exit(void) { if (mock_iommu_iopf_queue) { @@ -1590,8 +1605,9 @@ void iommufd_test_exit(void) mock_iommu_iopf_queue = NULL; } - iommu_device_sysfs_remove(&mock_iommu_device); - iommu_device_unregister_bus(&mock_iommu_device, + iommufd_test_wait_for_users(); + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); + iommu_device_unregister_bus(&mock_iommu.iommu_dev, &iommufd_mock_bus_type.bus, &iommufd_mock_bus_type.nb); bus_unregister(&iommufd_mock_bus_type.bus); -- 2.43.0
[PATCH v5 08/13] iommufd/selftest: Add mock_viommu_cache_invalidate
Similar to the coverage of cache_invalidate_user for iotlb invalidation, add a device cache and a viommu_cache_invalidate function to test it out. Signed-off-by: Nicolin Chen --- drivers/iommu/iommufd/iommufd_test.h | 25 + drivers/iommu/iommufd/selftest.c | 76 +++- 2 files changed, 100 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index edced4ac7cd3..46558f83e734 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -54,6 +54,11 @@ enum { MOCK_NESTED_DOMAIN_IOTLB_NUM = 4, }; +enum { + MOCK_DEV_CACHE_ID_MAX = 3, + MOCK_DEV_CACHE_NUM = 4, +}; + struct iommu_test_cmd { __u32 size; __u32 op; @@ -152,6 +157,7 @@ struct iommu_test_hw_info { /* Should not be equal to any defined value in enum iommu_hwpt_data_type */ #define IOMMU_HWPT_DATA_SELFTEST 0xdead #define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef +#define IOMMU_TEST_DEV_CACHE_DEFAULT 0xbaddad /** * struct iommu_hwpt_selftest @@ -182,4 +188,23 @@ struct iommu_hwpt_invalidate_selftest { #define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef +/* Should not be equal to any defined value in enum iommu_viommu_invalidate_data_type */ +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST 0xdeadbeef +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef + +/** + * struct iommu_viommu_invalidate_selftest - Invalidation data for Mock VIOMMU + * (IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) + * @flags: Invalidate flags + * @cache_id: Invalidate cache entry index + * + * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @cache_id will be ignored + */ +struct iommu_viommu_invalidate_selftest { +#define IOMMU_TEST_INVALIDATE_FLAG_ALL (1 << 0) + __u32 flags; + __u32 vdev_id; + __u32 cache_id; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 33a0fcc0eff7..01556854f2f2 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -163,6 +163,7 @@ struct mock_dev { struct device dev; unsigned long flags; int id; + u32 cache[MOCK_DEV_CACHE_NUM]; }; static inline struct mock_dev *to_mock_dev(struct device *dev) @@ -606,9 +607,80 @@ mock_viommu_alloc_domain_nested(struct iommufd_viommu *viommu, return &mock_nested->domain; } +static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) +{ + struct iommu_viommu_invalidate_selftest *cmds; + struct iommu_viommu_invalidate_selftest *cur; + struct iommu_viommu_invalidate_selftest *end; + int rc; + + /* A zero-length array is allowed to validate the array type */ + if (array->entry_num == 0 && + array->type == IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) { + array->entry_num = 0; + return 0; + } + + cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL); + if (!cmds) + return -ENOMEM; + cur = cmds; + end = cmds + array->entry_num; + + static_assert(sizeof(*cmds) == 3 * sizeof(u32)); + rc = iommu_copy_struct_from_full_user_array( + cmds, sizeof(*cmds), array, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST); + if (rc) + goto out; + + while (cur != end) { + struct mock_dev *mdev; + struct device *dev; + int i; + + if (cur->flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) { + rc = -EOPNOTSUPP; + goto out; + } + + if (cur->cache_id > MOCK_DEV_CACHE_ID_MAX) { + rc = -EINVAL; + goto out; + } + + xa_lock(&viommu->vdevs); + dev = iommufd_viommu_find_dev(viommu, + (unsigned long)cur->vdev_id); + if (!dev) { + xa_unlock(&viommu->vdevs); + rc = -EINVAL; + goto out; + } + mdev = container_of(dev, struct mock_dev, dev); + + if (cur->flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) { + /* Invalidate all cache entries and ignore cache_id */ + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) + mdev->cache[i] = 0; + } else { + mdev->cache[cur->cache_id] = 0; + } + xa_unlock(&viommu->vdevs); + + cur++; + } +out: + array->entry_num = cur - cmds; + kfree(cmds); + return rc; +} + static struct iommufd_viommu_ops mock_viommu_ops = { .free = mock_viommu_free, .alloc_domain_nested = mock_viommu_all
Re: [PATCH v4 4/4] selftests: pidfd: add tests for PIDFD_SELF_*
flock64 { > |^~~ > /usr/include/x86_64-linux-gnu/bits/fcntl.h:50:8: note: originally defined here >50 | struct flock64 > |^~~ > make: *** [../lib.mk:221: > /usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup/test_kill] > Error 1 > make: *** Waiting for unfinished jobs > make: Leaving directory > '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup' > 2024-10-23 12:53:56 make quicktest=1 run_tests -C cgroup > make: Entering directory > '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup' > CC test_kill > In file included from /usr/x86_64-linux-gnu/include/asm/fcntl.h:1, > from /usr/x86_64-linux-gnu/include/linux/fcntl.h:5, > from /usr/x86_64-linux-gnu/include/linux/pidfd.h:7, > from ../pidfd/pidfd.h:19, > from test_kill.c:13: > /usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:156:8: error: redefinition > of ‘struct f_owner_ex’ > 156 | struct f_owner_ex { > |^~ > In file included from /usr/include/x86_64-linux-gnu/bits/fcntl.h:61, > from /usr/include/fcntl.h:35, > from ../pidfd/pidfd.h:8: > /usr/include/x86_64-linux-gnu/bits/fcntl-linux.h:274:8: note: originally > defined here > 274 | struct f_owner_ex > |^~ > /usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:196:8: error: redefinition > of ‘struct flock’ > 196 | struct flock { > |^ > /usr/include/x86_64-linux-gnu/bits/fcntl.h:35:8: note: originally defined here >35 | struct flock > |^ > /usr/x86_64-linux-gnu/include/asm-generic/fcntl.h:210:8: error: redefinition > of ‘struct flock64’ > 210 | struct flock64 { > |^~~ > /usr/include/x86_64-linux-gnu/bits/fcntl.h:50:8: note: originally defined here >50 | struct flock64 > |^~~ > make: *** [../lib.mk:222: > /usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup/test_kill] > Error 1 > make: Leaving directory > '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-930cb1423ee2522760ffde43455b14df5c0d5487/tools/testing/selftests/cgroup' > > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20241025/202410251504.707d78fc-oliver.s...@intel.com > > > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki >
Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
On Fri, Oct 25, 2024 at 10:41 AM Lorenzo Stoakes wrote: > > It is useful to be able to utilise the pidfd mechanism to reference the > current thread or process (from a userland point of view - thread group > leader from the kernel's point of view). > > Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and > PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader. > > For convenience and to avoid confusion from userland's perspective we alias > these: > > * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what > the user will want to use, as they would find it surprising if for > instance fd's were unshared()'d and they wanted to invoke pidfd_getfd() > and that failed. > > * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users > have no concept of thread groups or what a thread group leader is, and > from userland's perspective and nomenclature this is what userland > considers to be a process. > > Due to the refactoring of the central __pidfd_get_pid() function we can > implement this functionality centrally, providing the use of this sentinel > in most functionality which utilises pidfd's. > > We need to explicitly adjust kernel_waitid_prepare() to permit this (though > it wouldn't really make sense to use this there, we provide the ability for > consistency). > > We explicitly disallow use of this in setns(), which would otherwise have > required explicit custom handling, as it doesn't make sense to set the > current calling thread to join the namespace of itself. > > As the callers of pidfd_get_pid() expect an increased reference count on > the pid we do so in the self case, reducing churn and avoiding any breakage > from existing logic which decrements this reference count. > > This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS, > ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and > pidfd_getfd() system calls. > > Things such as polling a pidfs and general fd operations are not supported, > this strictly provides the sentinel for APIs which explicitly accept a > pidfd. > > Reviewed-by: Shakeel Butt > Signed-off-by: Lorenzo Stoakes > --- > include/linux/pid.h| 8 -- > include/uapi/linux/pidfd.h | 15 +++ > kernel/exit.c | 3 ++- > kernel/nsproxy.c | 1 + > kernel/pid.c | 51 -- > 5 files changed, 57 insertions(+), 21 deletions(-) > > diff --git a/include/linux/pid.h b/include/linux/pid.h > index d466890e1b35..3b2ac7567a88 100644 > --- a/include/linux/pid.h > +++ b/include/linux/pid.h > @@ -78,11 +78,15 @@ struct file; > * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd. > * > * @pidfd: The pidfd whose pid we want, or the fd of a /proc/ file > if > - * @alloc_proc is also set. > + * @alloc_proc is also set, or PIDFD_SELF_* to refer to the > current > + * thread or thread group leader. > * @allow_proc: If set, then an fd of a /proc/ file can be passed > instead > * of a pidfd, and this will be used to determine the pid. > + > * @flags: Output variable, if non-NULL, then the file->f_flags of the > - * pidfd will be set here. > + * pidfd will be set here or If PIDFD_SELF_THREAD is set, this > is > + * set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP > then > + * this is set to zero. > * > * Returns: If successful, the pid associated with the pidfd, otherwise an > * error. > diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h > index 565fc0629fff..0ca2ebf906fd 100644 > --- a/include/uapi/linux/pidfd.h > +++ b/include/uapi/linux/pidfd.h > @@ -29,4 +29,19 @@ > #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9) > #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10) > > +/* > + * Special sentinel values which can be used to refer to the current thread > or > + * thread group leader (which from a userland perspective is the process). > + */ > +#define PIDFD_SELF PIDFD_SELF_THREAD > +#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP > + > +#define PIDFD_SELF_THREAD -100 /* Current thread. */ This conflicts with AT_FDCWD, might be worth changing? > +#define PIDFD_SELF_THREAD_GROUP-200 /* Current thread group leader. > */ We might want to pick some range outside of the negative errno space (-4096 IIRC), since we have plenty of values to pick from (2^31 at least). > +static inline int pidfd_is_self_sentinel(pid_t pid) > +{ > + return pid == PIDFD_SELF_THREAD || pid == PIDFD_SELF_THREAD_GROUP; > +} Do we want this in the uapi header? Even if this is useful, it might come with several drawbacks such as breaking scripts that parse kernel headers (and a quick git grep suggests we do have static inlines in headers, but in ra
Re: [PATCH V4 00/15] selftests/resctrl: Support diverse platforms with MBM and MBA tests
On Thu, 24 Oct 2024, Reinette Chatre wrote: > Hi Shuah, > > On 10/24/24 3:36 PM, Shuah Khan wrote: > > > > Is this patch series ready to be applied? > > > > I believe it is close ... I would like to give Ilpo some time to peek > at patches 2 and 10 to confirm if I got their fixes right this time. The > rest of the series is ready. Hi, I took a look at those two patches now and they seemed fine to me so this series should be ready to go now. -- i.
Re: [PATCH v7 1/3] modules: Support extended MODVERSIONS info
Matthew Maurer writes: > Adds a new format for MODVERSIONS which stores each field in a separate > ELF section. This initially adds support for variable length names, but > could later be used to add additional fields to MODVERSIONS in a > backwards compatible way if needed. Any new fields will be ignored by > old user tooling, unlike the current format where user tooling cannot > tolerate adjustments to the format (for example making the name field > longer). > > Since PPC munges its version records to strip leading dots, we reproduce > the munging for the new format. Other architectures do not appear to > have architecture-specific usage of this information. > > Signed-off-by: Matthew Maurer > --- > arch/powerpc/kernel/module_64.c | 24 ++- > kernel/module/internal.h| 11 + > kernel/module/main.c| 92 > + > kernel/module/version.c | 45 > 4 files changed, 162 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c > index > e9bab599d0c2745e4d2b5cae04f2c56395c24654..02ada0b057cef6b2f29fa7519a5d52acac740ee5 > 100644 > --- a/arch/powerpc/kernel/module_64.c > +++ b/arch/powerpc/kernel/module_64.c > @@ -355,6 +355,24 @@ static void dedotify_versions(struct modversion_info > *vers, > } > } > > +/* Same as normal versions, remove a leading dot if present. */ > +static void dedotify_ext_version_names(char *str_seq, unsigned long size) > +{ > + unsigned long out = 0; > + unsigned long in; > + char last = '\0'; > + > + for (in = 0; in < size; in++) { > + /* Skip one leading dot */ > + if (last == '\0' && str_seq[in] == '.') > + in++; > + last = str_seq[in]; > + str_seq[out++] = last; > + } > + /* Zero the trailing portion of the names table for robustness */ > + memset(&str_seq[out], 0, size - out); > +} Sorry I realise it's version 7, but although the above looks correct it's kind of dense. I think the below would also work and is (I think) easier to follow, and is more obviously similar to the existing code. I'm sure your version is faster, but I don't think it's that performance critical. static void dedotify_ext_version_names(char *str_seq, unsigned long size) { char *end = str_seq + size; char *p = str_seq; while (p < end) { if (*p == '.') memmove(p, p + 1, end - p - 1); p += strlen(p) + 1; } } The tail of str_seq will be filled with nulls as long as the last string was null terminated. cheers
[PATCH v5 02/13] iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage
Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage of IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc TEST_F to cover the IOMMU_VDEVICE_ALLOC ioctl. Signed-off-by: Nicolin Chen --- tools/testing/selftests/iommu/iommufd_utils.h | 27 +++ tools/testing/selftests/iommu/iommufd.c | 20 ++ .../selftests/iommu/iommufd_fail_nth.c| 4 +++ 3 files changed, 51 insertions(+) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index ca09308dad6a..5b17d7b2ac5c 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -790,3 +790,30 @@ static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, EXPECT_ERRNO(_errno, \ _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ type, 0, viommu_id)) + +static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id, + __u64 virt_id, __u32 *vdev_id) +{ + struct iommu_vdevice_alloc cmd = { + .size = sizeof(cmd), + .dev_id = idev_id, + .viommu_id = viommu_id, + .virt_id = virt_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &cmd); + if (ret) + return ret; + if (vdev_id) + *vdev_id = cmd.out_vdevice_id; + return 0; +} + +#define test_cmd_vdevice_alloc(viommu_id, idev_id, virt_id, vdev_id) \ + ASSERT_EQ(0, _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ +virt_id, vdev_id)) +#define test_err_vdevice_alloc(_errno, viommu_id, idev_id, virt_id, vdev_id) \ + EXPECT_ERRNO(_errno, \ +_test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ +virt_id, vdev_id)) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index b48b22d33ad4..93255403dee4 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -129,6 +129,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id); + TEST_LENGTH(iommu_vdevice_alloc, IOMMU_VDEVICE_ALLOC, __reserved2); #undef TEST_LENGTH } @@ -2473,4 +2474,23 @@ TEST_F(iommufd_viommu, viommu_auto_destroy) { } +TEST_F(iommufd_viommu, vdevice_alloc) +{ + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t vdev_id = 0; + + if (dev_id) { + /* Set vdev_id to 0x99, unset it, and set to 0x88 */ + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + test_err_vdevice_alloc(EEXIST, viommu_id, dev_id, 0x99, + &vdev_id); + test_ioctl_destroy(vdev_id); + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x88, &vdev_id); + test_ioctl_destroy(vdev_id); + } else { + test_err_vdevice_alloc(ENOENT, viommu_id, dev_id, 0x99, NULL); + } +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index e9a980b7729b..28f11b26f836 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -583,6 +583,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) uint32_t idev_id; uint32_t hwpt_id; uint32_t viommu_id; + uint32_t vdev_id; __u64 iova; self->fd = open("/dev/iommu", O_RDWR); @@ -635,6 +636,9 @@ TEST_FAIL_NTH(basic_fail_nth, device) IOMMU_VIOMMU_TYPE_SELFTEST, 0, &viommu_id)) return -1; + if (_test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, 0, &vdev_id)) + return -1; + return 0; } -- 2.43.0
Re: [PATCH net-next 1/2] net: netconsole: selftests: Change the IP subnet
On Fri, Oct 25, 2024 at 07:01:59PM +0200, Petr Machata wrote: > > Breno Leitao writes: > > > Use a less populated IP range to run the tests, as suggested by Petr in > > Link: https://lore.kernel.org/netdev/87ikvukv3s@nvidia.com/. > > > > Suggested-by: Petr Machata > > Signed-off-by: Breno Leitao > > --- > > tools/testing/selftests/drivers/net/netcons_basic.sh | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh > > b/tools/testing/selftests/drivers/net/netcons_basic.sh > > index 06021b2059b7..4ad1e216c6b0 100755 > > --- a/tools/testing/selftests/drivers/net/netcons_basic.sh > > +++ b/tools/testing/selftests/drivers/net/netcons_basic.sh > > @@ -20,9 +20,9 @@ SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")") > > > > # Simple script to test dynamic targets in netconsole > > SRCIF="" # to be populated later > > -SRCIP=192.168.1.1 > > +SRCIP=192.168.2.1 > > I mentioned 192.0.2.0/24, which we commonly use in selftests. The range > is meant for examples and documentation, which is not exactly selftests, > but feels like it's not bending the rules too far. And we shouldn't see > the range in the wild. True, my mistake. I will update it to 192.0.2.1 and 192.0.2.2.
[PATCH RFC v2 0/5] Verify bias functionality for pinctrl_paris driver through new gpio test
This series was motivated by the regression fixed by 166bf8af9122 ("pinctrl: mediatek: common-v2: Fix broken bias-disable for PULL_PU_PD_RSEL_TYPE"). A bug was introduced in the pinctrl_paris driver which prevented certain pins from having their bias configured. Running this test on the mt8195-tomato platform with the test plan included below[1] shows the test passing with the fix applied, but failing without the fix: With fix: $ ./gpio-setget-config.py TAP version 13 # Using test plan file: ./google,tomato.yaml 1..3 ok 1 pinctrl_paris.34.pull-up ok 2 pinctrl_paris.34.pull-down ok 3 pinctrl_paris.34.disabled # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0 Without fix: $ ./gpio-setget-config.py TAP version 13 # Using test plan file: ./google,tomato.yaml 1..3 # Bias doesn't match: Expected pull-up, read pull-down. not ok 1 pinctrl_paris.34.pull-up ok 2 pinctrl_paris.34.pull-down # Bias doesn't match: Expected disabled, read pull-down. not ok 3 pinctrl_paris.34.disabled # Totals: pass:1 fail:2 xfail:0 xpass:0 skip:0 error:0 In order to achieve this, the first three patches expose bias configuration through the GPIO API in the MediaTek pinctrl drivers, notably, pinctrl_paris, patch 4 extends the gpio-mockup-cdev utility for use by patch 5, and patch 5 introduces a new GPIO kselftest that takes a test plan in YAML, which can be tailored per-platform to specify the configurations to test, and sets and gets back each pin configuration to verify that they match and thus that the driver is behaving as expected. Since the GPIO uAPI only allows setting the pin configuration, getting it back is done through pinconf-pins in the pinctrl debugfs folder. The test currently only verifies bias but it would be easy to extend to verify other pin configurations. The test plan YAML file can be customized for each use-case and is platform-dependant. For that reason, only an example is included in patch 3 and the user is supposed to provide their test plan. That said, the aim is to collect test plans for ease of use at [2]. [1] This is the test plan used for mt8195-tomato: - label: "pinctrl_paris" tests: # Pin 34 has type MTK_PULL_PU_PD_RSEL_TYPE and is unused. # Setting bias to MTK_PULL_PU_PD_RSEL_TYPE pins was fixed by # 166bf8af9122 ("pinctrl: mediatek: common-v2: Fix broken bias-disable for PULL_PU_PD_RSEL_TYPE") - pin: 34 bias: "pull-up" - pin: 34 bias: "pull-down" - pin: 34 bias: "disabled" [2] https://github.com/kernelci/platform-test-parameters Signed-off-by: Nícolas F. R. A. Prado --- Changes in v2: - Added patches 2 and 3 enabling the extra GPIO pin configurations on the other mediatek drivers: pinctrl-moore and pinctrl-mtk-common - Tweaked function name in patch 1: mtk_pinconf_set -> mtk_paris_pin_config_set, to make it clear it is not a pinconf_ops - Adjusted commit message to make it clear the current support is limited to pins supported by the EINT controller - Link to v1: https://lore.kernel.org/r/20240909-kselftest-gpio-set-get-config-v1-0-16a065afc...@collabora.com --- Nícolas F. R. A. Prado (5): pinctrl: mediatek: paris: Expose more configurations to GPIO set_config pinctrl: mediatek: moore: Expose more configurations to GPIO set_config pinctrl: mediatek: common: Expose more configurations to GPIO set_config selftest: gpio: Add wait flag to gpio-mockup-cdev selftest: gpio: Add a new set-get config test drivers/pinctrl/mediatek/pinctrl-moore.c | 283 +++-- drivers/pinctrl/mediatek/pinctrl-mtk-common.c | 48 ++-- drivers/pinctrl/mediatek/pinctrl-paris.c | 26 +- tools/testing/selftests/gpio/Makefile | 2 +- tools/testing/selftests/gpio/gpio-mockup-cdev.c| 14 +- .../gpio-set-get-config-example-test-plan.yaml | 15 ++ .../testing/selftests/gpio/gpio-set-get-config.py | 183 + 7 files changed, 395 insertions(+), 176 deletions(-) --- base-commit: a39230ecf6b3057f5897bc4744a790070cfbe7a8 change-id: 20240906-kselftest-gpio-set-get-config-6e5bb670c1a5 Best regards, -- Nícolas F. R. A. Prado
[PATCH RFC v2 1/5] pinctrl: mediatek: paris: Expose more configurations to GPIO set_config
Currently the set_config callback in the gpio_chip registered by the pinctrl_paris driver only supports configuring a single parameter on specific pins (the input debounce of the EINT controller, on pins that support it), even though many other configurations are already implemented and available through the pinctrl API for configuration of pins by the Devicetree and other drivers. Expose all configurations currently implemented through the GPIO API so they can also be set from userspace, which is particularly useful to allow testing them from userspace. Signed-off-by: Nícolas F. R. A. Prado --- drivers/pinctrl/mediatek/pinctrl-paris.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/pinctrl/mediatek/pinctrl-paris.c b/drivers/pinctrl/mediatek/pinctrl-paris.c index 87e958d827bf939aa6006794287698be4936f25e..c9455de266a447ab7f5446c1511bef0ef9c9128e 100644 --- a/drivers/pinctrl/mediatek/pinctrl-paris.c +++ b/drivers/pinctrl/mediatek/pinctrl-paris.c @@ -255,10 +255,9 @@ static int mtk_pinconf_get(struct pinctrl_dev *pctldev, return err; } -static int mtk_pinconf_set(struct pinctrl_dev *pctldev, unsigned int pin, - enum pin_config_param param, u32 arg) +static int mtk_paris_pin_config_set(struct mtk_pinctrl *hw, unsigned int pin, + enum pin_config_param param, u32 arg) { - struct mtk_pinctrl *hw = pinctrl_dev_get_drvdata(pctldev); const struct mtk_pin_desc *desc; int err = -ENOTSUPP; u32 reg; @@ -795,9 +794,9 @@ static int mtk_pconf_group_set(struct pinctrl_dev *pctldev, unsigned group, int i, ret; for (i = 0; i < num_configs; i++) { - ret = mtk_pinconf_set(pctldev, grp->pin, - pinconf_to_config_param(configs[i]), - pinconf_to_config_argument(configs[i])); + ret = mtk_paris_pin_config_set(hw, grp->pin, + pinconf_to_config_param(configs[i]), + pinconf_to_config_argument(configs[i])); if (ret < 0) return ret; @@ -937,18 +936,19 @@ static int mtk_gpio_set_config(struct gpio_chip *chip, unsigned int offset, { struct mtk_pinctrl *hw = gpiochip_get_data(chip); const struct mtk_pin_desc *desc; - u32 debounce; + enum pin_config_param param = pinconf_to_config_param(config); + u32 arg = pinconf_to_config_argument(config); desc = (const struct mtk_pin_desc *)&hw->soc->pins[offset]; - if (!hw->eint || - pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE || - desc->eint.eint_n == EINT_NA) - return -ENOTSUPP; + if (param == PIN_CONFIG_INPUT_DEBOUNCE) { + if (!hw->eint || desc->eint.eint_n == EINT_NA) + return -ENOTSUPP; - debounce = pinconf_to_config_argument(config); + return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, arg); + } - return mtk_eint_set_debounce(hw->eint, desc->eint.eint_n, debounce); + return mtk_paris_pin_config_set(hw, offset, param, arg); } static int mtk_build_gpiochip(struct mtk_pinctrl *hw) -- 2.47.0
[PATCH RFC v2 4/5] selftest: gpio: Add wait flag to gpio-mockup-cdev
Add a -w flag to the gpio-mockup-cdev utility that causes the program to wait until a signal is received before exiting, even when its behavior is to retrieve the GPIO value of the line. This allows using this utility to keep a GPIO line configured even when in input mode, which will be relied on in other tests. Signed-off-by: Nícolas F. R. A. Prado --- tools/testing/selftests/gpio/gpio-mockup-cdev.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/gpio/gpio-mockup-cdev.c b/tools/testing/selftests/gpio/gpio-mockup-cdev.c index d1640f44f8ac2a6fda7a5f75605f83fcaa165dc0..f674dcafa60a02cb1739f3cfae8963dc09efba74 100644 --- a/tools/testing/selftests/gpio/gpio-mockup-cdev.c +++ b/tools/testing/selftests/gpio/gpio-mockup-cdev.c @@ -15,6 +15,7 @@ #include #include #include +#include #define CONSUMER "gpio-mockup-cdev" @@ -95,6 +96,7 @@ static void usage(char *prog) printf(" (default is to leave bias unchanged):\n"); printf("-l: set line active low (default is active high)\n"); printf("-s: set line value (default is to get line value)\n"); + printf("-w: wait even in get mode\n"); printf("-u: uAPI version to use (default is 2)\n"); exit(-1); } @@ -120,13 +122,14 @@ int main(int argc, char *argv[]) unsigned int offset, val = 0, abiv; uint32_t flags_v1; uint64_t flags_v2; + bool wait = false; abiv = 2; ret = 0; flags_v1 = GPIOHANDLE_REQUEST_INPUT; flags_v2 = GPIO_V2_LINE_FLAG_INPUT; - while ((opt = getopt(argc, argv, "lb:s:u:")) != -1) { + while ((opt = getopt(argc, argv, "lb:s:u:w")) != -1) { switch (opt) { case 'l': flags_v1 |= GPIOHANDLE_REQUEST_ACTIVE_LOW; @@ -150,10 +153,14 @@ int main(int argc, char *argv[]) flags_v1 |= GPIOHANDLE_REQUEST_OUTPUT; flags_v2 &= ~GPIO_V2_LINE_FLAG_INPUT; flags_v2 |= GPIO_V2_LINE_FLAG_OUTPUT; + wait = true; break; case 'u': abiv = atoi(optarg); break; + case 'w': + wait = true; + break; default: usage(argv[0]); } @@ -183,9 +190,10 @@ int main(int argc, char *argv[]) return lfd; } - if (flags_v2 & GPIO_V2_LINE_FLAG_OUTPUT) { + if (wait) wait_signal(); - } else { + + if (flags_v2 & GPIO_V2_LINE_FLAG_INPUT) { if (abiv == 1) ret = get_value_v1(lfd); else -- 2.47.0
[PATCH RFC v2 5/5] selftest: gpio: Add a new set-get config test
Add a new kselftest that sets a configuration to a GPIO line and then gets it back to verify that it was correctly carried out by the driver. Setting a configuration is done through the GPIO uAPI, but retrieving it is done through the debugfs interface since that is the only place where it can be retrieved from userspace. The test reads the test plan from a YAML file, which includes the chips and pin settings to set and validate. Signed-off-by: Nícolas F. R. A. Prado --- tools/testing/selftests/gpio/Makefile | 2 +- .../gpio-set-get-config-example-test-plan.yaml | 15 ++ .../testing/selftests/gpio/gpio-set-get-config.py | 183 + 3 files changed, 199 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/gpio/Makefile b/tools/testing/selftests/gpio/Makefile index e0884390447dcfffe4ca0b4fa0f1669463bb669c..bdfeb0c9aaddc436df77ada1d5ac0c80890960a7 100644 --- a/tools/testing/selftests/gpio/Makefile +++ b/tools/testing/selftests/gpio/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 -TEST_PROGS := gpio-mockup.sh gpio-sim.sh +TEST_PROGS := gpio-mockup.sh gpio-sim.sh gpio-set-get-config.py TEST_FILES := gpio-mockup-sysfs.sh TEST_GEN_PROGS_EXTENDED := gpio-mockup-cdev gpio-chip-info gpio-line-name CFLAGS += -O2 -g -Wall $(KHDR_INCLUDES) diff --git a/tools/testing/selftests/gpio/gpio-set-get-config-example-test-plan.yaml b/tools/testing/selftests/gpio/gpio-set-get-config-example-test-plan.yaml new file mode 100644 index ..3b749be3c8dcf6822b7531424a6b1f8fca840a65 --- /dev/null +++ b/tools/testing/selftests/gpio/gpio-set-get-config-example-test-plan.yaml @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0 +# Top-level contains a list of the GPIO chips that will be tested. Each one is +# chosen based on the GPIO chip's info label. +- label: "gpiochip_device_label" + # For each GPIO chip, multiple pin configurations can be tested, which are + # listed under 'tests' + tests: + # pin indicates the pin number to test + - pin: 34 +# bias can be 'pull-up', 'pull-down', 'disabled' +bias: "pull-up" + - pin: 34 +bias: "pull-down" + - pin: 34 +bias: "disabled" diff --git a/tools/testing/selftests/gpio/gpio-set-get-config.py b/tools/testing/selftests/gpio/gpio-set-get-config.py new file mode 100755 index ..6f1444c8d46bcfc226f414520b74f4a59725854f --- /dev/null +++ b/tools/testing/selftests/gpio/gpio-set-get-config.py @@ -0,0 +1,183 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024 Collabora Ltd + +# +# This test validates GPIO pin configuration. It takes a test plan in YAML (see +# gpio-set-get-config-example-test-plan.yaml) and sets and gets back each pin +# configuration described in the plan and checks that they match in order to +# validate that they are being applied correctly. +# +# When the file name for the test plan is not provided through --test-plan, it +# will be guessed based on the platform ID (DT compatible or DMI). +# + +import time +import os +import sys +import argparse +import re +import subprocess +import glob +import signal + +import yaml + +# Allow ksft module to be imported from different directory +this_dir = os.path.dirname(os.path.realpath(__file__)) +sys.path.append(os.path.join(this_dir, "../kselftest/")) + +import ksft + + +def config_pin(chip_dev, pin_config): +flags = [] +if pin_config.get("bias"): +flags += f"-b {pin_config['bias']}".split() +flags += ["-w", chip_dev, str(pin_config["pin"])] +gpio_mockup_cdev_path = os.path.join(this_dir, "gpio-mockup-cdev") +return subprocess.Popen([gpio_mockup_cdev_path] + flags) + + +def get_bias_debugfs(chip_debugfs_path, pin): +with open(os.path.join(chip_debugfs_path, "pinconf-pins")) as f: +for l in f: +m = re.match(rf"pin {pin}.*bias (?P(pull )?\w+)", l) +if m: +return m.group("bias") + + +def check_config_pin(chip, chip_debugfs_dir, pin_config): +test_passed = True + +if pin_config.get("bias"): +bias = get_bias_debugfs(chip_debugfs_dir, pin_config["pin"]) +# Convert "pull up" / "pull down" to "pull-up" / "pull-down" +bias = bias.replace(" ", "-") +if bias != pin_config["bias"]: +ksft.print_msg( +f"Bias doesn't match: Expected {pin_config['bias']}, read {bias}." +) +test_passed = False + +ksft.test_result( +test_passed, +f"{chip['label']}.{pin_config['pin']}.{pin_config['bias']}", +) + + +def get_devfs_chip_file(chip_dict): +gpio_chip_info_path = os.path.join(this_dir, 'gpio-chip-info') +for f in glob.glob("/dev/gpiochip*"): +proc = subprocess.run( +f"{gpio_chip_info_path} {f} label".split(), capture_output=True, text=True +) +if proc.returncode: +ksft.print_msg(f"Error opening gpio device {
Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
On Fri, Oct 25, 2024 at 11:44:34AM -0700, John Hubbard wrote: > On 10/25/24 11:38 AM, Pedro Falcato wrote: > > On Fri, Oct 25, 2024 at 6:41 PM John Hubbard wrote: > > > > > > On 10/25/24 5:50 AM, Pedro Falcato wrote: > > > > On Fri, Oct 25, 2024 at 10:41 AM Lorenzo Stoakes > > > > wrote: > > > ... > > > > > +static inline int pidfd_is_self_sentinel(pid_t pid) > > > > > +{ > > > > > + return pid == PIDFD_SELF_THREAD || pid == > > > > > PIDFD_SELF_THREAD_GROUP; > > > > > +} > > > > > > > > Do we want this in the uapi header? Even if this is useful, it might > > > > come with several drawbacks such as breaking scripts that parse kernel > > > > headers (and a quick git grep suggests we do have static inlines in > > > > headers, but in rather obscure ones) and breaking C89: > > > > > > > > > > Let's please not say "C89" anymore, we've moved on! :) > > > > > > The notes in [1], which is now nearly 2.5 years old, discuss the move to > > > C11, and specifically how to handle the inline keyword. > > > > That seems to only apply to the kernel internally, uapi headers are > > Yes. > > > included from userspace too (-std=c89 -pedantic doesn't know what a > > gnu extension is). And uapi headers _generally_ keep to defining > > constants and structs, nothing more. > > OK Because a lot of people using -ANSI- C89 are importing a very new linux feature header. And let's ignore the hundreds of existing uses... OK. The rules, unstated anywhere, are that we must support 1972-era C in an optional header for a feature available only in new kernels because somebody somewhere is using a VAX-11 and gosh darn it they can't change their toolchain! And you had better make sure you don't wear out those tape drums... > > > I don't know what the guidelines for uapi headers are nowadays, but we > > generally want to not break userspace. > > > > > > > > I think it's quite clear at this point, that we should not hold up new > > > work, based on concerns about handling the inline keyword, nor about > > > C89. > > > > Right, but the correct solution is probably to move > > pidfd_is_self_sentinel to some other place, since it's not even > > supposed to be used by userspace (it's semantically useless to > > userspace, and it's only two users are in the kernel, kernel/pid.c and > > exit.c). > > > > Yes, if userspace absolutely doesn't need nor want this, then putting > it in a non-uapi header does sound like the right move. The bike shed should be blue! Wait no no, it should be red... Hang on yellow yes! Yellow's great! No wait - did we _test_ yellow in the way I wanted... I mean for me this isn't a big deal - we declare the defines here, it makes sense to have a very very simple inline function. It's not like userspace is overly hurt by this... Also I did explain there's no obvious header to put this in in the kernel and I'm not introducing one sorry. ANyway if you guys feel strong enough about this, I'll respin again and just open-code this trivial check where it's used. > > > thanks, > -- > John Hubbard >
[PATCH v12 7/7] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware
The new TEE remoteproc driver is used to manage remote firmware in a secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is introduced to delegate the loading of the firmware to the trusted execution context. In such cases, the firmware should be signed and adhere to the image format defined by the TEE. Signed-off-by: Arnaud Pouliquen --- updates vs previous version - rename structures, variables and function from tee_rproc_xxx to rproc_tee_xxx, - rework code to take into account rproc_tee_register and rproc_tee_unregister APIs update, - optimize code around dev_err_probe() when rproc_tee_register() fails. --- drivers/remoteproc/stm32_rproc.c | 57 ++-- 1 file changed, 54 insertions(+), 3 deletions(-) diff --git a/drivers/remoteproc/stm32_rproc.c b/drivers/remoteproc/stm32_rproc.c index 288bd70c7861..7875b26a38a5 100644 --- a/drivers/remoteproc/stm32_rproc.c +++ b/drivers/remoteproc/stm32_rproc.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -255,6 +256,19 @@ static int stm32_rproc_release(struct rproc *rproc) return 0; } +static int stm32_rproc_tee_stop(struct rproc *rproc) +{ + int err; + + stm32_rproc_request_shutdown(rproc); + + err = rproc_tee_stop(rproc); + if (err) + return err; + + return stm32_rproc_release(rproc); +} + static int stm32_rproc_prepare(struct rproc *rproc) { struct device *dev = rproc->dev.parent; @@ -691,8 +705,20 @@ static const struct rproc_ops st_rproc_ops = { .get_boot_addr = rproc_elf_get_boot_addr, }; +static const struct rproc_ops st_rproc_tee_ops = { + .prepare= stm32_rproc_prepare, + .start = rproc_tee_start, + .stop = stm32_rproc_tee_stop, + .kick = stm32_rproc_kick, + .load = rproc_tee_load_fw, + .parse_fw = rproc_tee_parse_fw, + .find_loaded_rsc_table = rproc_tee_find_loaded_rsc_table, + .release_fw = rproc_tee_release_fw, +}; + static const struct of_device_id stm32_rproc_match[] = { { .compatible = "st,stm32mp1-m4" }, + { .compatible = "st,stm32mp1-m4-tee" }, {}, }; MODULE_DEVICE_TABLE(of, stm32_rproc_match); @@ -853,15 +879,36 @@ static int stm32_rproc_probe(struct platform_device *pdev) struct device_node *np = dev->of_node; struct rproc *rproc; unsigned int state; + u32 proc_id; int ret; ret = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(32)); if (ret) return ret; - rproc = devm_rproc_alloc(dev, np->name, &st_rproc_ops, NULL, sizeof(*ddata)); - if (!rproc) - return -ENOMEM; + if (of_device_is_compatible(np, "st,stm32mp1-m4-tee")) { + /* +* Delegate the firmware management to the secure context. +* The firmware loaded has to be signed. +*/ + ret = of_property_read_u32(np, "st,proc-id", &proc_id); + if (ret) { + dev_err(dev, "failed to read st,rproc-id property\n"); + return ret; + } + + rproc = devm_rproc_alloc(dev, np->name, &st_rproc_tee_ops, NULL, sizeof(*ddata)); + if (!rproc) + return -ENOMEM; + + ret = rproc_tee_register(dev, rproc, proc_id); + if (ret) + return dev_err_probe(dev, ret, "signed firmware not supported by TEE\n"); + } else { + rproc = devm_rproc_alloc(dev, np->name, &st_rproc_ops, NULL, sizeof(*ddata)); + if (!rproc) + return -ENOMEM; + } ddata = rproc->priv; @@ -913,6 +960,8 @@ static int stm32_rproc_probe(struct platform_device *pdev) dev_pm_clear_wake_irq(dev); device_init_wakeup(dev, false); } + rproc_tee_unregister(rproc); + return ret; } @@ -933,6 +982,8 @@ static void stm32_rproc_remove(struct platform_device *pdev) dev_pm_clear_wake_irq(dev); device_init_wakeup(dev, false); } + + rproc_tee_unregister(rproc); } static int stm32_rproc_suspend(struct device *dev) -- 2.25.1
[PATCH v12 2/7] remoteproc: Add TEE support
Add a remoteproc TEE (Trusted Execution Environment) driver that will be probed by the TEE bus. If the associated Trusted application is supported on secure part this driver offers a client interface to load a firmware by the secure part. This firmware could be authenticated by the secure trusted application. Signed-off-by: Arnaud Pouliquen --- Updates vs previous version: - rename structures, functions, and variables from "tee_rproc_xxx" to "rproc_tee_xxx", - update rproc_tee_register to return an error instead of "struct rproc_tee *" pointer, - update rproc_tee_unregister argument from "struct rproc_tee *trproc" to "struct rproc *rproc", - reword MODULE_DESCRIPTION. --- drivers/remoteproc/Kconfig | 10 + drivers/remoteproc/Makefile | 1 + drivers/remoteproc/remoteproc_tee.c | 510 include/linux/remoteproc.h | 4 + include/linux/remoteproc_tee.h | 106 ++ 5 files changed, 631 insertions(+) create mode 100644 drivers/remoteproc/remoteproc_tee.c create mode 100644 include/linux/remoteproc_tee.h diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig index 955e4e38477e..d0284220a194 100644 --- a/drivers/remoteproc/Kconfig +++ b/drivers/remoteproc/Kconfig @@ -23,6 +23,16 @@ config REMOTEPROC_CDEV It's safe to say N if you don't want to use this interface. +config REMOTEPROC_TEE + tristate "Remoteproc support by a TEE application" + depends on OPTEE + help + Support a remote processor with a TEE application. The Trusted + Execution Context is responsible for loading the trusted firmware + image and managing the remote processor's lifecycle. + + It's safe to say N if you don't want to use remoteproc TEE. + config IMX_REMOTEPROC tristate "i.MX remoteproc support" depends on ARCH_MXC diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile index 5ff4e2fee4ab..f77e0abe8349 100644 --- a/drivers/remoteproc/Makefile +++ b/drivers/remoteproc/Makefile @@ -11,6 +11,7 @@ remoteproc-y += remoteproc_sysfs.o remoteproc-y += remoteproc_virtio.o remoteproc-y += remoteproc_elf_loader.o obj-$(CONFIG_REMOTEPROC_CDEV) += remoteproc_cdev.o +obj-$(CONFIG_REMOTEPROC_TEE) += remoteproc_tee.o obj-$(CONFIG_IMX_REMOTEPROC) += imx_rproc.o obj-$(CONFIG_IMX_DSP_REMOTEPROC) += imx_dsp_rproc.o obj-$(CONFIG_INGENIC_VPU_RPROC)+= ingenic_rproc.o diff --git a/drivers/remoteproc/remoteproc_tee.c b/drivers/remoteproc/remoteproc_tee.c new file mode 100644 index ..f258b9304daf --- /dev/null +++ b/drivers/remoteproc/remoteproc_tee.c @@ -0,0 +1,510 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) STMicroelectronics 2024 + * Author: Arnaud Pouliquen + */ + +#include +#include +#include +#include +#include +#include +#include + +#define MAX_TEE_PARAM_ARRAY_MEMBER 4 + +/* + * Authentication of the firmware and load in the remote processor memory + * + * [in] params[0].value.a:unique 32bit identifier of the remote processor + * [in] params[1].memref: buffer containing the image of the buffer + */ +#define TA_RPROC_FW_CMD_LOAD_FW1 + +/* + * Start the remote processor + * + * [in] params[0].value.a:unique 32bit identifier of the remote processor + */ +#define TA_RPROC_FW_CMD_START_FW 2 + +/* + * Stop the remote processor + * + * [in] params[0].value.a:unique 32bit identifier of the remote processor + */ +#define TA_RPROC_FW_CMD_STOP_FW3 + +/* + * Return the address of the resource table, or 0 if not found + * No check is done to verify that the address returned is accessible by + * the non secure context. If the resource table is loaded in a protected + * memory the access by the non secure context will lead to a data abort. + * + * [in] params[0].value.a:unique 32bit identifier of the remote processor + * [out] params[1].value.a: 32bit LSB resource table memory address + * [out] params[1].value.b: 32bit MSB resource table memory address + * [out] params[2].value.a: 32bit LSB resource table memory size + * [out] params[2].value.b: 32bit MSB resource table memory size + */ +#define TA_RPROC_FW_CMD_GET_RSC_TABLE 4 + +/* + * Return the address of the core dump + * + * [in] params[0].value.a:unique 32bit identifier of the remote processor + * [out] params[1].memref: address of the core dump image if exist, + * else return Null + */ +#define TA_RPROC_FW_CMD_GET_COREDUMP 5 + +/* + * Release remote processor firmware images and associated resources. + * This command should be used in case an error occurs between the loading of + * the firmware images (A_RPROC_CMD_LOAD_FW) and the starting of the remote + * processor (TA_RPROC_CMD_START_FW) or after stopping the remote processor +
[PATCH v12 6/7] remoteproc: stm32: Create sub-functions to request shutdown and release
To prepare for the support of TEE remoteproc, create sub-functions that can be used in both cases, with and without remoteproc TEE support. Signed-off-by: Arnaud Pouliquen --- drivers/remoteproc/stm32_rproc.c | 82 +++- 1 file changed, 49 insertions(+), 33 deletions(-) diff --git a/drivers/remoteproc/stm32_rproc.c b/drivers/remoteproc/stm32_rproc.c index 8c7f7950b80e..288bd70c7861 100644 --- a/drivers/remoteproc/stm32_rproc.c +++ b/drivers/remoteproc/stm32_rproc.c @@ -209,6 +209,52 @@ static int stm32_rproc_mbox_idx(struct rproc *rproc, const unsigned char *name) return -EINVAL; } +static void stm32_rproc_request_shutdown(struct rproc *rproc) +{ + struct stm32_rproc *ddata = rproc->priv; + int err, idx; + + /* Request shutdown of the remote processor */ + if (rproc->state != RPROC_OFFLINE && rproc->state != RPROC_CRASHED) { + idx = stm32_rproc_mbox_idx(rproc, STM32_MBX_SHUTDOWN); + if (idx >= 0 && ddata->mb[idx].chan) { + err = mbox_send_message(ddata->mb[idx].chan, "detach"); + if (err < 0) + dev_warn(&rproc->dev, "warning: remote FW shutdown without ack\n"); + } + } +} + +static int stm32_rproc_release(struct rproc *rproc) +{ + struct stm32_rproc *ddata = rproc->priv; + unsigned int err = 0; + + /* To allow platform Standby power mode, set remote proc Deep Sleep */ + if (ddata->pdds.map) { + err = regmap_update_bits(ddata->pdds.map, ddata->pdds.reg, +ddata->pdds.mask, 1); + if (err) { + dev_err(&rproc->dev, "failed to set pdds\n"); + return err; + } + } + + /* Update coprocessor state to OFF if available */ + if (ddata->m4_state.map) { + err = regmap_update_bits(ddata->m4_state.map, +ddata->m4_state.reg, +ddata->m4_state.mask, +M4_STATE_OFF); + if (err) { + dev_err(&rproc->dev, "failed to set copro state\n"); + return err; + } + } + + return 0; +} + static int stm32_rproc_prepare(struct rproc *rproc) { struct device *dev = rproc->dev.parent; @@ -519,17 +565,9 @@ static int stm32_rproc_detach(struct rproc *rproc) static int stm32_rproc_stop(struct rproc *rproc) { struct stm32_rproc *ddata = rproc->priv; - int err, idx; + int err; - /* request shutdown of the remote processor */ - if (rproc->state != RPROC_OFFLINE && rproc->state != RPROC_CRASHED) { - idx = stm32_rproc_mbox_idx(rproc, STM32_MBX_SHUTDOWN); - if (idx >= 0 && ddata->mb[idx].chan) { - err = mbox_send_message(ddata->mb[idx].chan, "detach"); - if (err < 0) - dev_warn(&rproc->dev, "warning: remote FW shutdown without ack\n"); - } - } + stm32_rproc_request_shutdown(rproc); err = stm32_rproc_set_hold_boot(rproc, true); if (err) @@ -541,29 +579,7 @@ static int stm32_rproc_stop(struct rproc *rproc) return err; } - /* to allow platform Standby power mode, set remote proc Deep Sleep */ - if (ddata->pdds.map) { - err = regmap_update_bits(ddata->pdds.map, ddata->pdds.reg, -ddata->pdds.mask, 1); - if (err) { - dev_err(&rproc->dev, "failed to set pdds\n"); - return err; - } - } - - /* update coprocessor state to OFF if available */ - if (ddata->m4_state.map) { - err = regmap_update_bits(ddata->m4_state.map, -ddata->m4_state.reg, -ddata->m4_state.mask, -M4_STATE_OFF); - if (err) { - dev_err(&rproc->dev, "failed to set copro state\n"); - return err; - } - } - - return 0; + return stm32_rproc_release(rproc); } static void stm32_rproc_kick(struct rproc *rproc, int vqid) -- 2.25.1
Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
On 10/25/24 2:09 PM, Lorenzo Stoakes wrote: On Fri, Oct 25, 2024 at 01:31:49PM -0700, John Hubbard wrote: On 10/25/24 12:49 PM, Lorenzo Stoakes wrote: On Fri, Oct 25, 2024 at 11:44:34AM -0700, John Hubbard wrote: On 10/25/24 11:38 AM, Pedro Falcato wrote: On Fri, Oct 25, 2024 at 6:41 PM John Hubbard wrote: ... I'll admit to being easily cowed by "you're breaking userspace" arguments. Even when they start to get rather absurd. Because I can't easily tell where the line is. Maybe "-std=c89 -pedantic" is on the other side of the line. I'd like it to be! :) Well, apparently not... Why not? Your arguments are clear and reasonable. Why shouldn't they prevail? Please don't think that I have some sort of firm position here. I'm simply looking for the right answer. And if that's different than something I proposed earlier, no problem. The best answer should win. ... The bike shed should be blue! Wait no no, it should be red... Hang on yellow yes! Yellow's great! Putting a header in the right location, so as to avoid breakage here or there, is not bikeshedding. Sorry. There are 312 uses of "static inline" already in UAPI headers, not all quite as obscure as claimed. OK, good. Let's lead with that. It seems very clear, then, that a new one won't cause a problem. Specifically requiring me and only me to support ansi C89 for a theorised scenario is in my opinion bikeshedding, but I don't want to get into an argument about something so petty :) An argument about the definition of bikeshedding sounds delightfully recursive, but yes, let's not. :) ... ANyway if you guys feel strong enough about this, I'll respin again and just open-code this trivial check where it's used. No strong feelings, just hoping to help make a choice that gets you closer to getting your patches committed. I mean, you are saying I am breaking things and implying the series is blocked on this, that sounds like a strong opinion, but again I'm not going to argue. Actually, Pedro's request kicked this off, and I was hoping to dismiss it--again, in order to help move things along. My opinion is that we should shun ancient toolchains and ancient systems whenever possible. Somehow that got turned into "I'm trying to block the patchset". Really, whatever works, follows The Rules (whatever we eventually understand them to be), and doesn't cause someone *else* to come out of the woodwork and claim a problem, is fine with me. As with the requirement that I, only for my part of the change, must fix up test header import, while I disagree I should be doing the fix, I did it anyway as I am accommodating and reasonable. I agree that pre-existing problems in selftests should not be your problem. By the way, I'm occasionally involved in helping fix up various selftest-related problems, especially when they impact mm. Send me a note if you have anything in mind that ought to be fixed up, I might be able to help head off future grief in that area. So fine - I'll respin and just open-code this as it's trivial and there's no (other) sensible place to put it anyway. A P.S. though - a very NOT theoretical issue with userspace is the import of linux/fcntl.h in pidfd.h which seems to me to have been imported solely for the kernel's sake. A gentle suggestion (it seems I can't win - gentle suggestions are ignored, tongue-in-cheek parody is taken to be mean... but anyway) is to do Actually, these come across as sarcasm, especially in the context of these emails that show you are becoming quite distraught. I've met you several times at the conferences. We get along well. And your work is top notch. So please consider that I'm very much supportive of you and your work here. I'm still trying to understand why you are recently sending these very strong emails (Vlastimil also took some heat), but I see that you also mentioned some long hours. If my feedback is making things worse here, I'll try to adjust. Selftests in general are a frustrating area. thanks, -- John Hubbard something like: #ifdef __KERNEL__ #include #else #include #endif At the top of the pidfd.h header. This must surely sting a _lot_ of people in userland otherwise. But this is out of scope for this change.
Re: [PATCH next] rcu: Unlock correctly in rcu_dump_cpu_stacks()
On Fri, Oct 25, 2024 at 10:06:43AM +0300, Dan Carpenter wrote: > The unlock needs to be outside the } close curly braces for this if > statement. Otherwise it leads to a deadlock. > > Fixes: 744e87210b1a ("rcu: Finer-grained grace-period-end checks in > rcu_dump_cpu_stacks()") > Signed-off-by: Dan Carpenter Good catch! Reviewed-by: Paul E. McKenney This is a regression from this past merge window, if I am keeping track. So it is a candidate for going in before the next merge window opens. Thanx, Paul > --- > kernel/rcu/tree_stall.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index 8994391b95c7..925fcdad5dea 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -357,8 +357,8 @@ static void rcu_dump_cpu_stacks(unsigned long gp_seq) > pr_err("Offline CPU %d blocking current > GP.\n", cpu); > else > dump_cpu_task(cpu); > - raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > } > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > } > printk_deferred_exit(); > } > -- > 2.45.2 >
[PATCH RFC v2 3/5] pinctrl: mediatek: common: Expose more configurations to GPIO set_config
Currently the set_config callback in the gpio_chip registered by the pinctrl-mtk-common driver only supports configuring a single parameter on specific pins (the input debounce of the EINT controller, on pins that support it), even though many other configurations are already implemented and available through the pinctrl API for configuration of pins by the Devicetree and other drivers. Expose all configurations currently implemented through the GPIO API so they can also be set from userspace, which is particularly useful to allow testing them from userspace. --- drivers/pinctrl/mediatek/pinctrl-mtk-common.c | 48 --- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c index 91edb539925a49b4302866b9ac36f580cc189fb5..7f9764b474c4e7d0d4c3d6e542bdb7df0264daec 100644 --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c @@ -62,13 +62,12 @@ static unsigned int mtk_get_port(struct mtk_pinctrl *pctl, unsigned long pin) << pctl->devdata->port_shf; } -static int mtk_pmx_gpio_set_direction(struct pinctrl_dev *pctldev, - struct pinctrl_gpio_range *range, unsigned offset, - bool input) +static int mtk_common_pin_set_direction(struct mtk_pinctrl *pctl, + unsigned int offset, + bool input) { unsigned int reg_addr; unsigned int bit; - struct mtk_pinctrl *pctl = pinctrl_dev_get_drvdata(pctldev); reg_addr = mtk_get_port(pctl, offset) + pctl->devdata->dir_offset; bit = BIT(offset & pctl->devdata->mode_mask); @@ -86,6 +85,15 @@ static int mtk_pmx_gpio_set_direction(struct pinctrl_dev *pctldev, return 0; } +static int mtk_pmx_gpio_set_direction(struct pinctrl_dev *pctldev, + struct pinctrl_gpio_range *range, unsigned int offset, + bool input) +{ + struct mtk_pinctrl *pctl = pinctrl_dev_get_drvdata(pctldev); + + return mtk_common_pin_set_direction(pctl, offset, input); +} + static void mtk_gpio_set(struct gpio_chip *chip, unsigned offset, int value) { unsigned int reg_addr; @@ -363,12 +371,11 @@ static int mtk_pconf_set_pull_select(struct mtk_pinctrl *pctl, return 0; } -static int mtk_pconf_parse_conf(struct pinctrl_dev *pctldev, +static int mtk_pconf_parse_conf(struct mtk_pinctrl *pctl, unsigned int pin, enum pin_config_param param, - enum pin_config_param arg) + u32 arg) { int ret = 0; - struct mtk_pinctrl *pctl = pinctrl_dev_get_drvdata(pctldev); switch (param) { case PIN_CONFIG_BIAS_DISABLE: @@ -381,15 +388,15 @@ static int mtk_pconf_parse_conf(struct pinctrl_dev *pctldev, ret = mtk_pconf_set_pull_select(pctl, pin, true, false, arg); break; case PIN_CONFIG_INPUT_ENABLE: - mtk_pmx_gpio_set_direction(pctldev, NULL, pin, true); + mtk_common_pin_set_direction(pctl, pin, true); ret = mtk_pconf_set_ies_smt(pctl, pin, arg, param); break; case PIN_CONFIG_OUTPUT: mtk_gpio_set(pctl->chip, pin, arg); - ret = mtk_pmx_gpio_set_direction(pctldev, NULL, pin, false); + ret = mtk_common_pin_set_direction(pctl, pin, false); break; case PIN_CONFIG_INPUT_SCHMITT_ENABLE: - mtk_pmx_gpio_set_direction(pctldev, NULL, pin, true); + mtk_common_pin_set_direction(pctl, pin, true); ret = mtk_pconf_set_ies_smt(pctl, pin, arg, param); break; case PIN_CONFIG_DRIVE_STRENGTH: @@ -421,7 +428,7 @@ static int mtk_pconf_group_set(struct pinctrl_dev *pctldev, unsigned group, int i, ret; for (i = 0; i < num_configs; i++) { - ret = mtk_pconf_parse_conf(pctldev, g->pin, + ret = mtk_pconf_parse_conf(pctl, g->pin, pinconf_to_config_param(configs[i]), pinconf_to_config_argument(configs[i])); if (ret < 0) @@ -870,19 +877,20 @@ static int mtk_gpio_set_config(struct gpio_chip *chip, unsigned offset, struct mtk_pinctrl *pctl = gpiochip_get_data(chip); const struct mtk_desc_pin *pin; unsigned long eint_n; - u32 debounce; + enum pin_config_param param = pinconf_to_config_param(config); + u32 arg = pinconf_to_config_argument(config); - if (pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE) - return -ENOTSUPP; + if (param == PIN_CONFIG_INPUT_DEBOUNCE) { + pin = pctl->devdata->pins + offset; + if (pin->eint.eintnum == NO_EINT_SUPPORT) + return -EINVAL; - pin
Re: [PATCH v2] vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
On Thu, Oct 24, 2024 at 11:10:58AM -0500, Konstantin Shkolnyy wrote: This happens on 64-bit big-endian machines. SO_RCVLOWAT requires an int parameter. However, instead of int, the test uses unsigned long in one place and size_t in another. Both are 8 bytes long on 64-bit machines. The kernel, having received the 8 bytes, doesn't test for the exact size of the parameter, it only cares that it's >= sizeof(int), and casts the 4 lower-addressed bytes to an int, which, on a big-endian machine, contains 0. 0 doesn't trigger an error, SO_RCVLOWAT returns with success and the socket stays with the default SO_RCVLOWAT = 1, which results in test failures. Fixes: b1346338fbae ("vsock_test: POLLIN + SO_RCVLOWAT test") Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic") Signed-off-by: Konstantin Shkolnyy --- Notes: The problem was found on s390 (big endian), while x86-64 didn't show it. After this fix, all tests pass on s390. Changes for v2: - add "Fixes:" lines to the commit message LGTM! Reviewed-by: Stefano Garzarella tools/testing/vsock/vsock_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index 8d38dbf8f41f..7fd25b814b4b 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -835,7 +835,7 @@ static void test_stream_poll_rcvlowat_server(const struct test_opts *opts) static void test_stream_poll_rcvlowat_client(const struct test_opts *opts) { - unsigned long lowat_val = RCVLOWAT_BUF_SIZE; + int lowat_val = RCVLOWAT_BUF_SIZE; char buf[RCVLOWAT_BUF_SIZE]; struct pollfd fds; short poll_flags; @@ -1357,7 +1357,7 @@ static void test_stream_rcvlowat_def_cred_upd_client(const struct test_opts *opt static void test_stream_credit_update_test(const struct test_opts *opts, bool low_rx_bytes_test) { - size_t recv_buf_size; + int recv_buf_size; struct pollfd fds; size_t buf_size; void *buf; -- 2.34.1
RE: [PATCH v4 04/11] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
> From: Nicolin Chen > Sent: Tuesday, October 22, 2024 8:19 AM > > Add a new ioctl for user space to do a vIOMMU allocation. It must be based > on a nesting parent HWPT, so take its refcount. > > If an IOMMU driver supports a driver-managed vIOMMU object, it must > define why highlight 'driver-managed', implying a core-managed vIOMMU object some day? > +/** > + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC) > + * @size: sizeof(struct iommu_viommu_alloc) > + * @flags: Must be 0 > + * @type: Type of the virtual IOMMU. Must be defined in enum > iommu_viommu_type > + * @dev_id: The device's physical IOMMU will be used to back the virtual > IOMMU > + * @hwpt_id: ID of a nesting parent HWPT to associate to > + * @out_viommu_id: Output virtual IOMMU ID for the allocated object > + * > + * Allocate a virtual IOMMU object that represents the underlying physical > + * IOMMU's virtualization support. The vIOMMU object is a security-isolated > + * slice of the physical IOMMU HW that is unique to a specific VM. the object itself is a software abstraction, while a 'slice' is a set of real hw resources.
RE: [PATCH v4 05/11] iommufd: Add domain_alloc_nested op to iommufd_viommu_ops
> From: Nicolin Chen > Sent: Tuesday, October 22, 2024 8:19 AM > > Allow IOMMU driver to use a vIOMMU object that holds a nesting parent > hwpt/domain to allocate a nested domain. > > Suggested-by: Jason Gunthorpe > Signed-off-by: Nicolin Chen Reviewed-by: Kevin Tian
Re: [PATCH 2/2] rcuscale: Remove redundant WARN_ON_ONCE() splat
On Thu, Oct 24, 2024 at 01:28:24PM -0700, Paul E. McKenney wrote: > On Thu, Oct 24, 2024 at 06:45:58PM +0200, Uladzislau Rezki (Sony) wrote: > > There are two places where WARN_ON_ONCE() is called two times > > in the error paths. One which is encapsulated into if() condition > > and another one, which is unnecessary, is placed in the brackets. > > > > Remove an extra WARN_ON_ONCE() splat which is in brackets. > > > > Signed-off-by: Uladzislau Rezki (Sony) > > For both: > > Reviewed-by: Paul E. McKenney > Thank you :) -- Uladzislau Rezki
Re: [PATCH] sched_ext: Fix function pointer type mismatches in BPF selftests
On Thu, Oct 24, 2024 at 10:46:09AM +0530, Vishal Chourasia wrote: > Fix incompatible function pointer type warnings in sched_ext BPF selftests by > explicitly casting the function pointers when initializing struct_ops. > This addresses multiple -Wincompatible-function-pointer-types warnings from > the > clang compiler where function signatures didn't match exactly. > > The void * cast ensures the compiler accepts the function pointer > assignment despite minor type differences in the parameters. > > Signed-off-by: Vishal Chourasia Applied to sched_ext/for-6.12-fixes. Thanks. -- tejun