date:20241029

[PATCH net-next v11 12/23] ovpn: implement TCP transport

2024-10-29 Thread Antonio Quartulli

With this change ovpn is allowed to communicate to peers also via TCP.
Parsing of incoming messages is implemented through the strparser API.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/Kconfig   |   1 +
 drivers/net/ovpn/Makefile |   1 +
 drivers/net/ovpn/io.c |   4 +
 drivers/net/ovpn/main.c   |   3 +
 drivers/net/ovpn/peer.h   |  37 
 drivers/net/ovpn/socket.c |  44 +++-
 drivers/net/ovpn/socket.h |   9 +-
 drivers/net/ovpn/tcp.c| 506 ++
 drivers/net/ovpn/tcp.h|  44 
 9 files changed, 643 insertions(+), 6 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 
269b73fcfd348a48174fb96b8f8d4f8788636fa8..f37ce285e61fbee3201f4095ada3230305df511b
 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -118,6 +118,7 @@ config WIREGUARD_DEBUG
 config OVPN
tristate "OpenVPN data channel offload"
depends on NET && INET
+   select STREAM_PARSER
select NET_UDP_TUNNEL
select DST_CACHE
select CRYPTO
diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
d43fda72646bdc7644d9a878b56da0a0e5680c98..f4d4bd87c851c8dd5b81e357315c4b22de4bd092
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -18,4 +18,5 @@ ovpn-y += peer.o
 ovpn-y += pktid.o
 ovpn-y += socket.o
 ovpn-y += stats.o
+ovpn-y += tcp.o
 ovpn-y += udp.o
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
d56e74660c7be9020b5bdf7971322d41afd436d6..deda19ab87391f86964ba43088b7847d22420eee
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -22,6 +22,7 @@
 #include "crypto_aead.h"
 #include "netlink.h"
 #include "proto.h"
+#include "tcp.h"
 #include "udp.h"
 #include "skb.h"
 #include "socket.h"
@@ -213,6 +214,9 @@ void ovpn_encrypt_post(void *data, int ret)
case IPPROTO_UDP:
ovpn_udp_send_skb(peer->ovpn, peer, skb);
break;
+   case IPPROTO_TCP:
+   ovpn_tcp_send_skb(peer, skb);
+   break;
default:
/* no transport configured yet */
goto err;
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
73348765a8cf24321aa6be78e75f607d6dbffb1d..0488e395eb27d3dba1efc8ff39c023e0ac4a38dd
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -22,6 +22,7 @@
 #include "io.h"
 #include "packet.h"
 #include "peer.h"
+#include "tcp.h"
 
 /* Driver info */
 #define DRV_DESCRIPTION"OpenVPN data channel offload (ovpn)"
@@ -237,6 +238,8 @@ static int __init ovpn_init(void)
goto unreg_rtnl;
}
 
+   ovpn_tcp_init();
+
return 0;
 
 unreg_rtnl:
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 
eb1e31e854fbfff25d07fba8026789e41a76c113..2b7fa9510e362ef3646157bb0d361bab19ddaa99
 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -11,6 +11,7 @@
 #define _NET_OVPN_OVPNPEER_H_
 
 #include 
+#include 
 
 #include "crypto.h"
 #include "stats.h"
@@ -23,6 +24,18 @@
  * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
  * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
  * @sock: the socket being used to talk to this peer
+ * @tcp: keeps track of TCP specific state
+ * @tcp.strp: stream parser context (TCP only)
+ * @tcp.tx_work: work for deferring outgoing packet processing (TCP only)
+ * @tcp.user_queue: received packets that have to go to userspace (TCP only)
+ * @tcp.tx_in_progress: true if TX is already ongoing (TCP only)
+ * @tcp.out_msg.skb: packet scheduled for sending (TCP only)
+ * @tcp.out_msg.offset: offset where next send should start (TCP only)
+ * @tcp.out_msg.len: remaining data to send within packet (TCP only)
+ * @tcp.sk_cb.sk_data_ready: pointer to original cb (TCP only)
+ * @tcp.sk_cb.sk_write_space: pointer to original cb (TCP only)
+ * @tcp.sk_cb.prot: pointer to original prot object (TCP only)
+ * @tcp.sk_cb.ops: pointer to the original prot_ops object (TCP only)
  * @crypto: the crypto configuration (ciphers, keys, etc..)
  * @dst_cache: cache for dst_entry used to send to peer
  * @bind: remote peer binding
@@ -43,6 +56,30 @@ struct ovpn_peer {
struct in6_addr ipv6;
} vpn_addrs;
struct ovpn_socket *sock;
+
+   /* state of the TCP reading. Needed to keep track of how much of a
+* single packet has already been read from the stream and how much is
+* missing
+*/
+   struct {
+   struct strparser strp;
+   struct work_struct tx_work;
+   struct sk_buff_head user_queue;
+   bool tx_in_progress;
+
+   struct {
+   struct sk_buff *skb;
+   int offset;
+   int len;
+   } out_msg;
+
+   struct {
+   void (*sk_data_ready)(struct sock *sk);
+   void (*sk_write_space)(struct sock *sk);
+

Re: [PATCH v2 2/3] rust: macros: add macro to easily run KUnit tests

2024-10-29 Thread Boqun Feng

Hi David,

On Tue, Oct 29, 2024 at 05:24:18PM +0800, David Gow wrote:
> From: José Expósito 
> 
> Add a new procedural macro (`#[kunit_tests(kunit_test_suit_name)]`) to
> run KUnit tests using a user-space like syntax.
> 
> The macro, that should be used on modules, transforms every `#[test]`
> in a `kunit_case!` and adds a `kunit_unsafe_test_suite!` registering
> all of them.
> 
> The only difference with user-space tests is that instead of using
> `#[cfg(test)]`, `#[kunit_tests(kunit_test_suit_name)]` is used.
> 
> Note that `#[cfg(CONFIG_KUNIT)]` is added so the test module is not
> compiled when `CONFIG_KUNIT` is set to `n`.
> 
> Reviewed-by: David Gow 
> Signed-off-by: José Expósito 
> [Updated to use new const fn.]
> Signed-off-by: David Gow 
> ---
> 
> Changes since v1:
> https://lore.kernel.org/lkml/20230720-rustbind-v1-2-c80db349e...@google.com/
> - Rebased on top of rust-next
> - Make use of the new const functions, rather than the kunit_case!()
>   macro.
> 
> ---
>  MAINTAINERS  |  1 +
>  rust/kernel/kunit.rs | 11 +++
>  rust/macros/lib.rs   | 29 +
>  3 files changed, 41 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b77f4495dcf4..b65035ede675 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12433,6 +12433,7 @@ F:Documentation/dev-tools/kunit/
>  F:   include/kunit/
>  F:   lib/kunit/
>  F:   rust/kernel/kunit.rs
> +F:   rust/macros/kunit.rs

I cannot find this file in this series, and it's not in the current
rust-next either.

(Did you do the same thing as I sometimes did: forget to `git add` a new
file?)

Regards,
Boqun

>  F:   scripts/rustdoc_test_*
>  F:   tools/testing/kunit/
>  
> diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
> index fc2d259db458..abcf0229ffee 100644
> --- a/rust/kernel/kunit.rs
> +++ b/rust/kernel/kunit.rs
> @@ -40,6 +40,8 @@ pub fn info(args: fmt::Arguments<'_>) {
>  }
>  }
>  
> +use macros::kunit_tests;
> +
>  /// Asserts that a boolean expression is `true` at runtime.
>  ///
>  /// Public but hidden since it should only be used from generated tests.
> @@ -269,3 +271,12 @@ macro_rules! kunit_unsafe_test_suite {
>  };
>  };
>  }
> +
> +#[kunit_tests(rust_kernel_kunit)]
> +mod tests {
> +#[test]
> +fn rust_test_kunit_kunit_tests() {
> +let running = true;
> +assert_eq!(running, true);
> +}
> +}
> diff --git a/rust/macros/lib.rs b/rust/macros/lib.rs
> index 939ae00b723a..098925b99982 100644
> --- a/rust/macros/lib.rs
> +++ b/rust/macros/lib.rs
> @@ -10,6 +10,7 @@
>  mod quote;
>  mod concat_idents;
>  mod helpers;
> +mod kunit;
>  mod module;
>  mod paste;
>  mod pin_data;
> @@ -430,3 +431,31 @@ pub fn paste(input: TokenStream) -> TokenStream {
>  pub fn derive_zeroable(input: TokenStream) -> TokenStream {
>  zeroable::derive(input)
>  }
> +
> +/// Registers a KUnit test suite and its test cases using a user-space like 
> syntax.
> +///
> +/// This macro should be used on modules. If `CONFIG_KUNIT` (in `.config`) 
> is `n`, the target module
> +/// is ignored.
> +///
> +/// # Examples
> +///
> +/// ```ignore
> +/// # use macros::kunit_tests;
> +///
> +/// #[kunit_tests(kunit_test_suit_name)]
> +/// mod tests {
> +/// #[test]
> +/// fn foo() {
> +/// assert_eq!(1, 1);
> +/// }
> +///
> +/// #[test]
> +/// fn bar() {
> +/// assert_eq!(2, 2);
> +/// }
> +/// }
> +/// ```
> +#[proc_macro_attribute]
> +pub fn kunit_tests(attr: TokenStream, ts: TokenStream) -> TokenStream {
> +kunit::kunit_tests(attr, ts)
> +}
> -- 
> 2.47.0.163.g1226f6d8fa-goog
> 
>

[PATCH] kselftest/arm64: Use ksft_perror() to log MTE failures

2024-10-29 Thread Mark Brown

The logging in the allocation helpers variously uses ksft_print_msg() with
very intermittent logging of errno and perror() (which won't produce KTAP
conformant output) when logging the result of API calls that set errno.
Standardise on using the ksft_perror() helper in these cases so that more
information is available should the tests fail.

Signed-off-by: Mark Brown 
---
 tools/testing/selftests/arm64/mte/mte_common_util.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/arm64/mte/mte_common_util.c 
b/tools/testing/selftests/arm64/mte/mte_common_util.c
index 
00ffd34c66d301ee7d5c99e6b8d9d5d944520b7f..46958b58801e90ceb79be76f57c7f72b50d43b3c
 100644
--- a/tools/testing/selftests/arm64/mte/mte_common_util.c
+++ b/tools/testing/selftests/arm64/mte/mte_common_util.c
@@ -150,13 +150,13 @@ static void *__mte_allocate_memory_range(size_t size, int 
mem_type, int mapping,
map_flag |= MAP_PRIVATE;
ptr = mmap(NULL, entire_size, prot_flag, map_flag, fd, 0);
if (ptr == MAP_FAILED) {
-   ksft_print_msg("FAIL: mmap allocation\n");
+   ksft_perror("mmap()");
return NULL;
}
if (mem_type == USE_MPROTECT) {
if (mprotect(ptr, entire_size, prot_flag | PROT_MTE)) {
+   ksft_perror("mprotect(PROT_MTE)");
munmap(ptr, size);
-   ksft_print_msg("FAIL: mprotect PROT_MTE property\n");
return NULL;
}
}
@@ -190,13 +190,13 @@ void *mte_allocate_file_memory(size_t size, int mem_type, 
int mapping, bool tags
lseek(fd, 0, SEEK_SET);
for (index = INIT_BUFFER_SIZE; index < size; index += INIT_BUFFER_SIZE) 
{
if (write(fd, buffer, INIT_BUFFER_SIZE) != INIT_BUFFER_SIZE) {
-   perror("initialising buffer");
+   ksft_perror("initialising buffer");
return NULL;
}
}
index -= INIT_BUFFER_SIZE;
if (write(fd, buffer, size - index) != size - index) {
-   perror("initialising buffer");
+   ksft_perror("initialising buffer");
return NULL;
}
return __mte_allocate_memory_range(size, mem_type, mapping, 0, 0, tags, 
fd);
@@ -217,12 +217,12 @@ void *mte_allocate_file_memory_tag_range(size_t size, int 
mem_type, int mapping,
lseek(fd, 0, SEEK_SET);
for (index = INIT_BUFFER_SIZE; index < map_size; index += 
INIT_BUFFER_SIZE)
if (write(fd, buffer, INIT_BUFFER_SIZE) != INIT_BUFFER_SIZE) {
-   perror("initialising buffer");
+   ksft_perror("initialising buffer");
return NULL;
}
index -= INIT_BUFFER_SIZE;
if (write(fd, buffer, map_size - index) != map_size - index) {
-   perror("initialising buffer");
+   ksft_perror("initialising buffer");
return NULL;
}
return __mte_allocate_memory_range(size, mem_type, mapping, 
range_before,
@@ -359,7 +359,7 @@ int create_temp_file(void)
/* Create a file in the tmpfs filesystem */
fd = mkstemp(&filename[0]);
if (fd == -1) {
-   perror(filename);
+   ksft_perror(filename);
ksft_print_msg("FAIL: Unable to open temporary file\n");
return 0;
}

---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241028-arm64-mte-test-logging-6c6e737b1c62

Best regards,
-- 
Mark Brown

Re: [PATCH] kselftest/arm64: Use ksft_perror() to log MTE failures

2024-10-29 Thread Lorenzo Stoakes

On Tue, Oct 29, 2024 at 12:34:21PM +, Mark Brown wrote:
> The logging in the allocation helpers variously uses ksft_print_msg() with
> very intermittent logging of errno and perror() (which won't produce KTAP
> conformant output) when logging the result of API calls that set errno.
> Standardise on using the ksft_perror() helper in these cases so that more
> information is available should the tests fail.
>
> Signed-off-by: Mark Brown 

Thanks very much for this! LGTM!

Acked-by: Lorenzo Stoakes 

> ---
>  tools/testing/selftests/arm64/mte/mte_common_util.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/arm64/mte/mte_common_util.c 
> b/tools/testing/selftests/arm64/mte/mte_common_util.c
> index 
> 00ffd34c66d301ee7d5c99e6b8d9d5d944520b7f..46958b58801e90ceb79be76f57c7f72b50d43b3c
>  100644
> --- a/tools/testing/selftests/arm64/mte/mte_common_util.c
> +++ b/tools/testing/selftests/arm64/mte/mte_common_util.c
> @@ -150,13 +150,13 @@ static void *__mte_allocate_memory_range(size_t size, 
> int mem_type, int mapping,
>   map_flag |= MAP_PRIVATE;
>   ptr = mmap(NULL, entire_size, prot_flag, map_flag, fd, 0);
>   if (ptr == MAP_FAILED) {
> - ksft_print_msg("FAIL: mmap allocation\n");
> + ksft_perror("mmap()");
>   return NULL;
>   }
>   if (mem_type == USE_MPROTECT) {
>   if (mprotect(ptr, entire_size, prot_flag | PROT_MTE)) {
> + ksft_perror("mprotect(PROT_MTE)");
>   munmap(ptr, size);
> - ksft_print_msg("FAIL: mprotect PROT_MTE property\n");
>   return NULL;
>   }
>   }
> @@ -190,13 +190,13 @@ void *mte_allocate_file_memory(size_t size, int 
> mem_type, int mapping, bool tags
>   lseek(fd, 0, SEEK_SET);
>   for (index = INIT_BUFFER_SIZE; index < size; index += INIT_BUFFER_SIZE) 
> {
>   if (write(fd, buffer, INIT_BUFFER_SIZE) != INIT_BUFFER_SIZE) {
> - perror("initialising buffer");
> + ksft_perror("initialising buffer");
>   return NULL;
>   }
>   }
>   index -= INIT_BUFFER_SIZE;
>   if (write(fd, buffer, size - index) != size - index) {
> - perror("initialising buffer");
> + ksft_perror("initialising buffer");
>   return NULL;
>   }
>   return __mte_allocate_memory_range(size, mem_type, mapping, 0, 0, tags, 
> fd);
> @@ -217,12 +217,12 @@ void *mte_allocate_file_memory_tag_range(size_t size, 
> int mem_type, int mapping,
>   lseek(fd, 0, SEEK_SET);
>   for (index = INIT_BUFFER_SIZE; index < map_size; index += 
> INIT_BUFFER_SIZE)
>   if (write(fd, buffer, INIT_BUFFER_SIZE) != INIT_BUFFER_SIZE) {
> - perror("initialising buffer");
> + ksft_perror("initialising buffer");
>   return NULL;
>   }
>   index -= INIT_BUFFER_SIZE;
>   if (write(fd, buffer, map_size - index) != map_size - index) {
> - perror("initialising buffer");
> + ksft_perror("initialising buffer");
>   return NULL;
>   }
>   return __mte_allocate_memory_range(size, mem_type, mapping, 
> range_before,
> @@ -359,7 +359,7 @@ int create_temp_file(void)
>   /* Create a file in the tmpfs filesystem */
>   fd = mkstemp(&filename[0]);
>   if (fd == -1) {
> - perror(filename);
> + ksft_perror(filename);
>   ksft_print_msg("FAIL: Unable to open temporary file\n");
>   return 0;
>   }
>
> ---
> base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
> change-id: 20241028-arm64-mte-test-logging-6c6e737b1c62
>
> Best regards,
> --
> Mark Brown 
>

RE: [PATCH v5 02/13] iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage

2024-10-29 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Saturday, October 26, 2024 7:51 AM
> 
> Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage
> of
> IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc
> TEST_F
> to cover the IOMMU_VDEVICE_ALLOC ioctl.
> 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian

[PATCH net-next v11 14/23] ovpn: implement peer lookup logic

2024-10-29 Thread Antonio Quartulli

In a multi-peer scenario there are a number of situations when a
specific peer needs to be looked up.

We may want to lookup a peer by:
1. its ID
2. its VPN destination IP
3. its transport IP/port couple

For each of the above, there is a specific routing table referencing all
peers for fast look up.

Case 2. is a bit special in the sense that an outgoing packet may not be
sent to the peer VPN IP directly, but rather to a network behind it. For
this reason we first perform a nexthop lookup in the system routing
table and then we use the retrieved nexthop as peer search key.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/peer.c | 272 ++--
 1 file changed, 264 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
73ef509faab9701192a45ffe78a46dbbbeab01c2..c7dc9032c2b55fd42befc1f3e7a0eca893a96576
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ovpnstruct.h"
 #include "bind.h"
@@ -125,6 +126,94 @@ static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb,
return true;
 }
 
+/**
+ * ovpn_nexthop_from_skb4 - retrieve IPv4 nexthop for outgoing skb
+ * @skb: the outgoing packet
+ *
+ * Return: the IPv4 of the nexthop
+ */
+static __be32 ovpn_nexthop_from_skb4(struct sk_buff *skb)
+{
+   const struct rtable *rt = skb_rtable(skb);
+
+   if (rt && rt->rt_uses_gateway)
+   return rt->rt_gw4;
+
+   return ip_hdr(skb)->daddr;
+}
+
+/**
+ * ovpn_nexthop_from_skb6 - retrieve IPv6 nexthop for outgoing skb
+ * @skb: the outgoing packet
+ *
+ * Return: the IPv6 of the nexthop
+ */
+static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb)
+{
+   const struct rt6_info *rt = skb_rt6_info(skb);
+
+   if (!rt || !(rt->rt6i_flags & RTF_GATEWAY))
+   return ipv6_hdr(skb)->daddr;
+
+   return rt->rt6i_gateway;
+}
+
+#define ovpn_get_hash_head(_tbl, _key, _key_len) ({\
+   typeof(_tbl) *__tbl = &(_tbl);  \
+   (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \
+
+/**
+ * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address
+ * @ovpn: the openvpn instance to search
+ * @addr: VPN IPv4 to use as search key
+ *
+ * Refcounter is not increased for the returned peer.
+ *
+ * Return: the peer if found or NULL otherwise
+ */
+static struct ovpn_peer *ovpn_peer_get_by_vpn_addr4(struct ovpn_struct *ovpn,
+   __be32 addr)
+{
+   struct hlist_nulls_head *nhead;
+   struct hlist_nulls_node *ntmp;
+   struct ovpn_peer *tmp;
+
+   nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, &addr,
+  sizeof(addr));
+
+   hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, hash_entry_addr4)
+   if (addr == tmp->vpn_addrs.ipv4.s_addr)
+   return tmp;
+
+   return NULL;
+}
+
+/**
+ * ovpn_peer_get_by_vpn_addr6 - retrieve peer by its VPN IPv6 address
+ * @ovpn: the openvpn instance to search
+ * @addr: VPN IPv6 to use as search key
+ *
+ * Refcounter is not increased for the returned peer.
+ *
+ * Return: the peer if found or NULL otherwise
+ */
+static struct ovpn_peer *ovpn_peer_get_by_vpn_addr6(struct ovpn_struct *ovpn,
+   struct in6_addr *addr)
+{
+   struct hlist_nulls_head *nhead;
+   struct hlist_nulls_node *ntmp;
+   struct ovpn_peer *tmp;
+
+   nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, addr,
+  sizeof(*addr));
+
+   hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, hash_entry_addr6)
+   if (ipv6_addr_equal(addr, &tmp->vpn_addrs.ipv6))
+   return tmp;
+
+   return NULL;
+}
+
 /**
  * ovpn_peer_transp_match - check if sockaddr and peer binding match
  * @peer: the peer to get the binding from
@@ -202,14 +291,44 @@ ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn,
 struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
   struct sk_buff *skb)
 {
-   struct ovpn_peer *peer = NULL;
+   struct ovpn_peer *tmp, *peer = NULL;
struct sockaddr_storage ss = { 0 };
+   struct hlist_nulls_head *nhead;
+   struct hlist_nulls_node *ntmp;
+   size_t sa_len;
 
if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss)))
return NULL;
 
if (ovpn->mode == OVPN_MODE_P2P)
-   peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
+   return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
+
+   switch (ss.ss_family) {
+   case AF_INET:
+   sa_len = sizeof(struct sockaddr_in);
+   break;
+   case AF_INET6:
+   sa_len = sizeof(struct sockaddr_in6);
+   break;
+   default:
+

[PATCH net-next v11 01/23] netlink: add NLA_POLICY_MAX_LEN macro

2024-10-29 Thread Antonio Quartulli

Similarly to NLA_POLICY_MIN_LEN, NLA_POLICY_MAX_LEN defines a policy
with a maximum length value.

The netlink generator for YAML specs has been extended accordingly.

Signed-off-by: Antonio Quartulli 
Reviewed-by: Donald Hunter 
---
 include/net/netlink.h  | 1 +
 tools/net/ynl/ynl-gen-c.py | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 
db6af207287c839408c58cb28b82408e0548eaca..2dc671c977ff3297975269d236264907009703d3
 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -469,6 +469,7 @@ struct nla_policy {
.max = _len \
 }
 #define NLA_POLICY_MIN_LEN(_len)   NLA_POLICY_MIN(NLA_BINARY, _len)
+#define NLA_POLICY_MAX_LEN(_len)   NLA_POLICY_MAX(NLA_BINARY, _len)
 
 /**
  * struct nl_info - netlink source information
diff --git a/tools/net/ynl/ynl-gen-c.py b/tools/net/ynl/ynl-gen-c.py
index 
1a825b4081b222cf97eb73f01a2a5c1ffe47cd5c..aa22eb0924754f38ea0b9e68a1ff5a55d94d6717
 100755
--- a/tools/net/ynl/ynl-gen-c.py
+++ b/tools/net/ynl/ynl-gen-c.py
@@ -481,7 +481,7 @@ class TypeBinary(Type):
 pass
 elif len(self.checks) == 1:
 check_name = list(self.checks)[0]
-if check_name not in {'exact-len', 'min-len'}:
+if check_name not in {'exact-len', 'min-len', 'max-len'}:
 raise Exception('Unsupported check for binary type: ' + 
check_name)
 else:
 raise Exception('More than one check for binary type not 
implemented, yet')
@@ -492,6 +492,8 @@ class TypeBinary(Type):
 mem = 'NLA_POLICY_EXACT_LEN(' + self.get_limit_str('exact-len') + 
')'
 elif 'min-len' in self.checks:
 mem = '{ .len = ' + self.get_limit_str('min-len') + ', }'
+elif 'max-len' in self.checks:
+mem = 'NLA_POLICY_MAX_LEN(' + self.get_limit_str('max-len') + ')'
 
 return mem
 

-- 
2.45.2

[PATCH net-next v11 23/23] testing/selftests: add test tool and scripts for ovpn module

2024-10-29 Thread Antonio Quartulli

The ovpn-cli tool can be compiled and used as selftest for the ovpn
kernel module.

[NOTE: it depends on libmedtls for decoding base64-encoded keys]

ovpn-cli implements the netlink and RTNL APIs and can thus be integrated
in any script for more automated testing.

Along with the tool, 4 scripts are provided that perform basic
functionality tests by means of network namespaces.
These scripts take part to the kselftest automation.

The output of the tool, which will appear in the kselftest
reports, is a list of steps performed by the scripts plus some
output coming from the execution of `ping` and `iperf`.
In general it is useful only in case of failure, in order to
understand which step has failed and why.

Cc: linux-kselft...@vger.kernel.org
Signed-off-by: Antonio Quartulli 
Reviewed-by: Shuah Khan 
---
 MAINTAINERS|1 +
 tools/testing/selftests/Makefile   |1 +
 tools/testing/selftests/net/ovpn/.gitignore|2 +
 tools/testing/selftests/net/ovpn/Makefile  |   17 +
 tools/testing/selftests/net/ovpn/config|   10 +
 tools/testing/selftests/net/ovpn/data64.key|5 +
 tools/testing/selftests/net/ovpn/ovpn-cli.c| 2370 
 tools/testing/selftests/net/ovpn/tcp_peers.txt |5 +
 .../testing/selftests/net/ovpn/test-chachapoly.sh  |9 +
 tools/testing/selftests/net/ovpn/test-float.sh |9 +
 tools/testing/selftests/net/ovpn/test-tcp.sh   |9 +
 tools/testing/selftests/net/ovpn/test.sh   |  183 ++
 tools/testing/selftests/net/ovpn/udp_peers.txt |5 +
 13 files changed, 2626 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 
cf3d55c3e98aaea8f8817faed99dd7499cd59a71..110485aec73ae5bfeef4f228490ed76e28e01870
 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17295,6 +17295,7 @@ T:  git 
https://github.com/OpenVPN/linux-kernel-ovpn.git
 F: Documentation/netlink/specs/ovpn.yaml
 F: drivers/net/ovpn/
 F: include/uapi/linux/ovpn.h
+F: tools/testing/selftests/net/ovpn/
 
 OPENVSWITCH
 M: Pravin B Shelar 
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 
363d031a16f7e14152c904e6b68dab1f90c98392..be42906ecb11d4b0f9866d2c04b0e8fb27a2b995
 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -68,6 +68,7 @@ TARGETS += net/hsr
 TARGETS += net/mptcp
 TARGETS += net/netfilter
 TARGETS += net/openvswitch
+TARGETS += net/ovpn
 TARGETS += net/packetdrill
 TARGETS += net/rds
 TARGETS += net/tcp_ao
diff --git a/tools/testing/selftests/net/ovpn/.gitignore 
b/tools/testing/selftests/net/ovpn/.gitignore
new file mode 100644
index 
..ee44c081ca7c089933659689303c303a9fa9713b
--- /dev/null
+++ b/tools/testing/selftests/net/ovpn/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0+
+ovpn-cli
diff --git a/tools/testing/selftests/net/ovpn/Makefile 
b/tools/testing/selftests/net/ovpn/Makefile
new file mode 100644
index 
..c76d8fd953c5674941c8c2787813063b1bce180f
--- /dev/null
+++ b/tools/testing/selftests/net/ovpn/Makefile
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2020-2024 OpenVPN, Inc.
+#
+CFLAGS = -pedantic -Wextra -Wall -Wl,--no-as-needed -g -O0 -ggdb 
$(KHDR_INCLUDES)
+CFLAGS += $(shell pkg-config --cflags libnl-3.0 libnl-genl-3.0)
+
+LDFLAGS = -lmbedtls -lmbedcrypto
+LDFLAGS += $(shell pkg-config --libs libnl-3.0 libnl-genl-3.0)
+
+TEST_PROGS = test.sh \
+   test-chachapoly.sh \
+   test-tcp.sh \
+   test-float.sh
+
+TEST_GEN_FILES = ovpn-cli
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/net/ovpn/config 
b/tools/testing/selftests/net/ovpn/config
new file mode 100644
index 
..71946ba9fa175c191725e369eb9b973503d9d9c4
--- /dev/null
+++ b/tools/testing/selftests/net/ovpn/config
@@ -0,0 +1,10 @@
+CONFIG_NET=y
+CONFIG_INET=y
+CONFIG_STREAM_PARSER=y
+CONFIG_NET_UDP_TUNNEL=y
+CONFIG_DST_CACHE=y
+CONFIG_CRYPTO=y
+CONFIG_CRYPTO_AES=y
+CONFIG_CRYPTO_GCM=y
+CONFIG_CRYPTO_CHACHA20POLY1305=y
+CONFIG_OVPN=m
diff --git a/tools/testing/selftests/net/ovpn/data64.key 
b/tools/testing/selftests/net/ovpn/data64.key
new file mode 100644
index 
..a99e88c4e290f58b12f399b857b873f308d9ba09
--- /dev/null
+++ b/tools/testing/selftests/net/ovpn/data64.key
@@ -0,0 +1,5 @@
+jRqMACN7d7/aFQNT8S7jkrBD8uwrgHbG5OQZP2eu4R1Y7tfpS2bf5RHv06Vi163CGoaIiTX99R3B
+ia9ycAH8Wz1+9PWv51dnBLur9jbShlgZ2QHLtUc4a/gfT7zZwULXuuxdLnvR21DDeMBaTbkgbai9
+uvAa7ne1liIgGFzbv+Bas4HDVrygxIxuAnP5Qgc3648IJkZ0QEXPF+O9f0n5+QIvGCxkAUVx+5K6
+KIs+SoeWXnAopELmoGSjUpFtJbagXK82HfdqpuUxT2Tnuef0/14SzVE/vNleBNu2ZbyrSAaah8tE
+BofkPJUBFY+YQcfZNM5Dgrw3i+Bpmpq/gpdg5w==
diff --git a/tools/testing/selftests/net/ovpn/ovpn-cli.c 
b/tools/testing/selftests/net/ovpn/ovpn-cli.c
new file mode 100644
index 
00

RE: [PATCH v5 11/13] Documentation: userspace-api: iommufd: Update vDEVICE

2024-10-29 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Saturday, October 26, 2024 7:51 AM
> 
> With the introduction of the new object and its infrastructure, update the
> doc and the vIOMMU graph to reflect that.
> 
> Reviewed-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian

Re: [PATCH v2 3/3] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.

2024-10-29 Thread Sebastian Andrzej Siewior

On 2024-10-28 15:01:55 [+0100], Frederic Weisbecker wrote:
> > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > index 457151f9f263d..9637af78087f3 100644
> > --- a/include/linux/interrupt.h
> > +++ b/include/linux/interrupt.h
> > @@ -616,6 +616,50 @@ extern void __raise_softirq_irqoff(unsigned int nr);
> >  extern void raise_softirq_irqoff(unsigned int nr);
> >  extern void raise_softirq(unsigned int nr);
> >  
> > +/*
> > + * Handle timers in a dedicated thread at a low SCHED_FIFO priority 
> > instead in
> > + * ksoftirqd as to be prefred over SCHED_NORMAL tasks.
> > + */
> 
> This doesn't parse. How about, inspired by your changelog:
…

What about this essay instead:

| With forced-threaded interrupts enabled a raised softirq is deferred to
| ksoftirqd unless it can be handled within the threaded interrupt. This
| affects timer_list timers and hrtimers which are explicitly marked with
| HRTIMER_MODE_SOFT.
| With PREEMPT_RT enabled more hrtimers are moved to softirq for processing
| which includes all timers which are not explicitly marked HRTIMER_MODE_HARD.
| Userspace controlled timers (like the clock_nanosleep() interface) is divided
| into two categories: Tasks with elevated scheduling policy including
| SCHED_{FIFO|RR|DL} and the remaining scheduling policy. The tasks with the
| elevated scheduling policy are woken up directly from the HARDIRQ while all
| other wake ups are delayed to so softirq and so to ksoftirqd.
|
| The ksoftirqd runs at SCHED_OTHER policy at which it should remain since it
| handles the softirq in an overloaded situation (not handled everything
| within its last run).
| If the timers are handled at SCHED_OTHER priority then they competes with all
| other SCHED_OTHER tasks for CPU resources are possibly delayed.
| Moving timers softirqs to a low priority SCHED_FIFO thread instead ensures
| that timer are performed before scheduling any SCHED_OTHER thread.

And with this piece of text I convinced myself to also enable this in
the forced-threaded case.

> Thanks.

Sebastian

RE: [PATCH v5 12/13] iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate

2024-10-29 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Saturday, October 26, 2024 7:51 AM
> 
> Implement the vIOMMU's cache_invalidate op for user space to invalidate
> the
> IOTLB entries, Device ATS and CD entries that are still cached by hardware.
> 
> Add struct iommu_viommu_arm_smmuv3_invalidate defining invalidation

there are three 'iommu' in the name. What about:

arm_vsmmuv3_invalidate?

RE: [PATCH v5 09/13] iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command

2024-10-29 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Saturday, October 26, 2024 7:51 AM
> 
> Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's
> iotlb,
> IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's
> cache.
> 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian

RE: [PATCH v5 00/13] iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE)

2024-10-29 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Monday, October 28, 2024 10:17 PM
> 
> > > to
> > > a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID
> > > of the device against the physical IOMMU instance. This is essential for a
> > > vIOMMU-based invalidation, where the request contains a device's vID
> for a
> > > device cache flush, e.g. ATC invalidation.
> >
> > probably connect this to vCMDQ passthrough? otherwise for sw-based
> > invalidation the userspace can always replace vID with pID before
> > submitting the request.
> 
> You can't just do that, the ID in the invalidation command has to be
> validated by the kernel.

sure the ID must be validated to match the iommufd_device but not
exactly going through a vID indirectly.

> 
> At that point you may as well just use the vID instead of inventing a
> new means to validate raw pIDs.

w/o VCMDQ stuff validating raw pID sounds the natural way while
vID is more like a new means and not mandatory.

I'm fine with this design but just didn't feel the above description
is accurate.

RE: [PATCH v5 03/13] iommu/viommu: Add cache_invalidate to iommufd_viommu_ops

2024-10-29 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Saturday, October 26, 2024 7:51 AM
> 
> This per-vIOMMU cache_invalidate op is like the cache_invalidate_user op
> in struct iommu_domain_ops, but wider, supporting device cache (e.g. PCI
> ATC invaldiations).
> 
> Reviewed-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian

[PATCH net-next v11 13/23] ovpn: implement multi-peer support

2024-10-29 Thread Antonio Quartulli

With this change an ovpn instance will be able to stay connected to
multiple remote endpoints.

This functionality is strictly required when running ovpn on an
OpenVPN server.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/main.c   |  55 +-
 drivers/net/ovpn/ovpnstruct.h |  19 +
 drivers/net/ovpn/peer.c   | 166 --
 drivers/net/ovpn/peer.h   |   9 +++
 4 files changed, 243 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
0488e395eb27d3dba1efc8ff39c023e0ac4a38dd..c7453127ab640d7268c1ce919a87cc5419fac9ee
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -30,6 +30,9 @@
 
 static void ovpn_struct_free(struct net_device *net)
 {
+   struct ovpn_struct *ovpn = netdev_priv(net);
+
+   kfree(ovpn->peers);
 }
 
 static int ovpn_net_init(struct net_device *dev)
@@ -133,12 +136,52 @@ static void ovpn_setup(struct net_device *dev)
SET_NETDEV_DEVTYPE(dev, &ovpn_type);
 }
 
+static int ovpn_mp_alloc(struct ovpn_struct *ovpn)
+{
+   struct in_device *dev_v4;
+   int i;
+
+   if (ovpn->mode != OVPN_MODE_MP)
+   return 0;
+
+   dev_v4 = __in_dev_get_rtnl(ovpn->dev);
+   if (dev_v4) {
+   /* disable redirects as Linux gets confused by ovpn
+* handling same-LAN routing.
+* This happens because a multipeer interface is used as
+* relay point between hosts in the same subnet, while
+* in a classic LAN this would not be needed because the
+* two hosts would be able to talk directly.
+*/
+   IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
+   IPV4_DEVCONF_ALL(dev_net(ovpn->dev), SEND_REDIRECTS) = false;
+   }
+
+   /* the peer container is fairly large, therefore we allocate it only in
+* MP mode
+*/
+   ovpn->peers = kzalloc(sizeof(*ovpn->peers), GFP_KERNEL);
+   if (!ovpn->peers)
+   return -ENOMEM;
+
+   spin_lock_init(&ovpn->peers->lock);
+
+   for (i = 0; i < ARRAY_SIZE(ovpn->peers->by_id); i++) {
+   INIT_HLIST_HEAD(&ovpn->peers->by_id[i]);
+   INIT_HLIST_NULLS_HEAD(&ovpn->peers->by_vpn_addr[i], i);
+   INIT_HLIST_NULLS_HEAD(&ovpn->peers->by_transp_addr[i], i);
+   }
+
+   return 0;
+}
+
 static int ovpn_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
 {
struct ovpn_struct *ovpn = netdev_priv(dev);
enum ovpn_mode mode = OVPN_MODE_P2P;
+   int err;
 
if (data && data[IFLA_OVPN_MODE]) {
mode = nla_get_u8(data[IFLA_OVPN_MODE]);
@@ -149,6 +192,10 @@ static int ovpn_newlink(struct net *src_net, struct 
net_device *dev,
ovpn->mode = mode;
spin_lock_init(&ovpn->lock);
 
+   err = ovpn_mp_alloc(ovpn);
+   if (err < 0)
+   return err;
+
/* turn carrier explicitly off after registration, this way state is
 * clearly defined
 */
@@ -197,8 +244,14 @@ static int ovpn_netdev_notifier_call(struct notifier_block 
*nb,
netif_carrier_off(dev);
ovpn->registered = false;
 
-   if (ovpn->mode == OVPN_MODE_P2P)
+   switch (ovpn->mode) {
+   case OVPN_MODE_P2P:
ovpn_peer_release_p2p(ovpn);
+   break;
+   case OVPN_MODE_MP:
+   ovpn_peers_free(ovpn);
+   break;
+   }
break;
case NETDEV_POST_INIT:
case NETDEV_GOING_DOWN:
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
index 
4a48fc048890ab1cda78bc104fe3034b4a49d226..12ed5e22c2108c9f143d1984048eb40c887cac63
 100644
--- a/drivers/net/ovpn/ovpnstruct.h
+++ b/drivers/net/ovpn/ovpnstruct.h
@@ -15,6 +15,23 @@
 #include 
 #include 
 
+/**
+ * struct ovpn_peer_collection - container of peers for MultiPeer mode
+ * @by_id: table of peers index by ID
+ * @by_vpn_addr: table of peers indexed by VPN IP address (items can be
+ *  rehashed on the fly due to peer IP change)
+ * @by_transp_addr: table of peers indexed by transport address (items can be
+ * rehashed on the fly due to peer IP change)
+ * @lock: protects writes to peer tables
+ */
+struct ovpn_peer_collection {
+   DECLARE_HASHTABLE(by_id, 12);
+   struct hlist_nulls_head by_vpn_addr[1 << 12];
+   struct hlist_nulls_head by_transp_addr[1 << 12];
+
+   spinlock_t lock; /* protects writes to peer tables */
+};
+
 /**
  * struct ovpn_struct - per ovpn interface state
  * @dev: the actual netdev representing the tunnel
@@ -22,6 +39,7 @@
  * @registered: whether dev is still registered with netdev or not
  * @mode: device operation mo

[PATCH net-next v11 17/23] ovpn: add support for peer floating

2024-10-29 Thread Antonio Quartulli

A peer connected via UDP may change its IP address without reconnecting
(float).

Add support for detecting and updating the new peer IP/port in case of
floating.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/bind.c |  10 ++--
 drivers/net/ovpn/io.c   |   9 
 drivers/net/ovpn/peer.c | 129 ++--
 drivers/net/ovpn/peer.h |   2 +
 4 files changed, 139 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
index 
b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a..d17d078c5730bf4336dc87f45cdba3f6b8cad770
 100644
--- a/drivers/net/ovpn/bind.c
+++ b/drivers/net/ovpn/bind.c
@@ -47,12 +47,8 @@ struct ovpn_bind *ovpn_bind_from_sockaddr(const struct 
sockaddr_storage *ss)
  * @new: the new bind to assign
  */
 void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new)
+   __must_hold(&peer->lock)
 {
-   struct ovpn_bind *old;
-
-   spin_lock_bh(&peer->lock);
-   old = rcu_replace_pointer(peer->bind, new, true);
-   spin_unlock_bh(&peer->lock);
-
-   kfree_rcu(old, rcu);
+   kfree_rcu(rcu_replace_pointer(peer->bind, new,
+ lockdep_is_held(&peer->lock)), rcu);
 }
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret)
/* keep track of last received authenticated packet for keepalive */
peer->last_recv = ktime_get_real_seconds();
 
+   if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
+   /* check if this peer changed it's IP address and update
+* state
+*/
+   ovpn_peer_float(peer, skb);
+   /* update source endpoint for this peer */
+   ovpn_peer_update_local_endpoint(peer, skb);
+   }
+
/* point to encapsulated IP packet */
__skb_pull(skb, payload_offset);
 
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
3f67d200e283213fcb732d10f9edeb53e0a0e9ee..da6215bbb643592e4567e61e4b4976d367ed109c
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -94,6 +94,131 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, 
u32 id)
return peer;
 }
 
+/**
+ * ovpn_peer_reset_sockaddr - recreate binding for peer
+ * @peer: peer to recreate the binding for
+ * @ss: sockaddr to use as remote endpoint for the binding
+ * @local_ip: local IP for the binding
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
+   const struct sockaddr_storage *ss,
+   const u8 *local_ip)
+   __must_hold(&peer->lock)
+{
+   struct ovpn_bind *bind;
+   size_t ip_len;
+
+   /* create new ovpn_bind object */
+   bind = ovpn_bind_from_sockaddr(ss);
+   if (IS_ERR(bind))
+   return PTR_ERR(bind);
+
+   if (local_ip) {
+   if (ss->ss_family == AF_INET) {
+   ip_len = sizeof(struct in_addr);
+   } else if (ss->ss_family == AF_INET6) {
+   ip_len = sizeof(struct in6_addr);
+   } else {
+   netdev_dbg(peer->ovpn->dev, "%s: invalid family for 
remote endpoint\n",
+  __func__);
+   kfree(bind);
+   return -EINVAL;
+   }
+
+   memcpy(&bind->local, local_ip, ip_len);
+   }
+
+   /* set binding */
+   ovpn_bind_reset(peer, bind);
+
+   return 0;
+}
+
+#define ovpn_get_hash_head(_tbl, _key, _key_len) ({\
+   typeof(_tbl) *__tbl = &(_tbl);  \
+   (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \
+
+/**
+ * ovpn_peer_float - update remote endpoint for peer
+ * @peer: peer to update the remote endpoint for
+ * @skb: incoming packet to retrieve the source address (remote) from
+ */
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+   struct hlist_nulls_head *nhead;
+   struct sockaddr_storage ss;
+   const u8 *local_ip = NULL;
+   struct sockaddr_in6 *sa6;
+   struct sockaddr_in *sa;
+   struct ovpn_bind *bind;
+   sa_family_t family;
+   size_t salen;
+
+   rcu_read_lock();
+   bind = rcu_dereference(peer->bind);
+   if (unlikely(!bind)) {
+   rcu_read_unlock();
+   return;
+   }
+
+   spin_lock_bh(&peer->lock);
+   if (likely(ovpn_bind_skb_src_match(bind, skb)))
+   goto unlock;
+
+   family = skb_protocol_to_family(skb);
+
+   if (bind->remote.in4.sin_family == family)
+   local_ip = (u8 *)&bind->local;
+
+   switch (family) {
+   case AF_INET

[PATCH net-next v11 04/23] ovpn: add basic interface creation/destruction/management routines

2024-10-29 Thread Antonio Quartulli

Add basic infrastructure for handling ovpn interfaces.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/main.c   | 115 --
 drivers/net/ovpn/main.h   |   7 +++
 drivers/net/ovpn/ovpnstruct.h |   8 +++
 drivers/net/ovpn/packet.h |  40 +++
 include/uapi/linux/if_link.h  |  15 ++
 5 files changed, 180 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -10,18 +10,52 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
-#include 
+#include 
 
 #include "ovpnstruct.h"
 #include "main.h"
 #include "netlink.h"
 #include "io.h"
+#include "packet.h"
 
 /* Driver info */
 #define DRV_DESCRIPTION"OpenVPN data channel offload (ovpn)"
 #define DRV_COPYRIGHT  "(C) 2020-2024 OpenVPN, Inc."
 
+static void ovpn_struct_free(struct net_device *net)
+{
+}
+
+static int ovpn_net_open(struct net_device *dev)
+{
+   netif_tx_start_all_queues(dev);
+   return 0;
+}
+
+static int ovpn_net_stop(struct net_device *dev)
+{
+   netif_tx_stop_all_queues(dev);
+   return 0;
+}
+
+static const struct net_device_ops ovpn_netdev_ops = {
+   .ndo_open   = ovpn_net_open,
+   .ndo_stop   = ovpn_net_stop,
+   .ndo_start_xmit = ovpn_net_xmit,
+};
+
+static const struct device_type ovpn_type = {
+   .name = OVPN_FAMILY_NAME,
+};
+
+static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = {
+   [IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P,
+   OVPN_MODE_MP),
+};
+
 /**
  * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn'
  * @dev: the interface to check
@@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev)
return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
 }
 
+static void ovpn_setup(struct net_device *dev)
+{
+   /* compute the overhead considering AEAD encryption */
+   const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 +
+sizeof(struct udphdr) +
+max(sizeof(struct ipv6hdr), sizeof(struct iphdr));
+
+   netdev_features_t feat = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
+NETIF_F_GSO | NETIF_F_GSO_SOFTWARE |
+NETIF_F_HIGHDMA;
+
+   dev->needs_free_netdev = true;
+
+   dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
+
+   dev->netdev_ops = &ovpn_netdev_ops;
+
+   dev->priv_destructor = ovpn_struct_free;
+
+   dev->hard_header_len = 0;
+   dev->addr_len = 0;
+   dev->mtu = ETH_DATA_LEN - overhead;
+   dev->min_mtu = IPV4_MIN_MTU;
+   dev->max_mtu = IP_MAX_MTU - overhead;
+
+   dev->type = ARPHRD_NONE;
+   dev->flags = IFF_POINTOPOINT | IFF_NOARP;
+   dev->priv_flags |= IFF_NO_QUEUE;
+
+   dev->lltx = true;
+   dev->features |= feat;
+   dev->hw_features |= feat;
+   dev->hw_enc_features |= feat;
+
+   dev->needed_headroom = OVPN_HEAD_ROOM;
+   dev->needed_tailroom = OVPN_MAX_PADDING;
+
+   SET_NETDEV_DEVTYPE(dev, &ovpn_type);
+}
+
 static int ovpn_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
 {
-   return -EOPNOTSUPP;
+   struct ovpn_struct *ovpn = netdev_priv(dev);
+   enum ovpn_mode mode = OVPN_MODE_P2P;
+
+   if (data && data[IFLA_OVPN_MODE]) {
+   mode = nla_get_u8(data[IFLA_OVPN_MODE]);
+   netdev_dbg(dev, "setting device mode: %u\n", mode);
+   }
+
+   ovpn->dev = dev;
+   ovpn->mode = mode;
+
+   /* turn carrier explicitly off after registration, this way state is
+* clearly defined
+*/
+   netif_carrier_off(dev);
+
+   return register_netdevice(dev);
 }
 
 static struct rtnl_link_ops ovpn_link_ops = {
.kind = OVPN_FAMILY_NAME,
.netns_refund = false,
+   .priv_size = sizeof(struct ovpn_struct),
+   .setup = ovpn_setup,
+   .policy = ovpn_policy,
+   .maxtype = IFLA_OVPN_MAX,
.newlink = ovpn_newlink,
.dellink = unregister_netdevice_queue,
 };
@@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct notifier_block 
*nb,
 unsigned long state, void *ptr)
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct ovpn_struct *ovpn;
 
if (!ovpn_dev_is_valid(dev))
return NOTIFY_DONE;
 
+   ovpn = netdev_priv(dev);
+
switch (state) {
case NETDEV_REGISTER:
-   /* add device to internal list for later destruction upon
-* unregistration
-*/
+   ovpn-

[PATCH net-next v11 06/23] ovpn: introduce the ovpn_peer object

2024-10-29 Thread Antonio Quartulli

An ovpn_peer object holds the whole status of a remote peer
(regardless whether it is a server or a client).

This includes status for crypto, tx/rx buffers, napi, etc.

Only support for one peer is introduced (P2P mode).
Multi peer support is introduced with a later patch.

Along with the ovpn_peer, also the ovpn_bind object is introcued
as the two are strictly related.
An ovpn_bind object wraps a sockaddr representing the local
coordinates being used to talk to a specific peer.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/Makefile |   2 +
 drivers/net/ovpn/bind.c   |  58 +++
 drivers/net/ovpn/bind.h   | 117 ++
 drivers/net/ovpn/main.c   |  11 ++
 drivers/net/ovpn/main.h   |   2 +
 drivers/net/ovpn/ovpnstruct.h |   4 +
 drivers/net/ovpn/peer.c   | 354 ++
 drivers/net/ovpn/peer.h   |  79 ++
 8 files changed, 627 insertions(+)

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
201dc001419f1d99ae95c0ee0f96e68f8a4eac16..ce13499b3e1775a7f2a9ce16c6cb0aa088f93685
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -7,7 +7,9 @@
 # Author:  Antonio Quartulli 
 
 obj-$(CONFIG_OVPN) := ovpn.o
+ovpn-y += bind.o
 ovpn-y += main.o
 ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
+ovpn-y += peer.o
diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
new file mode 100644
index 
..b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a
--- /dev/null
+++ b/drivers/net/ovpn/bind.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2012-2024 OpenVPN, Inc.
+ *
+ *  Author:James Yonan 
+ * Antonio Quartulli 
+ */
+
+#include 
+#include 
+
+#include "ovpnstruct.h"
+#include "bind.h"
+#include "peer.h"
+
+/**
+ * ovpn_bind_from_sockaddr - retrieve binding matching sockaddr
+ * @ss: the sockaddr to match
+ *
+ * Return: the bind matching the passed sockaddr if found, NULL otherwise
+ */
+struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss)
+{
+   struct ovpn_bind *bind;
+   size_t sa_len;
+
+   if (ss->ss_family == AF_INET)
+   sa_len = sizeof(struct sockaddr_in);
+   else if (ss->ss_family == AF_INET6)
+   sa_len = sizeof(struct sockaddr_in6);
+   else
+   return ERR_PTR(-EAFNOSUPPORT);
+
+   bind = kzalloc(sizeof(*bind), GFP_ATOMIC);
+   if (unlikely(!bind))
+   return ERR_PTR(-ENOMEM);
+
+   memcpy(&bind->remote, ss, sa_len);
+
+   return bind;
+}
+
+/**
+ * ovpn_bind_reset - assign new binding to peer
+ * @peer: the peer whose binding has to be replaced
+ * @new: the new bind to assign
+ */
+void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new)
+{
+   struct ovpn_bind *old;
+
+   spin_lock_bh(&peer->lock);
+   old = rcu_replace_pointer(peer->bind, new, true);
+   spin_unlock_bh(&peer->lock);
+
+   kfree_rcu(old, rcu);
+}
diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h
new file mode 100644
index 
..859213d5040deb36c416eafcf5c6ab31c4d52c7a
--- /dev/null
+++ b/drivers/net/ovpn/bind.h
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2012-2024 OpenVPN, Inc.
+ *
+ *  Author:James Yonan 
+ * Antonio Quartulli 
+ */
+
+#ifndef _NET_OVPN_OVPNBIND_H_
+#define _NET_OVPN_OVPNBIND_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct ovpn_peer;
+
+/**
+ * union ovpn_sockaddr - basic transport layer address
+ * @in4: IPv4 address
+ * @in6: IPv6 address
+ */
+union ovpn_sockaddr {
+   struct sockaddr_in in4;
+   struct sockaddr_in6 in6;
+};
+
+/**
+ * struct ovpn_bind - remote peer binding
+ * @remote: the remote peer sockaddress
+ * @local: local endpoint used to talk to the peer
+ * @local.ipv4: local IPv4 used to talk to the peer
+ * @local.ipv6: local IPv6 used to talk to the peer
+ * @rcu: used to schedule RCU cleanup job
+ */
+struct ovpn_bind {
+   union ovpn_sockaddr remote;  /* remote sockaddr */
+
+   union {
+   struct in_addr ipv4;
+   struct in6_addr ipv6;
+   } local;
+
+   struct rcu_head rcu;
+};
+
+/**
+ * skb_protocol_to_family - translate skb->protocol to AF_INET or AF_INET6
+ * @skb: the packet sk_buff to inspect
+ *
+ * Return: AF_INET, AF_INET6 or 0 in case of unknown protocol
+ */
+static inline unsigned short skb_protocol_to_family(const struct sk_buff *skb)
+{
+   switch (skb->protocol) {
+   case htons(ETH_P_IP):
+   return AF_INET;
+   case htons(ETH_P_IPV6):
+   return AF_INET6;
+   default:
+   return 0;
+   }
+}
+
+/**
+ * ovpn_bind_skb_src_match - match packet source with binding
+ * @bind: the binding to match
+ * @skb: the pack

[PATCH net-next v11 08/23] ovpn: implement basic TX path (UDP)

2024-10-29 Thread Antonio Quartulli

Packets sent over the ovpn interface are processed and transmitted to the
connected peer, if any.

Implementation is UDP only. TCP will be added by a later patch.

Note: no crypto/encapsulation exists yet. packets are just captured and
sent.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/io.c   | 138 +++-
 drivers/net/ovpn/peer.c |  37 +++-
 drivers/net/ovpn/peer.h |   4 +
 drivers/net/ovpn/skb.h  |  51 +++
 drivers/net/ovpn/udp.c  | 232 
 drivers/net/ovpn/udp.h  |   8 ++
 6 files changed, 468 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
ad3813419c33cbdfe7e8ad6f5c8b444a3540a69f..77ba4d33ae0bd2f52e8bd1c06a182d24285297b4
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -9,14 +9,150 @@
 
 #include 
 #include 
+#include 
 
 #include "io.h"
+#include "ovpnstruct.h"
+#include "peer.h"
+#include "udp.h"
+#include "skb.h"
+#include "socket.h"
+
+static void ovpn_encrypt_post(struct sk_buff *skb, int ret)
+{
+   struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
+
+   if (unlikely(ret < 0))
+   goto err;
+
+   skb_mark_not_on_list(skb);
+
+   switch (peer->sock->sock->sk->sk_protocol) {
+   case IPPROTO_UDP:
+   ovpn_udp_send_skb(peer->ovpn, peer, skb);
+   break;
+   default:
+   /* no transport configured yet */
+   goto err;
+   }
+   /* skb passed down the stack - don't free it */
+   skb = NULL;
+err:
+   if (unlikely(skb))
+   dev_core_stats_tx_dropped_inc(peer->ovpn->dev);
+   ovpn_peer_put(peer);
+   kfree_skb(skb);
+}
+
+static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+   ovpn_skb_cb(skb)->peer = peer;
+
+   /* take a reference to the peer because the crypto code may run async.
+* ovpn_encrypt_post() will release it upon completion
+*/
+   if (unlikely(!ovpn_peer_hold(peer))) {
+   DEBUG_NET_WARN_ON_ONCE(1);
+   return false;
+   }
+
+   ovpn_encrypt_post(skb, 0);
+   return true;
+}
+
+/* send skb to connected peer, if any */
+static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb,
+ struct ovpn_peer *peer)
+{
+   struct sk_buff *curr, *next;
+
+   if (likely(!peer))
+   /* retrieve peer serving the destination IP of this packet */
+   peer = ovpn_peer_get_by_dst(ovpn, skb);
+   if (unlikely(!peer)) {
+   net_dbg_ratelimited("%s: no peer to send data to\n",
+   ovpn->dev->name);
+   dev_core_stats_tx_dropped_inc(ovpn->dev);
+   goto drop;
+   }
+
+   /* this might be a GSO-segmented skb list: process each skb
+* independently
+*/
+   skb_list_walk_safe(skb, curr, next)
+   if (unlikely(!ovpn_encrypt_one(peer, curr))) {
+   dev_core_stats_tx_dropped_inc(ovpn->dev);
+   kfree_skb(curr);
+   }
+
+   /* skb passed over, no need to free */
+   skb = NULL;
+drop:
+   if (likely(peer))
+   ovpn_peer_put(peer);
+   kfree_skb_list(skb);
+}
 
 /* Send user data to the network
  */
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
+   struct ovpn_struct *ovpn = netdev_priv(dev);
+   struct sk_buff *segments, *curr, *next;
+   struct sk_buff_head skb_list;
+   __be16 proto;
+   int ret;
+
+   /* reset netfilter state */
+   nf_reset_ct(skb);
+
+   /* verify IP header size in network packet */
+   proto = ovpn_ip_check_protocol(skb);
+   if (unlikely(!proto || skb->protocol != proto)) {
+   net_err_ratelimited("%s: dropping malformed payload packet\n",
+   dev->name);
+   dev_core_stats_tx_dropped_inc(ovpn->dev);
+   goto drop;
+   }
+
+   if (skb_is_gso(skb)) {
+   segments = skb_gso_segment(skb, 0);
+   if (IS_ERR(segments)) {
+   ret = PTR_ERR(segments);
+   net_err_ratelimited("%s: cannot segment packet: %d\n",
+   dev->name, ret);
+   dev_core_stats_tx_dropped_inc(ovpn->dev);
+   goto drop;
+   }
+
+   consume_skb(skb);
+   skb = segments;
+   }
+
+   /* from this moment on, "skb" might be a list */
+
+   __skb_queue_head_init(&skb_list);
+   skb_list_walk_safe(skb, curr, next) {
+   skb_mark_not_on_list(curr);
+
+   curr = skb_share_check(curr, GFP_ATOMIC);
+   if (unlikely(!curr)) {
+   net_err_ratelimited("%s: skb_share_check failed\n",
+   dev->name);
+

[PATCH net-next v11 05/23] ovpn: keep carrier always on

2024-10-29 Thread Antonio Quartulli

An ovpn interface will keep carrier always on and let the user
decide when an interface should be considered disconnected.

This way, even if an ovpn interface is not connected to any peer,
it can still retain all IPs and routes and thus prevent any data
leak.

Signed-off-by: Antonio Quartulli 
Reviewed-by: Andrew Lunn 
---
 drivers/net/ovpn/main.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net)
 
 static int ovpn_net_open(struct net_device *dev)
 {
+   /* ovpn keeps the carrier always on to avoid losing IP or route
+* configuration upon disconnection. This way it can prevent leaks
+* of traffic outside of the VPN tunnel.
+* The user may override this behaviour by tearing down the interface
+* manually.
+*/
+   netif_carrier_on(dev);
netif_tx_start_all_queues(dev);
return 0;
 }

-- 
2.45.2

[PATCH net-next v11 03/23] ovpn: add basic netlink support

2024-10-29 Thread Antonio Quartulli

This commit introduces basic netlink support with family
registration/unregistration functionalities and stub pre/post-doit.

More importantly it introduces the YAML uAPI description along
with its auto-generated files:
- include/uapi/linux/ovpn.h
- drivers/net/ovpn/netlink-gen.c
- drivers/net/ovpn/netlink-gen.h

Cc: donald.hun...@gmail.com
Signed-off-by: Antonio Quartulli 
---
 Documentation/netlink/specs/ovpn.yaml | 362 ++
 MAINTAINERS   |   2 +
 drivers/net/ovpn/Makefile |   2 +
 drivers/net/ovpn/main.c   |  15 +-
 drivers/net/ovpn/netlink-gen.c| 212 
 drivers/net/ovpn/netlink-gen.h|  41 
 drivers/net/ovpn/netlink.c| 157 +++
 drivers/net/ovpn/netlink.h|  15 ++
 drivers/net/ovpn/ovpnstruct.h |  25 +++
 include/uapi/linux/ovpn.h | 109 ++
 10 files changed, 939 insertions(+), 1 deletion(-)

diff --git a/Documentation/netlink/specs/ovpn.yaml 
b/Documentation/netlink/specs/ovpn.yaml
new file mode 100644
index 
..79339c25d607f1b5d15a0a973f6fc23637e158a2
--- /dev/null
+++ b/Documentation/netlink/specs/ovpn.yaml
@@ -0,0 +1,362 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+#
+# Author: Antonio Quartulli 
+#
+# Copyright (c) 2024, OpenVPN Inc.
+#
+
+name: ovpn
+
+protocol: genetlink
+
+doc: Netlink protocol to control OpenVPN network devices
+
+definitions:
+  -
+type: const
+name: nonce-tail-size
+value: 8
+  -
+type: enum
+name: cipher-alg
+entries: [ none, aes-gcm, chacha20-poly1305 ]
+  -
+type: enum
+name: del-peer-reason
+entries: [ teardown, userspace, expired, transport-error, 
transport-disconnect ]
+  -
+type: enum
+name: key-slot
+entries: [ primary, secondary ]
+
+attribute-sets:
+  -
+name: peer
+attributes:
+  -
+name: id
+type: u32
+doc: |
+  The unique ID of the peer. To be used to identify peers during
+  operations
+checks:
+  max: 0xFF
+  -
+name: remote-ipv4
+type: u32
+doc: The remote IPv4 address of the peer
+byte-order: big-endian
+display-hint: ipv4
+  -
+name: remote-ipv6
+type: binary
+doc: The remote IPv6 address of the peer
+display-hint: ipv6
+checks:
+  exact-len: 16
+  -
+name: remote-ipv6-scope-id
+type: u32
+doc: The scope id of the remote IPv6 address of the peer (RFC2553)
+  -
+name: remote-port
+type: u16
+doc: The remote port of the peer
+byte-order: big-endian
+checks:
+  min: 1
+  -
+name: socket
+type: u32
+doc: The socket to be used to communicate with the peer
+  -
+name: vpn-ipv4
+type: u32
+doc: The IPv4 address assigned to the peer by the server
+byte-order: big-endian
+display-hint: ipv4
+  -
+name: vpn-ipv6
+type: binary
+doc: The IPv6 address assigned to the peer by the server
+display-hint: ipv6
+checks:
+  exact-len: 16
+  -
+name: local-ipv4
+type: u32
+doc: The local IPv4 to be used to send packets to the peer (UDP only)
+byte-order: big-endian
+display-hint: ipv4
+  -
+name: local-ipv6
+type: binary
+doc: The local IPv6 to be used to send packets to the peer (UDP only)
+display-hint: ipv6
+checks:
+  exact-len: 16
+  -
+name: local-port
+type: u16
+doc: The local port to be used to send packets to the peer (UDP only)
+byte-order: big-endian
+checks:
+  min: 1
+  -
+name: keepalive-interval
+type: u32
+doc: |
+  The number of seconds after which a keep alive message is sent to the
+  peer
+  -
+name: keepalive-timeout
+type: u32
+doc: |
+  The number of seconds from the last activity after which the peer is
+  assumed dead
+  -
+name: del-reason
+type: u32
+doc: The reason why a peer was deleted
+enum: del-peer-reason
+  -
+name: vpn-rx-bytes
+type: uint
+doc: Number of bytes received over the tunnel
+  -
+name: vpn-tx-bytes
+type: uint
+doc: Number of bytes transmitted over the tunnel
+  -
+name: vpn-rx-packets
+type: uint
+doc: Number of packets received over the tunnel
+  -
+name: vpn-tx-packets
+type: uint
+doc: Number of packets transmitted over the tunnel
+  -
+name: link-rx-bytes
+type: uint
+doc: Number of bytes received at the transport level
+  -
+name: link-tx-bytes

[PATCH v5 2/2] selftests/resctrl: Adjust SNC support messages

2024-10-29 Thread Maciej Wieczor-Retman

Resctrl selftest prints a message on test failure that Sub-Numa
Clustering (SNC) could be enabled and points the user to check their BIOS
settings. No actual check is performed before printing that message so
it is not very accurate in pinpointing a problem.

Figuring out if SNC is enabled is only one part of the problem, the
others being whether the detected SNC mode is reliable and whether the
kernel supports SNC in resctrl.

When there is SNC support for kernel's resctrl subsystem and SNC is
enabled then sub node files are created for each node in the resctrlfs.
The sub node files exist in each regular node's L3 monitoring directory.
The reliable path to check for existence of sub node files is
/sys/fs/resctrl/mon_data/mon_L3_00/mon_sub_L3_00.

To check if SNC detection is reliable one can check the
/sys/devices/system/cpu/offline file. If it's empty, it means all cores
are operational and the ratio should be calculated correctly. If it has
any contents, it means the detected SNC mode can't be trusted and should
be disabled.

Add helpers for all operations mentioned above.

Detect SNC mode once and let other tests inherit that information.

Add messages to alert the user when SNC detection could return incorrect
results. Correct old messages to account for kernel support of SNC in
resctrl.

Signed-off-by: Maciej Wieczor-Retman 
---
Changelog v5:
- Move all resctrlfs.c code from this patch to 1/2. (Reinette)
- Remove kernel support check and error message from CAT since it can't
  be happen.
- Remove snc checks in CAT since snc doesn't affect it here.
- Skip MBM, MBA and CMT tests if snc is unreliable.

Changelog v4:
- Change messages at the end of tests and at the start of
  run_single_test. (Reinette)
- Add messages at the end of CAT since it can also fail due to enabled
  SNC + lack of kernel support.
- Remove snc_mode global variable. (Reinette)
- Fix wrong description of snc_kernel_support(). (Reinette)
- Move call to cpus_offline_empty() into snc_nodes_per_l3_cache() so the
  whole detection flow is in one place as discussed. (Reinette)

Changelog v3:
- Change snc_ways() to snc_nodes_per_l3_cache(). (Reinette)
- Add printing the discovered SNC mode. (Reinette)
- Change method of kernel support discovery from cache sizes to
  existance of sub node files.
- Check if SNC detection is unreliable.
- Move SNC detection to only the first run_single_test() instead on
  error at the end of test runs.
- Add global value to remind user at the end of relevant tests if SNC
  detection was found to be unreliable.
- Redo the patch message after the changes.

Changelog v2:
- Move snc_ways() checks from individual tests into
  snc_kernel_support().
- Write better comment for snc_kernel_support().

 tools/testing/selftests/resctrl/cmt_test.c |  8 ++--
 tools/testing/selftests/resctrl/mba_test.c |  8 +++-
 tools/testing/selftests/resctrl/mbm_test.c | 10 +++---
 tools/testing/selftests/resctrl/resctrl.h  |  3 +++
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cmt_test.c 
b/tools/testing/selftests/resctrl/cmt_test.c
index 0c045080d808..1470bd64d158 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -133,6 +133,10 @@ static int cmt_run_test(const struct resctrl_test *test, 
const struct user_param
ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
if (ret)
return ret;
+
+   if ((get_vendor() == ARCH_INTEL) && snc_unreliable)
+   ksft_exit_skip("Sub-NUMA Clustering could not be detected 
properly. Skipping...\n");
+
ksft_print_msg("Cache size :%lu\n", cache_total_size);
 
count_of_bits = count_bits(long_mask);
@@ -175,8 +179,8 @@ static int cmt_run_test(const struct resctrl_test *test, 
const struct user_param
goto out;
 
ret = check_results(¶m, span, n);
-   if (ret && (get_vendor() == ARCH_INTEL))
-   ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA 
Clustering is enabled. Check BIOS configuration.\n");
+   if (ret && (get_vendor() == ARCH_INTEL) && !snc_kernel_support())
+   ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but 
it is enabled on the system.\n");
 
 out:
free(span_str);
diff --git a/tools/testing/selftests/resctrl/mba_test.c 
b/tools/testing/selftests/resctrl/mba_test.c
index ab8496a4925b..8f4e198da047 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -170,15 +170,21 @@ static int mba_run_test(const struct resctrl_test *test, 
const struct user_param
.setup  = mba_setup,
.measure= mba_measure,
};
-   int ret;
+   int ret, snc_support;
 
remove(RESULT_FILE_NAME);
 
+   snc_support = snc_kernel_support();
+   if ((get_vendor() == ARCH_INTEL) && snc_unreliable)
+   ksft_exit_skip("Sub-NUM

[PATCH v5 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

2024-10-29 Thread Maciej Wieczor-Retman

Sub-NUMA Cluster divides CPUs sharing an L3 cache into separate NUMA
nodes. Systems may support splitting into either two, three or four
nodes.

When SNC mode is enabled the effective amount of L3 cache available
for allocation is divided by the number of nodes per L3.

Detect which SNC mode is active by comparing the number of CPUs
that share a cache with CPU0, with the number of CPUs on node0.

Co-developed-by: Tony Luck 
Signed-off-by: Tony Luck 
Signed-off-by: Maciej Wieczor-Retman 
---
Changelog v5:
- Set snc_mode to 1 at the start of if(!snc_mode) block.
- Move all resctrlfs.c code from patch 2/2 to this one. (Reinette)
- Fix Co-developed-by tag.
- Update SNC discovery comments for the case where it gets disabled by
  being unreliable.
- Remove exclamation mark from ksft_perror() and add full file path
  instead of "offline CPUs file".
- bit map -> bitmap.
- Remove unnecessary empty line.

Changelog v4:
- Make returned value a static local variable so the function only runs
  the logic once. (Reinette)

Changelog v3:
- Add comparison between present and online cpus to test if the
  calculated SNC mode is credible. (Reinette)
- Added comment to cache size modification to better explain why it is
  needed there. (Reinette)
- Fix facts in patch message. (Reinette)
- Change snc_ways() to snc_nodes_per_l3_cache(). (Reinette)

 tools/testing/selftests/resctrl/resctrl.h |   4 +
 .../testing/selftests/resctrl/resctrl_tests.c |   8 +-
 tools/testing/selftests/resctrl/resctrlfs.c   | 132 ++
 3 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/resctrl/resctrl.h 
b/tools/testing/selftests/resctrl/resctrl.h
index 2dda56084588..851b37c9c38a 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +44,8 @@
 
 #define DEFAULT_SPAN   (250 * MB)
 
+#define MAX_SNC4
+
 /*
  * user_params:User supplied parameters
  * @cpu:   CPU number to which the benchmark will be bound to
@@ -120,6 +123,7 @@ extern volatile int *value_sink;
 
 extern char llc_occup_path[1024];
 
+int snc_nodes_per_l3_cache(void);
 int get_vendor(void);
 bool check_resctrlfs_support(void);
 int filter_dmesg(void);
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c 
b/tools/testing/selftests/resctrl/resctrl_tests.c
index ecbb7605a981..4b84d6199a36 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -118,11 +118,17 @@ static bool test_vendor_specific_check(const struct 
resctrl_test *test)
 
 static void run_single_test(const struct resctrl_test *test, const struct 
user_params *uparams)
 {
-   int ret;
+   int ret, snc_mode;
 
if (test->disabled)
return;
 
+   snc_mode = snc_nodes_per_l3_cache();
+   if (snc_mode > 1)
+   ksft_print_msg("SNC-%d mode discovered.\n", snc_mode);
+   else if (snc_unreliable)
+   ksft_print_msg("SNC detection unreliable due to offline CPUs. 
Test results may not be accurate if SNC enabled.\n");
+
if (!test_vendor_specific_check(test)) {
ksft_test_result_skip("Hardware does not support %s\n", 
test->name);
return;
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c 
b/tools/testing/selftests/resctrl/resctrlfs.c
index 250c320349a7..d6384f404d95 100644
--- a/tools/testing/selftests/resctrl/resctrlfs.c
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -13,6 +13,8 @@
 
 #include "resctrl.h"
 
+int snc_unreliable;
+
 static int find_resctrl_mount(char *buffer)
 {
FILE *mounts;
@@ -156,6 +158,93 @@ int get_domain_id(const char *resource, int cpu_no, int 
*domain_id)
return 0;
 }
 
+/*
+ * Count number of CPUs in a /sys bitmap
+ */
+static unsigned int count_sys_bitmap_bits(char *name)
+{
+   FILE *fp = fopen(name, "r");
+   int count = 0, c;
+
+   if (!fp)
+   return 0;
+
+   while ((c = fgetc(fp)) != EOF) {
+   if (!isxdigit(c))
+   continue;
+   switch (c) {
+   case 'f':
+   count++;
+   case '7': case 'b': case 'd': case 'e':
+   count++;
+   case '3': case '5': case '6': case '9': case 'a': case 'c':
+   count++;
+   case '1': case '2': case '4': case '8':
+   count++;
+   }
+   }
+   fclose(fp);
+
+   return count;
+}
+
+static bool cpus_offline_empty(void)
+{
+   char offline_cpus_str[64];
+   FILE *fp;
+
+   fp = fopen("/sys/devices/system/cpu/offline", "r");
+   if (fscanf(fp, "%s", offline_cpus_str) < 0) {
+   if (!errno) {
+   fclose(fp);
+   return 1;
+   }
+

[PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

2024-10-29 Thread Antonio Quartulli

Packets received over the socket are forwarded to the user device.

Implementation is UDP only. TCP will be added by a later patch.

Note: no decryption/decapsulation exists yet, packets are forwarded as
they arrive without much processing.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/io.c |  66 ++-
 drivers/net/ovpn/io.h |   2 +
 drivers/net/ovpn/main.c   |  13 +-
 drivers/net/ovpn/ovpnstruct.h |   3 ++
 drivers/net/ovpn/proto.h  |  75 ++
 drivers/net/ovpn/socket.c |  24 ++
 drivers/net/ovpn/udp.c| 104 +-
 drivers/net/ovpn/udp.h|   3 +-
 8 files changed, 286 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
77ba4d33ae0bd2f52e8bd1c06a182d24285297b4..791a1b117125118b179cb13cdfd5fbab6523a360
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -9,15 +9,79 @@
 
 #include 
 #include 
+#include 
 #include 
 
-#include "io.h"
 #include "ovpnstruct.h"
 #include "peer.h"
+#include "io.h"
+#include "netlink.h"
+#include "proto.h"
 #include "udp.h"
 #include "skb.h"
 #include "socket.h"
 
+/* Called after decrypt to write the IP packet to the device.
+ * This method is expected to manage/free the skb.
+ */
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+   unsigned int pkt_len;
+
+   /* we can't guarantee the packet wasn't corrupted before entering the
+* VPN, therefore we give other layers a chance to check that
+*/
+   skb->ip_summed = CHECKSUM_NONE;
+
+   /* skb hash for transport packet no longer valid after decapsulation */
+   skb_clear_hash(skb);
+
+   /* post-decrypt scrub -- prepare to inject encapsulated packet onto the
+* interface, based on __skb_tunnel_rx() in dst.h
+*/
+   skb->dev = peer->ovpn->dev;
+   skb_set_queue_mapping(skb, 0);
+   skb_scrub_packet(skb, true);
+
+   skb_reset_network_header(skb);
+   skb_reset_transport_header(skb);
+   skb_probe_transport_header(skb);
+   skb_reset_inner_headers(skb);
+
+   memset(skb->cb, 0, sizeof(skb->cb));
+
+   /* cause packet to be "received" by the interface */
+   pkt_len = skb->len;
+   if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
+skb) == NET_RX_SUCCESS))
+   /* update RX stats with the size of decrypted packet */
+   dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
+}
+
+static void ovpn_decrypt_post(struct sk_buff *skb, int ret)
+{
+   struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
+
+   if (unlikely(ret < 0))
+   goto drop;
+
+   ovpn_netdev_write(peer, skb);
+   /* skb is passed to upper layer - don't free it */
+   skb = NULL;
+drop:
+   if (unlikely(skb))
+   dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
+   ovpn_peer_put(peer);
+   kfree_skb(skb);
+}
+
+/* pick next packet from RX queue, decrypt and forward it to the device */
+void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+   ovpn_skb_cb(skb)->peer = peer;
+   ovpn_decrypt_post(skb, 0);
+}
+
 static void ovpn_encrypt_post(struct sk_buff *skb, int ret)
 {
struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
index 
aa259be66441f7b0262f39da12d6c3dce0a9b24c..9667a0a470e0b4b427524fffb5b9b395007e5a2f
 100644
--- a/drivers/net/ovpn/io.h
+++ b/drivers/net/ovpn/io.h
@@ -12,4 +12,6 @@
 
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
 
+void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb);
+
 #endif /* _NET_OVPN_OVPN_H_ */
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
5492ce07751d135c1484fe1ed8227c646df94969..73348765a8cf24321aa6be78e75f607d6dbffb1d
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,7 +33,16 @@ static void ovpn_struct_free(struct net_device *net)
 
 static int ovpn_net_init(struct net_device *dev)
 {
-   return 0;
+   struct ovpn_struct *ovpn = netdev_priv(dev);
+
+   return gro_cells_init(&ovpn->gro_cells, dev);
+}
+
+static void ovpn_net_uninit(struct net_device *dev)
+{
+   struct ovpn_struct *ovpn = netdev_priv(dev);
+
+   gro_cells_destroy(&ovpn->gro_cells);
 }
 
 static int ovpn_net_open(struct net_device *dev)
@@ -56,6 +66,7 @@ static int ovpn_net_stop(struct net_device *dev)
 
 static const struct net_device_ops ovpn_netdev_ops = {
.ndo_init   = ovpn_net_init,
+   .ndo_uninit = ovpn_net_uninit,
.ndo_open   = ovpn_net_open,
.ndo_stop   = ovpn_net_stop,
.ndo_start_xmit = ovpn_net_xmit,
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/ne

[PATCH net-next v11 11/23] ovpn: store tunnel and transport statistics

2024-10-29 Thread Antonio Quartulli

Byte/packet counters for in-tunnel and transport streams
are now initialized and updated as needed.

To be exported via netlink.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/Makefile  |  1 +
 drivers/net/ovpn/crypto_aead.c |  2 ++
 drivers/net/ovpn/io.c  | 11 ++
 drivers/net/ovpn/peer.c|  2 ++
 drivers/net/ovpn/peer.h|  5 +
 drivers/net/ovpn/skb.h |  1 +
 drivers/net/ovpn/stats.c   | 21 +++
 drivers/net/ovpn/stats.h   | 47 ++
 8 files changed, 90 insertions(+)

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
ccdaeced1982c851475657860a005ff2b9dfbd13..d43fda72646bdc7644d9a878b56da0a0e5680c98
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -17,4 +17,5 @@ ovpn-y += netlink-gen.o
 ovpn-y += peer.o
 ovpn-y += pktid.o
 ovpn-y += socket.o
+ovpn-y += stats.o
 ovpn-y += udp.o
diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c
index 
f9e3feb297b19868b1084048933796fcc7a47d6e..072bb0881764752520e8e26e18337c1274ce1aa4
 100644
--- a/drivers/net/ovpn/crypto_aead.c
+++ b/drivers/net/ovpn/crypto_aead.c
@@ -48,6 +48,7 @@ int ovpn_aead_encrypt(struct ovpn_peer *peer, struct 
ovpn_crypto_key_slot *ks,
int nfrags, ret;
u32 pktid, op;
 
+   ovpn_skb_cb(skb)->orig_len = skb->len;
ovpn_skb_cb(skb)->peer = peer;
ovpn_skb_cb(skb)->ks = ks;
 
@@ -159,6 +160,7 @@ int ovpn_aead_decrypt(struct ovpn_peer *peer, struct 
ovpn_crypto_key_slot *ks,
payload_offset = OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE + tag_size;
payload_len = skb->len - payload_offset;
 
+   ovpn_skb_cb(skb)->orig_len = skb->len;
ovpn_skb_cb(skb)->payload_offset = payload_offset;
ovpn_skb_cb(skb)->peer = peer;
ovpn_skb_cb(skb)->ks = ks;
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
4c81c4547d35d2a73f680ef1f5d8853ffbd952e0..d56e74660c7be9020b5bdf7971322d41afd436d6
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ovpnstruct.h"
 #include "peer.h"
@@ -68,6 +69,7 @@ void ovpn_decrypt_post(void *data, int ret)
unsigned int payload_offset = 0;
struct sk_buff *skb = data;
struct ovpn_peer *peer;
+   unsigned int orig_len;
__be16 proto;
__be32 *pid;
 
@@ -80,6 +82,7 @@ void ovpn_decrypt_post(void *data, int ret)
payload_offset = ovpn_skb_cb(skb)->payload_offset;
ks = ovpn_skb_cb(skb)->ks;
peer = ovpn_skb_cb(skb)->peer;
+   orig_len = ovpn_skb_cb(skb)->orig_len;
 
/* crypto is done, cleanup skb CB and its members */
 
@@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret)
goto drop;
}
 
+   /* increment RX stats */
+   ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len);
+   ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len);
+
ovpn_netdev_write(peer, skb);
/* skb is passed to upper layer - don't free it */
skb = NULL;
@@ -175,6 +182,7 @@ void ovpn_encrypt_post(void *data, int ret)
struct ovpn_crypto_key_slot *ks;
struct sk_buff *skb = data;
struct ovpn_peer *peer;
+   unsigned int orig_len;
 
/* encryption is happening asynchronously. This function will be
 * called later by the crypto callback with a proper return value
@@ -184,6 +192,7 @@ void ovpn_encrypt_post(void *data, int ret)
 
ks = ovpn_skb_cb(skb)->ks;
peer = ovpn_skb_cb(skb)->peer;
+   orig_len = ovpn_skb_cb(skb)->orig_len;
 
/* crypto is done, cleanup skb CB and its members */
 
@@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret)
goto err;
 
skb_mark_not_on_list(skb);
+   ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len);
+   ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len);
 
switch (peer->sock->sock->sk->sk_protocol) {
case IPPROTO_UDP:
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
98ae7662f1e76811e625dc5f4b4c5c884856fbd6..5025bfb759d6a5f31e3f2ec094fe561fbdb9f451
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -48,6 +48,8 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 
id)
ovpn_crypto_state_init(&peer->crypto);
spin_lock_init(&peer->lock);
kref_init(&peer->refcount);
+   ovpn_peer_stats_init(&peer->vpn_stats);
+   ovpn_peer_stats_init(&peer->link_stats);
 
ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL);
if (ret < 0) {
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 
754fea470d1b4787f64a931d6c6adc24182fc16f..eb1e31e854fbfff25d07fba8026789e41a76c113
 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -13,6 +13,7 @@
 #include 
 
 #include "crypto.h"
+#include "stats.h"
 
 /**
  * s

[PATCH net-next v11 22/23] ovpn: add basic ethtool support

2024-10-29 Thread Antonio Quartulli

Implement support for basic ethtool functionality.

Note that ovpn is a virtual device driver, therefore
various ethtool APIs are just not meaningful and thus
not implemented.

Signed-off-by: Antonio Quartulli 
Reviewed-by: Andrew Lunn 
---
 drivers/net/ovpn/main.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 
1bd563e3f16f49dd01c897fbe79cbd90f4b8e9aa..9dcf51ae1497dda17d418b762011b04bfd0521df
 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -7,6 +7,7 @@
  * James Yonan 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -96,6 +97,19 @@ bool ovpn_dev_is_valid(const struct net_device *dev)
return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
 }
 
+static void ovpn_get_drvinfo(struct net_device *dev,
+struct ethtool_drvinfo *info)
+{
+   strscpy(info->driver, OVPN_FAMILY_NAME, sizeof(info->driver));
+   strscpy(info->bus_info, "ovpn", sizeof(info->bus_info));
+}
+
+static const struct ethtool_ops ovpn_ethtool_ops = {
+   .get_drvinfo= ovpn_get_drvinfo,
+   .get_link   = ethtool_op_get_link,
+   .get_ts_info= ethtool_op_get_ts_info,
+};
+
 static void ovpn_setup(struct net_device *dev)
 {
/* compute the overhead considering AEAD encryption */
@@ -111,6 +125,7 @@ static void ovpn_setup(struct net_device *dev)
 
dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
 
+   dev->ethtool_ops = &ovpn_ethtool_ops;
dev->netdev_ops = &ovpn_netdev_ops;
 
dev->priv_destructor = ovpn_struct_free;

-- 
2.45.2

[PATCH net-next v11 18/23] ovpn: implement peer add/get/dump/delete via netlink

2024-10-29 Thread Antonio Quartulli

This change introduces the netlink command needed to add, delete and
retrieve/dump known peers. Userspace is expected to use these commands
to handle known peer lifecycles.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/netlink.c | 578 -
 drivers/net/ovpn/peer.c|  48 ++--
 drivers/net/ovpn/peer.h|   5 +
 3 files changed, 609 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 
2cc34eb1d1d870c6705714cb971c3c5dfb04afda..d504445325ef82db04f87367c858adaf025f6297
 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -16,6 +17,10 @@
 #include "io.h"
 #include "netlink.h"
 #include "netlink-gen.h"
+#include "bind.h"
+#include "packet.h"
+#include "peer.h"
+#include "socket.h"
 
 MODULE_ALIAS_GENL_FAMILY(OVPN_FAMILY_NAME);
 
@@ -86,29 +91,592 @@ void ovpn_nl_post_doit(const struct genl_split_ops *ops, 
struct sk_buff *skb,
netdev_put(ovpn->dev, &ovpn->dev_tracker);
 }
 
+static int ovpn_nl_attr_sockaddr_remote(struct nlattr **attrs,
+   struct sockaddr_storage *ss)
+{
+   struct sockaddr_in6 *sin6;
+   struct sockaddr_in *sin;
+   struct in6_addr *in6;
+   __be16 port = 0;
+   __be32 *in;
+   int af;
+
+   ss->ss_family = AF_UNSPEC;
+
+   if (attrs[OVPN_A_PEER_REMOTE_PORT])
+   port = nla_get_be16(attrs[OVPN_A_PEER_REMOTE_PORT]);
+
+   if (attrs[OVPN_A_PEER_REMOTE_IPV4]) {
+   af = AF_INET;
+   ss->ss_family = AF_INET;
+   in = nla_data(attrs[OVPN_A_PEER_REMOTE_IPV4]);
+   } else if (attrs[OVPN_A_PEER_REMOTE_IPV6]) {
+   af = AF_INET6;
+   ss->ss_family = AF_INET6;
+   in6 = nla_data(attrs[OVPN_A_PEER_REMOTE_IPV6]);
+   } else {
+   return AF_UNSPEC;
+   }
+
+   switch (ss->ss_family) {
+   case AF_INET6:
+   /* If this is a regular IPv6 just break and move on,
+* otherwise switch to AF_INET and extract the IPv4 accordingly
+*/
+   if (!ipv6_addr_v4mapped(in6)) {
+   sin6 = (struct sockaddr_in6 *)ss;
+   sin6->sin6_port = port;
+   memcpy(&sin6->sin6_addr, in6, sizeof(*in6));
+   break;
+   }
+
+   /* v4-mapped-v6 address */
+   ss->ss_family = AF_INET;
+   in = &in6->s6_addr32[3];
+   fallthrough;
+   case AF_INET:
+   sin = (struct sockaddr_in *)ss;
+   sin->sin_port = port;
+   sin->sin_addr.s_addr = *in;
+   break;
+   }
+
+   /* don't return ss->ss_family as it may have changed in case of
+* v4-mapped-v6 address
+*/
+   return af;
+}
+
+static u8 *ovpn_nl_attr_local_ip(struct nlattr **attrs)
+{
+   u8 *addr6;
+
+   if (!attrs[OVPN_A_PEER_LOCAL_IPV4] && !attrs[OVPN_A_PEER_LOCAL_IPV6])
+   return NULL;
+
+   if (attrs[OVPN_A_PEER_LOCAL_IPV4])
+   return nla_data(attrs[OVPN_A_PEER_LOCAL_IPV4]);
+
+   addr6 = nla_data(attrs[OVPN_A_PEER_LOCAL_IPV6]);
+   /* this is an IPv4-mapped IPv6 address, therefore extract the actual
+* v4 address from the last 4 bytes
+*/
+   if (ipv6_addr_v4mapped((struct in6_addr *)addr6))
+   return addr6 + 12;
+
+   return addr6;
+}
+
+static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
+struct genl_info *info,
+struct nlattr **attrs)
+{
+   if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
+ OVPN_A_PEER_ID))
+   return -EINVAL;
+
+   if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "cannot specify both remote IPv4 or IPv6 
address");
+   return -EINVAL;
+   }
+
+   if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
+   !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "cannot specify remote port without IP 
address");
+   return -EINVAL;
+   }
+
+   if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
+   attrs[OVPN_A_PEER_LOCAL_IPV4]) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "cannot specify local IPv4 address without 
remote");
+   return -EINVAL;
+   }
+
+   if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
+   attrs[OVPN_A_PEER_LOCAL_IPV6]) {
+   NL_SET_ERR_MSG_MOD(info->extack,
+  "cannot specify local IPV6 address without 
remote");
+   retu

[PATCH net-next v11 19/23] ovpn: implement key add/get/del/swap via netlink

2024-10-29 Thread Antonio Quartulli

This change introduces the netlink commands needed to add, get, delete
and swap keys for a specific peer.

Userspace is expected to use these commands to create, inspect (non
sensible data only), destroy and rotate session keys for a specific
peer.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/crypto.c  |  42 ++
 drivers/net/ovpn/crypto.h  |   4 +
 drivers/net/ovpn/crypto_aead.c |  17 +++
 drivers/net/ovpn/crypto_aead.h |   2 +
 drivers/net/ovpn/netlink.c | 308 -
 5 files changed, 369 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c
index 
f1f7510e2f735e367f96eb4982ba82c9af3c8bfc..cfb014c947b968752ba3dab84ec42dc8ec086379
 100644
--- a/drivers/net/ovpn/crypto.c
+++ b/drivers/net/ovpn/crypto.c
@@ -151,3 +151,45 @@ void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state 
*cs)
 
spin_unlock_bh(&cs->lock);
 }
+
+/**
+ * ovpn_crypto_config_get - populate keyconf object with non-sensible key data
+ * @cs: the crypto state to extract the key data from
+ * @slot: the specific slot to inspect
+ * @keyconf: the output object to populate
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_crypto_config_get(struct ovpn_crypto_state *cs,
+  enum ovpn_key_slot slot,
+  struct ovpn_key_config *keyconf)
+{
+   struct ovpn_crypto_key_slot *ks;
+   int idx;
+
+   switch (slot) {
+   case OVPN_KEY_SLOT_PRIMARY:
+   idx = cs->primary_idx;
+   break;
+   case OVPN_KEY_SLOT_SECONDARY:
+   idx = !cs->primary_idx;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   rcu_read_lock();
+   ks = rcu_dereference(cs->slots[idx]);
+   if (!ks || (ks && !ovpn_crypto_key_slot_hold(ks))) {
+   rcu_read_unlock();
+   return -ENOENT;
+   }
+   rcu_read_unlock();
+
+   keyconf->cipher_alg = ovpn_aead_crypto_alg(ks);
+   keyconf->key_id = ks->key_id;
+
+   ovpn_crypto_key_slot_put(ks);
+
+   return 0;
+}
diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h
index 
3b437d26b531c3034cca5343c755ef9c7ef57276..96fd41f4b81b74f8a3ecfe33ee24ba0122d222fe
 100644
--- a/drivers/net/ovpn/crypto.h
+++ b/drivers/net/ovpn/crypto.h
@@ -136,4 +136,8 @@ void ovpn_crypto_state_release(struct ovpn_crypto_state 
*cs);
 
 void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs);
 
+int ovpn_crypto_config_get(struct ovpn_crypto_state *cs,
+  enum ovpn_key_slot slot,
+  struct ovpn_key_config *keyconf);
+
 #endif /* _NET_OVPN_OVPNCRYPTO_H_ */
diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c
index 
072bb0881764752520e8e26e18337c1274ce1aa4..25e4e4a453b2bc499aec9a192fe3d86ba1aac511
 100644
--- a/drivers/net/ovpn/crypto_aead.c
+++ b/drivers/net/ovpn/crypto_aead.c
@@ -367,3 +367,20 @@ ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config 
*kc)
ovpn_aead_crypto_key_slot_destroy(ks);
return ERR_PTR(ret);
 }
+
+enum ovpn_cipher_alg ovpn_aead_crypto_alg(struct ovpn_crypto_key_slot *ks)
+{
+   const char *alg_name;
+
+   if (!ks->encrypt)
+   return OVPN_CIPHER_ALG_NONE;
+
+   alg_name = crypto_tfm_alg_name(crypto_aead_tfm(ks->encrypt));
+
+   if (!strcmp(alg_name, ALG_NAME_AES))
+   return OVPN_CIPHER_ALG_AES_GCM;
+   else if (!strcmp(alg_name, ALG_NAME_CHACHAPOLY))
+   return OVPN_CIPHER_ALG_CHACHA20_POLY1305;
+   else
+   return OVPN_CIPHER_ALG_NONE;
+}
diff --git a/drivers/net/ovpn/crypto_aead.h b/drivers/net/ovpn/crypto_aead.h
index 
77ee8141599bc06b0dc664c5b0a4dae660a89238..fb65be82436edd7ff89b171f7a89c9103b617d1f
 100644
--- a/drivers/net/ovpn/crypto_aead.h
+++ b/drivers/net/ovpn/crypto_aead.h
@@ -28,4 +28,6 @@ struct ovpn_crypto_key_slot *
 ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc);
 void ovpn_aead_crypto_key_slot_destroy(struct ovpn_crypto_key_slot *ks);
 
+enum ovpn_cipher_alg ovpn_aead_crypto_alg(struct ovpn_crypto_key_slot *ks);
+
 #endif /* _NET_OVPN_OVPNAEAD_H_ */
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 
d504445325ef82db04f87367c858adaf025f6297..fe9377b9b8145784917460cd5f222bc7fae4d8db
 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -18,6 +18,7 @@
 #include "netlink.h"
 #include "netlink-gen.h"
 #include "bind.h"
+#include "crypto.h"
 #include "packet.h"
 #include "peer.h"
 #include "socket.h"
@@ -679,24 +680,323 @@ int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct 
genl_info *info)
return ret;
 }
 
+static int ovpn_nl_get_key_dir(struct genl_info *info, struct nlattr *key,
+  enum ovpn_cipher_alg cipher,
+  struct ovpn_key_direction *dir)
+{
+   struct nlattr *attrs[OVPN_A_KEYDIR_

[PATCH net-next v11 21/23] ovpn: notify userspace when a peer is deleted

2024-10-29 Thread Antonio Quartulli

Whenever a peer is deleted, send a notification to userspace so that it
can react accordingly.

This is most important when a peer is deleted due to ping timeout,
because it all happens in kernelspace and thus userspace has no direct
way to learn about it.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/netlink.c | 55 ++
 drivers/net/ovpn/netlink.h |  1 +
 drivers/net/ovpn/peer.c|  1 +
 3 files changed, 57 insertions(+)

diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 
2b2ba1a810a0e87fb9ffb43b988fa52725a9589b..4d7d835cb47fd1f03d7cdafa2eda9f03065b8024
 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -999,6 +999,61 @@ int ovpn_nl_key_del_doit(struct sk_buff *skb, struct 
genl_info *info)
return 0;
 }
 
+/**
+ * ovpn_nl_peer_del_notify - notify userspace about peer being deleted
+ * @peer: the peer being deleted
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_nl_peer_del_notify(struct ovpn_peer *peer)
+{
+   struct sk_buff *msg;
+   struct nlattr *attr;
+   int ret = -EMSGSIZE;
+   void *hdr;
+
+   netdev_info(peer->ovpn->dev, "deleting peer with id %u, reason %d\n",
+   peer->id, peer->delete_reason);
+
+   msg = nlmsg_new(100, GFP_ATOMIC);
+   if (!msg)
+   return -ENOMEM;
+
+   hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0, OVPN_CMD_PEER_DEL_NTF);
+   if (!hdr) {
+   ret = -ENOBUFS;
+   goto err_free_msg;
+   }
+
+   if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex))
+   goto err_cancel_msg;
+
+   attr = nla_nest_start(msg, OVPN_A_PEER);
+   if (!attr)
+   goto err_cancel_msg;
+
+   if (nla_put_u8(msg, OVPN_A_PEER_DEL_REASON, peer->delete_reason))
+   goto err_cancel_msg;
+
+   if (nla_put_u32(msg, OVPN_A_PEER_ID, peer->id))
+   goto err_cancel_msg;
+
+   nla_nest_end(msg, attr);
+
+   genlmsg_end(msg, hdr);
+
+   genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg,
+   0, OVPN_NLGRP_PEERS, GFP_ATOMIC);
+
+   return 0;
+
+err_cancel_msg:
+   genlmsg_cancel(msg, hdr);
+err_free_msg:
+   nlmsg_free(msg);
+   return ret;
+}
+
 /**
  * ovpn_nl_key_swap_notify - notify userspace peer's key must be renewed
  * @peer: the peer whose key needs to be renewed
diff --git a/drivers/net/ovpn/netlink.h b/drivers/net/ovpn/netlink.h
index 
33390b13c8904d40b629662005a9eb92ff617c3b..4ab3abcf23dba11f6b92e3d69e700693adbc671b
 100644
--- a/drivers/net/ovpn/netlink.h
+++ b/drivers/net/ovpn/netlink.h
@@ -12,6 +12,7 @@
 int ovpn_nl_register(void);
 void ovpn_nl_unregister(void);
 
+int ovpn_nl_peer_del_notify(struct ovpn_peer *peer);
 int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id);
 
 #endif /* _NET_OVPN_NETLINK_H_ */
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
8cfe1997ec116ae4fe74cd7105d228569e2a66a9..91c608f1ffa1d9dd1535ba308b6adc933dbbf1f1
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -242,6 +242,7 @@ void ovpn_peer_release_kref(struct kref *kref)
 {
struct ovpn_peer *peer = container_of(kref, struct ovpn_peer, refcount);
 
+   ovpn_nl_peer_del_notify(peer);
ovpn_peer_release(peer);
 }
 

-- 
2.45.2

[PATCH net-next v11 16/23] ovpn: add support for updating local UDP endpoint

2024-10-29 Thread Antonio Quartulli

In case of UDP links, the local endpoint used to communicate with a
given peer may change without a connection restart.

Add support for learning the new address in case of change.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/peer.c | 45 +
 drivers/net/ovpn/peer.h |  3 +++
 2 files changed, 48 insertions(+)

diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index 
e8a42212af391916b5321e729f7e8a864d0a541f..3f67d200e283213fcb732d10f9edeb53e0a0e9ee
 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -416,6 +416,51 @@ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct 
*ovpn, u32 peer_id)
return peer;
 }
 
+/**
+ * ovpn_peer_update_local_endpoint - update local endpoint for peer
+ * @peer: peer to update the endpoint for
+ * @skb: incoming packet to retrieve the destination address (local) from
+ */
+void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer,
+struct sk_buff *skb)
+{
+   struct ovpn_bind *bind;
+
+   rcu_read_lock();
+   bind = rcu_dereference(peer->bind);
+   if (unlikely(!bind))
+   goto unlock;
+
+   spin_lock_bh(&peer->lock);
+   switch (skb_protocol_to_family(skb)) {
+   case AF_INET:
+   if (unlikely(bind->local.ipv4.s_addr != ip_hdr(skb)->daddr)) {
+   netdev_dbg(peer->ovpn->dev,
+  "%s: learning local IPv4 for peer %d (%pI4 
-> %pI4)\n",
+  __func__, peer->id, &bind->local.ipv4.s_addr,
+  &ip_hdr(skb)->daddr);
+   bind->local.ipv4.s_addr = ip_hdr(skb)->daddr;
+   }
+   break;
+   case AF_INET6:
+   if (unlikely(!ipv6_addr_equal(&bind->local.ipv6,
+ &ipv6_hdr(skb)->daddr))) {
+   netdev_dbg(peer->ovpn->dev,
+  "%s: learning local IPv6 for peer %d (%pI6c 
-> %pI6c\n",
+  __func__, peer->id, &bind->local.ipv6,
+  &ipv6_hdr(skb)->daddr);
+   bind->local.ipv6 = ipv6_hdr(skb)->daddr;
+   }
+   break;
+   default:
+   break;
+   }
+   spin_unlock_bh(&peer->lock);
+
+unlock:
+   rcu_read_unlock();
+}
+
 /**
  * ovpn_peer_get_by_dst - Lookup peer to send skb to
  * @ovpn: the private data representing the current VPN session
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 
952927ae78a3ab753aaf2c6cc6f77121bdac34be..1a8638d266b11a4a80ee2f088394d47a7798c3af
 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -152,4 +152,7 @@ bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, 
struct sk_buff *skb,
 void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 
timeout);
 void ovpn_peer_keepalive_work(struct work_struct *work);
 
+void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer,
+struct sk_buff *skb);
+
 #endif /* _NET_OVPN_OVPNPEER_H_ */

-- 
2.45.2

[PATCH net-next v11 15/23] ovpn: implement keepalive mechanism

2024-10-29 Thread Antonio Quartulli

OpenVPN supports configuring a periodic keepalive packet.
message to allow the remote endpoint detect link failures.

This change implements the keepalive sending and timer expiring logic.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/io.c |  77 +
 drivers/net/ovpn/io.h |   5 ++
 drivers/net/ovpn/main.c   |   3 +
 drivers/net/ovpn/ovpnstruct.h |   2 +
 drivers/net/ovpn/peer.c   | 188 ++
 drivers/net/ovpn/peer.h   |  15 
 drivers/net/ovpn/proto.h  |   2 -
 7 files changed, 290 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
deda19ab87391f86964ba43088b7847d22420eee..63c140138bf98e5d1df79a2565b666d86513323d
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -27,6 +27,33 @@
 #include "skb.h"
 #include "socket.h"
 
+const unsigned char ovpn_keepalive_message[OVPN_KEEPALIVE_SIZE] = {
+   0x2a, 0x18, 0x7b, 0xf3, 0x64, 0x1e, 0xb4, 0xcb,
+   0x07, 0xed, 0x2d, 0x0a, 0x98, 0x1f, 0xc7, 0x48
+};
+
+/**
+ * ovpn_is_keepalive - check if skb contains a keepalive message
+ * @skb: packet to check
+ *
+ * Assumes that the first byte of skb->data is defined.
+ *
+ * Return: true if skb contains a keepalive or false otherwise
+ */
+static bool ovpn_is_keepalive(struct sk_buff *skb)
+{
+   if (*skb->data != ovpn_keepalive_message[0])
+   return false;
+
+   if (skb->len != OVPN_KEEPALIVE_SIZE)
+   return false;
+
+   if (!pskb_may_pull(skb, OVPN_KEEPALIVE_SIZE))
+   return false;
+
+   return !memcmp(skb->data, ovpn_keepalive_message, OVPN_KEEPALIVE_SIZE);
+}
+
 /* Called after decrypt to write the IP packet to the device.
  * This method is expected to manage/free the skb.
  */
@@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret)
goto drop;
}
 
+   /* keep track of last received authenticated packet for keepalive */
+   peer->last_recv = ktime_get_real_seconds();
+
/* point to encapsulated IP packet */
__skb_pull(skb, payload_offset);
 
@@ -121,6 +151,12 @@ void ovpn_decrypt_post(void *data, int ret)
goto drop;
}
 
+   if (ovpn_is_keepalive(skb)) {
+   net_dbg_ratelimited("%s: ping received from peer %u\n",
+   peer->ovpn->dev->name, peer->id);
+   goto drop;
+   }
+
net_info_ratelimited("%s: unsupported protocol received from 
peer %u\n",
 peer->ovpn->dev->name, peer->id);
goto drop;
@@ -221,6 +257,10 @@ void ovpn_encrypt_post(void *data, int ret)
/* no transport configured yet */
goto err;
}
+
+   /* keep track of last sent packet for keepalive */
+   peer->last_sent = ktime_get_real_seconds();
+
/* skb passed down the stack - don't free it */
skb = NULL;
 err:
@@ -361,3 +401,40 @@ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct 
net_device *dev)
kfree_skb_list(skb);
return NET_XMIT_DROP;
 }
+
+/**
+ * ovpn_xmit_special - encrypt and transmit an out-of-band message to peer
+ * @peer: peer to send the message to
+ * @data: message content
+ * @len: message length
+ *
+ * Assumes that caller holds a reference to peer
+ */
+void ovpn_xmit_special(struct ovpn_peer *peer, const void *data,
+  const unsigned int len)
+{
+   struct ovpn_struct *ovpn;
+   struct sk_buff *skb;
+
+   ovpn = peer->ovpn;
+   if (unlikely(!ovpn))
+   return;
+
+   skb = alloc_skb(256 + len, GFP_ATOMIC);
+   if (unlikely(!skb))
+   return;
+
+   skb_reserve(skb, 128);
+   skb->priority = TC_PRIO_BESTEFFORT;
+   __skb_put_data(skb, data, len);
+
+   /* increase reference counter when passing peer to sending queue */
+   if (!ovpn_peer_hold(peer)) {
+   netdev_dbg(ovpn->dev, "%s: cannot hold peer reference for 
sending special packet\n",
+  __func__);
+   kfree_skb(skb);
+   return;
+   }
+
+   ovpn_send(ovpn, skb, peer);
+}
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
index 
ad81dd86924689309b3299573575a1705eddaf99..eb224114152c29f42aadf026212e8d278006b490
 100644
--- a/drivers/net/ovpn/io.h
+++ b/drivers/net/ovpn/io.h
@@ -10,9 +10,14 @@
 #ifndef _NET_OVPN_OVPN_H_
 #define _NET_OVPN_OVPN_H_
 
+#define OVPN_KEEPALIVE_SIZE 16
+extern const unsigned char ovpn_keepalive_message[OVPN_KEEPALIVE_SIZE];
+
 netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
 
 void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb);
+void ovpn_xmit_special(struct ovpn_peer *peer, const void *data,
+  const unsigned int len);
 
 void ovpn_encrypt_post(void *data, int ret);
 void ovpn_decrypt_post(

[PATCH net-next v11 00/23] Introducing OpenVPN Data Channel Offload

2024-10-29 Thread Antonio Quartulli

Notable changes from v10:
* extended commit message of 23/23 with brief description of the output
* Link to v10: 
https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777...@openvpn.net

Please note that some patches were already reviewed by Andre Lunn,
Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag
since no major code modification has happened since the review.

The latest code can also be found at:

https://github.com/OpenVPN/linux-kernel-ovpn

Thanks a lot!
Best Regards,

Antonio Quartulli
OpenVPN Inc.

---
Antonio Quartulli (23):
  netlink: add NLA_POLICY_MAX_LEN macro
  net: introduce OpenVPN Data Channel Offload (ovpn)
  ovpn: add basic netlink support
  ovpn: add basic interface creation/destruction/management routines
  ovpn: keep carrier always on
  ovpn: introduce the ovpn_peer object
  ovpn: introduce the ovpn_socket object
  ovpn: implement basic TX path (UDP)
  ovpn: implement basic RX path (UDP)
  ovpn: implement packet processing
  ovpn: store tunnel and transport statistics
  ovpn: implement TCP transport
  ovpn: implement multi-peer support
  ovpn: implement peer lookup logic
  ovpn: implement keepalive mechanism
  ovpn: add support for updating local UDP endpoint
  ovpn: add support for peer floating
  ovpn: implement peer add/get/dump/delete via netlink
  ovpn: implement key add/get/del/swap via netlink
  ovpn: kill key and notify userspace in case of IV exhaustion
  ovpn: notify userspace when a peer is deleted
  ovpn: add basic ethtool support
  testing/selftests: add test tool and scripts for ovpn module

 Documentation/netlink/specs/ovpn.yaml  |  362 +++
 MAINTAINERS|   11 +
 drivers/net/Kconfig|   14 +
 drivers/net/Makefile   |1 +
 drivers/net/ovpn/Makefile  |   22 +
 drivers/net/ovpn/bind.c|   54 +
 drivers/net/ovpn/bind.h|  117 +
 drivers/net/ovpn/crypto.c  |  214 ++
 drivers/net/ovpn/crypto.h  |  145 ++
 drivers/net/ovpn/crypto_aead.c |  386 
 drivers/net/ovpn/crypto_aead.h |   33 +
 drivers/net/ovpn/io.c  |  462 
 drivers/net/ovpn/io.h  |   25 +
 drivers/net/ovpn/main.c|  337 +++
 drivers/net/ovpn/main.h|   24 +
 drivers/net/ovpn/netlink-gen.c |  212 ++
 drivers/net/ovpn/netlink-gen.h |   41 +
 drivers/net/ovpn/netlink.c | 1135 ++
 drivers/net/ovpn/netlink.h |   18 +
 drivers/net/ovpn/ovpnstruct.h  |   61 +
 drivers/net/ovpn/packet.h  |   40 +
 drivers/net/ovpn/peer.c| 1201 ++
 drivers/net/ovpn/peer.h|  165 ++
 drivers/net/ovpn/pktid.c   |  130 ++
 drivers/net/ovpn/pktid.h   |   87 +
 drivers/net/ovpn/proto.h   |  104 +
 drivers/net/ovpn/skb.h |   56 +
 drivers/net/ovpn/socket.c  |  178 ++
 drivers/net/ovpn/socket.h  |   55 +
 drivers/net/ovpn/stats.c   |   21 +
 drivers/net/ovpn/stats.h   |   47 +
 drivers/net/ovpn/tcp.c |  506 +
 drivers/net/ovpn/tcp.h |   44 +
 drivers/net/ovpn/udp.c |  406 
 drivers/net/ovpn/udp.h |   26 +
 include/net/netlink.h  |1 +
 include/uapi/linux/if_link.h   |   15 +
 include/uapi/linux/ovpn.h  |  109 +
 include/uapi/linux/udp.h   |1 +
 tools/net/ynl/ynl-gen-c.py |4 +-
 tools/testing/selftests/Makefile   |1 +
 tools/testing/selftests/net/ovpn/.gitignore|2 +
 tools/testing/selftests/net/ovpn/Makefile  |   17 +
 tools/testing/selftests/net/ovpn/config|   10 +
 tools/testing/selftests/net/ovpn/data64.key|5 +
 tools/testing/selftests/net/ovpn/ovpn-cli.c| 2370 
 tools/testing/selftests/net/ovpn/tcp_peers.txt |5 +
 .../testing/selftests/net/ovpn/test-chachapoly.sh  |9 +
 tools/testing/selftests/net/ovpn/test-float.sh |9 +
 tools/testing/selftests/net/ovpn/test-tcp.sh   |9 +
 tools/testing/selftests/net/ovpn/test.sh   |  183 ++
 tools/testing/selftests/net/ovpn/udp_peers.txt |5 +
 52 files changed, 9494 insertions(+), 1 deletion(-)
---
base-commit: ab101c553bc1f76a839163d1dc

[PATCH net-next v11 02/23] net: introduce OpenVPN Data Channel Offload (ovpn)

2024-10-29 Thread Antonio Quartulli

OpenVPN is a userspace software existing since around 2005 that allows
users to create secure tunnels.

So far OpenVPN has implemented all operations in userspace, which
implies several back and forth between kernel and user land in order to
process packets (encapsulate/decapsulate, encrypt/decrypt, rerouting..).

With `ovpn` we intend to move the fast path (data channel) entirely
in kernel space and thus improve user measured throughput over the
tunnel.

`ovpn` is implemented as a simple virtual network device driver, that
can be manipulated by means of the standard RTNL APIs. A device of kind
`ovpn` allows only IPv4/6 traffic and can be of type:
* P2P (peer-to-peer): any packet sent over the interface will be
  encapsulated and transmitted to the other side (typical OpenVPN
  client or peer-to-peer behaviour);
* P2MP (point-to-multipoint): packets sent over the interface are
  transmitted to peers based on existing routes (typical OpenVPN
  server behaviour).

After the interface has been created, OpenVPN in userspace can
configure it using a new Netlink API. Specifically it is possible
to manage peers and their keys.

The OpenVPN control channel is multiplexed over the same transport
socket by means of OP codes. Anything that is not DATA_V2 (OpenVPN
OP code for data traffic) is sent to userspace and handled there.
This way the `ovpn` codebase is kept as compact as possible while
focusing on handling data traffic only (fast path).

Any OpenVPN control feature (like cipher negotiation, TLS handshake,
rekeying, etc.) is still fully handled by the userspace process.

When userspace establishes a new connection with a peer, it first
performs the handshake and then passes the socket to the `ovpn` kernel
module, which takes ownership. From this moment on `ovpn` will handle
data traffic for the new peer.
When control packets are received on the link, they are forwarded to
userspace through the same transport socket they were received on, as
userspace is still listening to them.

Some events (like peer deletion) are sent to a Netlink multicast group.

Although it wasn't easy to convince the community, `ovpn` implements
only a limited number of the data-channel features supported by the
userspace program.

Each feature that made it to `ovpn` was attentively vetted to
avoid carrying too much legacy along with us (and to give a clear cut to
old and probalby-not-so-useful features).

Notably, only encryption using AEAD ciphers (specifically
ChaCha20Poly1305 and AES-GCM) was implemented. Supporting any other
cipher out there was not deemed useful.

Both UDP and TCP sockets ae supported.

As explained above, in case of P2MP mode, OpenVPN will use the main system
routing table to decide which packet goes to which peer. This implies
that no routing table was re-implemented in the `ovpn` kernel module.

This kernel module can be enabled by selecting the CONFIG_OVPN entry
in the networking drivers section.

NOTE: this first patch introduces the very basic framework only.
Features are then added patch by patch, however, although each patch
will compile and possibly not break at runtime, only after having
applied the full set it is expected to see the ovpn module fully working.

Cc: steffen.klass...@secunet.com
Cc: antony.ant...@secunet.com
Signed-off-by: Antonio Quartulli 
---
 MAINTAINERS   |   8 
 drivers/net/Kconfig   |  13 ++
 drivers/net/Makefile  |   1 +
 drivers/net/ovpn/Makefile |  11 +
 drivers/net/ovpn/io.c |  22 +
 drivers/net/ovpn/io.h |  15 ++
 drivers/net/ovpn/main.c   | 116 ++
 drivers/net/ovpn/main.h   |  15 ++
 include/uapi/linux/udp.h  |   1 +
 9 files changed, 202 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 
f39ab140710f16b1245924bfe381cd64d499ff8a..09e193bbc218d74846cbae26f80ada3e04c3692a
 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17286,6 +17286,14 @@ F: arch/openrisc/
 F: drivers/irqchip/irq-ompic.c
 F: drivers/irqchip/irq-or1k-*
 
+OPENVPN DATA CHANNEL OFFLOAD
+M: Antonio Quartulli 
+L: openvpn-de...@lists.sourceforge.net (moderated for non-subscribers)
+L: net...@vger.kernel.org
+S: Supported
+T: git https://github.com/OpenVPN/linux-kernel-ovpn.git
+F: drivers/net/ovpn/
+
 OPENVSWITCH
 M: Pravin B Shelar 
 L: net...@vger.kernel.org
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 
1fd5acdc73c6af0e1a861867039c3624fc618e25..269b73fcfd348a48174fb96b8f8d4f8788636fa8
 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -115,6 +115,19 @@ config WIREGUARD_DEBUG
 
  Say N here unless you know what you're doing.
 
+config OVPN
+   tristate "OpenVPN data channel offload"
+   depends on NET && INET
+   select NET_UDP_TUNNEL
+   select DST_CACHE
+   select CRYPTO
+   select CRYPTO_AES
+   select CRYPTO_GCM
+   select CRYPTO_CHACHA20POLY1305
+   help
+ This module enhances the p

[PATCH net-next v11 10/23] ovpn: implement packet processing

2024-10-29 Thread Antonio Quartulli

This change implements encryption/decryption and
encapsulation/decapsulation of OpenVPN packets.

Support for generic crypto state is added along with
a wrapper for the AEAD crypto kernel API.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/Makefile  |   3 +
 drivers/net/ovpn/crypto.c  | 153 +
 drivers/net/ovpn/crypto.h  | 139 
 drivers/net/ovpn/crypto_aead.c | 367 +
 drivers/net/ovpn/crypto_aead.h |  31 
 drivers/net/ovpn/io.c  | 146 ++--
 drivers/net/ovpn/io.h  |   3 +
 drivers/net/ovpn/packet.h  |   2 +-
 drivers/net/ovpn/peer.c|  29 
 drivers/net/ovpn/peer.h|   6 +
 drivers/net/ovpn/pktid.c   | 130 +++
 drivers/net/ovpn/pktid.h   |  87 ++
 drivers/net/ovpn/proto.h   |  31 
 drivers/net/ovpn/skb.h |   4 +
 14 files changed, 1120 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
56bddc9bef83e0befde6af3c3565bb91731d7b22..ccdaeced1982c851475657860a005ff2b9dfbd13
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -8,10 +8,13 @@
 
 obj-$(CONFIG_OVPN) := ovpn.o
 ovpn-y += bind.o
+ovpn-y += crypto.o
+ovpn-y += crypto_aead.o
 ovpn-y += main.o
 ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
 ovpn-y += peer.o
+ovpn-y += pktid.o
 ovpn-y += socket.o
 ovpn-y += udp.o
diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c
new file mode 100644
index 
..f1f7510e2f735e367f96eb4982ba82c9af3c8bfc
--- /dev/null
+++ b/drivers/net/ovpn/crypto.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:James Yonan 
+ * Antonio Quartulli 
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "packet.h"
+#include "pktid.h"
+#include "crypto_aead.h"
+#include "crypto.h"
+
+static void ovpn_ks_destroy_rcu(struct rcu_head *head)
+{
+   struct ovpn_crypto_key_slot *ks;
+
+   ks = container_of(head, struct ovpn_crypto_key_slot, rcu);
+   ovpn_aead_crypto_key_slot_destroy(ks);
+}
+
+void ovpn_crypto_key_slot_release(struct kref *kref)
+{
+   struct ovpn_crypto_key_slot *ks;
+
+   ks = container_of(kref, struct ovpn_crypto_key_slot, refcount);
+   call_rcu(&ks->rcu, ovpn_ks_destroy_rcu);
+}
+
+/* can only be invoked when all peer references have been dropped (i.e. RCU
+ * release routine)
+ */
+void ovpn_crypto_state_release(struct ovpn_crypto_state *cs)
+{
+   struct ovpn_crypto_key_slot *ks;
+
+   ks = rcu_access_pointer(cs->slots[0]);
+   if (ks) {
+   RCU_INIT_POINTER(cs->slots[0], NULL);
+   ovpn_crypto_key_slot_put(ks);
+   }
+
+   ks = rcu_access_pointer(cs->slots[1]);
+   if (ks) {
+   RCU_INIT_POINTER(cs->slots[1], NULL);
+   ovpn_crypto_key_slot_put(ks);
+   }
+}
+
+/* Reset the ovpn_crypto_state object in a way that is atomic
+ * to RCU readers.
+ */
+int ovpn_crypto_state_reset(struct ovpn_crypto_state *cs,
+   const struct ovpn_peer_key_reset *pkr)
+{
+   struct ovpn_crypto_key_slot *old = NULL, *new;
+   u8 idx;
+
+   if (pkr->slot != OVPN_KEY_SLOT_PRIMARY &&
+   pkr->slot != OVPN_KEY_SLOT_SECONDARY)
+   return -EINVAL;
+
+   new = ovpn_aead_crypto_key_slot_new(&pkr->key);
+   if (IS_ERR(new))
+   return PTR_ERR(new);
+
+   spin_lock_bh(&cs->lock);
+   idx = cs->primary_idx;
+   switch (pkr->slot) {
+   case OVPN_KEY_SLOT_PRIMARY:
+   old = rcu_replace_pointer(cs->slots[idx], new,
+ lockdep_is_held(&cs->lock));
+   break;
+   case OVPN_KEY_SLOT_SECONDARY:
+   old = rcu_replace_pointer(cs->slots[!idx], new,
+ lockdep_is_held(&cs->lock));
+   break;
+   }
+   spin_unlock_bh(&cs->lock);
+
+   if (old)
+   ovpn_crypto_key_slot_put(old);
+
+   return 0;
+}
+
+void ovpn_crypto_key_slot_delete(struct ovpn_crypto_state *cs,
+enum ovpn_key_slot slot)
+{
+   struct ovpn_crypto_key_slot *ks = NULL;
+   u8 idx;
+
+   if (slot != OVPN_KEY_SLOT_PRIMARY &&
+   slot != OVPN_KEY_SLOT_SECONDARY) {
+   pr_warn("Invalid slot to release: %u\n", slot);
+   return;
+   }
+
+   spin_lock_bh(&cs->lock);
+   idx = cs->primary_idx;
+   switch (slot) {
+   case OVPN_KEY_SLOT_PRIMARY:
+   ks = rcu_replace_pointer(cs->slots[idx], NULL,
+lockdep_is_held(&cs->lock));
+   break;
+   case OVPN_KEY_SLOT_SECONDARY:
+   ks = rcu_replace_pointer(cs

[PATCH v5 0/2] selftests/resctrl: SNC kernel support discovery

2024-10-29 Thread Maciej Wieczor-Retman

Changes v5:
- Tests are skipped if snc_unreliable was set.
- Moved resctrlfs.c changes from patch 2/2 to 1/2.
- Removed CAT changes since it's not impacted by SNC in the selftest.
- Updated various comments.
- Fixed a bunch of minor issues pointed out in the review.

Changes v4:
- Printing SNC warnings at the start of every test.
- Printing SNC warnings at the end of every relevant test.
- Remove global snc_mode variable, consolidate snc detection functions
  into one.
- Correct minor mistakes.

Changes v3:
- Reworked patch 2.
- Changed minor things in patch 1 like function name and made
  corrections to the patch message.

Changes v2:
- Removed patches 2 and 3 since now this part will be supported by the
  kernel.

Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
into multiple NUMA nodes. When enabled, NUMA-aware applications can
achieve better performance on bigger server platforms.

SNC support in the kernel was merged into x86/cache [1]. With SNC enabled
and kernel support in place all the tests will function normally (aside
from effective cache size). There might be a problem when SNC is enabled
but the system is still using an older kernel version without SNC
support. Currently the only message displayed in that situation is a
guess that SNC might be enabled and is causing issues. That message also
is displayed whenever the test fails on an Intel platform.

Add a mechanism to discover kernel support for SNC which will add more
meaning and certainty to the error message.

Add runtime SNC mode detection and verify how reliable that information
is.

Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
and SNC-4. The tests were also ran with and without kernel support for
SNC.

Series applies cleanly on kselftest/next.

[1] https://lore.kernel.org/all/20240628215619.76401-1-tony.l...@intel.com/

Previous versions of this series:
[v1] 
https://lore.kernel.org/all/cover.1709721159.git.maciej.wieczor-ret...@intel.com/
[v2] 
https://lore.kernel.org/all/cover.1715769576.git.maciej.wieczor-ret...@intel.com/
[v3] 
https://lore.kernel.org/all/cover.1719842207.git.maciej.wieczor-ret...@intel.com/
[v4] 
https://lore.kernel.org/all/cover.1720774981.git.maciej.wieczor-ret...@intel.com/

Maciej Wieczor-Retman (2):
  selftests/resctrl: Adjust effective L3 cache size with SNC enabled
  selftests/resctrl: Adjust SNC support messages

 tools/testing/selftests/resctrl/cmt_test.c|   8 +-
 tools/testing/selftests/resctrl/mba_test.c|   8 +-
 tools/testing/selftests/resctrl/mbm_test.c|  10 +-
 tools/testing/selftests/resctrl/resctrl.h |   7 +
 .../testing/selftests/resctrl/resctrl_tests.c |   8 +-
 tools/testing/selftests/resctrl/resctrlfs.c   | 132 ++
 6 files changed, 166 insertions(+), 7 deletions(-)

-- 
2.46.2

Re: [PATCH V2 4/4] selftests/mm: skip virtual_address_range tests on riscv

2024-10-29 Thread Chunyan Zhang

Hi Andrew，

On Fri, 25 Oct 2024 at 02:00, Palmer Dabbelt  wrote:
>
> On Tue, 08 Oct 2024 02:41:41 PDT (-0700), zhangchun...@iscas.ac.cn wrote:
> > RISC-V doesn't currently have the behavior of restricting the virtual
> > address space which virtual_address_range tests check, this will
> > cause the tests fail. So lets disable the whole test suite for riscv64
> > for now, not build it and run_vmtests.sh will skip it if it is not present.
> >
> > Reviewed-by: Charlie Jenkins 
> > Signed-off-by: Chunyan Zhang 
> > ---
> > V1: https://lore.kernel.org/linux-mm/ZuOuedBpS7i3T%2Fo0@ghost/T/
> > ---
> >  tools/testing/selftests/mm/Makefile   |  2 ++
> >  tools/testing/selftests/mm/run_vmtests.sh | 10 ++
> >  2 files changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/testing/selftests/mm/Makefile 
> > b/tools/testing/selftests/mm/Makefile
> > index 02e1204971b0..76a378c5c141 100644
> > --- a/tools/testing/selftests/mm/Makefile
> > +++ b/tools/testing/selftests/mm/Makefile
> > @@ -115,7 +115,9 @@ endif
> >
> >  ifneq (,$(filter $(ARCH),arm64 mips64 parisc64 powerpc riscv64 s390x 
> > sparc64 x86_64 s390))
> >  TEST_GEN_FILES += va_high_addr_switch
> > +ifneq ($(ARCH),riscv64)
> >  TEST_GEN_FILES += virtual_address_range
> > +endif
> >  TEST_GEN_FILES += write_to_hugetlbfs
> >  endif
> >
> > diff --git a/tools/testing/selftests/mm/run_vmtests.sh 
> > b/tools/testing/selftests/mm/run_vmtests.sh
> > index c5797ad1d37b..4493bfd1911c 100755
> > --- a/tools/testing/selftests/mm/run_vmtests.sh
> > +++ b/tools/testing/selftests/mm/run_vmtests.sh
> > @@ -347,10 +347,12 @@ if [ $VADDR64 -ne 0 ]; then
> >   # allows high virtual address allocation requests independent
> >   # of platform's physical memory.
> >
> > - prev_policy=$(cat /proc/sys/vm/overcommit_memory)
> > - echo 1 > /proc/sys/vm/overcommit_memory
> > - CATEGORY="hugevm" run_test ./virtual_address_range
> > - echo $prev_policy > /proc/sys/vm/overcommit_memory
> > + if [ -x ./virtual_address_range ]; then
> > + prev_policy=$(cat /proc/sys/vm/overcommit_memory)
> > + echo 1 > /proc/sys/vm/overcommit_memory
> > + CATEGORY="hugevm" run_test ./virtual_address_range
> > + echo $prev_policy > /proc/sys/vm/overcommit_memory
> > + fi
> >
> >   # va high address boundary switch test
> >   ARCH_ARM64="arm64"
>
> Acked-by: Palmer Dabbelt 
>
> (I'm taking the first two as they're RISC-V bits)

Could you please pick up the last two through your tree?

Thanks,
Chunyan

[PATCH net-next v11 07/23] ovpn: introduce the ovpn_socket object

2024-10-29 Thread Antonio Quartulli

This specific structure is used in the ovpn kernel module
to wrap and carry around a standard kernel socket.

ovpn takes ownership of passed sockets and therefore an ovpn
specific objects is attached to them for status tracking
purposes.

Initially only UDP support is introduced. TCP will come in a later
patch.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/Makefile |   2 +
 drivers/net/ovpn/socket.c | 120 ++
 drivers/net/ovpn/socket.h |  48 +++
 drivers/net/ovpn/udp.c|  72 
 drivers/net/ovpn/udp.h|  17 +++
 5 files changed, 259 insertions(+)

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index 
ce13499b3e1775a7f2a9ce16c6cb0aa088f93685..56bddc9bef83e0befde6af3c3565bb91731d7b22
 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -13,3 +13,5 @@ ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
 ovpn-y += peer.o
+ovpn-y += socket.o
+ovpn-y += udp.o
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
new file mode 100644
index 
..090a3232ab0ec19702110f1a90f45c7f10889f6f
--- /dev/null
+++ b/drivers/net/ovpn/socket.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:James Yonan 
+ * Antonio Quartulli 
+ */
+
+#include 
+#include 
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "io.h"
+#include "peer.h"
+#include "socket.h"
+#include "udp.h"
+
+static void ovpn_socket_detach(struct socket *sock)
+{
+   if (!sock)
+   return;
+
+   sockfd_put(sock);
+}
+
+/**
+ * ovpn_socket_release_kref - kref_put callback
+ * @kref: the kref object
+ */
+void ovpn_socket_release_kref(struct kref *kref)
+{
+   struct ovpn_socket *sock = container_of(kref, struct ovpn_socket,
+   refcount);
+
+   ovpn_socket_detach(sock->sock);
+   kfree_rcu(sock, rcu);
+}
+
+static bool ovpn_socket_hold(struct ovpn_socket *sock)
+{
+   return kref_get_unless_zero(&sock->refcount);
+}
+
+static struct ovpn_socket *ovpn_socket_get(struct socket *sock)
+{
+   struct ovpn_socket *ovpn_sock;
+
+   rcu_read_lock();
+   ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
+   if (!ovpn_socket_hold(ovpn_sock)) {
+   pr_warn("%s: found ovpn_socket with ref = 0\n", __func__);
+   ovpn_sock = NULL;
+   }
+   rcu_read_unlock();
+
+   return ovpn_sock;
+}
+
+static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
+{
+   int ret = -EOPNOTSUPP;
+
+   if (!sock || !peer)
+   return -EINVAL;
+
+   if (sock->sk->sk_protocol == IPPROTO_UDP)
+   ret = ovpn_udp_socket_attach(sock, peer->ovpn);
+
+   return ret;
+}
+
+/**
+ * ovpn_socket_new - create a new socket and initialize it
+ * @sock: the kernel socket to embed
+ * @peer: the peer reachable via this socket
+ *
+ * Return: an openvpn socket on success or a negative error code otherwise
+ */
+struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer 
*peer)
+{
+   struct ovpn_socket *ovpn_sock;
+   int ret;
+
+   ret = ovpn_socket_attach(sock, peer);
+   if (ret < 0 && ret != -EALREADY)
+   return ERR_PTR(ret);
+
+   /* if this socket is already owned by this interface, just increase the
+* refcounter and use it as expected.
+*
+* Since UDP sockets can be used to talk to multiple remote endpoints,
+* openvpn normally instantiates only one socket and shares it among all
+* its peers. For this reason, when we find out that a socket is already
+* used for some other peer in *this* instance, we can happily increase
+* its refcounter and use it normally.
+*/
+   if (ret == -EALREADY) {
+   /* caller is expected to increase the sock refcounter before
+* passing it to this function. For this reason we drop it if
+* not needed, like when this socket is already owned.
+*/
+   ovpn_sock = ovpn_socket_get(sock);
+   sockfd_put(sock);
+   return ovpn_sock;
+   }
+
+   ovpn_sock = kzalloc(sizeof(*ovpn_sock), GFP_KERNEL);
+   if (!ovpn_sock)
+   return ERR_PTR(-ENOMEM);
+
+   ovpn_sock->ovpn = peer->ovpn;
+   ovpn_sock->sock = sock;
+   kref_init(&ovpn_sock->refcount);
+
+   rcu_assign_sk_user_data(sock->sk, ovpn_sock);
+
+   return ovpn_sock;
+}
diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h
new file mode 100644
index 
..5ad9c5073b085482da95ee8ebf40acf20bf2e4b3
--- /dev/null
+++ b/drivers/net/ovpn/socket.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data chann

Re: [PATCH v2 1/3] rust: kunit: add KUnit case and suite macros

2024-10-29 Thread Alice Ryhl

On Tue, Oct 29, 2024 at 10:24 AM David Gow  wrote:
>
> From: José Expósito 
>
> Add a couple of Rust const functions and macros to allow to develop
> KUnit tests without relying on generated C code:
>
>  - The `kunit_unsafe_test_suite!` Rust macro is similar to the
>`kunit_test_suite` C macro. It requires a NULL-terminated array of
>test cases (see below).
>  - The `kunit_case` Rust function is similar to the `KUNIT_CASE` C macro.
>It generates as case from the name and function.
>  - The `kunit_case_null` Rust function generates a NULL test case, which
>is to be used as delimiter in `kunit_test_suite!`.
>
> While these functions and macros can be used on their own, a future
> patch will introduce another macro to create KUnit tests using a
> user-space like syntax.
>
> Signed-off-by: José Expósito 
> Co-developed-by: Matt Gilbride 
> Signed-off-by: Matt Gilbride 
> Co-developed-by: David Gow 
> Signed-off-by: David Gow 
> ---
>
> Changes since v1:
> https://lore.kernel.org/lkml/20230720-rustbind-v1-1-c80db349e...@google.com/
> - Rebase on top of rust-next
> - As a result, KUnit attributes are new set. These are hardcoded to the
>   defaults of "normal" speed and no module name.
> - Split the kunit_case!() macro into two const functions, kunit_case()
>   and kunit_case_null() (for the NULL terminator).
>
> ---
>  rust/kernel/kunit.rs | 108 +++
>  rust/kernel/lib.rs   |   1 +
>  2 files changed, 109 insertions(+)
>
> diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
> index 824da0e9738a..fc2d259db458 100644
> --- a/rust/kernel/kunit.rs
> +++ b/rust/kernel/kunit.rs
> @@ -161,3 +161,111 @@ macro_rules! kunit_assert_eq {
>  $crate::kunit_assert!($name, $file, $diff, $left == $right);
>  }};
>  }
> +
> +/// Represents an individual test case.
> +///
> +/// The test case should have the signature
> +/// `unsafe extern "C" fn test_case(test: *mut crate::bindings::kunit)`.
> +///
> +/// The `kunit_unsafe_test_suite!` macro expects a NULL-terminated list of 
> test cases.
> +/// Use `kunit_case_null` to generate such a delimeter.
> +const fn kunit_case(name: &kernel::str::CStr, run_case: unsafe extern "C" 
> fn(*mut kernel::bindings::kunit)) -> kernel::bindings::kunit_case {

This should probably say `name: &'static CStr` to require that the
name lives forever.

> +/// Registers a KUnit test suite.
> +///
> +/// # Safety
> +///
> +/// `test_cases` must be a NULL terminated array of test cases.
> +///
> +/// # Examples
> +///
> +/// ```ignore
> +/// unsafe extern "C" fn test_fn(_test: *mut crate::bindings::kunit) {
> +/// let actual = 1 + 1;
> +/// let expected = 2;
> +/// assert_eq!(actual, expected);
> +/// }
> +///
> +/// static mut KUNIT_TEST_CASE: crate::bindings::kunit_case = 
> crate::kunit_case(name, test_fn);
> +/// static mut KUNIT_NULL_CASE: crate::bindings::kunit_case = 
> crate::kunit_case_null();
> +/// static mut KUNIT_TEST_CASES: &mut[crate::bindings::kunit_case] = unsafe {
> +/// &mut[KUNIT_TEST_CASE, KUNIT_NULL_CASE]
> +/// };
> +/// crate::kunit_unsafe_test_suite!(suite_name, KUNIT_TEST_CASES);
> +/// ```
> +#[macro_export]
> +macro_rules! kunit_unsafe_test_suite {
> +($name:ident, $test_cases:ident) => {
> +const _: () = {
> +static KUNIT_TEST_SUITE_NAME: [i8; 256] = {
> +let name_u8 = core::stringify!($name).as_bytes();
> +let mut ret = [0; 256];
> +
> +let mut i = 0;
> +while i < name_u8.len() {
> +ret[i] = name_u8[i] as i8;
> +i += 1;
> +}

I assume the name must be zero-terminated? If so, you probably need to
enforce that somehow, e.g. by failing if `name_u8` is longer than 255
bytes.

> +
> +static mut KUNIT_TEST_SUITE: 
> core::cell::UnsafeCell<$crate::bindings::kunit_suite> =

You don't actually need UnsafeCell here since it's already `static mut`.

> +core::cell::UnsafeCell::new($crate::bindings::kunit_suite {
> +name: KUNIT_TEST_SUITE_NAME,
> +// SAFETY: User is expected to pass a correct 
> `test_cases`, hence this macro
> +// named 'unsafe'.
> +test_cases: unsafe { $test_cases.as_mut_ptr() },
> +suite_init: None,
> +suite_exit: None,
> +init: None,
> +exit: None,
> +attr: $crate::bindings::kunit_attributes {
> +speed: 
> $crate::bindings::kunit_speed_KUNIT_SPEED_NORMAL,
> +},
> +status_comment: [0; 256usize],
> +debugfs: core::ptr::null_mut(),
> +log: core::ptr::null_mut(),
> +suite_init_err: 0,
> +is_init: false,
> +});
> +
> +#[used]
> +#[link_section = ".kunit_test_su

[PATCH net-next v11 20/23] ovpn: kill key and notify userspace in case of IV exhaustion

2024-10-29 Thread Antonio Quartulli

IV wrap-around is cryptographically dangerous for a number of ciphers,
therefore kill the key and inform userspace (via netlink) should the
IV space go exhausted.

Userspace has two ways of deciding when the key has to be renewed before
exhausting the IV space:
1) time based approach:
   after X seconds/minutes userspace generates a new key and sends it
   to the kernel. This is based on guestimate and normally default
   timer value works well.

2) packet count based approach:
   after X packets/bytes userspace generates a new key and sends it to
   the kernel. Userspace keeps track of the amount of traffic by
   periodically polling GET_PEER and fetching the VPN/LINK stats.

Signed-off-by: Antonio Quartulli 
---
 drivers/net/ovpn/crypto.c  | 19 
 drivers/net/ovpn/crypto.h  |  2 ++
 drivers/net/ovpn/io.c  | 13 +++
 drivers/net/ovpn/netlink.c | 55 ++
 drivers/net/ovpn/netlink.h |  2 ++
 5 files changed, 91 insertions(+)

diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c
index 
cfb014c947b968752ba3dab84ec42dc8ec086379..a2346bc630be9b60604282d20a33321c277bc56f
 100644
--- a/drivers/net/ovpn/crypto.c
+++ b/drivers/net/ovpn/crypto.c
@@ -55,6 +55,25 @@ void ovpn_crypto_state_release(struct ovpn_crypto_state *cs)
}
 }
 
+/* removes the key matching the specified id from the crypto context */
+void ovpn_crypto_kill_key(struct ovpn_crypto_state *cs, u8 key_id)
+{
+   struct ovpn_crypto_key_slot *ks = NULL;
+
+   spin_lock_bh(&cs->lock);
+   if (rcu_access_pointer(cs->slots[0])->key_id == key_id) {
+   ks = rcu_replace_pointer(cs->slots[0], NULL,
+lockdep_is_held(&cs->lock));
+   } else if (rcu_access_pointer(cs->slots[1])->key_id == key_id) {
+   ks = rcu_replace_pointer(cs->slots[1], NULL,
+lockdep_is_held(&cs->lock));
+   }
+   spin_unlock_bh(&cs->lock);
+
+   if (ks)
+   ovpn_crypto_key_slot_put(ks);
+}
+
 /* Reset the ovpn_crypto_state object in a way that is atomic
  * to RCU readers.
  */
diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h
index 
96fd41f4b81b74f8a3ecfe33ee24ba0122d222fe..b7a7be752d54f1f8bcd548e0a714511efcaf68a8
 100644
--- a/drivers/net/ovpn/crypto.h
+++ b/drivers/net/ovpn/crypto.h
@@ -140,4 +140,6 @@ int ovpn_crypto_config_get(struct ovpn_crypto_state *cs,
   enum ovpn_key_slot slot,
   struct ovpn_key_config *keyconf);
 
+void ovpn_crypto_kill_key(struct ovpn_crypto_state *cs, u8 key_id);
+
 #endif /* _NET_OVPN_OVPNCRYPTO_H_ */
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 
0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261..c04791a508e5c0ae292b7b5d8098096c676b2f99
 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -248,6 +248,19 @@ void ovpn_encrypt_post(void *data, int ret)
if (likely(ovpn_skb_cb(skb)->req))
aead_request_free(ovpn_skb_cb(skb)->req);
 
+   if (unlikely(ret == -ERANGE)) {
+   /* we ran out of IVs and we must kill the key as it can't be
+* use anymore
+*/
+   netdev_warn(peer->ovpn->dev,
+   "killing key %u for peer %u\n", ks->key_id,
+   peer->id);
+   ovpn_crypto_kill_key(&peer->crypto, ks->key_id);
+   /* let userspace know so that a new key must be negotiated */
+   ovpn_nl_key_swap_notify(peer, ks->key_id);
+   goto err;
+   }
+
if (unlikely(ret < 0))
goto err;
 
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c
index 
fe9377b9b8145784917460cd5f222bc7fae4d8db..2b2ba1a810a0e87fb9ffb43b988fa52725a9589b
 100644
--- a/drivers/net/ovpn/netlink.c
+++ b/drivers/net/ovpn/netlink.c
@@ -999,6 +999,61 @@ int ovpn_nl_key_del_doit(struct sk_buff *skb, struct 
genl_info *info)
return 0;
 }
 
+/**
+ * ovpn_nl_key_swap_notify - notify userspace peer's key must be renewed
+ * @peer: the peer whose key needs to be renewed
+ * @key_id: the ID of the key that needs to be renewed
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id)
+{
+   struct nlattr *k_attr;
+   struct sk_buff *msg;
+   int ret = -EMSGSIZE;
+   void *hdr;
+
+   netdev_info(peer->ovpn->dev, "peer with id %u must rekey - primary key 
unusable.\n",
+   peer->id);
+
+   msg = nlmsg_new(100, GFP_ATOMIC);
+   if (!msg)
+   return -ENOMEM;
+
+   hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0, OVPN_CMD_KEY_SWAP_NTF);
+   if (!hdr) {
+   ret = -ENOBUFS;
+   goto err_free_msg;
+   }
+
+   if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex))
+   goto err_cancel_msg;
+
+   k_att

Re: [PATCH 1/2] kselftest/arm64: Increase frequency of signal delivery in fp-stress

2024-10-29 Thread Mark Brown

On Tue, Oct 29, 2024 at 03:43:37PM +, Mark Rutland wrote:

> On those emulated platforms (FVP?), does this change trigger the faukure
> more often?

Yes.

> I gave this a quick test, and with this change, running fp-stress on a
> defconfig kernel running on 1 CPU triggers the "Bad SVCR: 0" splat in
> 35/100 runs. Hacking that down to 5ms brings that to 89/100 runs. So
> even if we have to keep this high for an emulated platform, it'd
> probably be worth being able to override that as a parameter?

I was getting better numbers than that with the default multi-CPU setups
on my particular machine, most runs were showing something IIRC.  I do
agree that it'd be a useful command line argument to add incrementally.

> Otherwise, maybe it's worth increasing the timeout for the fp-stress
> test specifically? The docs at:

>   https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests

> ... imply that is possible, but don't say exactly how, and it seems
> legitimate for a stress test.

IIRC it's per suite and there's a bunch of pushback on using it in
default configurations since the selftests are expected to run quickly
in other cases where I'd have said it was a reasonable change to make.
Stress tests are not entirely idiomatic for kselftest, it's supposed to
run quickly.

> > +#define SIGNAL_INTERVAL_MS 25
> > +#define LOG_INTERVALS (1000 / SIGNAL_INTERVAL_MS)

> Running this, I see that by default test logs:

>   # Will run for 400s

> ... for a timeout that's actually ~10s, due to the following, which isn't in
> the diff:

>   if (timeout > 0)
>   ksft_print_msg("Will run for %ds\n", timeout);

> ... so that probably wants an update to either convert to seconds or be in
> terms of signals, and likewise for the "timeout remaining" message below.

> Otherwise, this looks good to me.

Oh, yeah - we should probably just remove the unit from that one.  I
never see it due to all the spam from the subprocesses starting.

signature.asc
Description: PGP signature

Re: [PATCH v5 09/13] iommufd/selftest: Add refcount to mock_iommu_device

2024-10-29 Thread Jason Gunthorpe

On Tue, Oct 29, 2024 at 09:02:58AM -0700, Nicolin Chen wrote:
> On Tue, Oct 29, 2024 at 12:34:38PM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 25, 2024 at 04:49:49PM -0700, Nicolin Chen wrote:
> > > For an iommu_dev that can unplug (so far only this selftest does so), the
> > > viommu->iommu_dev pointer has no guarantee of its life cycle after it is
> > > copied from the idev->dev->iommu->iommu_dev.
> > > 
> > > Track the user count of the iommu_dev. Postpone the exit routine using a
> > > completion, if refcount is unbalanced. The refcount inc/dec will be added
> > > in the following patch.
> > > 
> > > Signed-off-by: Nicolin Chen 
> > > ---
> > >  drivers/iommu/iommufd/selftest.c | 32 
> > >  1 file changed, 24 insertions(+), 8 deletions(-)
> > 
> > Reviewed-by: Jason Gunthorpe 
> > 
> > Since this is built into the iommufd module it can't be unloaded
> > without also unloading iommufd, which is impossible as long as any
> > iommufd FDs are open. So I expect that the WARN_ON can never happen.
> 
> Hmm, I assume we still need this patch then?

I was thinking, I think it still is a reasonable example of what it
might look like

You might include the above remark as a comment above the WARN_ON though.

> Could a faulty "--force" possibly trigger it?

I'm not sure, I suspect not?

Jason

Re: [PATCH v5 06/13] iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC

2024-10-29 Thread Jason Gunthorpe

On Tue, Oct 29, 2024 at 09:07:38AM -0700, Nicolin Chen wrote:
> On Tue, Oct 29, 2024 at 12:27:46PM -0300, Jason Gunthorpe wrote:
> > On Mon, Oct 28, 2024 at 07:52:10AM -0700, Nicolin Chen wrote:
> > > On Mon, Oct 28, 2024 at 10:03:09AM -0300, Jason Gunthorpe wrote:
> > > In iommufd_hwpt_paging_alloc(), we reject IOMMU_HWPT_FAULT_ID_VALID:
> > >   const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT |
> > >   IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> > >   ...
> > >   if (flags & ~valid_flags)
> > >   return ERR_PTR(-EOPNOTSUPP);
> > > 
> > > In iommufd_hwpt_nested_alloc(), we mask the flag away:
> > >   if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
> > >   !user_data->len || !ops->domain_alloc_user)
> > >   return ERR_PTR(-EOPNOTSUPP);
> > >   ...
> > >   hwpt->domain = ops->domain_alloc_user(idev->dev,
> > > flags & 
> > > ~IOMMU_HWPT_FAULT_ID_VALID,
> > > parent->common.domain, user_data);
> > > 
> > > Then, in the common function it has a section of
> > >   if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
> > >   ...
> > > 
> > > It seems that this IOMMU_HWPT_FAULT_ID_VALID is for nested domains?
> > 
> > OK, but ARM should be blocking it since it doesn't work there.
> > 
> > I think we made some error here, it should have been passed in flags
> > to the drivers and only intel should have accepted it.
> 
> Trying to limit changes here since two parts are already quite
> large, I think a separate series fixing that would be nicer? 

Yes, let's just make a note

> > This suggests we should send flags down the viommu alloc domain path too.
> 
> Ack. Will pass it in.

But this would be nice to get to

Jason

Re: [PATCH v5 05/13] iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE

2024-10-29 Thread Nicolin Chen

On Tue, Oct 29, 2024 at 04:09:41PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 25, 2024 at 04:50:34PM -0700, Nicolin Chen wrote:
> > @@ -497,17 +497,35 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
> > goto out;
> > }
> >  
> > -   hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id);
> > -   if (IS_ERR(hwpt)) {
> > -   rc = PTR_ERR(hwpt);
> > +   pt_obj = iommufd_get_object(ucmd->ictx, cmd->hwpt_id, IOMMUFD_OBJ_ANY);
> > +   if (IS_ERR(pt_obj)) {
> > +   rc = PTR_ERR(pt_obj);
> > goto out;
> > }
> > +   if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) {
> > +   struct iommufd_hw_pagetable *hwpt =
> > +   container_of(pt_obj, struct iommufd_hw_pagetable, obj);
> > +
> > +   rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain,
> > + &data_array);
> > +   } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) {
> > +   struct iommufd_viommu *viommu =
> > +   container_of(pt_obj, struct iommufd_viommu, obj);
> > +
> > +   if (!viommu->ops || !viommu->ops->cache_invalidate) {
> > +   rc = -EOPNOTSUPP;
> > +   goto out_put_pt;
> > +   }
> > +   rc = viommu->ops->cache_invalidate(viommu, &data_array);
> > +   } else {
> > +   rc = -EINVAL;
> > +   goto out_put_pt;
> > +   }
> 
> Given the test in iommufd_viommu_alloc_hwpt_nested() is:
> 
>   if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED ||
>(!viommu->ops->cache_invalidate &&
> !hwpt->domain->ops->cache_invalidate_user)))
> {
> 
> We will crash if the user passes a viommu allocated domain as
> IOMMUFD_OBJ_HWPT_NESTED since the above doesn't check it.

Ah, that was missed.

> I suggest we put the required if (ops..) -EOPNOTSUPP above and remove
> the ops->cache_invalidate checks from both WARN_ONs.

Ack. I will add hwpt->domain->ops check:
-
if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) {
struct iommufd_hw_pagetable *hwpt =
container_of(pt_obj, struct iommufd_hw_pagetable, obj);

if (!hwpt->domain->ops ||
!hwpt->domain->ops->cache_invalidate_user) {
rc = -EOPNOTSUPP;
goto out_put_pt;
}
rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain,
  &data_array);
} else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) {
struct iommufd_viommu *viommu =
container_of(pt_obj, struct iommufd_viommu, obj);

if (!viommu->ops || !viommu->ops->cache_invalidate) {
rc = -EOPNOTSUPP;
goto out_put_pt;
}
rc = viommu->ops->cache_invalidate(viommu, &data_array);
} else {
-

Thanks
Nicolin

[PATCH net-next v1 1/7] net: page_pool: rename page_pool_alloc_netmem to *_netmems

2024-10-29 Thread Mina Almasry

page_pool_alloc_netmem (without an s) was the mirror of
page_pool_alloc_pages (with an s), which was confusing.

Rename to page_pool_alloc_netmems so it's the mirror of
page_pool_alloc_pages.

Signed-off-by: Mina Almasry 
---
 include/net/page_pool/types.h | 2 +-
 net/core/page_pool.c  | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index c022c410abe3..8f564fe9eb9a 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -242,7 +242,7 @@ struct page_pool {
 };
 
 struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
-netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp);
+netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp);
 struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset,
  unsigned int size, gfp_t gfp);
 netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a813d30d2135..77de79c1933b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -574,7 +574,7 @@ static noinline netmem_ref 
__page_pool_alloc_pages_slow(struct page_pool *pool,
 /* For using page_pool replace: alloc_pages() API calls, but provide
  * synchronization guarantee for allocation side.
  */
-netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp)
+netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp)
 {
netmem_ref netmem;
 
@@ -590,11 +590,11 @@ netmem_ref page_pool_alloc_netmem(struct page_pool *pool, 
gfp_t gfp)
netmem = __page_pool_alloc_pages_slow(pool, gfp);
return netmem;
 }
-EXPORT_SYMBOL(page_pool_alloc_netmem);
+EXPORT_SYMBOL(page_pool_alloc_netmems);
 
 struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
 {
-   return netmem_to_page(page_pool_alloc_netmem(pool, gfp));
+   return netmem_to_page(page_pool_alloc_netmems(pool, gfp));
 }
 EXPORT_SYMBOL(page_pool_alloc_pages);
 ALLOW_ERROR_INJECTION(page_pool_alloc_pages, NULL);
@@ -956,7 +956,7 @@ netmem_ref page_pool_alloc_frag_netmem(struct page_pool 
*pool,
}
 
if (!netmem) {
-   netmem = page_pool_alloc_netmem(pool, gfp);
+   netmem = page_pool_alloc_netmems(pool, gfp);
if (unlikely(!netmem)) {
pool->frag_page = 0;
return 0;
-- 
2.47.0.163.g1226f6d8fa-goog

Re: [PATCH for-next 4/7] selftests/net: Add missing gitignore file

2024-10-29 Thread Jakub Kicinski

On Fri, 25 Oct 2024 09:40:07 +0800 Li Zhijian wrote:
> diff --git a/tools/testing/selftests/net/netfilter/.gitignore 
> b/tools/testing/selftests/net/netfilter/.gitignore
> index 0a64d6d0e29a..eef8d5784e94 100644
> --- a/tools/testing/selftests/net/netfilter/.gitignore
> +++ b/tools/testing/selftests/net/netfilter/.gitignore
> @@ -4,3 +4,4 @@ connect_close
>  conntrack_dump_flush
>  sctp_collision
>  nf_queue
> +conntrack_reverse_clash

This gitignore file appears to be sorted, please move the new entry up
into the right place and repost as a standalone patch (not part of a
series) so that its easier to netfilter maintainers to pick it up.

Re: [PATCH for-next 7/7] selftests/net: Fix ./ns-XXXXXX not cleanup

2024-10-29 Thread Jakub Kicinski

On Tue, 29 Oct 2024 16:43:06 -0700 Jakub Kicinski wrote:
> Similarly to patch 6 

I meant patch 4, not patch 6

Re: [PATCH for-next 7/7] selftests/net: Fix ./ns-XXXXXX not cleanup

2024-10-29 Thread Jakub Kicinski

On Fri, 25 Oct 2024 09:40:10 +0800 Li Zhijian wrote:
> ```
> readonly STATS="$(mktemp -p /tmp ns-XX)"
> readonly BASE=`basename $STATS`
> ```
> It could be a mistake to write to $BASE rather than $STATS, where $STATS
> is used to save the NSTAT_HISTORY and it will be cleaned up before exit.

Agreed, although since we've been creating the wrong file this whole
time and everything worked -- should we just just delete those two
lines completely?

Similarly to patch 6 - please repost as a standalone patch so that our
CI will test it. If you only CC a mailing list on subset of patches
they are likely to be ignored by automation..
-- 
pw-bot: cr

Re: [PATCH] scripts: Remove export_report.pl

2024-10-29 Thread Sami Tolvanen

Hi Matt,

On Tue, Oct 29, 2024 at 2:12 PM Matthew Maurer  wrote:
>
> This script has been broken for 5 years with no user complaints.
>
> It first had its .mod.c parser broken in commit a3d0cb04f7df ("modpost:
> use __section in the output to *.mod.c"). Later, it had its object file
> enumeration broken in commit f65a486821cf ("kbuild: change module.order
> to list *.o instead of *.ko"). Both of these changes sat for years with
> no reports.
>
> Rather than reviving this script as we make further changes to `.mod.c`,
> this patch gets rid of it because it is clearly unused.
>
> Signed-off-by: Matthew Maurer 

Thanks for the patch! Applying this separately without waiting for the
rest of the extended modversions series to land makes sense to me.

Reviewed-by: Sami Tolvanen 

Sami

[PATCH v2 1/2] kselftest/arm64: Increase frequency of signal delivery in fp-stress

2024-10-29 Thread Mark Brown

Currently we only deliver signals to the processes being tested about once
a second, meaning that the signal code paths are subject to relatively
little stress. Increase this frequency substantially to 25ms intervals,
along with some minor refactoring to make this more readily tuneable and
maintain the 1s logging interval. This interval was chosen based on some
experimentation with emulated platforms to avoid causing so much extra load
that the test starts to run into the 45s limit for selftests or generally
completely disconnect the timeout numbers from the

We could increase this if we moved the signal generation out of the main
supervisor thread, though we should also consider that he percentage of
time that we spend interacting with the floating point state is also a
consideration.

Suggested-by: Mark Rutland 
Signed-off-by: Mark Brown 
---
 tools/testing/selftests/arm64/fp/fp-stress.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c 
b/tools/testing/selftests/arm64/fp/fp-stress.c
index 
faac24bdefeb9436e2daf20b7250d0ae25ca23a7..71d02c701bf56be56b7ad00a5f6614e33dc8e01b
 100644
--- a/tools/testing/selftests/arm64/fp/fp-stress.c
+++ b/tools/testing/selftests/arm64/fp/fp-stress.c
@@ -28,6 +28,9 @@
 
 #define MAX_VLS 16
 
+#define SIGNAL_INTERVAL_MS 25
+#define LOG_INTERVALS (1000 / SIGNAL_INTERVAL_MS)
+
 struct child_data {
char *name, *output;
pid_t pid;
@@ -449,7 +452,7 @@ static const struct option options[] = {
 int main(int argc, char **argv)
 {
int ret;
-   int timeout = 10;
+   int timeout = 10 * (1000 / SIGNAL_INTERVAL_MS);
int cpus, i, j, c;
int sve_vl_count, sme_vl_count;
bool all_children_started = false;
@@ -505,7 +508,7 @@ int main(int argc, char **argv)
   have_sme2 ? "present" : "absent");
 
if (timeout > 0)
-   ksft_print_msg("Will run for %ds\n", timeout);
+   ksft_print_msg("Will run for %d\n", timeout);
else
ksft_print_msg("Will run until terminated\n");
 
@@ -578,14 +581,14 @@ int main(int argc, char **argv)
break;
 
/*
-* Timeout is counted in seconds with no output, the
-* tests print during startup then are silent when
-* running so this should ensure they all ran enough
-* to install the signal handler, this is especially
-* useful in emulation where we will both be slow and
-* likely to have a large set of VLs.
+* Timeout is counted in poll intervals with no
+* output, the tests print during startup then are
+* silent when running so this should ensure they all
+* ran enough to install the signal handler, this is
+* especially useful in emulation where we will both
+* be slow and likely to have a large set of VLs.
 */
-   ret = epoll_wait(epoll_fd, evs, tests, 1000);
+   ret = epoll_wait(epoll_fd, evs, tests, SIGNAL_INTERVAL_MS);
if (ret < 0) {
if (errno == EINTR)
continue;
@@ -625,8 +628,9 @@ int main(int argc, char **argv)
all_children_started = true;
}
 
-   ksft_print_msg("Sending signals, timeout remaining: %d\n",
-  timeout);
+   if ((timeout % LOG_INTERVALS) == 0)
+   ksft_print_msg("Sending signals, timeout remaining: 
%d\n",
+  timeout);
 
for (i = 0; i < num_children; i++)
child_tickle(&children[i]);

-- 
2.39.2

[PATCH] scripts: Remove export_report.pl

2024-10-29 Thread Matthew Maurer

lue+1, $symbol, $gpl];
-   push(@{$MODULE{$thismod}} , $sym);
-   }
-   }
-   if ($state != 2) {
-   warn "WARNING:$thismod is not built with CONFIG_MODVERSIONS 
enabled\n";
-   $modversion_warnings++;
-   }
-   close($module);
-}
-
-print "\tThis file reports the exported symbols usage patterns by in-tree\n",
-   "\t\t\t\tmodules\n";
-printf("%s\n\n\n","x"x80);
-printf("\t\t\t\tINDEX\n\n\n");
-printf("SECTION 1: Usage counts of all exported symbols\n");
-printf("SECTION 2: List of modules and the exported symbols they use\n");
-printf("%s\n\n\n","x"x80);
-printf("SECTION 1:\tThe exported symbols and their usage count\n\n");
-printf("%-25s\t%-25s\t%-5s\t%-25s\n", "Symbol", "Module", "Usage count",
-   "export type");
-
-#
-# print the list of unused exported symbols
-#
-foreach my $list (sort alphabetically values(%SYMBOL)) {
-   my ($module, $value, $symbol, $gpl) = @{$list};
-   printf("%-25s\t%-25s\t%-10s\t", $symbol, $module, $value);
-   if (defined $gpl) {
-   printf("%-25s\n",$gpl);
-   } else {
-   printf("\n");
-   }
-}
-printf("%s\n\n\n","x"x80);
-
-printf("SECTION 2:\n\tThis section reports export-symbol-usage of in-kernel
-modules. Each module lists the modules, and the symbols from that module that
-it uses.  Each listed symbol reports the number of modules using it\n");
-
-print "\nNOTE: Got $modversion_warnings CONFIG_MODVERSIONS warnings\n\n"
-if $modversion_warnings;
-
-print "~"x80 , "\n";
-for my $thismod (sort keys %MODULE) {
-   my $list = $MODULE{$thismod};
-   my %depends;
-   $thismod =~ s/\.mod\.c/.ko/;
-   print "\t\t\t$thismod\n";
-   foreach my $symbol (@{$list}) {
-   my ($module, $value, undef, $gpl) = @{$SYMBOL{$symbol}};
-   push (@{$depends{"$module"}}, "$symbol $value");
-   }
-   print_depends_on(\%depends);
-}

---
base-commit: 6fb2fa9805c501d9ade047fc511961f3273cdcb5
change-id: 20241029-remove-export-report-pl-7365c6ca3bee

Best regards,
-- 
Matthew Maurer

RE: [PATCH v5 04/13] iommufd/hw_pagetable: Enforce invalidation op on vIOMMU-based hwpt_nested

2024-10-29 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Wednesday, October 30, 2024 12:05 AM
> 
> On Tue, Oct 29, 2024 at 08:22:43AM +, Tian, Kevin wrote:
> > > From: Nicolin Chen 
> > > Sent: Saturday, October 26, 2024 7:51 AM
> > >
> > > @@ -302,7 +302,9 @@ iommufd_viommu_alloc_hwpt_nested(struct
> > > iommufd_viommu *viommu, u32 flags,
> > >   }
> > >   hwpt->domain->owner = viommu->iommu_dev->ops;
> > >
> > > - if (WARN_ON_ONCE(hwpt->domain->type !=
> > > IOMMU_DOMAIN_NESTED)) {
> > > + if (WARN_ON_ONCE(hwpt->domain->type !=
> > > IOMMU_DOMAIN_NESTED ||
> > > +  (!viommu->ops->cache_invalidate &&
> > > +   !hwpt->domain->ops->cache_invalidate_user))) {
> > >   rc = -EINVAL;
> > >   goto out_abort;
> > >   }
> >
> > According to patch5, cache invalidate in viommu only uses
> > viommu->ops->cache_invalidate. Is here intended to allow
> > nested hwpt created via viommu to still support the old
> > method?
> 
> I think that is reasonable?
> 

Yes, just want to confirm.

Reviewed-by: Kevin Tian

[PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long

2024-10-29 Thread Mina Almasry

Check we're going to free a reasonable number of frags in token_count
before starting the loop, to prevent looping too long.

Also minor code cleanups:
- Flip checks to reduce indentation.
- Use sizeof(*tokens) everywhere for consistentcy.

Cc: Yi Lai 

Signed-off-by: Mina Almasry 

---
 net/core/sock.c | 46 --
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 7f398bd07fb7..8603b8d87f2e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int 
bytes)

 #ifdef CONFIG_PAGE_POOL

-/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in
+/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in
  * 1 syscall. The limit exists to limit the amount of memory the kernel
- * allocates to copy these tokens.
+ * allocates to copy these tokens, and to prevent looping over the frags for
+ * too long.
  */
-#define MAX_DONTNEED_TOKENS 128
+#define MAX_DONTNEED_FRAGS 1024

 static noinline_for_stack int
 sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
@@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, 
unsigned int optlen)
unsigned int num_tokens, i, j, k, netmem_num = 0;
struct dmabuf_token *tokens;
netmem_ref netmems[16];
+   u64 num_frags = 0;
int ret = 0;

if (!sk_is_tcp(sk))
return -EBADF;

-   if (optlen % sizeof(struct dmabuf_token) ||
-   optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS)
+   if (optlen % sizeof(*tokens) ||
+   optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS)
return -EINVAL;

-   tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL);
+   num_tokens = optlen / sizeof(*tokens);
+   tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL);
if (!tokens)
return -ENOMEM;

-   num_tokens = optlen / sizeof(struct dmabuf_token);
if (copy_from_sockptr(tokens, optval, optlen)) {
kvfree(tokens);
return -EFAULT;
}

+   for (i = 0; i < num_tokens; i++) {
+   num_frags += tokens[i].token_count;
+   if (num_frags > MAX_DONTNEED_FRAGS) {
+   kvfree(tokens);
+   return -E2BIG;
+   }
+   }
+
xa_lock_bh(&sk->sk_user_frags);
for (i = 0; i < num_tokens; i++) {
for (j = 0; j < tokens[i].token_count; j++) {
netmem_ref netmem = (__force netmem_ref)__xa_erase(
&sk->sk_user_frags, tokens[i].token_start + j);

-   if (netmem &&
-   !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) {
-   netmems[netmem_num++] = netmem;
-   if (netmem_num == ARRAY_SIZE(netmems)) {
-   xa_unlock_bh(&sk->sk_user_frags);
-   for (k = 0; k < netmem_num; k++)
-   
WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
-   netmem_num = 0;
-   xa_lock_bh(&sk->sk_user_frags);
-   }
-   ret++;
+   if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
+   continue;
+
+   netmems[netmem_num++] = netmem;
+   if (netmem_num == ARRAY_SIZE(netmems)) {
+   xa_unlock_bh(&sk->sk_user_frags);
+   for (k = 0; k < netmem_num; k++)
+   
WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
+   netmem_num = 0;
+   xa_lock_bh(&sk->sk_user_frags);
}
+   ret++;
}
}

--
2.47.0.163.g1226f6d8fa-goog

[PATCH net-next v1 7/7] ncdevmem: add test for too many token_count

2024-10-29 Thread Mina Almasry

Add test for fixed issue: user passing a token with a very large
token_count. Expect an error in this case.

Signed-off-by: Mina Almasry 
---
 tools/testing/selftests/net/ncdevmem.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/net/ncdevmem.c 
b/tools/testing/selftests/net/ncdevmem.c
index 64d6805381c5..3fd2aee461f3 100644
--- a/tools/testing/selftests/net/ncdevmem.c
+++ b/tools/testing/selftests/net/ncdevmem.c
@@ -391,6 +391,17 @@ int do_server(void)
continue;
}
 
+   token.token_start = dmabuf_cmsg->frag_token;
+   token.token_count = 8192;
+
+   ret = setsockopt(client_fd, SOL_SOCKET,
+SO_DEVMEM_DONTNEED, &token,
+sizeof(token));
+   if (ret >= 0)
+   error(1, 0,
+ "DONTNEED of too many frags should have 
failed. ret=%ld\n",
+ ret);
+
token.token_start = dmabuf_cmsg->frag_token;
token.token_count = 1;
 
-- 
2.47.0.163.g1226f6d8fa-goog

[PATCH net-next v1 2/7] net: page_pool: create page_pool_alloc_netmem

2024-10-29 Thread Mina Almasry

Create page_pool_alloc_netmem to be the mirror of page_pool_alloc.

This enables drivers that want currently use page_pool_alloc to
transition to netmem by converting the call sites to
page_pool_alloc_netmem.

Signed-off-by: Mina Almasry 
---
 include/net/page_pool/helpers.h | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 793e6fd78bc5..8e548ff3044c 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -116,22 +116,22 @@ static inline struct page 
*page_pool_dev_alloc_frag(struct page_pool *pool,
return page_pool_alloc_frag(pool, offset, size, gfp);
 }
 
-static inline struct page *page_pool_alloc(struct page_pool *pool,
-  unsigned int *offset,
-  unsigned int *size, gfp_t gfp)
+static inline netmem_ref page_pool_alloc_netmem(struct page_pool *pool,
+   unsigned int *offset,
+   unsigned int *size, gfp_t gfp)
 {
unsigned int max_size = PAGE_SIZE << pool->p.order;
-   struct page *page;
+   netmem_ref netmem;
 
if ((*size << 1) > max_size) {
*size = max_size;
*offset = 0;
-   return page_pool_alloc_pages(pool, gfp);
+   return page_pool_alloc_netmems(pool, gfp);
}
 
-   page = page_pool_alloc_frag(pool, offset, *size, gfp);
-   if (unlikely(!page))
-   return NULL;
+   netmem = page_pool_alloc_frag_netmem(pool, offset, *size, gfp);
+   if (unlikely(!netmem))
+   return 0;
 
/* There is very likely not enough space for another fragment, so append
 * the remaining size to the current fragment to avoid truesize
@@ -142,7 +142,14 @@ static inline struct page *page_pool_alloc(struct 
page_pool *pool,
pool->frag_offset = max_size;
}
 
-   return page;
+   return netmem;
+}
+
+static inline struct page *page_pool_alloc(struct page_pool *pool,
+  unsigned int *offset,
+  unsigned int *size, gfp_t gfp)
+{
+   return netmem_to_page(page_pool_alloc_netmem(pool, offset, size, gfp));
 }
 
 /**
-- 
2.47.0.163.g1226f6d8fa-goog

[PATCH net-next v1 4/7] page_pool: disable sync for cpu for dmabuf memory provider

2024-10-29 Thread Mina Almasry

dmabuf dma-addresses should not be dma_sync'd for CPU/device. Typically
its the driver responsibility to dma_sync for CPU, but the driver should
not dma_sync for CPU if the netmem is actually coming from a dmabuf
memory provider.

The page_pool already exposes a helper for dma_sync_for_cpu:
page_pool_dma_sync_for_cpu. Upgrade this existing helper to handle
netmem, and have it skip dma_sync if the memory is from a dmabuf memory
provider. Drivers should migrate to using this helper when adding
support for netmem.

Cc: Jason Gunthorpe 
Signed-off-by: Mina Almasry 

---
 include/net/page_pool/helpers.h | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 8e548ff3044c..ad4fed4a791c 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -429,9 +429,10 @@ static inline dma_addr_t page_pool_get_dma_addr(const 
struct page *page)
 }
 
 /**
- * page_pool_dma_sync_for_cpu - sync Rx page for CPU after it's written by HW
+ * page_pool_dma_sync_netmem_for_cpu - sync Rx page for CPU after it's written
+ *by HW
  * @pool: &page_pool the @page belongs to
- * @page: page to sync
+ * @netmem: netmem to sync
  * @offset: offset from page start to "hard" start if using PP frags
  * @dma_sync_size: size of the data written to the page
  *
@@ -440,16 +441,28 @@ static inline dma_addr_t page_pool_get_dma_addr(const 
struct page *page)
  * Note that this version performs DMA sync unconditionally, even if the
  * associated PP doesn't perform sync-for-device.
  */
-static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
- const struct page *page,
- u32 offset, u32 dma_sync_size)
+static inline void
+page_pool_dma_sync_netmem_for_cpu(const struct page_pool *pool,
+ const netmem_ref netmem, u32 offset,
+ u32 dma_sync_size)
 {
+   if (pool->mp_priv)
+   return;
+
dma_sync_single_range_for_cpu(pool->p.dev,
- page_pool_get_dma_addr(page),
+ page_pool_get_dma_addr_netmem(netmem),
  offset + pool->p.offset, dma_sync_size,
  page_pool_get_dma_dir(pool));
 }
 
+static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
+ struct page *page, u32 offset,
+ u32 dma_sync_size)
+{
+   page_pool_dma_sync_netmem_for_cpu(pool, page_to_netmem(page), offset,
+ dma_sync_size);
+}
+
 static inline bool page_pool_put(struct page_pool *pool)
 {
return refcount_dec_and_test(&pool->user_cnt);
-- 
2.47.0.163.g1226f6d8fa-goog

[PATCH net-next v1 5/7] netmem: add netmem_prefetch

2024-10-29 Thread Mina Almasry

prefect(page) is a common thing to be called from drivers. Add
netmem_prefetch that can be called on generic netmem. Skips the prefetch
for net_iovs.

Signed-off-by: Mina Almasry 
---
 include/net/netmem.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index 8a6e20be4b9d..923c47147aa8 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -171,4 +171,11 @@ static inline unsigned long netmem_get_dma_addr(netmem_ref 
netmem)
return __netmem_clear_lsb(netmem)->dma_addr;
 }
 
+static inline void netmem_prefetch(netmem_ref netmem)
+{
+   if (netmem_is_net_iov(netmem))
+   return;
+
+   prefetch(netmem_to_page(netmem));
+}
 #endif /* _NET_NETMEM_H */
-- 
2.47.0.163.g1226f6d8fa-goog

[PATCH net-next v1 0/7] devmem TCP fixes

2024-10-29 Thread Mina Almasry

A few unrelated devmem TCP fixes bundled in a series for some
convenience (if that's ok).

Patch 1-2: fix naming and provide page_pool_alloc_netmem for fragged
netmem.

Patch 3-4: fix issues with dma-buf dma addresses being potentially
passed to dma_sync_for_* helpers.

Patch 5-6: fix syzbot SO_DEVMEM_DONTNEED issue and add test for this
case.


Mina Almasry (6):
  net: page_pool: rename page_pool_alloc_netmem to *_netmems
  net: page_pool: create page_pool_alloc_netmem
  page_pool: disable sync for cpu for dmabuf memory provider
  netmem: add netmem_prefetch
  net: fix SO_DEVMEM_DONTNEED looping too long
  ncdevmem: add test for too many token_count

Samiullah Khawaja (1):
  page_pool: Set `dma_sync` to false for devmem memory provider

 include/net/netmem.h   |  7 
 include/net/page_pool/helpers.h| 50 ++
 include/net/page_pool/types.h  |  2 +-
 net/core/devmem.c  |  9 +++--
 net/core/page_pool.c   | 11 +++---
 net/core/sock.c| 46 ++--
 tools/testing/selftests/net/ncdevmem.c | 11 ++
 7 files changed, 93 insertions(+), 43 deletions(-)

-- 
2.47.0.163.g1226f6d8fa-goog

[PATCH v3] selftests/net: Add missing gitignore file

2024-10-29 Thread Li Zhijian

Compiled binary files should be added to .gitignore
'git status' complains:
   Untracked files:
   (use "git add ..." to include in what will be committed)
 net/netfilter/conntrack_reverse_clash

Cc: Pablo Neira Ayuso 
Cc: Jozsef Kadlecsik 
Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Cc: Shuah Khan 
Signed-off-by: Li Zhijian 
---
Cc: netfilter-de...@vger.kernel.org
Cc: coret...@netfilter.org
Cc: net...@vger.kernel.org
---
Hello,
Cover letter is here.

This patch set aims to make 'git status' clear after 'make' and 'make
run_tests' for kselftests.
---
V3:
  sort the files

V2:
  split as a separate patch from a small one [0]
  [0] 
https://lore.kernel.org/linux-kselftest/20241015010817.453539-1-lizhij...@fujitsu.com/

Signed-off-by: Li Zhijian 
---
 tools/testing/selftests/net/netfilter/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/net/netfilter/.gitignore 
b/tools/testing/selftests/net/netfilter/.gitignore
index 0a64d6d0e29a..64c4f8d9aa6c 100644
--- a/tools/testing/selftests/net/netfilter/.gitignore
+++ b/tools/testing/selftests/net/netfilter/.gitignore
@@ -2,5 +2,6 @@
 audit_logread
 connect_close
 conntrack_dump_flush
+conntrack_reverse_clash
 sctp_collision
 nf_queue
-- 
2.44.0

Re: [PATCH for-next 7/7] selftests/net: Fix ./ns-XXXXXX not cleanup

2024-10-29 Thread Zhijian Li (Fujitsu)



On 30/10/2024 07:43, Jakub Kicinski wrote:
> On Fri, 25 Oct 2024 09:40:10 +0800 Li Zhijian wrote:
>> ```
>> readonly STATS="$(mktemp -p /tmp ns-XX)"
>> readonly BASE=`basename $STATS`
>> ```
>> It could be a mistake to write to $BASE rather than $STATS, where $STATS
>> is used to save the NSTAT_HISTORY and it will be cleaned up before exit.
> 
> Agreed, although since we've been creating the wrong file this whole
> time and everything worked
>-- should we just just delete those two lines completely?

Yes, it also works.


> 
> Similarly to patch 6 - please repost as a standalone patch so that our
> CI will test it. If you only CC a mailing list on subset of patches
> they are likely to be ignored by automation..

Got it

Thanks
Zhijian

[PATCH v3] selftests/net: Fix ./ns-XXXXXX not cleanup

2024-10-29 Thread Li Zhijian

```
readonly STATS="$(mktemp -p /tmp ns-XX)"
readonly BASE=`basename $STATS`
```
It could be a mistake to write to $BASE rather than $STATS, where $STATS
is used to save the NSTAT_HISTORY and it will be cleaned up before exit.

Although since we've been creating the wrong file this whole time and
everything worked, it's fine to remove these 2 lines completely

Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Signed-off-by: Li Zhijian 
---
Cc: net...@vger.kernel.org
---
V3:
  Remove these 2 lines rather than fixing the filename

---
Hello,
Cover letter is here.

This patch set aims to make 'git status' clear after 'make' and 'make
run_tests' for kselftests.
---
V2: nothing change

Signed-off-by: Li Zhijian 
---
 tools/testing/selftests/net/veth.sh | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/selftests/net/veth.sh 
b/tools/testing/selftests/net/veth.sh
index 4f1edbafb946..6bb7dfaa30b6 100755
--- a/tools/testing/selftests/net/veth.sh
+++ b/tools/testing/selftests/net/veth.sh
@@ -46,8 +46,6 @@ create_ns() {
ip -n $BASE$ns addr add dev veth$ns $BM_NET_V4$ns/24
ip -n $BASE$ns addr add dev veth$ns $BM_NET_V6$ns/64 nodad
done
-   echo "#kernel" > $BASE
-   chmod go-rw $BASE
 }
 
 __chk_flag() {
-- 
2.44.0

Re: [PATCH] selftest/hid: increase timeout to 10 min

2024-10-29 Thread Shuah Khan


On 10/29/24 03:07, Yun Lu wrote:

If running hid testcases with command "./run_kselftest.sh -c hid",


NIT - When inestead of "If"

the following tests will take longer than the kselftest framework
timeout (now 200 seconds) to run and thus got terminated with TIMEOUT
error:

   hid-multitouch.sh - took about 6min41s
   hid-tablet.sh - took about 6min30s

Increase the timeout setting to 10 minutes to allow them have a chance
to finish.

Cc: sta...@vger.kernel.org
Signed-off-by: Yun Lu 
---
  tools/testing/selftests/hid/settings | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/hid/settings 
b/tools/testing/selftests/hid/settings
index b3cbfc521b10..dff0d947f9c2 100644
--- a/tools/testing/selftests/hid/settings
+++ b/tools/testing/selftests/hid/settings
@@ -1,3 +1,3 @@
  # HID tests can be long, so give a little bit more time
  # to them
-timeout=200
+timeout=600


Okay - maybe this test shouldn't be part of the default run if it needs
600 seconds to run?

thanks,
-- Shuah

Re: [PATCH v5 01/13] iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl

2024-10-29 Thread Jason Gunthorpe

On Tue, Oct 29, 2024 at 12:30:00PM -0700, Nicolin Chen wrote:

> > iommufd_device_unbind() can't fail, and if the object can't be
> > destroyed because it has an elevated long term refcount it WARN's:
> > 
> > 
> > ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM);
> > 
> > /*
> >  * If there is a bug and we couldn't destroy the object then we did put
> >  * back the caller's users refcount and will eventually try to free it
> >  * again during close.
> >  */
> > WARN_ON(ret);
> > 
> > So you cannot take long term references on kernel owned objects. Only
> > userspace owned objects.
> 
> OK. I think I had got this part. Gao ran into this WARN_ON at v3,
> so I added iommufd_object_remove(vdev_id) in unbind() prior to
> this iommufd_object_destroy_user(idev->ictx, &idev->obj).

Oh I see, so the fix to that is to not take a longterm reference, not
to try to destroy a vdev.

The alternative ould be to try to unlink the idev from the vdev and
leave a zombie vdev, but that didn't look so nice to implement. If we
need it we can do it later

Jason

Re: [PATCH net-next v5 03/12] selftests: ncdevmem: Unify error handling

2024-10-29 Thread Jakub Kicinski

On Wed, 23 Oct 2024 08:43:53 -0700 Stanislav Fomichev wrote:
>   ret = inet_pton(server_sin.sin_family, server_ip, &server_sin.sin_addr);
> - if (socket < 0)
> - error(79, 0, "%s: [FAIL, create socket]\n", TEST_PREFIX);
> + if (ret < 0)
> + error(1, pton, "%s: [FAIL, create socket]\n", TEST_PREFIX);

Looks like sched_ext broke our build_tools check, I think I pushed a
fix, but I also see here:

ncdevmem.c: In function ‘do_server’:
ncdevmem.c:343:26: error: ‘pton’ undeclared (first use in this function)
  343 | error(1, pton, "%s: [FAIL, create socket]\n", 
TEST_PREFIX);
  |  ^~~~
-- 
pw-bot: cr

[PATCH v2 0/2] kselftest/arm64: fp-stress signal delivery interval improvements

2024-10-29 Thread Mark Brown

One of the things that fp-stress does to stress the floating point
context switching is send signals to the test threads it spawns.
Currently we do this once per second but as suggested by Mark Rutland if
we increase this we can improve the chances of triggering any issues
with context switching the signal handling code.  Do a quick change to 
increase the rate of signal delivery, trying to avoid excessive impact
on emulated platforms, and a further change to mitigate the impact that
this creates during startup.

Signed-off-by: Mark Brown 
---
Changes in v2:
- Minor clarifications in commit message and log output.
- Link to v1: 
https://lore.kernel.org/r/20241029-arm64-fp-stress-interval-v1-0-db540abf6...@kernel.org

---
Mark Brown (2):
  kselftest/arm64: Increase frequency of signal delivery in fp-stress
  kselftest/arm64: Poll less often while waiting for fp-stress children

 tools/testing/selftests/arm64/fp/fp-stress.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241028-arm64-fp-stress-interval-8f5e62c06e12

Best regards,
-- 
Mark Brown

Re: [PATCH] remoteproc: qcom: pas: Make remoteproc name human friendly

2024-10-29 Thread Chris Lew





On 10/21/2024 9:21 PM, Bjorn Andersson wrote:

The remoteproc "name" property is supposed to present the "human
readable" name of the remoteproc, while using the device name is
readable, it's not "friendly".

Instead, use the "sysmon_name" as the identifier for the remoteproc
instance. It matches the typical names used when we speak about each
instance, while still being unique.

Signed-off-by: Bjorn Andersson 
---
  drivers/remoteproc/qcom_q6v5_pas.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Chris Lew

[PATCH net-next v1 3/7] page_pool: Set `dma_sync` to false for devmem memory provider

2024-10-29 Thread Mina Almasry

From: Samiullah Khawaja 

Move the `dma_map` and `dma_sync` checks to `page_pool_init` to make
them generic. Set dma_sync to false for devmem memory provider because
the dma_sync APIs should not be used for dma_buf backed devmem memory
provider.

Cc: Jason Gunthorpe 
Signed-off-by: Samiullah Khawaja 
Signed-off-by: Mina Almasry 

---
 net/core/devmem.c| 9 -
 net/core/page_pool.c | 3 +++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/core/devmem.c b/net/core/devmem.c
index 11b91c12ee11..826d0b00159f 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -331,11 +331,10 @@ int mp_dmabuf_devmem_init(struct page_pool *pool)
if (!binding)
return -EINVAL;
 
-   if (!pool->dma_map)
-   return -EOPNOTSUPP;
-
-   if (pool->dma_sync)
-   return -EOPNOTSUPP;
+   /* dma-buf dma addresses should not be used with
+* dma_sync_for_cpu/device. Force disable dma_sync.
+*/
+   pool->dma_sync = false;
 
if (pool->p.order != 0)
return -E2BIG;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 77de79c1933b..528dd4d18eab 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -287,6 +287,9 @@ static int page_pool_init(struct page_pool *pool,
}
 
if (pool->mp_priv) {
+   if (!pool->dma_map || !pool->dma_sync)
+   return -EOPNOTSUPP;
+
err = mp_dmabuf_devmem_init(pool);
if (err) {
pr_warn("%s() mem-provider init failed %d\n", __func__,
-- 
2.47.0.163.g1226f6d8fa-goog

[PATCH net-next v1 0/7] devmem TCP fixes

2024-10-29 Thread Mina Almasry

A few unrelated devmem TCP fixes bundled in a series for some
convenience (if that's ok).

Patch 1-2: fix naming and provide page_pool_alloc_netmem for fragged
netmem.

Patch 3-4: fix issues with dma-buf dma addresses being potentially
passed to dma_sync_for_* helpers.

Patch 5-6: fix syzbot SO_DEVMEM_DONTNEED issue and add test for this
case.


Mina Almasry (6):
  net: page_pool: rename page_pool_alloc_netmem to *_netmems
  net: page_pool: create page_pool_alloc_netmem
  page_pool: disable sync for cpu for dmabuf memory provider
  netmem: add netmem_prefetch
  net: fix SO_DEVMEM_DONTNEED looping too long
  ncdevmem: add test for too many token_count

Samiullah Khawaja (1):
  page_pool: Set `dma_sync` to false for devmem memory provider

 include/net/netmem.h   |  7 
 include/net/page_pool/helpers.h| 50 ++
 include/net/page_pool/types.h  |  2 +-
 net/core/devmem.c  |  9 +++--
 net/core/page_pool.c   | 11 +++---
 net/core/sock.c| 46 ++--
 tools/testing/selftests/net/ncdevmem.c | 11 ++
 7 files changed, 93 insertions(+), 43 deletions(-)

-- 
2.47.0.163.g1226f6d8fa-goog

Re: [PATCH v2 3/3] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.

2024-10-29 Thread Frederic Weisbecker

Le Tue, Oct 29, 2024 at 02:52:31PM +0100, Sebastian Andrzej Siewior a écrit :
> On 2024-10-28 15:01:55 [+0100], Frederic Weisbecker wrote:
> > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > > index 457151f9f263d..9637af78087f3 100644
> > > --- a/include/linux/interrupt.h
> > > +++ b/include/linux/interrupt.h
> > > @@ -616,6 +616,50 @@ extern void __raise_softirq_irqoff(unsigned int nr);
> > >  extern void raise_softirq_irqoff(unsigned int nr);
> > >  extern void raise_softirq(unsigned int nr);
> > >  
> > > +/*
> > > + * Handle timers in a dedicated thread at a low SCHED_FIFO priority 
> > > instead in
> > > + * ksoftirqd as to be prefred over SCHED_NORMAL tasks.
> > > + */
> > 
> > This doesn't parse. How about, inspired by your changelog:
> …
> 
> What about this essay instead:
> 
> | With forced-threaded interrupts enabled a raised softirq is deferred to
> | ksoftirqd unless it can be handled within the threaded interrupt. This
> | affects timer_list timers and hrtimers which are explicitly marked with
> | HRTIMER_MODE_SOFT.
> | With PREEMPT_RT enabled more hrtimers are moved to softirq for processing
> | which includes all timers which are not explicitly marked HRTIMER_MODE_HARD.
> | Userspace controlled timers (like the clock_nanosleep() interface) is 
> divided
> | into two categories: Tasks with elevated scheduling policy including
> | SCHED_{FIFO|RR|DL} and the remaining scheduling policy. The tasks with the
> | elevated scheduling policy are woken up directly from the HARDIRQ while all
> | other wake ups are delayed to so softirq and so to ksoftirqd.

First "so" is intended?

> |
> | The ksoftirqd runs at SCHED_OTHER policy at which it should remain since it
> | handles the softirq in an overloaded situation (not handled everything
> | within its last run).
> | If the timers are handled at SCHED_OTHER priority then they competes with 
> all
> | other SCHED_OTHER tasks for CPU resources are possibly delayed.
> | Moving timers softirqs to a low priority SCHED_FIFO thread instead ensures
> | that timer are performed before scheduling any SCHED_OTHER thread.

Works for me!

> And with this piece of text I convinced myself to also enable this in
> the forced-threaded case.

Yes, that makes sense.

Thanks.

> > Thanks.
> 
> Sebastian

Re: [PATCH v5 07/13] iommufd/selftest: Add container_of helpers

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:49:47PM -0700, Nicolin Chen wrote:
> Use these inline helpers to shorten those container_of lines.
> 
> Note that one of them goes back and forth between iommu_domain and
> mock_iommu_domain, which isn't necessary. So drop its container_of.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/iommufd/selftest.c | 75 ++--
>  1 file changed, 42 insertions(+), 33 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH v5 08/13] iommufd/selftest: Prepare for mock_viommu_alloc_domain_nested()

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:49:48PM -0700, Nicolin Chen wrote:
> A nested domain now can be allocated for a parent domain for a vIOMMU
> object. Rework the existing allocators to prepare for the latter case.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/iommufd/selftest.c | 89 ++--
>  1 file changed, 50 insertions(+), 39 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

[PATCH] selftests/lam: Test get_user() LAM pointer handling

2024-10-29 Thread Maciej Wieczor-Retman

Recent change in how get_user() handles pointers [1] has a specific case
for LAM. It assigns a different bitmask that's later used to check
whether a pointer comes from userland in get_user().

While currently commented out (until LASS [2] is merged into the kernel)
it's worth making changes to the LAM selftest ahead of time.

Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses
get_user() in its implementation. Execute the syscall with differently
tagged pointers to verify that valid user pointers are passing through
and invalid kernel/non-canonical pointers are not.

Code was tested on a Sierra Forest Xeon machine that's LAM capable. The
test was ran without issues with both the LAM lines from [1] untouched
and commented out. The test was also ran without issues with LAM_SUP
both enabled and disabled.

[1] 
https://lore.kernel.org/all/20241024013214.129639-1-torva...@linux-foundation.org/
[2] 
https://lore.kernel.org/all/20240710160655.3402786-1-alexander.shish...@linux.intel.com/

Signed-off-by: Maciej Wieczor-Retman 
---
 tools/testing/selftests/x86/lam.c | 85 +++
 1 file changed, 85 insertions(+)

diff --git a/tools/testing/selftests/x86/lam.c 
b/tools/testing/selftests/x86/lam.c
index 0ea4f6813930..3c53d4b7aa61 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,10 +44,19 @@
 #define FUNC_INHERITE   0x20
 #define FUNC_PASID  0x40
 
+/* get_user() pointer test cases */
+#define GET_USER_USER   0
+#define GET_USER_KERNEL_TOP 1
+#define GET_USER_KERNEL_BOT 2
+#define GET_USER_KERNEL 3
+
 #define TEST_MASK   0x7f
+#define L5_SIGN_EXT_MASK(0xFFUL << 56)
+#define L4_SIGN_EXT_MASK(0x1UL << 47)
 
 #define LOW_ADDR(0x1UL << 30)
 #define HIGH_ADDR   (0x3UL << 48)
+#define L5_ADDR (0x1UL << 48)
 
 #define MALLOC_LEN  32
 
@@ -370,6 +380,54 @@ static int handle_syscall(struct testcases *test)
return ret;
 }
 
+static int get_user_syscall(struct testcases *test)
+{
+   int ret = 0;
+   int ptr_value = 0;
+   void *ptr = &ptr_value;
+   int fd;
+
+   uint64_t bitmask = ((uint64_t)ptr & L5_ADDR) ? L5_SIGN_EXT_MASK :
+  L4_SIGN_EXT_MASK;
+
+   if (test->lam != 0)
+   if (set_lam(test->lam) != 0)
+   return 2;
+
+   fd = memfd_create("lam_ioctl", 0);
+   if (fd == -1)
+   exit(EXIT_FAILURE);
+
+   switch (test->later) {
+   case GET_USER_USER:
+   /* Control group - properly tagger user pointer */
+   ptr = (void *)set_metadata((uint64_t)ptr, test->lam);
+   break;
+   case GET_USER_KERNEL_TOP:
+   /* Kernel address with top bit cleared */
+   bitmask &= (bitmask >> 1);
+   ptr = (void *)((uint64_t)ptr | bitmask);
+   break;
+   case GET_USER_KERNEL_BOT:
+   /* Kernel address with bottom sign-extension bit cleared */
+   bitmask &= (bitmask << 1);
+   ptr = (void *)((uint64_t)ptr | bitmask);
+   break;
+   case GET_USER_KERNEL:
+   /* Try to pass a kernel address */
+   ptr = (void *)((uint64_t)ptr | bitmask);
+   break;
+   default:
+   printf("Invalid test case value passed!\n");
+   break;
+   }
+
+   if (ioctl(fd, FIOASYNC, ptr) != 0)
+   ret = 1;
+
+   return ret;
+}
+
 int sys_uring_setup(unsigned int entries, struct io_uring_params *p)
 {
return (int)syscall(__NR_io_uring_setup, entries, p);
@@ -883,6 +941,33 @@ static struct testcases syscall_cases[] = {
.test_func = handle_syscall,
.msg = "SYSCALL:[Negative] Disable LAM. Dereferencing pointer 
with metadata.\n",
},
+   {
+   .later = GET_USER_USER,
+   .lam = LAM_U57_BITS,
+   .test_func = get_user_syscall,
+   .msg = "GET_USER: get_user() and pass a properly tagged user 
pointer.\n",
+   },
+   {
+   .later = GET_USER_KERNEL_TOP,
+   .expected = 1,
+   .lam = LAM_U57_BITS,
+   .test_func = get_user_syscall,
+   .msg = "GET_USER:[Negative] get_user() with a kernel pointer 
and the top bit cleared.\n",
+   },
+   {
+   .later = GET_USER_KERNEL_BOT,
+   .expected = 1,
+   .lam = LAM_U57_BITS,
+   .test_func = get_user_syscall,
+   .msg = "GET_USER:[Negative] get_user() with a kernel pointer 
and the bottom sign-extension bit cleared.\n",
+   },
+   {
+   .later = GET_USER_KERNEL,
+   .expected = 1,
+   .lam

Re: [PATCH v5 06/13] iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC

2024-10-29 Thread Jason Gunthorpe

On Mon, Oct 28, 2024 at 07:52:10AM -0700, Nicolin Chen wrote:
> On Mon, Oct 28, 2024 at 10:03:09AM -0300, Jason Gunthorpe wrote:
> > On Mon, Oct 28, 2024 at 11:24:10AM +0800, Zhangfei Gao wrote:
> > 
> > > > +/**
> > > > + * iommufd_viommu_alloc_hwpt_nested() - Get a hwpt_nested for a vIOMMU
> > > > + * @viommu: vIOMMU ojbect to associate the hwpt_nested/domain with
> > > > + * @user_data: user_data pointer. Must be valid
> > > > + *
> > > > + * Allocate a new IOMMU_DOMAIN_NESTED for a vIOMMU and return it as a 
> > > > NESTED
> > > > + * hw_pagetable.
> > > > + */
> > > > +static struct iommufd_hwpt_nested *
> > > > +iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 
> > > > flags,
> > > > +const struct iommu_user_data 
> > > > *user_data)
> > > > +{
> > > > +   struct iommufd_hwpt_nested *hwpt_nested;
> > > > +   struct iommufd_hw_pagetable *hwpt;
> > > > +   int rc;
> > > > +
> > > > +   if (flags)
> > > > +   return ERR_PTR(-EOPNOTSUPP);
> > > 
> > > This check should be removed.
> > > 
> > > When a user page fault is required, IOMMU_HWPT_FAULT_ID_VALID is set.
> > > if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
> > 
> > It can't just be removed..
> > 
> > I suspect that IOMMU_HWPT_FAULT_ID_VALID should not be set on the
> > nested domain but on the parent?
> 
> By giving another look,
> 
> In iommufd_hwpt_paging_alloc(), we reject IOMMU_HWPT_FAULT_ID_VALID:
>   const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT |
>   IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>   ...
>   if (flags & ~valid_flags)
>   return ERR_PTR(-EOPNOTSUPP);
> 
> In iommufd_hwpt_nested_alloc(), we mask the flag away:
>   if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
>   !user_data->len || !ops->domain_alloc_user)
>   return ERR_PTR(-EOPNOTSUPP);
>   ...
>   hwpt->domain = ops->domain_alloc_user(idev->dev,
> flags & 
> ~IOMMU_HWPT_FAULT_ID_VALID,
> parent->common.domain, user_data);
> 
> Then, in the common function it has a section of
>   if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
>   ...
> 
> It seems that this IOMMU_HWPT_FAULT_ID_VALID is for nested domains?

OK, but ARM should be blocking it since it doesn't work there.

I think we made some error here, it should have been passed in flags
to the drivers and only intel should have accepted it.

This suggests we should send flags down the viommu alloc domain path too.

Jason

Re: [PATCH v2] selftests: tc-testing: Fix typo error

2024-10-29 Thread patchwork-bot+netdevbpf

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Tue, 22 Oct 2024 18:30:52 + you wrote:
> Correct the typo errors in json files
> 
> - "diffferent" is corrected to "different".
> - "muliple" and "miltiple" is corrected to "multiple".
> 
> Reviewed-by: Simon Horman 
> Reviewed-by: Shuah Khan 
> Signed-off-by: Karan Sanghavi 
> 
> [...]

Here is the summary with links:
  - [v2] selftests: tc-testing: Fix typo error
https://git.kernel.org/netdev/net-next/c/9a1036389fa2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: [PATCH v5 05/13] iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:50:34PM -0700, Nicolin Chen wrote:
> @@ -497,17 +497,35 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
>   goto out;
>   }
>  
> - hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id);
> - if (IS_ERR(hwpt)) {
> - rc = PTR_ERR(hwpt);
> + pt_obj = iommufd_get_object(ucmd->ictx, cmd->hwpt_id, IOMMUFD_OBJ_ANY);
> + if (IS_ERR(pt_obj)) {
> + rc = PTR_ERR(pt_obj);
>   goto out;
>   }
> + if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) {
> + struct iommufd_hw_pagetable *hwpt =
> + container_of(pt_obj, struct iommufd_hw_pagetable, obj);
> +
> + rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain,
> +   &data_array);
> + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) {
> + struct iommufd_viommu *viommu =
> + container_of(pt_obj, struct iommufd_viommu, obj);
> +
> + if (!viommu->ops || !viommu->ops->cache_invalidate) {
> + rc = -EOPNOTSUPP;
> + goto out_put_pt;
> + }
> + rc = viommu->ops->cache_invalidate(viommu, &data_array);
> + } else {
> + rc = -EINVAL;
> + goto out_put_pt;
> + }

Given the test in iommufd_viommu_alloc_hwpt_nested() is:

if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED ||
 (!viommu->ops->cache_invalidate &&
  !hwpt->domain->ops->cache_invalidate_user)))
  {

We will crash if the user passes a viommu allocated domain as
IOMMUFD_OBJ_HWPT_NESTED since the above doesn't check it.

I suggest we put the required if (ops..) -EOPNOTSUPP above and remove
the ops->cache_invalidate checks from both WARN_ONs.

Jason

Re: [PATCH v5 04/13] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

2024-10-29 Thread Jason Gunthorpe

On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
> > +void iommufd_viommu_destroy(struct iommufd_object *obj)
> > +{
> > +   struct iommufd_viommu *viommu =
> > +   container_of(obj, struct iommufd_viommu, obj);
> > +
> > +   if (viommu->ops && viommu->ops->free)
> > +   viommu->ops->free(viommu);
> 
> Ops can't be null and free can't be null, that would mean there is a
> memory leak.

Actually, it is just named wrong, it should be called destroy like
this op, it doesn't free any memory..

Jason

Re: [PATCH v5 01/13] iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl

2024-10-29 Thread Nicolin Chen

On Tue, Oct 29, 2024 at 12:58:24PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 5fd3dd420290..e50113305a9c 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> >   */
> >  void iommufd_device_unbind(struct iommufd_device *idev)
> >  {
> > +   u32 vdev_id = 0;
> > +
> > +   /* idev->vdev object should be destroyed prior, yet just in case.. */
> > +   mutex_lock(&idev->igroup->lock);
> > +   if (idev->vdev)
> 
> Then should it have a WARN_ON here?

It'd be a user space mistake that forgot to call the destroy ioctl
to the object, in which case I recall kernel shouldn't WARN_ON?

> > +   vdev_id = idev->vdev->obj.id;
> > +   mutex_unlock(&idev->igroup->lock);
> > +   /* Relying on xa_lock against a race with iommufd_destroy() */
> > +   if (vdev_id)
> > +   iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
> 
> That doesn't seem right, iommufd_object_remove() should never be used
> to destroy an object that userspace created with an IOCTL, in fact
> that just isn't allowed.

It was for our auto destroy feature. If user space forgot to destroy
the object while trying to unplug the device from VM. This saves the
day.

> Ugh, there is worse here, we can't hold a long term reference on a
> kernel owned object:
> 
>   idev->vdev = vdev;
>   refcount_inc(&idev->obj.users);
> 
> As it prevents the kernel from disconnecting it.

Hmm, mind elaborating? I think the iommufd_fops_release() would
xa_for_each the object list that destroys the vdev object first
then this idev (and viommu too)?

> I came up with this that seems like it will work. Maybe we will need
> to improve it later. Instead of using the idev, just keep the raw
> struct device. We can hold a refcount on the struct device without
> races. There is no need for the idev igroup lock since the xa_lock
> does everything we need.

OK. If user space forgot to destroy its vdev while unplugging the
device, it would not be allowed to hotplug another device (or the
same device) back to the same slot having the same RID, since the
RID on the vIOMMU would be occupied by the undestroyed vdev.

If we decide to do so, I think we should highlight this somewhere
in the doc.

Thanks
Nicolin

Re: [PATCH v5 01/13] iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
> +/**
> + * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC)
> + * @size: sizeof(struct iommu_vdevice_alloc)
> + * @viommu_id: vIOMMU ID to associate with the virtual device
> + * @dev_id: The pyhsical device to allocate a virtual instance on the vIOMMU
> + * @__reserved: Must be 0
> + * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID
> + *   of AMD IOMMU, and vID of a nested Intel VT-d to a Context Table.
> + * @out_vdevice_id: Output virtual instance ID for the allocated object

How about:

@out_vdevice_id: Object handle for the vDevice. Pass to IOMMU_DESTORY


> + * Allocate a virtual device instance (for a physical device) against a 
> vIOMMU.
> + * This instance holds the device's information (related to its vIOMMU) in a 
> VM.
> + */
> +struct iommu_vdevice_alloc {
> + __u32 size;
> + __u32 viommu_id;
> + __u32 dev_id;
> + __u32 __reserved;
> + __aligned_u64 virt_id;
> + __u32 out_vdevice_id;
> + __u32 __reserved2;

Lets not have two u32 reserved, put the out_vdevice_id above virt_id

> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 5fd3dd420290..e50113305a9c 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
>   */
>  void iommufd_device_unbind(struct iommufd_device *idev)
>  {
> + u32 vdev_id = 0;
> +
> + /* idev->vdev object should be destroyed prior, yet just in case.. */
> + mutex_lock(&idev->igroup->lock);
> + if (idev->vdev)

Then should it have a WARN_ON here?

> + vdev_id = idev->vdev->obj.id;
> + mutex_unlock(&idev->igroup->lock);
> + /* Relying on xa_lock against a race with iommufd_destroy() */
> + if (vdev_id)
> + iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);

That doesn't seem right, iommufd_object_remove() should never be used
to destroy an object that userspace created with an IOCTL, in fact
that just isn't allowed.

Ugh, there is worse here, we can't hold a long term reference on a
kernel owned object:

idev->vdev = vdev;
refcount_inc(&idev->obj.users);

As it prevents the kernel from disconnecting it.

I came up with this that seems like it will work. Maybe we will need
to improve it later. Instead of using the idev, just keep the raw
struct device. We can hold a refcount on the struct device without
races. There is no need for the idev igroup lock since the xa_lock
does everything we need.

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index e50113305a9c47..5fd3dd42029015 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -277,17 +277,6 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
  */
 void iommufd_device_unbind(struct iommufd_device *idev)
 {
-   u32 vdev_id = 0;
-
-   /* idev->vdev object should be destroyed prior, yet just in case.. */
-   mutex_lock(&idev->igroup->lock);
-   if (idev->vdev)
-   vdev_id = idev->vdev->obj.id;
-   mutex_unlock(&idev->igroup->lock);
-   /* Relying on xa_lock against a race with iommufd_destroy() */
-   if (vdev_id)
-   iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
-
iommufd_object_destroy_user(idev->ictx, &idev->obj);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index 9849474f429f98..6e870bce2a0cd0 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -46,6 +46,6 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu 
*viommu,
lockdep_assert_held(&viommu->vdevs.xa_lock);
 
vdev = xa_load(&viommu->vdevs, vdev_id);
-   return vdev ? vdev->idev->dev : NULL;
+   return vdev ? vdev->dev : NULL;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, IOMMUFD);
diff --git a/drivers/iommu/iommufd/iommufd_private.h 
b/drivers/iommu/iommufd/iommufd_private.h
index 365cf5a56cdf20..275f954235940c 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -152,9 +152,6 @@ static inline void iommufd_put_object(struct iommufd_ctx 
*ictx,
wake_up_interruptible_all(&ictx->destroy_wait);
 }
 
-int iommufd_verify_unfinalized_object(struct iommufd_ctx *ictx,
- struct iommufd_object *to_verify);
-
 void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object 
*obj);
 void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx,
  struct iommufd_object *obj);
@@ -391,7 +388,6 @@ struct iommufd_device {
struct iommufd_object obj;
struct iommufd_ctx *ictx;
struct iommufd_group *igroup;
-   struct iommufd_vdevice *vdev;
struct list_head group_item;
/*

Re: [PATCH v5 02/13] iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:50:31PM -0700, Nicolin Chen wrote:
> Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage of
> IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc TEST_F
> to cover the IOMMU_VDEVICE_ALLOC ioctl.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  tools/testing/selftests/iommu/iommufd_utils.h | 27 +++
>  tools/testing/selftests/iommu/iommufd.c   | 20 ++
>  .../selftests/iommu/iommufd_fail_nth.c|  4 +++
>  3 files changed, 51 insertions(+)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH v5 04/13] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

2024-10-29 Thread Jason Gunthorpe

On Tue, Oct 29, 2024 at 08:46:40AM -0700, Nicolin Chen wrote:
> On Tue, Oct 29, 2024 at 12:36:24PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
> > > > +void iommufd_viommu_destroy(struct iommufd_object *obj)
> > > > +{
> > > > +   struct iommufd_viommu *viommu =
> > > > +   container_of(obj, struct iommufd_viommu, obj);
> > > > +
> > > > +   if (viommu->ops && viommu->ops->free)
> > > > +   viommu->ops->free(viommu);
> > > 
> > > Ops can't be null and free can't be null, that would mean there is a
> > > memory leak.
> > 
> > Actually, it is just named wrong, it should be called destroy like
> > this op, it doesn't free any memory..
> 
> Well, it frees if driver allocated something in its top structure.
> Yet, "destroy" does sound less confusing. Let's rename it, assuming
> it can still remain to be optional as we have here.

Yes, optional is right, I was confused by the name. The core code uses
destroy so I'd stick with that.

Jason

Re: [PATCH v5 05/13] iommufd: Add alloc_domain_nested op to iommufd_viommu_ops

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:49:45PM -0700, Nicolin Chen wrote:
> Allow IOMMU driver to use a vIOMMU object that holds a nesting parent
> hwpt/domain to allocate a nested domain.
> 
> Suggested-by: Jason Gunthorpe 
> Reviewed-by: Kevin Tian 
> Signed-off-by: Nicolin Chen 
> ---
>  include/linux/iommufd.h | 9 +
>  1 file changed, 9 insertions(+)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH v5 04/13] iommufd/hw_pagetable: Enforce invalidation op on vIOMMU-based hwpt_nested

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:50:33PM -0700, Nicolin Chen wrote:
> A vIOMMU-based hwpt_nested requires a cache invalidation op too, either
> using the one in iommu_domain_ops or the one in viommu_ops. Enforce that
> upon the allocated hwpt_nested.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/iommufd/hw_pagetable.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH v5 09/13] iommufd/selftest: Add refcount to mock_iommu_device

2024-10-29 Thread Nicolin Chen

On Tue, Oct 29, 2024 at 12:34:38PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 25, 2024 at 04:49:49PM -0700, Nicolin Chen wrote:
> > For an iommu_dev that can unplug (so far only this selftest does so), the
> > viommu->iommu_dev pointer has no guarantee of its life cycle after it is
> > copied from the idev->dev->iommu->iommu_dev.
> > 
> > Track the user count of the iommu_dev. Postpone the exit routine using a
> > completion, if refcount is unbalanced. The refcount inc/dec will be added
> > in the following patch.
> > 
> > Signed-off-by: Nicolin Chen 
> > ---
> >  drivers/iommu/iommufd/selftest.c | 32 
> >  1 file changed, 24 insertions(+), 8 deletions(-)
> 
> Reviewed-by: Jason Gunthorpe 
> 
> Since this is built into the iommufd module it can't be unloaded
> without also unloading iommufd, which is impossible as long as any
> iommufd FDs are open. So I expect that the WARN_ON can never happen.

Hmm, I assume we still need this patch then?

Could a faulty "--force" possibly trigger it?

Nicolin

Re: [PATCH v5 03/13] iommufd: Add iommufd_verify_unfinalized_object

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:49:43PM -0700, Nicolin Chen wrote:
> To support driver-allocated vIOMMU objects, it's suggested to call the
> allocator helper in IOMMU dirvers. However, there is no guarantee that
> drivers will all use it and allocate objects properly.
> 
> Add a helper for iommufd core to verify if an unfinalized object is at
> least reserved in the ictx.

I don't think we need this..

iommufd_object_finalize() already does:

old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL);
/* obj->id was returned from xa_alloc() so the xa_store() cannot fail */
WARN_ON(old);

So, we could enhance it to be more robust, like this patch is doing:

void iommufd_object_finalize(struct iommufd_ctx *ictx,
 struct iommufd_object *obj)
{
void *old;

old = xa_cmpxchg(&ictx->objects, obj->id, XA_ZERO_ENTRY, obj,
 GFP_KERNEL);
/* obj->id was returned from xa_alloc() so the xa_store() cannot fail */
WARN_ON(old != XA_ZERO_ENTRY);

It is always a driver bug to not initialize that object, so WARN_ON is
OK.

Jason

Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()

2024-10-29 Thread Paul E. McKenney

On Tue, Oct 29, 2024 at 02:20:51AM +, Cheng-Jui Wang (王正睿) wrote:
> On Mon, 2024-10-28 at 17:22 -0700, Paul E. McKenney wrote:
> > The result is that the current leaf rcu_node structure's ->lock is
> > acquired only if a stack backtrace might be needed from the current CPU,
> > and is held across only that CPU's backtrace. As a result, if there are
> 
> After upgrading our device to kernel-6.11, we encountered a lockup
> scenario under stall warning. 
> I had prepared a patch to submit, but I noticed that this series has
> already addressed some issues, though it hasn't been merged into the
> mainline yet. So, I decided to reply to this series for discussion on
> how to fix it before pushing. Here is the lockup scenario We
> encountered:
> 
> Devices: arm64 with only 8 cores
> One CPU holds rnp->lock in rcu_dump_cpu_stack() while trying to dump
> other CPUs, but it waits for the corresponding CPU to dump backtrace,
> with a 10-second timeout.
> 
>__delay()
>__const_udelay()
>nmi_trigger_cpumask_backtrace()
>arch_trigger_cpumask_backtrace()
>trigger_single_cpu_backtrace()
>dump_cpu_task()
>rcu_dump_cpu_stacks()  <- holding rnp->lock
>print_other_cpu_stall()
>check_cpu_stall()
>rcu_pending()
>rcu_sched_clock_irq()
>update_process_times()
> 
> However, the other 7 CPUs are waiting for rnp->lock on the path to
> report qs.
> 
>queued_spin_lock_slowpath()
>queued_spin_lock()
>do_raw_spin_lock()
>__raw_spin_lock_irqsave()
>_raw_spin_lock_irqsave()
>rcu_report_qs_rdp()
>rcu_check_quiescent_state()
>rcu_core()
>rcu_core_si()
>handle_softirqs()
>__do_softirq()
>do_softirq()
>call_on_irq_stack()
> 
> Since the arm64 architecture uses IPI instead of true NMI to implement
> arch_trigger_cpumask_backtrace(), spin_lock_irqsave disables
> interrupts, which is enough to block this IPI request.
> Therefore, if other CPUs start waiting for the lock before receiving
> the IPI, a semi-deadlock scenario like the following occurs:
> 
> CPU0CPU1CPU2
> -   -   -
> lock_irqsave(rnp->lock)
> lock_irqsave(rnp->lock)
> 
> 
> 
> lock_irqsave(rnp->lock)
> 
> 
> 
> ...
> 
> 
> In our scenario, with 7 CPUs to dump, the lockup takes nearly 70
> seconds, causing subsequent useful logs to be unable to print, leading
> to a watchdog timeout and system reboot.
> 
> This series of changes re-acquires the lock after each dump,
> significantly reducing lock-holding time. However, since it still holds
> the lock while dumping CPU backtrace, there's still a chance for two
> CPUs to wait for each other for 10 seconds, which is still too long.
> So, I would like to ask if it's necessary to dump backtrace within the
> spinlock section?
> If not, especially now that lockless checks are possible, maybe it can
> be changed as follows?
> 
> - if (!(data_race(rnp->qsmask) & leaf_node_cpu_bit(rnp, 
> cpu)))
> - continue;
> - raw_spin_lock_irqsave_rcu_node(rnp, flags);
> - if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
> + if (data_race(rnp->qsmask) & leaf_node_cpu_bit(rnp, 
> cpu)) {
>   if (cpu_is_offline(cpu))
>   pr_err("Offline CPU %d blocking current 
> GP.\n", cpu);
>   else
>   dump_cpu_task(cpu);
>   }
>   }
> - raw_spin_unlock_irqrestore_rcu_node(rnp,
> flags);
> 
> Or should this be considered an arm64 issue, and they should switch to
> true NMI, otherwise, they shouldn't use
> nmi_trigger_cpumask_backtrace()?

Thank you for looking into this!

We do assume that nmi_trigger_cpumask_backtrace() uses true NMIs, so,
yes, nmi_trigger_cpumask_backtrace() should use true NMIs, just like
the name says.  ;-)

Alternatively, arm64 could continue using nmi_trigger_cpumask_backtrace()
with normal interrupts (for example, on SoCs not implementing true NMIs),
but have a short timeout (maybe a few jiffies?) after which its returns
false (and presumably also cancels the backtrace request so that when
the non-NMI interrupt eventually does happen, its handler simply returns
without backtracing).  This should be implemented using atomics to avoid
deadlock issues.  This alternative approach would provide accurate arm64
backtraces in the common case where interrupts are enabled, but allow
a graceful fallback to remote tracing otherwise.

Would you be interested in working this issue, whatever solution the
arm64 maintainers end up preferring?

Thanx, Paul

Re: [PATCH v5 01/13] iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl

2024-10-29 Thread Jason Gunthorpe

On Tue, Oct 29, 2024 at 10:29:56AM -0700, Nicolin Chen wrote:
> On Tue, Oct 29, 2024 at 12:58:24PM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
> > > diff --git a/drivers/iommu/iommufd/device.c 
> > > b/drivers/iommu/iommufd/device.c
> > > index 5fd3dd420290..e50113305a9c 100644
> > > --- a/drivers/iommu/iommufd/device.c
> > > +++ b/drivers/iommu/iommufd/device.c
> > > @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> > >   */
> > >  void iommufd_device_unbind(struct iommufd_device *idev)
> > >  {
> > > + u32 vdev_id = 0;
> > > +
> > > + /* idev->vdev object should be destroyed prior, yet just in case.. */
> > > + mutex_lock(&idev->igroup->lock);
> > > + if (idev->vdev)
> > 
> > Then should it have a WARN_ON here?
> 
> It'd be a user space mistake that forgot to call the destroy ioctl
> to the object, in which case I recall kernel shouldn't WARN_ON?

But you can't get here because:

refcount_inc(&idev->obj.users);

And kernel doesn't destroy objects with elevated ref counts?

> > > + vdev_id = idev->vdev->obj.id;
> > > + mutex_unlock(&idev->igroup->lock);
> > > + /* Relying on xa_lock against a race with iommufd_destroy() */
> > > + if (vdev_id)
> > > + iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
> > 
> > That doesn't seem right, iommufd_object_remove() should never be used
> > to destroy an object that userspace created with an IOCTL, in fact
> > that just isn't allowed.
> 
> It was for our auto destroy feature. 

auto domains are "hidden" hwpts that are kernel managed. They are not
"userspace created".

"Usespace created" objects are ones that userspace is expected to call
destroy on.

If you destroy them behind the scenes in the kerenl then the objecd ID
can be reallocated for something else and when userspace does DESTROY
on the ID it thought was still allocated it will malfunction.

So, only userspace can destroy objects that userspace created.

> If user space forgot to destroy the object while trying to unplug
> the device from VM. This saves the day.

No, it should/does fail destroy of the VIOMMU object because the users
refcount is elevated.

> > Ugh, there is worse here, we can't hold a long term reference on a
> > kernel owned object:
> > 
> > idev->vdev = vdev;
> > refcount_inc(&idev->obj.users);
> > 
> > As it prevents the kernel from disconnecting it.
> 
> Hmm, mind elaborating? I think the iommufd_fops_release() would
> xa_for_each the object list that destroys the vdev object first
> then this idev (and viommu too)?

iommufd_device_unbind() can't fail, and if the object can't be
destroyed because it has an elevated long term refcount it WARN's:

ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM);

/*
 * If there is a bug and we couldn't destroy the object then we did put
 * back the caller's users refcount and will eventually try to free it
 * again during close.
 */
WARN_ON(ret);

So you cannot take long term references on kernel owned objects. Only
userspace owned objects.

> OK. If user space forgot to destroy its vdev while unplugging the
> device, it would not be allowed to hotplug another device (or the
> same device) back to the same slot having the same RID, since the
> RID on the vIOMMU would be occupied by the undestroyed vdev.

Yes, that seems correct and obvious to me. Until the vdev is
explicitly destroyed the ID is in-use.

Good userspace should destroy the iommufd vDEVICE object before
closing the VFIO file descriptor.

If it doesn't, then the VDEVICE object remains even though the VFIO it
was linked to is gone.

Jason

Re: [PATCH v5 03/13] iommufd: Add iommufd_verify_unfinalized_object

2024-10-29 Thread Nicolin Chen

On Tue, Oct 29, 2024 at 03:55:58PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 29, 2024 at 09:18:05AM -0700, Nicolin Chen wrote:
> > On Tue, Oct 29, 2024 at 11:49:07AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 25, 2024 at 04:49:43PM -0700, Nicolin Chen wrote:
> > > > To support driver-allocated vIOMMU objects, it's suggested to call the
> > > > allocator helper in IOMMU dirvers. However, there is no guarantee that
> > > > drivers will all use it and allocate objects properly.
> > > > 
> > > > Add a helper for iommufd core to verify if an unfinalized object is at
> > > > least reserved in the ictx.
> > > 
> > > I don't think we need this..
> > > 
> > > iommufd_object_finalize() already does:
> > > 
> > >   old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL);
> > >   /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */
> > >   WARN_ON(old);
> > 
> > It feels unsafe to carry on the iommufd_viommu_alloc_ioctl() until
> > iommufd_object_finalize() as the function would touch the returned
> > faulty viommu pointer? E.g. what if the viommu has an even smaller
> > size than struct iommufd_viommu?
> 
> This is Linux just because the output came from a driver doesn't mean
> we have to validate it somehow. It is reasonable to be helpful and
> detect driver bugs, but if the driver is buggy it is still OK to
> crash.
> 
> So you don't *have* to check any of this, if the driver didn't use the
> right function to allocate the memory then it will go bad pretty fast.
> 
> Improving the xa_store() is something that will detect more kinds of
> bugs everywhere, so seems more worthwhile

I see. Thanks!

Nicolin

[PATCH v4 0/2] vsock/test: fix wrong setsockopt() parameters

2024-10-29 Thread Konstantin Shkolnyy

Parameters were created using wrong C types, which caused them to be of
wrong size on some architectures, causing problems.

The problem with SO_RCVLOWAT was found on s390 (big endian), while x86-64
didn't show it. After the fix, all tests pass on s390.
Then Stefano Garzarella pointed out that SO_VM_SOCKETS_* calls might have
a similar problem, which turned out to be true, hence, the second patch.

Changes for v4:
- add "Reviewed-by:" to the first patch, and add a second patch fixing
SO_VM_SOCKETS_* calls, which depends on the first one (hence, it's now
a patch series.)
Changes for v3:
- fix the same problem in vsock_perf and update commit message
Changes for v2:
- add "Fixes:" lines to the commit message

Konstantin Shkolnyy (2):
  vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
  vsock/test: fix parameter types in SO_VM_SOCKETS_* calls

 tools/testing/vsock/vsock_perf.c |  8 
 tools/testing/vsock/vsock_test.c | 23 ---
 2 files changed, 20 insertions(+), 11 deletions(-)

-- 
2.34.1

Re: [PATCH v5 04/13] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
> +void iommufd_viommu_destroy(struct iommufd_object *obj)
> +{
> + struct iommufd_viommu *viommu =
> + container_of(obj, struct iommufd_viommu, obj);
> +
> + if (viommu->ops && viommu->ops->free)
> + viommu->ops->free(viommu);

Ops can't be null and free can't be null, that would mean there is a
memory leak.

> + refcount_dec(&viommu->hwpt->common.obj.users);

Don't touch viommu after freeing it

Did you run selftests with kasn?

> +int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
> +{
> + struct iommu_viommu_alloc *cmd = ucmd->cmd;
> + struct iommufd_hwpt_paging *hwpt_paging;
> + struct iommufd_viommu *viommu;
> + struct iommufd_device *idev;
> + const struct iommu_ops *ops;
> + int rc;
> +
> + if (cmd->flags || cmd->type == IOMMU_VIOMMU_TYPE_DEFAULT)
> + return -EOPNOTSUPP;
> +
> + idev = iommufd_get_device(ucmd, cmd->dev_id);
> + if (IS_ERR(idev))
> + return PTR_ERR(idev);
> +
> + ops = dev_iommu_ops(idev->dev);
> + if (!ops->viommu_alloc) {
> + rc = -EOPNOTSUPP;
> + goto out_put_idev;
> + }
> +
> + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id);
> + if (IS_ERR(hwpt_paging)) {
> + rc = PTR_ERR(hwpt_paging);
> + goto out_put_idev;
> + }
> +
> + if (!hwpt_paging->nest_parent) {
> + rc = -EINVAL;
> + goto out_put_hwpt;
> + }
> +
> + viommu = ops->viommu_alloc(idev->dev, hwpt_paging->common.domain,
> +ucmd->ictx, cmd->type);
> + if (IS_ERR(viommu)) {
> + rc = PTR_ERR(viommu);
> + goto out_put_hwpt;
> + }

Check that ops and ops->free are valid here with a WARN_ON

> + rc = iommufd_verify_unfinalized_object(ucmd->ictx, &viommu->obj);
> + if (rc) {
> + kfree(viommu);
> + goto out_put_hwpt;
> + }
> +
> + viommu->type = cmd->type;
> + viommu->ictx = ucmd->ictx;
> + viommu->hwpt = hwpt_paging;
> + /*
> +  * It is the most likely case that a physical IOMMU is unpluggable. A
> +  * pluggable IOMMU instance (if exists) is responsible for refcounting
> +  * on its own.
> +  */
> + viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev);
> +
> + refcount_inc(&viommu->hwpt->common.obj.users);

Put this line right after the one storing to viommu_>hwpt

Jason

Re: [PATCH] rpmsg_ns: Work around TI non-standard message

2024-10-29 Thread Romain Naour

Hello Richard, All,

Le 15/10/2024 à 20:00, Richard Weinberger a écrit :
> Am Dienstag, 15. Oktober 2024, 19:56:08 CEST schrieb Mathieu Poirier:
 In my opinion the real fix here is to get TI to use the standard message
 announcement structure.  The ->desc field doesn't seem to be that useful 
 since
 it gets discarted.
>>>
>>> This is for the future, the goal of my patch is helping people to
>>> get existing DSP programs work with mainline.
>>> Not everyone can or want to rebuild theirs DSP programs when moving to a 
>>> mainline
>>> kernel.
>>
>> That's an even better argument to adopt the standard structure as soon as
>> possible.  Modifying the mainline kernel to adapt to vendors' quirks doesn't
>> scale.  
> 
> Well, I can't speak for TI.
> But I have little hope.

I'm also using an AM57xx SoC with DSP firmware and I had the same issue while
updating the kernel from 5.10 to a newer version.

remoteproc rpmsg description field changes [1] is required by the DSP firmware
based on TI-RTOS that is loaded by remoteproc using the new (as of 2013) but
still experimental RPMSG_NS_2_0 [2].

RPMSG_NS_2_0 broke compatibility with existing rpmsg structs [3] (defined in
upstream kernel) for all devices except OMAP5 and newer SoC (Newer 64bits SoC
DRA8 and Jacinto doesn’t need this change thanks to the new IPC-lld 
implementation).

This rpmsg description field has been added to ipcdev long time ago (14/03/2013)
and requires this kernel vendor change since then (all DSP firmware generated by
AM57xx TI SDK need it).

Note:
RPMSG_NS_2_0 is not the only vendor changes you need to rebase on newer 
kernels...

DSP firmware generated by TI SDK are using the vendor MessageQ API [4] that
requires AF_RPMSG socket support [5] in the kernel.
This driver was never upstreamed since the IPC 3.x is deprecated nowadays and
replaced by the IPC-lld on newer SoC (IPC-lld uses the upstream generic
rpmg-char driver).

All theses patches were moved out of ti-linux-kernel (6.1 since) to a meta-tisdk
yocto layer that is used only for am57xx vendor kernel. See the latest SDK
release for AM57xx [7].

So, DSP firmwares on AM57xx will requires theses two vendor patches since it's
how the TI IPC stack was designed more than 10 years ago.

It's nice to see newer TI SoC able to use upstream kernel using the standard
structure.

[1]
https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/commit/?h=ti-rt-linux-5.10.y&id=7e3ea0d62a4bf0ca04be9bc320d13f564aab0a92
[2]
https://git.ti.com/cgit/ipc/ipcdev/commit/?id=e8a33844cd2faa404e457d13f9d54478ec8129e7
[3]
https://git.ti.com/cgit/ipc/ipcdev/commit/?id=1264ed112ef8c0eed6ff30503b14f81b8ff11dd7
[4]
http://downloads.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/ipc/latest/docs/doxygen/html/_message_q_8h.html
[5]
https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/commit/?h=ti-rt-linux-5.10.y&id=f4b978a978c38149f712ddd137f12ed5fb914161
[6]
https://git.ti.com/cgit/ti-sdk-linux/meta-tisdk/commit/?h=am57x-9.x&id=25e56a0615fb8e973e516b5a225ee81f655f98db
[7] https://www.ti.com/tool/download/PROCESSOR-SDK-LINUX-AM57X/09.02.00.133

Best regards,
Romain


> 
> Thanks,
> //richard
>

Re: [PATCH 1/3] remoteproc: k3-r5: Use IO memset to clear TCMs

2024-10-29 Thread Mathieu Poirier

I have applied all 3 patches in this set.

Thanks,
Mathieu

On Mon, Oct 21, 2024 at 03:45:55PM -0500, Andrew Davis wrote:
> While it should be safe to use normal memset() on these memories as they
> are mapped as Normal Non-Cached, using the memset_io() provides stronger
> guarantees on access alignment and fixes a sparse check warning. Switch
> to memset_io() here.
> 
> Signed-off-by: Andrew Davis 
> ---
>  drivers/remoteproc/ti_k3_r5_remoteproc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c 
> b/drivers/remoteproc/ti_k3_r5_remoteproc.c
> index 2f996a962f557..e1fe85e5eba6a 100644
> --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
> +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
> @@ -487,10 +487,10 @@ static int k3_r5_rproc_prepare(struct rproc *rproc)
>* can be effective on all TCM addresses.
>*/
>   dev_dbg(dev, "zeroing out ATCM memory\n");
> - memset(core->mem[0].cpu_addr, 0x00, core->mem[0].size);
> + memset_io(core->mem[0].cpu_addr, 0x00, core->mem[0].size);
>  
>   dev_dbg(dev, "zeroing out BTCM memory\n");
> - memset(core->mem[1].cpu_addr, 0x00, core->mem[1].size);
> + memset_io(core->mem[1].cpu_addr, 0x00, core->mem[1].size);
>  
>   return 0;
>  }
> -- 
> 2.39.2
>

Re: [PATCH v5 04/13] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

2024-10-29 Thread Nicolin Chen

On Tue, Oct 29, 2024 at 12:36:24PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
> > > +void iommufd_viommu_destroy(struct iommufd_object *obj)
> > > +{
> > > + struct iommufd_viommu *viommu =
> > > + container_of(obj, struct iommufd_viommu, obj);
> > > +
> > > + if (viommu->ops && viommu->ops->free)
> > > + viommu->ops->free(viommu);
> > 
> > Ops can't be null and free can't be null, that would mean there is a
> > memory leak.
> 
> Actually, it is just named wrong, it should be called destroy like
> this op, it doesn't free any memory..

Well, it frees if driver allocated something in its top structure.
Yet, "destroy" does sound less confusing. Let's rename it, assuming
it can still remain to be optional as we have here.

Thanks
Nicolin

Re: [PATCH 2/2] kselftest/arm64: Lower poll interval while waiting for fp-stress children

2024-10-29 Thread Mark Rutland

Nit: the title says we lower the poll interval, while we actually raise
it. Maybe that'd be clearer as:

kselftest/arm64: Raise poll timeout while waiting for fp-stress children

... or:

kselftest/arm64: Poll less frequently while waiting for fp-stress 
children

That aside, this looks fine.

Mark.

On Tue, Oct 29, 2024 at 12:10:40AM +, Mark Brown wrote:
> While fp-stress is waiting for children to start it doesn't send any
> signals to them so there is no need for it to have as short an epoll()
> timeout as it does when the children are all running. We do still want to
> have some timeout so that we can log diagnostics about missing children but
> this can be relatively large. On emulated platforms the overhead of running
> the supervisor process is quite high, especially during the process of
> execing the test binaries.
> 
> Implement a longer epoll() timeout during the setup phase, using a 5s
> timeout while waiting for children and switching  to the signal raise
> interval when all the children are started and we start sending signals.
> 
> Signed-off-by: Mark Brown 
> ---
>  tools/testing/selftests/arm64/fp/fp-stress.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c 
> b/tools/testing/selftests/arm64/fp/fp-stress.c
> index 
> c986c68fbcacdd295f4db57277075209193cb943..963e2d891ced72fb8d6eff4fdb5c7df0724b14f1
>  100644
> --- a/tools/testing/selftests/arm64/fp/fp-stress.c
> +++ b/tools/testing/selftests/arm64/fp/fp-stress.c
> @@ -453,6 +453,7 @@ int main(int argc, char **argv)
>  {
>   int ret;
>   int timeout = 10 * (1000 / SIGNAL_INTERVAL_MS);
> + int poll_interval = 5000;
>   int cpus, i, j, c;
>   int sve_vl_count, sme_vl_count;
>   bool all_children_started = false;
> @@ -588,7 +589,7 @@ int main(int argc, char **argv)
>* especially useful in emulation where we will both
>* be slow and likely to have a large set of VLs.
>*/
> - ret = epoll_wait(epoll_fd, evs, tests, SIGNAL_INTERVAL_MS);
> + ret = epoll_wait(epoll_fd, evs, tests, poll_interval);
>   if (ret < 0) {
>   if (errno == EINTR)
>   continue;
> @@ -626,6 +627,7 @@ int main(int argc, char **argv)
>   }
>  
>   all_children_started = true;
> + poll_interval = SIGNAL_INTERVAL_MS;
>   }
>  
>   if ((timeout % LOG_INTERVALS) == 0)
> 
> -- 
> 2.39.2
>

Re: [PATCH v5 10/13] iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:49:50PM -0700, Nicolin Chen wrote:
> Implement the viommu alloc/free functions to increase/reduce refcount of
> its dependent mock iommu device. User space can verify this loop via the
> IOMMU_VIOMMU_TYPE_SELFTEST.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/iommufd/iommufd_test.h |  2 +
>  drivers/iommu/iommufd/selftest.c | 64 
>  2 files changed, 66 insertions(+)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH v2] kunit: Introduce autorun option

2024-10-29 Thread Rae Moar

On Mon, Oct 28, 2024 at 5:54 PM Stanislav Kinsburskii
 wrote:
>
> The new option controls tests run on boot or module load. With the new
> debugfs "run" dentry allowing to run tests on demand, an ability to disable
> automatic tests run becomes a useful option in case of intrusive tests.
>
> The option is set to true by default to preserve the existent behavior. It
> can be overridden by either the corresponding module option or by the
> corresponding config build option.
>
> Signed-off-by: Stanislav Kinsburskii 

Hello,

This looks good to me!

Reviewed-by: Rae Moar 

Thanks!
-Rae

> ---
>  include/kunit/test.h |4 +++-
>  lib/kunit/Kconfig|   12 
>  lib/kunit/debugfs.c  |2 +-
>  lib/kunit/executor.c |   21 +++--
>  lib/kunit/test.c |6 --
>  5 files changed, 39 insertions(+), 6 deletions(-)
>
> diff --git a/include/kunit/test.h b/include/kunit/test.h
> index 34b71e42fb10..58dbab60f853 100644
> --- a/include/kunit/test.h
> +++ b/include/kunit/test.h
> @@ -312,6 +312,7 @@ static inline void kunit_set_failure(struct kunit *test)
>  }
>
>  bool kunit_enabled(void);
> +bool kunit_autorun(void);
>  const char *kunit_action(void);
>  const char *kunit_filter_glob(void);
>  char *kunit_filter(void);
> @@ -334,7 +335,8 @@ kunit_filter_suites(const struct kunit_suite_set 
> *suite_set,
> int *err);
>  void kunit_free_suite_set(struct kunit_suite_set suite_set);
>
> -int __kunit_test_suites_init(struct kunit_suite * const * const suites, int 
> num_suites);
> +int __kunit_test_suites_init(struct kunit_suite * const * const suites, int 
> num_suites,
> +bool run_tests);
>
>  void __kunit_test_suites_exit(struct kunit_suite **suites, int num_suites);
>
> diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig
> index 34d7242d526d..a97897edd964 100644
> --- a/lib/kunit/Kconfig
> +++ b/lib/kunit/Kconfig
> @@ -81,4 +81,16 @@ config KUNIT_DEFAULT_ENABLED
>   In most cases this should be left as Y. Only if additional opt-in
>   behavior is needed should this be set to N.
>
> +config KUNIT_AUTORUN_ENABLED
> +   bool "Default value of kunit.autorun"
> +   default y
> +   help
> + Sets the default value of kunit.autorun. If set to N then KUnit
> + tests will not run after initialization unless kunit.autorun=1 is
> + passed to the kernel command line. The test can still be run 
> manually
> + via debugfs interface.
> +
> + In most cases this should be left as Y. Only if additional opt-in
> + behavior is needed should this be set to N.
> +
>  endif # KUNIT
> diff --git a/lib/kunit/debugfs.c b/lib/kunit/debugfs.c
> index d548750a325a..9df064f40d98 100644
> --- a/lib/kunit/debugfs.c
> +++ b/lib/kunit/debugfs.c
> @@ -145,7 +145,7 @@ static ssize_t debugfs_run(struct file *file,
> struct inode *f_inode = file->f_inode;
> struct kunit_suite *suite = (struct kunit_suite *) f_inode->i_private;
>
> -   __kunit_test_suites_init(&suite, 1);
> +   __kunit_test_suites_init(&suite, 1, true);
>
> return count;
>  }
> diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
> index 34b7b6833df3..3f39955cb0f1 100644
> --- a/lib/kunit/executor.c
> +++ b/lib/kunit/executor.c
> @@ -29,6 +29,22 @@ const char *kunit_action(void)
> return action_param;
>  }
>
> +/*
> + * Run KUnit tests after initialization
> + */
> +#ifdef CONFIG_KUNIT_AUTORUN_ENABLED
> +static bool autorun_param = true;
> +#else
> +static bool autorun_param;
> +#endif
> +module_param_named(autorun, autorun_param, bool, 0);
> +MODULE_PARM_DESC(autorun, "Run KUnit tests after initialization");
> +
> +bool kunit_autorun(void)
> +{
> +   return autorun_param;
> +}
> +
>  static char *filter_glob_param;
>  static char *filter_param;
>  static char *filter_action_param;
> @@ -260,13 +276,14 @@ kunit_filter_suites(const struct kunit_suite_set 
> *suite_set,
>  void kunit_exec_run_tests(struct kunit_suite_set *suite_set, bool builtin)
>  {
> size_t num_suites = suite_set->end - suite_set->start;
> +   bool autorun = kunit_autorun();
>
> -   if (builtin || num_suites) {
> +   if (autorun && (builtin || num_suites)) {
> pr_info("KTAP version 1\n");
> pr_info("1..%zu\n", num_suites);
> }
>
> -   __kunit_test_suites_init(suite_set->start, num_suites);
> +   __kunit_test_suites_init(suite_set->start, num_suites, autorun);
>  }
>
>  void kunit_exec_list_tests(struct kunit_suite_set *suite_set, bool 
> include_attr)
> diff --git a/lib/kunit/test.c b/lib/kunit/test.c
> index 089c832e3cdb..146d1b48a096 100644
> --- a/lib/kunit/test.c
> +++ b/lib/kunit/test.c
> @@ -708,7 +708,8 @@ bool kunit_enabled(void)
> return enable_param;
>  }
>
> -int __kunit_test_suites_init(struct kunit_suite * const * const suites, int 
> num_suites)
> +int __kunit_test_suites_init(struct kunit_suite * const * const suites, int 
> num_su

Re: [PATCH v5 03/13] iommufd: Add iommufd_verify_unfinalized_object

2024-10-29 Thread Nicolin Chen

On Tue, Oct 29, 2024 at 11:49:07AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 25, 2024 at 04:49:43PM -0700, Nicolin Chen wrote:
> > To support driver-allocated vIOMMU objects, it's suggested to call the
> > allocator helper in IOMMU dirvers. However, there is no guarantee that
> > drivers will all use it and allocate objects properly.
> > 
> > Add a helper for iommufd core to verify if an unfinalized object is at
> > least reserved in the ictx.
> 
> I don't think we need this..
> 
> iommufd_object_finalize() already does:
> 
>   old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL);
>   /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */
>   WARN_ON(old);

It feels unsafe to carry on the iommufd_viommu_alloc_ioctl() until
iommufd_object_finalize() as the function would touch the returned
faulty viommu pointer? E.g. what if the viommu has an even smaller
size than struct iommufd_viommu?

> So, we could enhance it to be more robust, like this patch is doing:
> 
> void iommufd_object_finalize(struct iommufd_ctx *ictx,
>struct iommufd_object *obj)
> {
>   void *old;
> 
>   old = xa_cmpxchg(&ictx->objects, obj->id, XA_ZERO_ENTRY, obj,
>GFP_KERNEL);
>   /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */
>   WARN_ON(old != XA_ZERO_ENTRY);
> 
> It is always a driver bug to not initialize that object, so WARN_ON is
> OK.

I think we'd need the same change in iommufd_object_abort() too.

Thanks
Nicolin

Re: [PATCH v3] selftests: add new kallsyms selftests

2024-10-29 Thread Sami Tolvanen

Hi Luis,

On Mon, Oct 21, 2024 at 12:33 PM Luis Chamberlain  wrote:
>
> diff --git a/lib/tests/module/gen_test_kallsyms.sh 
> b/lib/tests/module/gen_test_kallsyms.sh
> new file mode 100755
> index ..e85f10dc11bd
> --- /dev/null
> +++ b/lib/tests/module/gen_test_kallsyms.sh
> @@ -0,0 +1,128 @@
[..]
> +gen_template_module_data_b()
> +{
> +   printf "\nextern int auto_test_a_%010d;\n\n" 28
> +   echo "static int auto_runtime_test(void)"
> +   echo "{"
> +   printf "\nreturn auto_test_a_%010d;\n" 28
> +   echo "}"
> +}

FYI, I get a warning when loading test_kallsyms_b because the init
function return value is >0:

# modprobe test_kallsyms_b
[   11.154496] do_init_module: 'test_kallsyms_b'->init suspiciously
returned 255, it should follow 0/-E convention
[   11.154496] do_init_module: loading module anyway...
[   11.156156] CPU: 3 UID: 0 PID: 116 Comm: modprobe Not tainted
6.12.0-rc5-00020-g897cb2ff413d #1
[   11.156832] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[   11.157762] Call Trace:
[   11.158914]  
[   11.159253]  dump_stack_lvl+0x3f/0xb0
[   11.160279]  do_init_module+0x1f4/0x200
[   11.160586]  __se_sys_finit_module+0x30c/0x400
[   11.160948]  do_syscall_64+0xd0/0x1a0
[   11.161255]  ? arch_exit_to_user_mode_prepare+0x11/0x60
[   11.161659]  ? irqentry_exit_to_user_mode+0x8e/0xb0
[   11.162052]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   11.162598] RIP: 0033:0x7f5843968cf6
[   11.163076] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1e 9a 02
00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3a fd ff ff c3 48 c7 c6 01 00 00 00
e9 a1
[   11.164465] RSP: 002b:7ffefcc92d68 EFLAGS: 0246 ORIG_RAX:
0139
[   11.165046] RAX: ffda RBX: 0003 RCX: 7f5843968cf6
[   11.165656] RDX:  RSI: 320429e0 RDI: 0003
[   11.166220] RBP: 320429e0 R08: 0074 R09: 
[   11.166804] R10:  R11: 0246 R12: 32042b50
[   11.167378] R13: 0001 R14: 32042c10 R15: 
[   11.168007]  

> diff --git a/tools/testing/selftests/module/find_symbol.sh 
> b/tools/testing/selftests/module/find_symbol.sh
> new file mode 100755
> index ..140364d3c49f
> --- /dev/null
> +++ b/tools/testing/selftests/module/find_symbol.sh
> @@ -0,0 +1,81 @@
[..]
> +test_reqs()
> +{
> +   if ! which modprobe 2> /dev/null > /dev/null; then
> +   echo "$0: You need modprobe installed" >&2
> +   exit $ksft_skip
> +   fi
> +
> +   if ! which kmod 2> /dev/null > /dev/null; then
> +   echo "$0: You need kmod installed" >&2
> +   exit $ksft_skip
> +   fi

Is there a reason to test for kmod? I don't see it called directly in
this script.

Also, shouldn't you add the module directory to TARGETS in
tools/testing/selftests/Makefile? Otherwise the script won't be
installed with the rest of kselftests.

Sami

Re: [PATCH v5 09/13] iommufd/selftest: Add refcount to mock_iommu_device

2024-10-29 Thread Jason Gunthorpe

On Fri, Oct 25, 2024 at 04:49:49PM -0700, Nicolin Chen wrote:
> For an iommu_dev that can unplug (so far only this selftest does so), the
> viommu->iommu_dev pointer has no guarantee of its life cycle after it is
> copied from the idev->dev->iommu->iommu_dev.
> 
> Track the user count of the iommu_dev. Postpone the exit routine using a
> completion, if refcount is unbalanced. The refcount inc/dec will be added
> in the following patch.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/iommufd/selftest.c | 32 
>  1 file changed, 24 insertions(+), 8 deletions(-)

Reviewed-by: Jason Gunthorpe 

Since this is built into the iommufd module it can't be unloaded
without also unloading iommufd, which is impossible as long as any
iommufd FDs are open. So I expect that the WARN_ON can never happen.

Jason

Re: [PATCH v5 04/13] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

2024-10-29 Thread Nicolin Chen

On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
> > +void iommufd_viommu_destroy(struct iommufd_object *obj)
> > +{
> > +   struct iommufd_viommu *viommu =
> > +   container_of(obj, struct iommufd_viommu, obj);
> > +
> > +   if (viommu->ops && viommu->ops->free)
> > +   viommu->ops->free(viommu);
> 
> Ops can't be null and free can't be null, that would mean there is a
> memory leak.

What if a driver doesn't have anything to free? You're suggesting
to force the driver to have an empty free function, right? We can
do that, if it is preferable:

void arm_vsmmu_free(struct iommufd_viommu *viommu)
{
}

> > +   refcount_dec(&viommu->hwpt->common.obj.users);
> 
> Don't touch viommu after freeing it

Drivers should only free their own stuff without touching the core:
"* @free: Free all driver-specific parts of an iommufd_viommu. The memory of the
 *vIOMMU will be free-ed by iommufd core after calling this free op."

Then, viommu object is freed by the core after ->destroy(), right?

> Did you run selftests with kasn?

Yea, all passed with no WARN_ON.

Thanks
Nicolin

Re: [PATCH v5 04/13] iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl

2024-10-29 Thread Nicolin Chen

On Tue, Oct 29, 2024 at 12:59:35PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 29, 2024 at 08:46:40AM -0700, Nicolin Chen wrote:
> > On Tue, Oct 29, 2024 at 12:36:24PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 29, 2024 at 11:54:36AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Oct 25, 2024 at 04:49:44PM -0700, Nicolin Chen wrote:
> > > > > +void iommufd_viommu_destroy(struct iommufd_object *obj)
> > > > > +{
> > > > > + struct iommufd_viommu *viommu =
> > > > > + container_of(obj, struct iommufd_viommu, obj);
> > > > > +
> > > > > + if (viommu->ops && viommu->ops->free)
> > > > > + viommu->ops->free(viommu);
> > > > 
> > > > Ops can't be null and free can't be null, that would mean there is a
> > > > memory leak.
> > > 
> > > Actually, it is just named wrong, it should be called destroy like
> > > this op, it doesn't free any memory..
> > 
> > Well, it frees if driver allocated something in its top structure.
> > Yet, "destroy" does sound less confusing. Let's rename it, assuming
> > it can still remain to be optional as we have here.
> 
> Yes, optional is right, I was confused by the name. The core code uses
> destroy so I'd stick with that.

Ack.

Nicolin

Re: [PATCH v5 04/13] iommufd/hw_pagetable: Enforce invalidation op on vIOMMU-based hwpt_nested

2024-10-29 Thread Jason Gunthorpe

On Tue, Oct 29, 2024 at 08:22:43AM +, Tian, Kevin wrote:
> > From: Nicolin Chen 
> > Sent: Saturday, October 26, 2024 7:51 AM
> > 
> > @@ -302,7 +302,9 @@ iommufd_viommu_alloc_hwpt_nested(struct
> > iommufd_viommu *viommu, u32 flags,
> > }
> > hwpt->domain->owner = viommu->iommu_dev->ops;
> > 
> > -   if (WARN_ON_ONCE(hwpt->domain->type !=
> > IOMMU_DOMAIN_NESTED)) {
> > +   if (WARN_ON_ONCE(hwpt->domain->type !=
> > IOMMU_DOMAIN_NESTED ||
> > +(!viommu->ops->cache_invalidate &&
> > + !hwpt->domain->ops->cache_invalidate_user))) {
> > rc = -EINVAL;
> > goto out_abort;
> > }
> 
> According to patch5, cache invalidate in viommu only uses
> viommu->ops->cache_invalidate. Is here intended to allow
> nested hwpt created via viommu to still support the old
> method?

I think that is reasonable?

Jason

1 2 >

1 - 100 of 140 matches

Mail list logo