Re: [PATCH net 1/4] selftests: openvswitch: Add version check for pyroute2
On Fri, 2023-10-06 at 11:12 -0400, Aaron Conole wrote: > Paolo Abeni reports that on some systems the pyroute2 version isn't > new enough to run the test suite. Ensure that we support a minimum > version of 0.6 for all cases (which does include the existing ones). > The 0.6.1 version was released in May of 2021, so should be > propagated to most installations at this point. > > The alternative that Paolo proposed was to only skip when the > add-flow is being run. This would be okay for most cases, except > if a future test case is added that needs to do flow dump without > an associated add (just guessing). In that case, it could also be > broken and we would need additional skip logic anyway. Just draw > a line in the sand now. > > Fixes: 25f16c873fb1 ("selftests: add openvswitch selftest suite") > Reported-by: Paolo Abeni > Closes: > https://lore.kernel.org/lkml/8470c431e0930d2ea204a9363a60937289b7fdbe.ca...@redhat.com/ > Signed-off-by: Aaron Conole > --- > tools/testing/selftests/net/openvswitch/openvswitch.sh | 2 +- > tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 8 > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh > b/tools/testing/selftests/net/openvswitch/openvswitch.sh > index 9c2012d70b08e..220c3356901ef 100755 > --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh > +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh > @@ -525,7 +525,7 @@ run_test() { > fi > > if python3 ovs-dpctl.py -h 2>&1 | \ > - grep "Need to install the python" >/dev/null 2>&1; then > + grep -E "Need to (install|upgrade) the python" >/dev/null 2>&1; > then > stdbuf -o0 printf "TEST: %-60s [PYLIB]\n" "${tdesc}" > return $ksft_skip > fi > diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > index 912dc8c490858..9686ca30d516d 100644 > --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > @@ -28,6 +28,8 @@ try: > from pyroute2.netlink import nlmsg_atoms > from pyroute2.netlink.exceptions import NetlinkError > from pyroute2.netlink.generic import GenericNetlinkSocket > +import pyroute2 > + > except ModuleNotFoundError: > print("Need to install the python pyroute2 package.") > sys.exit(0) > @@ -1998,6 +2000,12 @@ def main(argv): > nlmsg_atoms.ovskey = ovskey > nlmsg_atoms.ovsactions = ovsactions > > +# version check for pyroute2 > +prverscheck = pyroute2.__version__.split(".") > +if int(prverscheck[0]) == 0 and int(prverscheck[1]) < 6: > +print("Need to upgrade the python pyroute2 package.") I think it would be better to propagate/print also the minimum version required, so that the user should not have to resort looking at the self-test sources to learn the required minimum version. Cheers, Paolo
Re: [PATCH net 3/4] selftests: openvswitch: Skip drop testing on older kernels
On Fri, 2023-10-06 at 11:12 -0400, Aaron Conole wrote: > Kernels that don't have support for openvswitch drop reasons also > won't have the drop counter reasons, so we should skip the test > completely. It previously wasn't possible to build a test case > for this without polluting the datapath, so we introduce a mechanism > to clear all the flows from a datapath allowing us to test for > explicit drop actions, and then clear the flows to build the > original test case. > > Fixes: 4242029164d6 ("selftests: openvswitch: add explicit drop testcase") > Signed-off-by: Aaron Conole > --- > .../selftests/net/openvswitch/openvswitch.sh | 17 ++ > .../selftests/net/openvswitch/ovs-dpctl.py| 34 +++ > 2 files changed, 51 insertions(+) > > diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh > b/tools/testing/selftests/net/openvswitch/openvswitch.sh > index 2a0112be7ead5..ca7090e71bff2 100755 > --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh > +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh > @@ -144,6 +144,12 @@ ovs_add_flow () { > return 0 > } > > +ovs_del_flows () { > + info "Deleting all flows from DP: sbx:$1 br:$2" > + ovs_sbx "$1" python3 $ovs_base/ovs-dpctl.py del-flows "$2" > +return 0 The chunk above mixes whitespaces and tabs for indenting, please be consistent. Thanks! Paolo
Re: [PATCH net 4/4] selftests: openvswitch: Fix the ct_tuple for v4
On Fri, 2023-10-06 at 11:12 -0400, Aaron Conole wrote: > Caught during code review. Since there are a few other small things, please additionally expand this changelog briefly describing the addressed problem and it's consequences. Thanks, Paolo
Re: [PATCH v2] selftests/net: synchronize udpgso_bench rx and tx
On Mon, 2023-10-30 at 13:40 -0400, Lucas Karpinski wrote: > The sockets used by udpgso_bench_tx aren't always ready when > udpgso_bench_tx transmits packets. This issue is more prevalent in -rt > kernels, but can occur in both. Replace the hacky sleep calls with a > function that checks whether the ports in the namespace are ready for > use. > > Suggested-by: Paolo Abeni > Signed-off-by: Lucas Karpinski > --- > https://lore.kernel.org/all/t7v6mmuobrbucyfpwqbcujtvpa3wxnsrc36cz5rr6kzzrzkwtj@toz6mr4ggnyp/ > > Changelog v2: > - applied synchronization method suggested by Pablo > - changed commit message to code > > tools/testing/selftests/net/udpgro.sh | 27 ++- > tools/testing/selftests/net/udpgro_bench.sh | 19 +++-- > tools/testing/selftests/net/udpgro_frglist.sh | 19 +++-- > 3 files changed, 54 insertions(+), 11 deletions(-) > > diff --git a/tools/testing/selftests/net/udpgro.sh > b/tools/testing/selftests/net/udpgro.sh > index 0c743752669a..04792a315729 100755 > --- a/tools/testing/selftests/net/udpgro.sh > +++ b/tools/testing/selftests/net/udpgro.sh > @@ -24,6 +24,22 @@ cleanup() { > } > trap cleanup EXIT > > +wait_local_port_listen() > +{ > + local port="${1}" > + > + local port_hex > + port_hex="$(printf "%04X" "${port}")" > + > + local i Minor nit: I think the code would be more readable, if you will group the variable together: local port="${1}" local port_hex local i port_hex="$(printf "%04X" "${port}")" # ... > + for i in $(seq 10); do > + ip netns exec "${PEER_NS}" cat /proc/net/udp* | \ > + awk "BEGIN {rc=1} {if (\$2 ~ /:${port_hex}\$/) {rc=0; > exit}} END {exit rc}" && > + break > + sleep 0.1 > + done > +} Since you wrote the same function verbatim in 3 different files, I think it would be better place it in separate, new, net_helper.sh file and include such file from the various callers. Possibly additionally rename the function as wait_local_udp_port_listen. Thanks! Paolo
Re: [PATCH v2] selftests/net: synchronize udpgso_bench rx and tx
On Mon, 2023-10-30 at 13:40 -0400, Lucas Karpinski wrote: > The sockets used by udpgso_bench_tx aren't always ready when > udpgso_bench_tx transmits packets. This issue is more prevalent in -rt > kernels, but can occur in both. Replace the hacky sleep calls with a > function that checks whether the ports in the namespace are ready for > use. > > Suggested-by: Paolo Abeni > Signed-off-by: Lucas Karpinski > --- > https://lore.kernel.org/all/t7v6mmuobrbucyfpwqbcujtvpa3wxnsrc36cz5rr6kzzrzkwtj@toz6mr4ggnyp/ > I almost forgot ... > Changelog v2: > - applied synchronization method suggested by Pablo ^ most common typo since match 2022 ;) Less irrelevant, please include the target tree in the next submission, in this case 'net-next'. Thanks, Paolo
Re: [PATCH v2] selftests/net: synchronize udpgso_bench rx and tx
On Tue, 2023-10-31 at 09:26 -0400, Lucas Karpinski wrote: > > Since you wrote the same function verbatim in 3 different files, I > > think it would be better place it in separate, new, net_helper.sh > > file > > and include such file from the various callers. Possibly > > additionally > > rename the function as wait_local_udp_port_listen. > > > Thanks, I'll move it over. I think it would be best though to leave > udp out of the name and to just pass the protocol as an argument. Indeed. I suggested the other option just to keep it the simpler, but if you have time and will, please go ahead! Cheers, Paolo
Re: [RFC PATCH v3 04/12] netdev: support binding dma-buf to netdevice
On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote: [...] > +int netdev_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, > +struct netdev_dmabuf_binding **out) > +{ > + struct netdev_dmabuf_binding *binding; > + struct scatterlist *sg; > + struct dma_buf *dmabuf; > + unsigned int sg_idx, i; > + unsigned long virtual; > + int err; > + > + if (!capable(CAP_NET_ADMIN)) > + return -EPERM; > + > + dmabuf = dma_buf_get(dmabuf_fd); > + if (IS_ERR_OR_NULL(dmabuf)) > + return -EBADFD; > + > + binding = kzalloc_node(sizeof(*binding), GFP_KERNEL, > +dev_to_node(&dev->dev)); > + if (!binding) { > + err = -ENOMEM; > + goto err_put_dmabuf; > + } > + > + xa_init_flags(&binding->bound_rxq_list, XA_FLAGS_ALLOC); > + > + refcount_set(&binding->ref, 1); > + > + binding->dmabuf = dmabuf; > + > + binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent); > + if (IS_ERR(binding->attachment)) { > + err = PTR_ERR(binding->attachment); > + goto err_free_binding; > + } > + > + binding->sgt = dma_buf_map_attachment(binding->attachment, > + DMA_BIDIRECTIONAL); > + if (IS_ERR(binding->sgt)) { > + err = PTR_ERR(binding->sgt); > + goto err_detach; > + } > + > + /* For simplicity we expect to make PAGE_SIZE allocations, but the > + * binding can be much more flexible than that. We may be able to > + * allocate MTU sized chunks here. Leave that for future work... > + */ > + binding->chunk_pool = gen_pool_create(PAGE_SHIFT, > + dev_to_node(&dev->dev)); > + if (!binding->chunk_pool) { > + err = -ENOMEM; > + goto err_unmap; > + } > + > + virtual = 0; > + for_each_sgtable_dma_sg(binding->sgt, sg, sg_idx) { > + dma_addr_t dma_addr = sg_dma_address(sg); > + struct dmabuf_genpool_chunk_owner *owner; > + size_t len = sg_dma_len(sg); > + struct page_pool_iov *ppiov; > + > + owner = kzalloc_node(sizeof(*owner), GFP_KERNEL, > + dev_to_node(&dev->dev)); > + owner->base_virtual = virtual; > + owner->base_dma_addr = dma_addr; > + owner->num_ppiovs = len / PAGE_SIZE; > + owner->binding = binding; > + > + err = gen_pool_add_owner(binding->chunk_pool, dma_addr, > + dma_addr, len, dev_to_node(&dev->dev), > + owner); > + if (err) { > + err = -EINVAL; > + goto err_free_chunks; > + } > + > + owner->ppiovs = kvmalloc_array(owner->num_ppiovs, > +sizeof(*owner->ppiovs), > +GFP_KERNEL); > + if (!owner->ppiovs) { > + err = -ENOMEM; > + goto err_free_chunks; > + } > + > + for (i = 0; i < owner->num_ppiovs; i++) { > + ppiov = &owner->ppiovs[i]; > + ppiov->owner = owner; > + refcount_set(&ppiov->refcount, 1); > + } > + > + dma_addr += len; I'm trying to wrap my head around the whole infra... the above line is confusing. Why do you increment dma_addr? it will be re-initialized in the next iteration. Cheers, Paolo
Re: [RFC PATCH v3 05/12] netdev: netdevice devmem allocator
On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote: [...] > +void netdev_free_devmem(struct page_pool_iov *ppiov) > +{ > + struct netdev_dmabuf_binding *binding = page_pool_iov_binding(ppiov); > + > + refcount_set(&ppiov->refcount, 1); > + > + if (gen_pool_has_addr(binding->chunk_pool, > + page_pool_iov_dma_addr(ppiov), PAGE_SIZE)) > + gen_pool_free(binding->chunk_pool, > + page_pool_iov_dma_addr(ppiov), PAGE_SIZE); Minor nit: what about caching the dma_addr value to make the above more readable? Cheers, Paolo
Re: [RFC PATCH v3 07/12] page-pool: device memory support
On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote: > Overload the LSB of struct page* to indicate that it's a page_pool_iov. > > Refactor mm calls on struct page* into helpers, and add page_pool_iov > handling on those helpers. Modify callers of these mm APIs with calls to > these helpers instead. > > In areas where struct page* is dereferenced, add a check for special > handling of page_pool_iov. > > Signed-off-by: Mina Almasry > > --- > include/net/page_pool/helpers.h | 74 - > net/core/page_pool.c| 63 > 2 files changed, 118 insertions(+), 19 deletions(-) > > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h > index b93243c2a640..08f1a2cc70d2 100644 > --- a/include/net/page_pool/helpers.h > +++ b/include/net/page_pool/helpers.h > @@ -151,6 +151,64 @@ static inline struct page_pool_iov > *page_to_page_pool_iov(struct page *page) > return NULL; > } > > +static inline int page_pool_page_ref_count(struct page *page) > +{ > + if (page_is_page_pool_iov(page)) > + return page_pool_iov_refcount(page_to_page_pool_iov(page)); > + > + return page_ref_count(page); > +} > + > +static inline void page_pool_page_get_many(struct page *page, > +unsigned int count) > +{ > + if (page_is_page_pool_iov(page)) > + return page_pool_iov_get_many(page_to_page_pool_iov(page), > + count); > + > + return page_ref_add(page, count); > +} > + > +static inline void page_pool_page_put_many(struct page *page, > +unsigned int count) > +{ > + if (page_is_page_pool_iov(page)) > + return page_pool_iov_put_many(page_to_page_pool_iov(page), > + count); > + > + if (count > 1) > + page_ref_sub(page, count - 1); > + > + put_page(page); > +} > + > +static inline bool page_pool_page_is_pfmemalloc(struct page *page) > +{ > + if (page_is_page_pool_iov(page)) > + return false; > + > + return page_is_pfmemalloc(page); > +} > + > +static inline bool page_pool_page_is_pref_nid(struct page *page, int > pref_nid) > +{ > + /* Assume page_pool_iov are on the preferred node without actually > + * checking... > + * > + * This check is only used to check for recycling memory in the page > + * pool's fast paths. Currently the only implementation of page_pool_iov > + * is dmabuf device memory. It's a deliberate decision by the user to > + * bind a certain dmabuf to a certain netdev, and the netdev rx queue > + * would not be able to reallocate memory from another dmabuf that > + * exists on the preferred node, so, this check doesn't make much sense > + * in this case. Assume all page_pool_iovs can be recycled for now. > + */ > + if (page_is_page_pool_iov(page)) > + return true; > + > + return page_to_nid(page) == pref_nid; > +} > + > /** > * page_pool_dev_alloc_pages() - allocate a page. > * @pool:pool from which to allocate > @@ -301,6 +359,9 @@ static inline long page_pool_defrag_page(struct page > *page, long nr) > { > long ret; > > + if (page_is_page_pool_iov(page)) > + return -EINVAL; > + > /* If nr == pp_frag_count then we have cleared all remaining >* references to the page: >* 1. 'n == 1': no need to actually overwrite it. > @@ -431,7 +492,12 @@ static inline void page_pool_free_va(struct page_pool > *pool, void *va, > */ > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - dma_addr_t ret = page->dma_addr; > + dma_addr_t ret; > + > + if (page_is_page_pool_iov(page)) > + return page_pool_iov_dma_addr(page_to_page_pool_iov(page)); Should the above conditional be guarded by the page_pool_mem_providers static key? this looks like fast-path. Same question for the refcount helper above. Minor nit: possibly cache 'page_is_page_pool_iov(page)' to make the code more readable. > + > + ret = page->dma_addr; > > if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) > ret <<= PAGE_SHIFT; > @@ -441,6 +507,12 @@ static inline dma_addr_t page_pool_get_dma_addr(struct > page *page) > > static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr) > { > + /* page_pool_iovs are mapped and their dma-addr can't be modified. */ > + if (page_is_page_pool_iov(page)) { > + DEBUG_NET_WARN_ON_ONCE(true); > + return false; > + } Quickly skimming over the page_pool_code it looks like page_pool_set_dma_addr() usage is guarded by the PP_FLAG_DMA_MAP page pool flag. Could the device mem provider enforce such flag being cleared on the page pool? > + > if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { > page->dma_addr = addr >> PAGE_SHIFT; >
Re: [RFC PATCH v3 08/12] net: support non paged skb frags
On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote: [...] > @@ -3421,7 +3446,7 @@ static inline struct page *skb_frag_page(const > skb_frag_t *frag) > */ > static inline void __skb_frag_ref(skb_frag_t *frag) > { > - get_page(skb_frag_page(frag)); > + page_pool_page_get_many(frag->bv_page, 1); I guess the above needs #ifdef CONFIG_PAGE_POOL guards and explicit skb_frag_is_page_pool_iov() check ? Cheers, Paolo
Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP
On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote: [...] > +/* On error, returns the -errno. On success, returns number of bytes sent to > the > + * user. May not consume all of @remaining_len. > + */ > +static int tcp_recvmsg_devmem(const struct sock *sk, const struct sk_buff > *skb, > + unsigned int offset, struct msghdr *msg, > + int remaining_len) > +{ > + struct cmsg_devmem cmsg_devmem = { 0 }; > + unsigned int start; > + int i, copy, n; > + int sent = 0; > + int err = 0; > + > + do { > + start = skb_headlen(skb); > + > + if (!skb_frags_not_readable(skb)) { As 'skb_frags_not_readable()' is intended to be a possibly wider scope test then skb->devmem, should the above test explicitly skb->devmem? > + err = -ENODEV; > + goto out; > + } > + > + /* Copy header. */ > + copy = start - offset; > + if (copy > 0) { > + copy = min(copy, remaining_len); > + > + n = copy_to_iter(skb->data + offset, copy, > + &msg->msg_iter); > + if (n != copy) { > + err = -EFAULT; > + goto out; > + } > + > + offset += copy; > + remaining_len -= copy; > + > + /* First a cmsg_devmem for # bytes copied to user > + * buffer. > + */ > + memset(&cmsg_devmem, 0, sizeof(cmsg_devmem)); > + cmsg_devmem.frag_size = copy; > + err = put_cmsg(msg, SOL_SOCKET, SO_DEVMEM_HEADER, > +sizeof(cmsg_devmem), &cmsg_devmem); > + if (err || msg->msg_flags & MSG_CTRUNC) { > + msg->msg_flags &= ~MSG_CTRUNC; > + if (!err) > + err = -ETOOSMALL; > + goto out; > + } > + > + sent += copy; > + > + if (remaining_len == 0) > + goto out; > + } > + > + /* after that, send information of devmem pages through a > + * sequence of cmsg > + */ > + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { > + const skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; > + struct page_pool_iov *ppiov; > + u64 frag_offset; > + u32 user_token; > + int end; > + > + /* skb_frags_not_readable() should indicate that ALL the > + * frags in this skb are unreadable page_pool_iovs. > + * We're checking for that flag above, but also check > + * individual pages here. If the tcp stack is not > + * setting skb->devmem correctly, we still don't want to > + * crash here when accessing pgmap or priv below. > + */ > + if (!skb_frag_page_pool_iov(frag)) { > + net_err_ratelimited("Found non-devmem skb with > page_pool_iov"); > + err = -ENODEV; > + goto out; > + } > + > + ppiov = skb_frag_page_pool_iov(frag); > + end = start + skb_frag_size(frag); > + copy = end - offset; > + > + if (copy > 0) { > + copy = min(copy, remaining_len); > + > + frag_offset = page_pool_iov_virtual_addr(ppiov) > + > + skb_frag_off(frag) + offset - > + start; > + cmsg_devmem.frag_offset = frag_offset; > + cmsg_devmem.frag_size = copy; > + err = xa_alloc((struct xarray > *)&sk->sk_user_pages, > +&user_token, frag->bv_page, > +xa_limit_31b, GFP_KERNEL); > + if (err) > + goto out; > + > + cmsg_devmem.frag_token = user_token; > + > + offset += copy; > + remaining_len -= copy; > + > + err = put_cmsg(msg, SOL_SOCKET, > +SO_DEVMEM_OFFSET, > +sizeof(cmsg_devmem), > +&cmsg_devmem); > + if (err || msg->msg_flags & MSG_CTRU
Re: [RFC PATCH v3 12/12] selftests: add ncdevmem, netcat for devmem TCP
On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote: > @@ -91,6 +95,7 @@ TEST_PROGS += test_bridge_neigh_suppress.sh > TEST_PROGS += test_vxlan_nolocalbypass.sh > TEST_PROGS += test_bridge_backup_port.sh > TEST_PROGS += fdb_flush.sh > +TEST_GEN_FILES += ncdevmem I guess we want something added to TEST_PROGS, too ;) > TEST_FILES := settings > > diff --git a/tools/testing/selftests/net/ncdevmem.c > b/tools/testing/selftests/net/ncdevmem.c > new file mode 100644 > index ..78bc3ad767ca > --- /dev/null > +++ b/tools/testing/selftests/net/ncdevmem.c > @@ -0,0 +1,546 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#define _GNU_SOURCE > +#define __EXPORTED_HEADERS__ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#define __iovec_defined > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "netdev-user.h" > +#include > + > +#define PAGE_SHIFT 12 > +#define TEST_PREFIX "ncdevmem" > +#define NUM_PAGES 16000 > + > +#ifndef MSG_SOCK_DEVMEM > +#define MSG_SOCK_DEVMEM 0x200 > +#endif > + > +/* > + * tcpdevmem netcat. Works similarly to netcat but does device memory TCP > + * instead of regular TCP. Uses udmabuf to mock a dmabuf provider. > + * > + * Usage: > + * > + * * Without validation: > + * > + * On server: > + * ncdevmem -s -c -f eth1 -n :06:00.0 -l \ > + * -p 5201 > + * > + * On client: > + * ncdevmem -s -c -f eth1 -n :06:00.0 -p 5201 > + * > + * * With Validation: > + * On server: > + * ncdevmem -s -c -l -f eth1 -n :06:00.0 \ > + * -p 5202 -v 1 > + * > + * On client: > + * ncdevmem -s -c -f eth1 -n :06:00.0 -p 5202 \ > + * -v 10 > + * > + * Note this is compatible with regular netcat. i.e. the sender or receiver > can > + * be replaced with regular netcat to test the RX or TX path in isolation. > + */ > + > +static char *server_ip = "192.168.1.4"; > +static char *client_ip = "192.168.1.2"; > +static char *port = "5201"; > +static size_t do_validation; > +static int queue_num = 15; > +static char *ifname = "eth1"; > +static char *nic_pci_addr = ":06:00.0"; > +static unsigned int iterations; > + > +void print_bytes(void *ptr, size_t size) > +{ > + unsigned char *p = ptr; > + int i; > + > + for (i = 0; i < size; i++) { > + printf("%02hhX ", p[i]); > + } > + printf("\n"); > +} > + > +void print_nonzero_bytes(void *ptr, size_t size) > +{ > + unsigned char *p = ptr; > + unsigned int i; > + > + for (i = 0; i < size; i++) > + putchar(p[i]); > + printf("\n"); > +} > + > +void validate_buffer(void *line, size_t size) > +{ > + static unsigned char seed = 1; > + unsigned char *ptr = line; > + int errors = 0; > + size_t i; > + > + for (i = 0; i < size; i++) { > + if (ptr[i] != seed) { > + fprintf(stderr, > + "Failed validation: expected=%u, actual=%u, > index=%lu\n", > + seed, ptr[i], i); > + errors++; > + if (errors > 20) > + exit(1); > + } > + seed++; > + if (seed == do_validation) > + seed = 0; > + } > + > + fprintf(stdout, "Validated buffer\n"); > +} > + > +static void reset_flow_steering(void) > +{ > + char command[256]; > + > + memset(command, 0, sizeof(command)); > + snprintf(command, sizeof(command), "sudo ethtool -K %s ntuple off", > + "eth1"); > + system(command); > + > + memset(command, 0, sizeof(command)); > + snprintf(command, sizeof(command), "sudo ethtool -K %s ntuple on", > + "eth1"); > + system(command); > +} > + > +static void configure_flow_steering(void) > +{ > + char command[256]; > + > + memset(command, 0, sizeof(command)); > + snprintf(command, sizeof(command), > + "sudo ethtool -N %s flow-type tcp4 src-ip %s dst-ip %s > src-port %s dst-port %s queue %d", > + ifname, client_ip, server_ip, port, port, queue_num); > + system(command); > +} > + > +/* Triggers a driver reset... > + * > + * The proper way to do this is probably 'ethtool --reset', but I don't have > + * that supported on my current test bed. I resort to changing this > + * configuration in the driver which also causes a driver reset... > + */ > +static void trigger_device_reset(void) > +{ > + char command[256]; > + > + memset(command, 0, sizeof(command)); > + snprintf(command, sizeof(command), > + "sudo ethtool --set-priv-flags %s enable-header-split off", > + ifname); > + system(command); > + > + memset(command, 0, sizeof(command)); > + snprintf(command, sizeof(c
Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP
On Mon, 2023-11-06 at 14:55 -0800, Willem de Bruijn wrote: > On Mon, Nov 6, 2023 at 2:34 PM Stanislav Fomichev wrote: > > > > On 11/06, Willem de Bruijn wrote: > > > > > IMHO, we need a better UAPI to receive the tokens and give them back > > > > > to > > > > > the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done, > > > > > but look dated and hacky :-( > > > > > > > > > > We should either do some kind of user/kernel shared memory queue to > > > > > receive/return the tokens (similar to what Jonathan was doing in his > > > > > proposal?) > > > > > > > > I'll take a look at Jonathan's proposal, sorry, I'm not immediately > > > > familiar but I wanted to respond :-) But is the suggestion here to > > > > build a new kernel-user communication channel primitive for the > > > > purpose of passing the information in the devmem cmsg? IMHO that seems > > > > like an overkill. Why add 100-200 lines of code to the kernel to add > > > > something that can already be done with existing primitives? I don't > > > > see anything concretely wrong with cmsg & setsockopt approach, and if > > > > we switch to something I'd prefer to switch to an existing primitive > > > > for simplicity? > > > > > > > > The only other existing primitive to pass data outside of the linear > > > > buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that > > > > preferred? Any other suggestions or existing primitives I'm not aware > > > > of? > > > > > > > > > or bite the bullet and switch to io_uring. > > > > > > > > > > > > > IMO io_uring & socket support are orthogonal, and one doesn't preclude > > > > the other. As you know we like to use sockets and I believe there are > > > > issues with io_uring adoption at Google that I'm not familiar with > > > > (and could be wrong). I'm interested in exploring io_uring support as > > > > a follow up but I think David Wei will be interested in io_uring > > > > support as well anyway. > > > > > > I also disagree that we need to replace a standard socket interface > > > with something "faster", in quotes. > > > > > > This interface is not the bottleneck to the target workload. > > > > > > Replacing the synchronous sockets interface with something more > > > performant for workloads where it is, is an orthogonal challenge. > > > However we do that, I think that traditional sockets should continue > > > to be supported. > > > > > > The feature may already even work with io_uring, as both recvmsg with > > > cmsg and setsockopt have io_uring support now. > > > > I'm not really concerned with faster. I would prefer something cleaner :-) > > > > Or maybe we should just have it documented. With some kind of path > > towards beautiful world where we can create dynamic queues.. > > I suppose we just disagree on the elegance of the API. > > The concise notification API returns tokens as a range for > compression, encoding as two 32-bit unsigned integers start + length. > It allows for even further batching by returning multiple such ranges > in a single call. > > This is analogous to the MSG_ZEROCOPY notification mechanism from > kernel to user. > > The synchronous socket syscall interface can be replaced by something > asynchronous like io_uring. This already works today? Whatever > asynchronous ring-based API would be selected, io_uring or otherwise, > I think the concise notification encoding would remain as is. > > Since this is an operation on a socket, I find a setsockopt the > fitting interface. FWIW, I think sockopt +cmsg is the right API. It would deserve some explicit addition to the documentation, both in the kernel and in the man-pages. Cheers, Paolo
Re: [RFC PATCH v3 02/12] net: page_pool: create hooks for custom page providers
On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote: > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h > index 6fc5134095ed..d4bea053bb7e 100644 > --- a/include/net/page_pool/types.h > +++ b/include/net/page_pool/types.h > @@ -60,6 +60,8 @@ struct page_pool_params { > int nid; > struct device *dev; > struct napi_struct *napi; > + u8 memory_provider; > + void*mp_priv; Minor nit: swapping the above 2 fields should make the struct smaller. > enum dma_data_direction dma_dir; > unsigned intmax_len; > unsigned intoffset; > @@ -118,6 +120,19 @@ struct page_pool_stats { > }; > #endif > > +struct mem_provider; > + > +enum pp_memory_provider_type { > + __PP_MP_NONE, /* Use system allocator directly */ > +}; > + > +struct pp_memory_provider_ops { > + int (*init)(struct page_pool *pool); > + void (*destroy)(struct page_pool *pool); > + struct page *(*alloc_pages)(struct page_pool *pool, gfp_t gfp); > + bool (*release_page)(struct page_pool *pool, struct page *page); > +}; > + > struct page_pool { > struct page_pool_params p; > > @@ -165,6 +180,9 @@ struct page_pool { >*/ > struct ptr_ring ring; > > + const struct pp_memory_provider_ops *mp_ops; > + void *mp_priv; Why the mp_ops are not part of page_pool_params? why mp_priv is duplicated here? Cheers, Paolo
Re: [Discuss] Seeking advice on net selftests netns naming method
On Tue, 2023-11-14 at 17:55 +0800, Hangbin Liu wrote: > Good day! Following Guillaume's suggestion, I've been working on updating all > net self-tests to run in their respective netns. This modification allows us > to execute all tests in parallel, potentially saving a significant amount of > test time. > > However, I've encountered a challenge while making these modifications. The > net selftest folder contains around 80 tests (excluding the forwarding test), > with some tests using common netns names and others using self-defined names. > I've considered two methods to address this issue: > > One approach is to retain the original names but append a unique suffix using > $(mktemp -u XX). While this is a straightforward solution, it may not > prevent future tests from using common names. > > Another option is to establish a general netns lib. Similar to the NUM_NETIFS > variable in the forwarding test, we could introduce a variable like NUM_NS. > This variable would define the number of netns instances, and all tests would > use the netns lib to set up and clean up netns accordingly. However, this > approach might complicate test debugging, especially for tests like > fib_nexthops.sh, which relies on clear and visually netns names > (e.g., me/peer/remote). I personally would like sort of both :) e.g. lib function(s) to automatically create and dispose netns, and retain a script- specific/related name prefix. The library function could optionally set the newly created namespaces name in global variables provided by the caller, e.g.: # create 3 namespaces: netns_init 3 # create 3 namespaces and set the global variables: # $remote, $local $me # to their respective names netns_init 3 remote local me The trick to do such assignment would be using the 'eval' statement, something alike netns_init() { # create the netns shift while [ -n "$1" ]; do eval $1=$NETNS_NAMES[0] shift done } While at that, it would be useful to package some common helper e.g. to wait for a (tcp) listener to be created (available) on a given port. WDYT? Thanks! Paolo
Re: [Discuss] Seeking advice on net selftests netns naming method
On Wed, 2023-11-15 at 15:51 +0800, Hangbin Liu wrote: > On Tue, Nov 14, 2023 at 12:02:00PM +0100, Paolo Abeni wrote: > > I personally would like sort of both :) e.g. lib function(s) to > > automatically create and dispose netns, and retain a script- > > specific/related name prefix. > > > > The library function could optionally set the newly created namespaces > > name in global variables provided by the caller, e.g.: > > > > # create 3 namespaces: > > netns_init 3 > > > > # create 3 namespaces and set the global variables: > > # $remote, $local $me > > # to their respective names > > netns_init 3 remote local me > > > > The trick to do such assignment would be using the 'eval' statement, > > something alike > > > > netns_init() > > { > > # create the netns > > > > shift > > while [ -n "$1" ]; do > > eval $1=$NETNS_NAMES[0] > > shift > > done > > } > > > > While at that, it would be useful to package some common helper e.g. to > > wait for a (tcp) listener to be created (available) on a given port. > > > > WDYT? > > Thanks, this is a good idea. I reviewed all the test cases and it should works > for most of them. Only the SRv6 tests are a little complex as they use 2 id > number for netns name. e.g. the setup_hs() in > srv6_end_dt46_l3vpn_test.sh. I plan to add the tmp string between the hs/rt > and > ids. e.g. hs-xx-t100-1, rt-xx-1. I will have a try first. Supposing netns_init() creates a namespace named , I think the following (very hackish thing) would work: # create an alias for the namespace ln -s /var/run/netns/ /var/run/netns/hs-t${tid}-${hs} # using the alias should work ip -n hs-t${tid}-${hs} link #delete the alias at exit time rm -f /var/run/netns/hs-t${tid}-${hs} The troublesome part is that the '/var/run/netns/' prefix could be configured to something else at iproute build time. @David, @Stephen: I'm wondering if it would make sense adding a new 'ip netns' sub-command to implement the 'alias' feature above? Cheers, Paolo
Re: [PATCHv3 net-next 01/14] selftests/net: add lib.sh
On Sat, 2023-12-02 at 10:00 +0800, Hangbin Liu wrote: > Add a lib.sh for net selftests. This file can be used to define commonly > used variables and functions. Some commonly used functions can be moved > from forwarding/lib.sh to this lib file. e.g. busywait(). > > Add function setup_ns() for user to create unique namespaces with given > prefix name. > > Reviewed-by: Petr Machata > Signed-off-by: Hangbin Liu > --- > tools/testing/selftests/net/Makefile | 2 +- > tools/testing/selftests/net/forwarding/lib.sh | 27 +- > tools/testing/selftests/net/lib.sh| 85 +++ > 3 files changed, 87 insertions(+), 27 deletions(-) > create mode 100644 tools/testing/selftests/net/lib.sh > > diff --git a/tools/testing/selftests/net/Makefile > b/tools/testing/selftests/net/Makefile > index 9274edfb76ff..14bd68da7466 100644 > --- a/tools/testing/selftests/net/Makefile > +++ b/tools/testing/selftests/net/Makefile > @@ -54,7 +54,7 @@ TEST_PROGS += ip_local_port_range.sh > TEST_PROGS += rps_default_mask.sh > TEST_PROGS += big_tcp.sh > TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh > -TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh > +TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh lib.sh > TEST_GEN_FILES = socket nettest > TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any > TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd txring_overwrite > diff --git a/tools/testing/selftests/net/forwarding/lib.sh > b/tools/testing/selftests/net/forwarding/lib.sh > index e37a15eda6c2..8f6ca458af9a 100755 > --- a/tools/testing/selftests/net/forwarding/lib.sh > +++ b/tools/testing/selftests/net/forwarding/lib.sh > @@ -4,9 +4,6 @@ > > ## > # Defines > > -# Kselftest framework requirement - SKIP code is 4. > -ksft_skip=4 > - > # Can be overridden by the configuration file. > PING=${PING:=ping} > PING6=${PING6:=ping6} > @@ -41,6 +38,7 @@ if [[ -f $relative_path/forwarding.config ]]; then > source "$relative_path/forwarding.config" > fi > > +source ../lib.sh > > ## > # Sanity checks > > @@ -395,29 +393,6 @@ log_info() > echo "INFO: $msg" > } > > -busywait() > -{ > - local timeout=$1; shift > - > - local start_time="$(date -u +%s%3N)" > - while true > - do > - local out > - out=$("$@") > - local ret=$? > - if ((!ret)); then > - echo -n "$out" > - return 0 > - fi > - > - local current_time="$(date -u +%s%3N)" > - if ((current_time - start_time > timeout)); then > - echo -n "$out" > - return 1 > - fi > - done > -} > - > not() > { > "$@" > diff --git a/tools/testing/selftests/net/lib.sh > b/tools/testing/selftests/net/lib.sh > new file mode 100644 > index ..518eca57b815 > --- /dev/null > +++ b/tools/testing/selftests/net/lib.sh > @@ -0,0 +1,85 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-2.0 > + > +## > +# Defines > + > +# Kselftest framework requirement - SKIP code is 4. > +ksft_skip=4 > + > +## > +# Helpers > +busywait() > +{ > + local timeout=$1; shift > + > + local start_time="$(date -u +%s%3N)" > + while true > + do > + local out > + out=$("$@") > + local ret=$? > + if ((!ret)); then > + echo -n "$out" > + return 0 > + fi > + > + local current_time="$(date -u +%s%3N)" > + if ((current_time - start_time > timeout)); then > + echo -n "$out" > + return 1 > + fi > + done > +} > + > +cleanup_ns() > +{ > + local ns="" > + local errexit=0 > + local ret=0 > + > + # disable errexit temporary > + if [[ $- =~ "e" ]]; then > + errexit=1 > + set +e > + fi > + > + for ns in "$@"; do > + ip netns delete "${ns}" &> /dev/null > + if ! busywait 2 ip netns list \| grep -vq "^$ns$" &> /dev/null; > then > + echo "Warn: Failed to remove namespace $ns" > + ret=1 > + fi > + done > + > + [ $errexit -eq 1 ] && set -e > + return $ret > +} > + > +# setup netns with given names as prefix. e.g > +# setup_ns local remote > +setup_ns() > +{ > + local ns="" > + local ns_name="" > + local ns_list="" > + for ns_name in "$@"; do > + # Some test may setup/remove same netns multi times > + if unset ${ns_name} 2> /dev/null; then > + ns="${n
Re: [PATCHv3 net-next 01/14] selftests/net: add lib.sh
On Wed, 2023-12-06 at 13:32 +0100, Petr Machata wrote: > Paolo Abeni writes: > > > Side note for a possible follow-up: if you maintain $ns_list as global > > variable, and remove from such list the ns deleted by cleanup_ns, you > > could remove the cleanup trap from the individual test with something > > alike: > > > > final_cleanup_ns() > > { > > cleanup_ns $ns_list > > } > > > > trap final_cleanup_ns EXIT > > > > No respin needed for the above, could be a follow-up if agreed upon. > > If you propose this for the library then I'm against it. The exit trap > is a global resource that the client scripts sometimes need to use as > well, to do topology teardowns or just general cleanups. > So either the library would have to provide APIs for cleanup management, or > the trap > is for exclusive use by clients. The latter is IMHO simpler. Even the former would not be very complex: TRAPS="" do_at_exit() { TRAPS="${TRAPS}$@;" trap "${TRAPS}" EXIT } And then use "do_at_exit " instead of "trap EXIT" > It also puts the cleanups at the same place where the acquisition is > prompted: the client allocates the NS, the client should prompt its > cleanup. I guess I could argue that the the script is asking the library to allocate the namespaces, and the library could take care of disposing them. But I'm not pushing the proposed option, if there is no agreement no need for additional work ;) Cheers, Paolo
Re: [PATCH] selftests/net/tcp-ao: Use LDLIBS instead of LDFLAGS
On Wed, 2024-01-10 at 21:34 +, Dmitry Safonov wrote: > The rules to link selftests are: > > > $(OUTPUT)/%_ipv4: %.c > > $(LINK.c) $^ $(LDLIBS) -o $@ > > > > $(OUTPUT)/%_ipv6: %.c > > $(LINK.c) -DIPV6_TEST $^ $(LDLIBS) -o $@ > > The intel test robot uses only selftest's Makefile, not the top linux > Makefile: > > > make W=1 O=/tmp/kselftest -C tools/testing/selftests > > So, $(LINK.c) is determined by environment, rather than by kernel > Makefiles. On my machine (as well as other people that ran tcp-ao > selftests) GNU/Make implicit definition does use $(LDFLAGS): > > > [dima@Mindolluin ~]$ make -p -f/dev/null | grep '^LINK.c\>' > > make: *** No targets. Stop. > > LINK.c = $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) > > But, according to build robot report, it's not the case for them. > While I could just avoid using pre-defined $(LINK.c), it's also used by > selftests/lib.mk by default. > > Anyways, according to GNU/Make documentation [1], I should have used > $(LDLIBS) instead of $(LDFLAGS) in the first place, so let's just do it: > > > LDFLAGS > > Extra flags to give to compilers when they are supposed to invoke > > the linker, ‘ld’, such as -L. Libraries (-lfoo) should be added > > to the LDLIBS variable instead. > > LDLIBS > > Library flags or names given to compilers when they are supposed > > to invoke the linker, ‘ld’. LOADLIBES is a deprecated (but still > > supported) alternative to LDLIBS. Non-library linker flags, such > > as -L, should go in the LDFLAGS variable. > > [1]: > https://www.gnu.org/software/make/manual/html_node/Implicit-Variables.html > > Fixes: cfbab37b3da0 ("selftests/net: Add TCP-AO library") > Reported-by: kernel test robot > Closes: > https://lore.kernel.org/oe-kbuild-all/202401011151.veyytjzq-...@intel.com/ > Signed-off-by: Dmitry Safonov Acked-by: Paolo Abeni
Re: [PATCH net 2/2] selftests: forwarding: Remove executable bits from lib.sh
On Wed, 2024-01-10 at 09:14 -0500, Benjamin Poirier wrote: > The lib.sh script is meant to be sourced from other scripts, not executed > directly. Therefore, remove the executable bits from lib.sh's permissions. LGTM, but there is any special reason to not fix the permissions of other *lib.sh files in the same directory? Could be a follow-up. Thanks! Paolo
Re: [PATCH net] selftests: netdevsim: add a config file
On Tue, 2024-01-16 at 07:43 -0800, Jakub Kicinski wrote: > netdevsim tests aren't very well integrated with kselftest, > which has its advantages and disadvantages. Out of sheer ignorance I don't see the advantage?!? > But regardless > of the intended integration - a config file to know what kernel > to build is very useful, add one. With a complete integration we could more easily ask kbuild to generate automatically the kernel config suitable for testing; what about completing such integration? Cheers, Paolo
Re: [PATCH net] selftests: netdevsim: add a config file
On Tue, 2024-01-16 at 10:34 -0800, Jakub Kicinski wrote: > On Tue, 16 Jan 2024 18:40:49 +0100 Paolo Abeni wrote: > > On Tue, 2024-01-16 at 07:43 -0800, Jakub Kicinski wrote: > > > netdevsim tests aren't very well integrated with kselftest, > > > which has its advantages and disadvantages. > > > > Out of sheer ignorance I don't see the advantage?!? > > > > > But regardless > > > of the intended integration - a config file to know what kernel > > > to build is very useful, add one. > > > > With a complete integration we could more easily ask kbuild to generate > > automatically the kernel config suitable for testing; what about > > completing such integration? > > My bad, I didn't have the right words at my fingertips so I deleted > the explanation of advantages. > > make run_tests doesn't give us the ability to inject logic between > each test, AFAIU. The runner for netdevsim I typed up checks after > each test whether the VM has any crashes or things got otherwise > out of whack. And if so kills the VM and starts a new one to run > the next test. For make run_tests we can still more or less zero > in on which test caused an oops or crash, but the next test will > try to keep going. I see. > Even if we force kill it after we see a crash > I didn't see in the docs how to continue testing from a specific > point. I think something like the following should do: cd tools/testing/selftests make TARGETS="net drivers/net/bonding <...full relevant targets list>" O= install cd ARGS="" for t in $(./run_kselftest.sh -l| sed -n '//,$p'); do ARGS="$ARGS -t $t" done ./run_kselftest.sh $ARGS # run all tests after Probably it would be nice to add to the kselftest runner the ability to check for kernel oops after each test and ev. stop. > So all in all, yeah, uniformity is good, the hacky approach kinda > works. Converting netdevsim to make run_tests is not a priority.. I agree, but also will put the all the above possible improvements in my wishlist ;) Cheers, Paolo
Re: [PATCH net] selftests: netdevsim: add a config file
On Wed, 2024-01-17 at 07:39 -0800, Jakub Kicinski wrote: > On Wed, 17 Jan 2024 10:32:19 +0100 Paolo Abeni wrote: > > > > I think something like the following should do: > > > > cd tools/testing/selftests > > make TARGETS="net drivers/net/bonding <...full relevant targets list>" > > O= install > > cd > > > > ARGS="" > > for t in $(./run_kselftest.sh -l| sed -n '//,$p'); do > > ARGS="$ARGS -t $t" > > done > > ./run_kselftest.sh $ARGS # run all tests after > > > > Probably it would be nice to add to the kselftest runner the ability to > > check for kernel oops after each test and ev. stop. > > I wasn't aware there's a way to list tests! That should work well > enough we can run them one by one with make, that's fine. > > ./run_kselftest.sh only seems to work for installed tests, tho, > in tree it says: > ./run_kselftest.sh: Could not find list of tests to run > (~linux/tools/testing/selftests/kselftest-list.txt) > So perhaps the wishlist item would be "make list_tests"? Yes, kselftest-list.txt is created on the fly by make install. As such step could be constrained to the relevant selftests directories (see the above code snippet), and a 'make' step is required anyway to run the tests, what about using directly such target? In any case I don't mean to block this patch, let me apply it... Cheers, Paolo
Re: [PATCH net] selftests: netdevsim: fix the udp_tunnel_nic test
On Mon, 2024-01-22 at 22:05 -0800, Jakub Kicinski wrote: > This test is missing a whole bunch of checks for interface > renaming and one ifup. Presumably it was only used on a system > with renaming disabled and NetworkManager running. > > Fixes: 91f430b2c49d ("selftests: net: add a test for UDP tunnel info infra") > Signed-off-by: Jakub Kicinski > --- > CC: sh...@kernel.org > CC: ho...@kernel.org > CC: linux-kselftest@vger.kernel.org > --- > .../selftests/drivers/net/netdevsim/udp_tunnel_nic.sh| 9 + > 1 file changed, 9 insertions(+) > > diff --git a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh > b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh > index 4855ef597a15..f98435c502f6 100755 > --- a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh > +++ b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh > @@ -270,6 +270,7 @@ for port in 0 1; do > echo 1 > $NSIM_DEV_SYS/new_port > fi > NSIM_NETDEV=`get_netdev_name old_netdevs` > +ifconfig $NSIM_NETDEV up WoW! I initially thought the above was a typo, before noticing it's actually consistent with the whole script :) Do you think we should look at dropping ifconfig usage from self-tests? I guess that in the long run most systems should not have such command available in the default install. In any case the patch LGTM. Acked-by: Paolo Abeni Cheers, Paolo
Re: [PATCH net] selftests/net/lib: update busywait timeout value
On Mon, 2024-01-22 at 17:05 +0800, Hangbin Liu wrote: > The busywait timeout value is a millisecond, not a second. So the > current setting 2 is meaningless. Let's copy the WAIT_TIMEOUT from > forwarding/lib.sh and set a BUSYWAIT_TIMEOUT here. > > Signed-off-by: Hangbin Liu > --- > Not sure if the default WAIT_TIMEOUT 20s is too large. But since > we usually don't need to wait for that long. I think it's OK to > stay the same value with forwarding/lib.sh. Please tell me if you > think we need to set a more proper value. > > BTW, This doesn't look like a fix. But also not a feature. So I just > post it to net tree. I think the 20s max timeout is fine. I also think this is really a fix: on slow/busy host (or VMs) the current timeout can expire even on "correct" execution, causing random failures. We had a lot of similar case in the mptcp self-tests in the past. Could you please send a v2 with a suitable fixes tag and an update changelog to point out the possible failures? Thanks! Paolo
[PATCH net 0/3] selftests: net: a few fixes
This series address self-tests failures for udp gro-related tests. The first patch addresses the main problem I observe locally - the XDP program required by such tests, xdp_dummy, is currently build in the ebpf self-tests directory, not available if/when the user targets net only. Arguably is more a refactor than a fix, but still targeting net to hopefully The second patch fixes the integration of such tests with the build system. Patch 3/3 fixes sporadic failures due to races. Tested with: make -C tools/testing/selftests/ TARGETS=net install ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t "net:udpgro_bench.sh net:udpgro.sh net:udpgro_fwd.sh \ net:udpgro_frglist.sh net:veth.sh" no failures. Paolo Abeni (3): selftests: net: remove dependency on ebpf tests selftests: net: included needed helper in the install targets selftests: net: explicitly wait for listener ready tools/testing/selftests/net/Makefile | 6 -- tools/testing/selftests/net/udpgro.sh | 4 ++-- tools/testing/selftests/net/udpgro_bench.sh | 4 ++-- tools/testing/selftests/net/udpgro_frglist.sh | 6 +++--- tools/testing/selftests/net/udpgro_fwd.sh | 8 +--- tools/testing/selftests/net/veth.sh | 4 ++-- tools/testing/selftests/net/xdp_dummy.c | 13 + 7 files changed, 31 insertions(+), 14 deletions(-) create mode 100644 tools/testing/selftests/net/xdp_dummy.c -- 2.43.0
[PATCH net 1/3] selftests: net: remove dependency on ebpf tests
Several net tests requires an XDP program build under the ebpf directory, and error out if such program is not available. That makes running successful net test hard, let's duplicate into the net dir the [very small] program, re-using the existing rules to build it, and finally dropping the bogus dependency. Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/Makefile | 5 +++-- tools/testing/selftests/net/udpgro.sh | 4 ++-- tools/testing/selftests/net/udpgro_bench.sh | 4 ++-- tools/testing/selftests/net/udpgro_frglist.sh | 6 +++--- tools/testing/selftests/net/udpgro_fwd.sh | 2 +- tools/testing/selftests/net/veth.sh | 4 ++-- tools/testing/selftests/net/xdp_dummy.c | 13 + 7 files changed, 26 insertions(+), 12 deletions(-) create mode 100644 tools/testing/selftests/net/xdp_dummy.c diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 50818075e566..304d8b852ef0 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -84,6 +84,7 @@ TEST_PROGS += sctp_vrf.sh TEST_GEN_FILES += sctp_hello TEST_GEN_FILES += csum TEST_GEN_FILES += nat6to4.o +TEST_GEN_FILES += xdp_dummy.o TEST_GEN_FILES += ip_local_port_range TEST_GEN_FILES += bind_wildcard TEST_PROGS += test_vxlan_mdb.sh @@ -104,7 +105,7 @@ $(OUTPUT)/tcp_inq: LDLIBS += -lpthread $(OUTPUT)/bind_bhash: LDLIBS += -lpthread $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/ -# Rules to generate bpf obj nat6to4.o +# Rules to generate bpf objs CLANG ?= clang SCRATCH_DIR := $(OUTPUT)/tools BUILD_DIR := $(SCRATCH_DIR)/build @@ -139,7 +140,7 @@ endif CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH)) -$(OUTPUT)/nat6to4.o: nat6to4.c $(BPFOBJ) | $(MAKE_DIRS) +$(OUTPUT)/nat6to4.o $(OUTPUT)/xdp_dummy.o: $(OUTPUT)/%.o : %.c $(BPFOBJ) | $(MAKE_DIRS) $(CLANG) -O2 --target=bpf -c $< $(CCINCLUDE) $(CLANG_SYS_INCLUDES) -o $@ $(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile)\ diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh index af5dc57c8ce9..8802604148dd 100755 --- a/tools/testing/selftests/net/udpgro.sh +++ b/tools/testing/selftests/net/udpgro.sh @@ -7,7 +7,7 @@ source net_helper.sh readonly PEER_NS="ns-peer-$(mktemp -u XX)" -BPF_FILE="../bpf/xdp_dummy.bpf.o" +BPF_FILE="xdp_dummy.o" # set global exit status, but never reset nonzero one. check_err() @@ -197,7 +197,7 @@ run_all() { } if [ ! -f ${BPF_FILE} ]; then - echo "Missing ${BPF_FILE}. Build bpf selftest first" + echo "Missing ${BPF_FILE}. Run 'make' first" exit -1 fi diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh index cb664679b434..7080eae5312b 100755 --- a/tools/testing/selftests/net/udpgro_bench.sh +++ b/tools/testing/selftests/net/udpgro_bench.sh @@ -7,7 +7,7 @@ source net_helper.sh readonly PEER_NS="ns-peer-$(mktemp -u XX)" -BPF_FILE="../bpf/xdp_dummy.bpf.o" +BPF_FILE="xdp_dummy.o" cleanup() { local -r jobs="$(jobs -p)" @@ -84,7 +84,7 @@ run_all() { } if [ ! -f ${BPF_FILE} ]; then - echo "Missing ${BPF_FILE}. Build bpf selftest first" + echo "Missing ${BPF_FILE}. Run 'make' first" exit -1 fi diff --git a/tools/testing/selftests/net/udpgro_frglist.sh b/tools/testing/selftests/net/udpgro_frglist.sh index dd47fa96f6b3..e1ff645bd3d1 100755 --- a/tools/testing/selftests/net/udpgro_frglist.sh +++ b/tools/testing/selftests/net/udpgro_frglist.sh @@ -7,7 +7,7 @@ source net_helper.sh readonly PEER_NS="ns-peer-$(mktemp -u XX)" -BPF_FILE="../bpf/xdp_dummy.bpf.o" +BPF_FILE="xdp_dummy.o" cleanup() { local -r jobs="$(jobs -p)" @@ -85,12 +85,12 @@ run_all() { } if [ ! -f ${BPF_FILE} ]; then - echo "Missing ${BPF_FILE}. Build bpf selftest first" + echo "Missing ${BPF_FILE}. Run 'make' first" exit -1 fi if [ ! -f nat6to4.o ]; then - echo "Missing nat6to4 helper. Build bpf nat6to4.o selftest first" + echo "Missing nat6to4 helper. Run 'make' first" exit -1 fi diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh index c079565add39..5fa8659ab13d 100755 --- a/tools/testing/selftests/net/udpgro_fwd.sh +++ b/tools/testing/selftests/net/udpgro_fwd.sh @@ -1,7 +1,7 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 -BPF_FILE="../bpf/xdp_dummy.bpf.o" +BPF_FILE="xdp_dummy.o" readonly BASE="ns-$(mktemp -u XX)" readonly SRC=2 readonly DST=1 diff --git a/tools/testing/selftests/net/veth.sh b/tools/testing/selftests/net/vet
[PATCH net 3/3] selftests: net: explicitly wait for listener ready
The UDP GRO forwarding test still hard-code an arbitrary pause to wait for the UDP listener becoming ready in background. That causes sporadic failures depending on the host load. Replace the sleep with the existing helper waiting for the desired port being exposed. Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/udpgro_fwd.sh | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh index 5fa8659ab13d..d6b9c759043c 100755 --- a/tools/testing/selftests/net/udpgro_fwd.sh +++ b/tools/testing/selftests/net/udpgro_fwd.sh @@ -1,6 +1,8 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 +source net_helper.sh + BPF_FILE="xdp_dummy.o" readonly BASE="ns-$(mktemp -u XX)" readonly SRC=2 @@ -119,7 +121,7 @@ run_test() { ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 8000 ip netns exec $NS_DST ./udpgso_bench_rx -C 1000 -R 10 -n 10 -l 1300 $rx_args & local spid=$! - sleep 0.1 + wait_local_port_listen "$NS_DST" 8000 udp ip netns exec $NS_SRC ./udpgso_bench_tx $family -M 1 -s 13000 -S 1300 -D $dst local retc=$? wait $spid @@ -168,7 +170,7 @@ run_bench() { ip netns exec $NS_DST bash -c "echo 2 > /sys/class/net/veth$DST/queues/rx-0/rps_cpus" ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 1000 -R 10 & local spid=$! - sleep 0.1 + wait_local_port_listen "$NS_DST" 8000 udp ip netns exec $NS_SRC taskset 0x1 ./udpgso_bench_tx $family -l 3 -S 1300 -D $dst local retc=$? wait $spid -- 2.43.0
[PATCH net 2/3] selftests: net: included needed helper in the install targets
The blamed commit below introduce a dependency in some net self-tests towards a newly introduce helper script. Such script is currently not included into the TEST_PROGS_EXTENDED list and thus is not installed, causing failure for the relevant tests when executed from the install dir. Fix the issue updating the install targets. Fixes: 3bdd9fd29cb0 ("selftests/net: synchronize udpgro tests' tx and rx connection") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 304d8b852ef0..48c6f93b8149 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -55,6 +55,7 @@ TEST_PROGS += rps_default_mask.sh TEST_PROGS += big_tcp.sh TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh lib.sh +TEST_PROGS_EXTENDED += net_helper.sh TEST_GEN_FILES = socket nettest TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd txring_overwrite -- 2.43.0
Re: [PATCH net 1/3] selftests: net: remove dependency on ebpf tests
On Wed, 2024-01-24 at 20:10 -0500, Willem de Bruijn wrote: > Paolo Abeni wrote: > > Several net tests requires an XDP program build under the ebpf > > directory, and error out if such program is not available. > > > > That makes running successful net test hard, let's duplicate into the > > net dir the [very small] program, re-using the existing rules to build > > it, and finally dropping the bogus dependency. > > > > Signed-off-by: Paolo Abeni > > --- > > tools/testing/selftests/net/Makefile | 5 +++-- > > tools/testing/selftests/net/udpgro.sh | 4 ++-- > > tools/testing/selftests/net/udpgro_bench.sh | 4 ++-- > > tools/testing/selftests/net/udpgro_frglist.sh | 6 +++--- > > tools/testing/selftests/net/udpgro_fwd.sh | 2 +- > > tools/testing/selftests/net/veth.sh | 4 ++-- > > tools/testing/selftests/net/xdp_dummy.c | 13 + > > 7 files changed, 26 insertions(+), 12 deletions(-) > > create mode 100644 tools/testing/selftests/net/xdp_dummy.c > > > > diff --git a/tools/testing/selftests/net/Makefile > > b/tools/testing/selftests/net/Makefile > > index 50818075e566..304d8b852ef0 100644 > > --- a/tools/testing/selftests/net/Makefile > > +++ b/tools/testing/selftests/net/Makefile > > @@ -84,6 +84,7 @@ TEST_PROGS += sctp_vrf.sh > > TEST_GEN_FILES += sctp_hello > > TEST_GEN_FILES += csum > > TEST_GEN_FILES += nat6to4.o > > +TEST_GEN_FILES += xdp_dummy.o > > TEST_GEN_FILES += ip_local_port_range > > TEST_GEN_FILES += bind_wildcard > > TEST_PROGS += test_vxlan_mdb.sh > > @@ -104,7 +105,7 @@ $(OUTPUT)/tcp_inq: LDLIBS += -lpthread > > $(OUTPUT)/bind_bhash: LDLIBS += -lpthread > > $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/ > > > > -# Rules to generate bpf obj nat6to4.o > > +# Rules to generate bpf objs > > CLANG ?= clang > > SCRATCH_DIR := $(OUTPUT)/tools > > BUILD_DIR := $(SCRATCH_DIR)/build > > @@ -139,7 +140,7 @@ endif > > > > CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH)) > > > > -$(OUTPUT)/nat6to4.o: nat6to4.c $(BPFOBJ) | $(MAKE_DIRS) > > +$(OUTPUT)/nat6to4.o $(OUTPUT)/xdp_dummy.o: $(OUTPUT)/%.o : %.c $(BPFOBJ) | > > $(MAKE_DIRS) > > $(CLANG) -O2 --target=bpf -c $< $(CCINCLUDE) $(CLANG_SYS_INCLUDES) -o $@ > > is the "$(OUTPUT)/%.o :" intentional or a leftover from editing? Is intentional and AFAICS required to let this rule being selected when the output directory is not an empty string (the target and the pre-req will be in different directories). Cheers, Paolo
[PATCH net] selftests: net: add missing required classifier
the udpgro_fraglist self-test uses the BPF classifiers, but the current net self-test configuration does not include it, causing CI failures: # selftests: net: udpgro_frglist.sh # ipv6 # tcp - over veth touching data # -l 4 -6 -D 2001:db8::1 -t rx -4 -t # Error: TC classifier not found. # We have an error talking to the kernel # Error: TC classifier not found. # We have an error talking to the kernel Add the missing knob. Fixes: edae34a3ed92 ("selftests net: add UDP GRO fraglist + bpf self-tests") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/config | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 8da562a9ae87..ca4423ee6dc9 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -42,6 +42,7 @@ CONFIG_MPLS_ROUTING=m CONFIG_MPLS_IPTUNNEL=m CONFIG_NET_SCH_INGRESS=m CONFIG_NET_CLS_FLOWER=m +CONFIG_NET_CLS_BPF=m CONFIG_NET_ACT_TUNNEL_KEY=m CONFIG_NET_ACT_MIRRED=m CONFIG_BAREUDP=m -- 2.43.0
Re: [PATCH net] selftests: net: add missing required classifier
On Thu, 2024-01-25 at 09:48 +0100, Eric Dumazet wrote: > On Thu, Jan 25, 2024 at 9:23 AM Paolo Abeni wrote: > > > > the udpgro_fraglist self-test uses the BPF classifiers, but the > > current net self-test configuration does not include it, causing > > CI failures: > > > > # selftests: net: udpgro_frglist.sh > > # ipv6 > > # tcp - over veth touching data > > # -l 4 -6 -D 2001:db8::1 -t rx -4 -t > > # Error: TC classifier not found. > > # We have an error talking to the kernel > > # Error: TC classifier not found. > > # We have an error talking to the kernel > > > > Add the missing knob. > > > > Fixes: edae34a3ed92 ("selftests net: add UDP GRO fraglist + bpf self-tests") > > Signed-off-by: Paolo Abeni > > FYI, while looking at the gro test, I found that using strace was > making it failing as well. It looks like the gro.sh (large) tests send the to-be-aggregate segments individually and relay on the gro flush timeout being large enough to fit all the relevant write operations. I suspect/hope something alike: --- diff --git a/tools/testing/selftests/net/setup_veth.sh b/tools/testing/selftests/net/setup_veth.sh index a9a1759e035c..1f78a87f6f37 100644 --- a/tools/testing/selftests/net/setup_veth.sh +++ b/tools/testing/selftests/net/setup_veth.sh @@ -11,7 +11,7 @@ setup_veth_ns() { local -r ns_mac="$4" [[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}" - echo 10 > "/sys/class/net/${ns_dev}/gro_flush_timeout" + echo 100 > "/sys/class/net/${ns_dev}/gro_flush_timeout" ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535 ip -netns "${ns_name}" link set dev "${ns_dev}" up --- should solve the sporadic issues. > Not sure about this one... All the udpgro* test should write a single UDP GSO packet and let the veth segmenting it, hopefully slowing down either ends should not impact them - but I did not check yet! Perhaps even the gro.sh tests could be modified alike? Cheers, Paolo
Re: [PATCH net 1/3] selftests: net: remove dependency on ebpf tests
On Thu, 2024-01-25 at 09:27 -0500, Willem de Bruijn wrote: > Paolo Abeni wrote: > > On Wed, 2024-01-24 at 20:10 -0500, Willem de Bruijn wrote: > > > Paolo Abeni wrote: > > > > Several net tests requires an XDP program build under the ebpf > > > > directory, and error out if such program is not available. > > > > > > > > That makes running successful net test hard, let's duplicate into the > > > > net dir the [very small] program, re-using the existing rules to build > > > > it, and finally dropping the bogus dependency. > > > > > > > > Signed-off-by: Paolo Abeni > > > > --- > > > > tools/testing/selftests/net/Makefile | 5 +++-- > > > > tools/testing/selftests/net/udpgro.sh | 4 ++-- > > > > tools/testing/selftests/net/udpgro_bench.sh | 4 ++-- > > > > tools/testing/selftests/net/udpgro_frglist.sh | 6 +++--- > > > > tools/testing/selftests/net/udpgro_fwd.sh | 2 +- > > > > tools/testing/selftests/net/veth.sh | 4 ++-- > > > > tools/testing/selftests/net/xdp_dummy.c | 13 + > > > > 7 files changed, 26 insertions(+), 12 deletions(-) > > > > create mode 100644 tools/testing/selftests/net/xdp_dummy.c > > > > > > > > diff --git a/tools/testing/selftests/net/Makefile > > > > b/tools/testing/selftests/net/Makefile > > > > index 50818075e566..304d8b852ef0 100644 > > > > --- a/tools/testing/selftests/net/Makefile > > > > +++ b/tools/testing/selftests/net/Makefile > > > > @@ -84,6 +84,7 @@ TEST_PROGS += sctp_vrf.sh > > > > TEST_GEN_FILES += sctp_hello > > > > TEST_GEN_FILES += csum > > > > TEST_GEN_FILES += nat6to4.o > > > > +TEST_GEN_FILES += xdp_dummy.o > > > > TEST_GEN_FILES += ip_local_port_range > > > > TEST_GEN_FILES += bind_wildcard > > > > TEST_PROGS += test_vxlan_mdb.sh > > > > @@ -104,7 +105,7 @@ $(OUTPUT)/tcp_inq: LDLIBS += -lpthread > > > > $(OUTPUT)/bind_bhash: LDLIBS += -lpthread > > > > $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/ > > > > > > > > -# Rules to generate bpf obj nat6to4.o > > > > +# Rules to generate bpf objs > > > > CLANG ?= clang > > > > SCRATCH_DIR := $(OUTPUT)/tools > > > > BUILD_DIR := $(SCRATCH_DIR)/build > > > > @@ -139,7 +140,7 @@ endif > > > > > > > > CLANG_SYS_INCLUDES = $(call > > > > get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH)) > > > > > > > > -$(OUTPUT)/nat6to4.o: nat6to4.c $(BPFOBJ) | $(MAKE_DIRS) > > > > +$(OUTPUT)/nat6to4.o $(OUTPUT)/xdp_dummy.o: $(OUTPUT)/%.o : %.c > > > > $(BPFOBJ) | $(MAKE_DIRS) > > > > $(CLANG) -O2 --target=bpf -c $< $(CCINCLUDE) > > > > $(CLANG_SYS_INCLUDES) -o $@ > > > > > > is the "$(OUTPUT)/%.o :" intentional or a leftover from editing? > > > > Is intentional and AFAICS required to let this rule being selected when > > the output directory is not an empty string (the target and the pre-req > > will be in different directories). > > Thanks. I don't understand why. Sorry to harp on this small point, but > you've verified that the build fails without? Is it perhaps due to that > "$(MAKE_DIRS)" order-only-prerequisite? But nat6to4 on its own did not > need this. I tried quite a bit of permutation (all others failing) before selecting this one (shame on me, with a stackoverflow hint [!!!]). But I finally found the relevant documentation reference: https://www.gnu.org/software/make/manual/make.html#Static-Pattern A simpler wildcard rule would not be enough, as the already existing wildcard used to build plain c files will take precedence. nat6to4 did not need this fancy syntax, as it was a simple, single target single pre-req rule - that takes precedence on the mentioned wildcard. Please let me know if the above clarifies a bit the scenario. Cheers, Paolo
Re: [PATCH net] selftests: net: add missing required classifier
On Thu, 2024-01-25 at 15:10 +0100, Eric Dumazet wrote: > On Thu, Jan 25, 2024 at 12:38 PM Paolo Abeni wrote: > > > > On Thu, 2024-01-25 at 09:48 +0100, Eric Dumazet wrote: > > > On Thu, Jan 25, 2024 at 9:23 AM Paolo Abeni wrote: > > > > > > > > the udpgro_fraglist self-test uses the BPF classifiers, but the > > > > current net self-test configuration does not include it, causing > > > > CI failures: > > > > > > > > # selftests: net: udpgro_frglist.sh > > > > # ipv6 > > > > # tcp - over veth touching data > > > > # -l 4 -6 -D 2001:db8::1 -t rx -4 -t > > > > # Error: TC classifier not found. > > > > # We have an error talking to the kernel > > > > # Error: TC classifier not found. > > > > # We have an error talking to the kernel > > > > > > > > Add the missing knob. > > > > > > > > Fixes: edae34a3ed92 ("selftests net: add UDP GRO fraglist + bpf > > > > self-tests") > > > > Signed-off-by: Paolo Abeni > > > > > > FYI, while looking at the gro test, I found that using strace was > > > making it failing as well. > > > > It looks like the gro.sh (large) tests send the to-be-aggregate > > segments individually and relay on the gro flush timeout being large > > enough to fit all the relevant write operations. I suspect/hope > > something alike: > > > > --- > > diff --git a/tools/testing/selftests/net/setup_veth.sh > > b/tools/testing/selftests/net/setup_veth.sh > > index a9a1759e035c..1f78a87f6f37 100644 > > --- a/tools/testing/selftests/net/setup_veth.sh > > +++ b/tools/testing/selftests/net/setup_veth.sh > > @@ -11,7 +11,7 @@ setup_veth_ns() { > > local -r ns_mac="$4" > > > > [[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}" > > - echo 10 > "/sys/class/net/${ns_dev}/gro_flush_timeout" > > + echo 100 > "/sys/class/net/${ns_dev}/gro_flush_timeout" > > ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535 > > ip -netns "${ns_name}" link set dev "${ns_dev}" up > > --- > > should solve the sporadic issues. > > I think you are right. > > I tried multiple values, and found 600,000 was not enough in some cases. > > With 1,000,000, I was able to run the test (with the strace overhead) > 100 times without a single failure. Thank you for testing! Do you prefer I'll send the formal patch or do you prefer otherwise? Cheers, Paolo
[PATCH net] selftests: net: give more time for GRO aggregation
The gro.sh test-case relay on the gro_flush_timeout to ensure that all the segments belonging to any given batch are properly aggregated. The other end, the sender is a user-space program transmitting each packet with a separate write syscall. A busy host and/or stracing the sender program can make the relevant segments reach the GRO engine after the flush timeout triggers. Give the GRO flush timeout more slack, to avoid sporadic self-tests failures. Fixes: 9af771d2ec04 ("selftests/net: allow GRO coalesce test on veth") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/setup_veth.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/setup_veth.sh b/tools/testing/selftests/net/setup_veth.sh index a9a1759e035c..1f78a87f6f37 100644 --- a/tools/testing/selftests/net/setup_veth.sh +++ b/tools/testing/selftests/net/setup_veth.sh @@ -11,7 +11,7 @@ setup_veth_ns() { local -r ns_mac="$4" [[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}" - echo 10 > "/sys/class/net/${ns_dev}/gro_flush_timeout" + echo 100 > "/sys/class/net/${ns_dev}/gro_flush_timeout" ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535 ip -netns "${ns_name}" link set dev "${ns_dev}" up -- 2.43.0
[PATCH net] selftests: net: add missing config for big tcp tests
The big_tcp test-case requires a few kernel knobs currently not specified in the net selftests config, causing the following failure: # selftests: net: big_tcp.sh # Error: Failed to load TC action module. # We have an error talking to the kernel ... # Testing for BIG TCP: # CLI GSO | GW GRO | GW GSO | SER GRO # ./big_tcp.sh: line 107: test: !=: unary operator expected ... # onon on on : [FAIL_on_link1] Add the missing configs Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/config | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index b511f43cd197..c0e8482f82d3 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -29,6 +29,7 @@ CONFIG_NF_NAT=m CONFIG_IP6_NF_IPTABLES=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP6_NF_NAT=m +CONFIG_IP6_NF_RAW=m CONFIG_IP_NF_NAT=m CONFIG_IPV6_GRE=m CONFIG_IPV6_SEG6_LWTUNNEL=y @@ -41,9 +42,11 @@ CONFIG_MACVLAN=y CONFIG_MACVTAP=y CONFIG_MPLS=y CONFIG_MPTCP=y +CONFIG_IP_NF_RAW=m CONFIG_NF_TABLES=m CONFIG_NF_TABLES_IPV6=y CONFIG_NF_TABLES_IPV4=y +CONFIG_NF_FLOW_TABLE=m CONFIG_NFT_NAT=m CONFIG_NET_ACT_GACT=m CONFIG_NET_CLS_BASIC=m @@ -71,6 +74,7 @@ CONFIG_NET_CLS_FLOWER=m CONFIG_NET_CLS_BPF=m CONFIG_NET_ACT_TUNNEL_KEY=m CONFIG_NET_ACT_MIRRED=m +CONFIG_NET_ACT_CT=m CONFIG_BAREUDP=m CONFIG_IPV6_IOAM6_LWTUNNEL=y CONFIG_CRYPTO_SM4_GENERIC=y @@ -79,5 +83,6 @@ CONFIG_TUN=y CONFIG_VXLAN=m CONFIG_IP_SCTP=m CONFIG_NETFILTER_XT_MATCH_POLICY=m +CONFIG_NETFILTER_XT_MATCH_LENGTH=m CONFIG_CRYPTO_ARIA=y CONFIG_XFRM_INTERFACE=m -- 2.43.0
Re: [PATCH net] selftests: net: add missing config for big tcp tests
On Fri, 2024-01-26 at 11:18 -0500, Aaron Conole wrote: > Paolo Abeni writes: > > > The big_tcp test-case requires a few kernel knobs currently > > not specified in the net selftests config, causing the > > following failure: > > > > # selftests: net: big_tcp.sh > > # Error: Failed to load TC action module. > > # We have an error talking to the kernel > > ... > > # Testing for BIG TCP: > > # CLI GSO | GW GRO | GW GSO | SER GRO > > # ./big_tcp.sh: line 107: test: !=: unary operator expected > > ... > > # onon on on : [FAIL_on_link1] > > > > Add the missing configs > > > > Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp") > > Signed-off-by: Paolo Abeni > > --- > > Thanks for the fix. > > Maybe we should also add the config for NET_ACT_CT since we will > invoke it on setup. I guess there's some dependency that must be > pulling it in for us already so we don't explicitly call for it, but we > do require it in setup() if I understand correctly. This patch already adds NET_ACT_CT=m: > @@ -71,6 +74,7 @@ CONFIG_NET_CLS_FLOWER=m > CONFIG_NET_CLS_BPF=m > CONFIG_NET_ACT_TUNNEL_KEY=m > CONFIG_NET_ACT_MIRRED=m > +CONFIG_NET_ACT_CT=m > CONFIG_BAREUDP=m > CONFIG_IPV6_IOAM6_LWTUNNEL=y > CONFIG_CRYPTO_SM4_GENERIC=y Cheers, Paolo
Re: [PATCH net] selftests: net: add missing config for big tcp tests
On Fri, 2024-01-26 at 11:55 -0800, Jakub Kicinski wrote: > On Fri, 26 Jan 2024 16:32:36 +0100 Paolo Abeni wrote: > > The big_tcp test-case requires a few kernel knobs currently > > not specified in the net selftests config, causing the > > following failure: > > > > # selftests: net: big_tcp.sh > > # Error: Failed to load TC action module. > > # We have an error talking to the kernel > > ... > > # Testing for BIG TCP: > > # CLI GSO | GW GRO | GW GSO | SER GRO > > # ./big_tcp.sh: line 107: test: !=: unary operator expected > > ... > > # onon on on : [FAIL_on_link1] > > > > Add the missing configs > > > > Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp") > > Signed-off-by: Paolo Abeni > > Ah, great, I was missing RF_RAW in the local hack. > I applied manually because looks like this change is on top of > something: > > patching file tools/testing/selftests/net/config > Hunk #3 succeeded at 73 with fuzz 1 (offset -1 lines). > Hunk #4 succeeded at 82 (offset -1 lines). Ooops... yes, indeed it's on top of the few patches I sent in the past days. > While at it I reordered the values a little bit to be closer to what > I think would get us closer to alphasort. Hope you don't mind. Sure thing I don't mind! I'm sorry for the extra work on you. Cheers, Paolo
Re: [PATCH net] selftests: net: add missing config for big tcp tests
On Mon, 2024-01-29 at 10:11 +0100, Paolo Abeni wrote: > On Fri, 2024-01-26 at 11:55 -0800, Jakub Kicinski wrote: > > On Fri, 26 Jan 2024 16:32:36 +0100 Paolo Abeni wrote: > > > The big_tcp test-case requires a few kernel knobs currently > > > not specified in the net selftests config, causing the > > > following failure: > > > > > > # selftests: net: big_tcp.sh > > > # Error: Failed to load TC action module. > > > # We have an error talking to the kernel > > > ... > > > # Testing for BIG TCP: > > > # CLI GSO | GW GRO | GW GSO | SER GRO > > > # ./big_tcp.sh: line 107: test: !=: unary operator expected > > > ... > > > # onon on on : [FAIL_on_link1] > > > > > > Add the missing configs > > > > > > Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp") > > > Signed-off-by: Paolo Abeni > > > > Ah, great, I was missing RF_RAW in the local hack. > > I applied manually because looks like this change is on top of > > something: > > > > patching file tools/testing/selftests/net/config > > Hunk #3 succeeded at 73 with fuzz 1 (offset -1 lines). > > Hunk #4 succeeded at 82 (offset -1 lines). > > Ooops... yes, indeed it's on top of the few patches I sent in the past > days. > > > While at it I reordered the values a little bit to be closer to what > > I think would get us closer to alphasort. Hope you don't mind. > > Sure thing I don't mind! I'm sorry for the extra work on you. Uhm... while the self-test doesn't emit anymore the message related to the missing modules, it still fails in the CI env and I can't reproduce the failures in my local env (the same for the gro.sh script). If I understand correctly, the tests run under double virtualization (a VM on top AWS?), is that correct? I guess the extra slowdown/overhead will need more care. Thanks, Paolo
[PATCH net 0/3] selftests: net: a few pmtu.sh fixes
This series try to address CI failures for the pmtu.sh tests. It does _not_ attempt to enable all the currently skipped cases, to avoid adding more entropy. Tested with: make -C tools/testing/selftests/ TARGETS=net install vng --build --config tools/testing/selftests/net/config vng --run . --user root -- \ ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t net:pmtu.sh Paolo Abeni (3): selftests: net: add missing config for pmtu.sh tests selftests: net: fix available tunnels detection selftests: net: don't access /dev/stdout in pmtu.sh tools/testing/selftests/net/config | 3 +++ tools/testing/selftests/net/pmtu.sh | 18 +- 2 files changed, 12 insertions(+), 9 deletions(-) -- 2.43.0
[PATCH net 1/3] selftests: net: add missing config for pmtu.sh tests
The mentioned test uses a few Kconfig still missing the net config, add them. Before: # Error: Specified qdisc kind is unknown. # Error: Specified qdisc kind is unknown. # Error: Qdisc not classful. # We have an error talking to the kernel # Error: Qdisc not classful. # We have an error talking to the kernel # policy_routing not supported # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [SKIP] After: # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [ OK ] Fixes: ec730c3e1f0e ("selftest: net: Test IPv4 PMTU exceptions with DSCP and ECN") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/config | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index d4b38177ed04..88835cdbf0ea 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -60,6 +60,7 @@ CONFIG_NET_SCH_HTB=m CONFIG_NET_SCH_FQ=m CONFIG_NET_SCH_ETF=m CONFIG_NET_SCH_NETEM=y +CONFIG_NET_SCH_PRIO=m CONFIG_NFT_COMPAT=m CONFIG_NF_FLOW_TABLE=m CONFIG_PSAMPLE=m @@ -77,6 +78,8 @@ CONFIG_NET_SCH_INGRESS=m CONFIG_NET_CLS_FLOWER=m CONFIG_NET_ACT_TUNNEL_KEY=m CONFIG_NET_ACT_MIRRED=m +CONFIG_NET_ACT_PEDIT=m +CONFIG_NET_ACT_CSUM=m CONFIG_BAREUDP=m CONFIG_IPV6_IOAM6_LWTUNNEL=y CONFIG_CRYPTO_SM4_GENERIC=y -- 2.43.0
[PATCH net 2/3] selftests: net: fix available tunnels detection
The pmtu.sh test tries to detect the tunnel protocols available in the running kernel and properly skip the unsupported cases. In a few more complex setup, such detection is unsuccessful, as the script currently ignores some intermediate error code at setup time. Before: # which: no nettest in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin) # TEST: vti6: PMTU exceptions (ESP-in-UDP)[FAIL] # PMTU exception wasn't created after creating tunnel exceeding link layer MTU # ./pmtu.sh: line 931: kill: (7543) - No such process # ./pmtu.sh: line 931: kill: (7544) - No such process After: # xfrm4 not supported # TEST: vti4: PMTU exceptions [SKIP] Fixes: ece1278a9b81 ("selftests: net: add ESP-in-UDP PMTU test") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/pmtu.sh | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index f10879788f61..31892b366913 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -707,23 +707,23 @@ setup_xfrm6() { } setup_xfrm4udp() { - setup_xfrm 4 ${veth4_a_addr} ${veth4_b_addr} "encap espinudp 4500 4500 0.0.0.0" - setup_nettest_xfrm 4 4500 + setup_xfrm 4 ${veth4_a_addr} ${veth4_b_addr} "encap espinudp 4500 4500 0.0.0.0" && \ + setup_nettest_xfrm 4 4500 } setup_xfrm6udp() { - setup_xfrm 6 ${veth6_a_addr} ${veth6_b_addr} "encap espinudp 4500 4500 0.0.0.0" - setup_nettest_xfrm 6 4500 + setup_xfrm 6 ${veth6_a_addr} ${veth6_b_addr} "encap espinudp 4500 4500 0.0.0.0" && \ + setup_nettest_xfrm 6 4500 } setup_xfrm4udprouted() { - setup_xfrm 4 ${prefix4}.${a_r1}.1 ${prefix4}.${b_r1}.1 "encap espinudp 4500 4500 0.0.0.0" - setup_nettest_xfrm 4 4500 + setup_xfrm 4 ${prefix4}.${a_r1}.1 ${prefix4}.${b_r1}.1 "encap espinudp 4500 4500 0.0.0.0" && \ + setup_nettest_xfrm 4 4500 } setup_xfrm6udprouted() { - setup_xfrm 6 ${prefix6}:${a_r1}::1 ${prefix6}:${b_r1}::1 "encap espinudp 4500 4500 0.0.0.0" - setup_nettest_xfrm 6 4500 + setup_xfrm 6 ${prefix6}:${a_r1}::1 ${prefix6}:${b_r1}::1 "encap espinudp 4500 4500 0.0.0.0" && \ + setup_nettest_xfrm 6 4500 } setup_routing_old() { -- 2.43.0
[PATCH net 3/3] selftests: net: don't access /dev/stdout in pmtu.sh
When running the pmtu.sh via the kselftest infra, accessing /dev/stdout gives unexpected results: # dd: failed to open '/dev/stdout': Device or resource busy # TEST: IPv4, bridged vxlan4: PMTU exceptions [FAIL] Let dd use directly the standard output to fix the above: # TEST: IPv4, bridged vxlan4: PMTU exceptions - nexthop objects [ OK ] Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/pmtu.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 31892b366913..3f118e3f1c66 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -1339,7 +1339,7 @@ test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() { sleep 1 - dd if=/dev/zero of=/dev/stdout status=none bs=1M count=1 | ${target} socat -T 3 -u STDIN $TCPDST,connect-timeout=3 + dd if=/dev/zero status=none bs=1M count=1 | ${target} socat -T 3 -u STDIN $TCPDST,connect-timeout=3 size=$(du -sb $tmpoutfile) size=${size%%/tmp/*} -- 2.43.0
Re: [PATCH net] selftests: net: add missing config for big tcp tests
On Mon, 2024-01-29 at 08:39 -0800, Jakub Kicinski wrote: > On Mon, 29 Jan 2024 17:31:33 +0100 Paolo Abeni wrote: > > Uhm... while the self-test doesn't emit anymore the message related to > > the missing modules, it still fails in the CI env and I can't reproduce > > the failures in my local env (the same for the gro.sh script). > > > > If I understand correctly, the tests run under double virtualization (a > > VM on top AWS?), is that correct? I guess the extra slowdown/overhead > > will need more care. > > Yes, it's VM inside a VM without nested virtualization support. > A weird setup, granted, but when we move to bare metal I'd like > to enable KASAN, which will probably cause a similar slowdown.. > > You could possibly get a similar slowdown by disabling HW virt / > KVM? Thanks, the above helped - that is, I can reproduce the failure running the self-tests in a VM with KVM disabled in the host. Funnily enough I can't use plain virtme for that - the virtme VM crashes on boot, possibly due to the wrong 'machine' argument passed to qemu. In any case I can't see a sane way to cope with such slow environments except skipping the sensitive cases. > FWIW far the 4 types of issues we've seen were: > - config missing > - OS doesn't ifup by default > - OS tools are old / buggy > - VM-in-VM is just too slow. > > There's a bunch of failures in forwarding which look like perf issues. > I wonder if we should introduce something in the settings file to let > tests know that they are running in very slow env? I was wondering about passing such info to the test e.g. via an env variable: vng --run . --user root -- HOST_IS_DAMN_SLOW=true ./tools/testing/selftests/kselftest_install/run_kselftest.sh -t In any case some tests should be updated to skip the relevant cases accordingly, right? Cheers, Paolo
[PATCH net 0/3] selftests: net: more fixes
Another small bunch of fixes, addressing issues outlined by the netdev CI. Paolo Abeni (3): selftests: net: cut more slack for gro fwd tests. selftests: net: fix setup_ns usage in rtnetlink.sh selftests: net: enable some more knobs tools/testing/selftests/net/config| 3 +++ tools/testing/selftests/net/rtnetlink.sh | 6 ++ tools/testing/selftests/net/udpgro_fwd.sh | 14 -- tools/testing/selftests/net/udpgso_bench_rx.c | 2 +- 4 files changed, 18 insertions(+), 7 deletions(-) -- 2.43.0
[PATCH net 2/3] selftests: net: fix setup_ns usage in rtnetlink.sh
The setup_ns helper marks the testns global variable as readonly. Later attempts to set such variable are unsuccessful, causing a couple test failures. Avoid completely the variable re-initialization and let the function access the global value. Fixes: ("selftests: rtnetlink: use setup_ns in bonding test") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/rtnetlink.sh | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 4667d74579d1..874a2952aa8e 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -440,7 +440,6 @@ kci_test_encap_vxlan() local ret=0 vxlan="test-vxlan0" vlan="test-vlan0" - testns="$1" run_cmd ip -netns "$testns" link add "$vxlan" type vxlan id 42 group 239.1.1.1 \ dev "$devdummy" dstport 4789 if [ $? -ne 0 ]; then @@ -485,7 +484,6 @@ kci_test_encap_fou() { local ret=0 name="test-fou" - testns="$1" run_cmd_grep 'Usage: ip fou' ip fou help if [ $? -ne 0 ];then end_test "SKIP: fou: iproute2 too old" @@ -526,8 +524,8 @@ kci_test_encap() run_cmd ip -netns "$testns" link set lo up run_cmd ip -netns "$testns" link add name "$devdummy" type dummy run_cmd ip -netns "$testns" link set "$devdummy" up - run_cmd kci_test_encap_vxlan "$testns" - run_cmd kci_test_encap_fou "$testns" + run_cmd kci_test_encap_vxlan + run_cmd kci_test_encap_fou ip netns del "$testns" return $ret -- 2.43.0
[PATCH net 1/3] selftests: net: cut more slack for gro fwd tests.
The udpgro_fwd.sh self-tests are somewhat unstable. There are a few timing constraints the we struggle to meet on very slow environments. Instead of skipping the whole tests in such envs, increase the test resilience WRT very slow hosts: increase the inter-packets timeouts, avoid resetting the counters every second and finally disable reduce the background traffic noise. Tested with: for I in $(seq 1 100); do ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t net:udpgro_fwd.sh || exit -1 done in a slow environment. Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/udpgro_fwd.sh | 14 -- tools/testing/selftests/net/udpgso_bench_rx.c | 2 +- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh index d6b9c759043c..9cd5e885e91f 100755 --- a/tools/testing/selftests/net/udpgro_fwd.sh +++ b/tools/testing/selftests/net/udpgro_fwd.sh @@ -39,6 +39,10 @@ create_ns() { for ns in $NS_SRC $NS_DST; do ip netns add $ns ip -n $ns link set dev lo up + + # disable route solicitations to decrease 'noise' traffic + ip netns exec $ns sysctl -qw net.ipv6.conf.default.router_solicitations=0 + ip netns exec $ns sysctl -qw net.ipv6.conf.all.router_solicitations=0 done ip link add name veth$SRC type veth peer name veth$DST @@ -80,6 +84,12 @@ create_vxlan_pair() { create_vxlan_endpoint $BASE$ns veth$ns $BM_NET_V6$((3 - $ns)) vxlan6$ns 6 ip -n $BASE$ns addr add dev vxlan6$ns $OL_NET_V6$ns/24 nodad done + + # preload neighbur cache, do avoid some noisy traffic + local addr_dst=$(ip -j -n $BASE$DST link show dev vxlan6$DST |jq -r '.[]["address"]') + local addr_src=$(ip -j -n $BASE$SRC link show dev vxlan6$SRC |jq -r '.[]["address"]') + ip -n $BASE$DST neigh add dev vxlan6$DST lladdr $addr_src $OL_NET_V6$SRC + ip -n $BASE$SRC neigh add dev vxlan6$SRC lladdr $addr_dst $OL_NET_V6$DST } is_ipv6() { @@ -119,7 +129,7 @@ run_test() { # not enable GRO ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 4789 ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 8000 - ip netns exec $NS_DST ./udpgso_bench_rx -C 1000 -R 10 -n 10 -l 1300 $rx_args & + ip netns exec $NS_DST ./udpgso_bench_rx -C 2000 -R 100 -n 10 -l 1300 $rx_args & local spid=$! wait_local_port_listen "$NS_DST" 8000 udp ip netns exec $NS_SRC ./udpgso_bench_tx $family -M 1 -s 13000 -S 1300 -D $dst @@ -168,7 +178,7 @@ run_bench() { # bind the sender and the receiver to different CPUs to try # get reproducible results ip netns exec $NS_DST bash -c "echo 2 > /sys/class/net/veth$DST/queues/rx-0/rps_cpus" - ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 1000 -R 10 & + ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 2000 -R 100 & local spid=$! wait_local_port_listen "$NS_DST" 8000 udp ip netns exec $NS_SRC taskset 0x1 ./udpgso_bench_tx $family -l 3 -S 1300 -D $dst diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c index f35a924d4a30..1cbadd267c96 100644 --- a/tools/testing/selftests/net/udpgso_bench_rx.c +++ b/tools/testing/selftests/net/udpgso_bench_rx.c @@ -375,7 +375,7 @@ static void do_recv(void) do_flush_udp(fd); tnow = gettimeofday_ms(); - if (tnow > treport) { + if (!cfg_expected_pkt_nr && tnow > treport) { if (packets) fprintf(stderr, "%s rx: %6lu MB/s %8lu calls/s\n", -- 2.43.0
[PATCH net 3/3] selftests: net: enable some more knobs
The rtnetlink tests require additional options currently off by default. Fixes: 2766a11161cc ("selftests: rtnetlink: add ipsec offload API test") Fixes: 5e596ee171ba ("selftests: add xfrm state-policy-monitor to rtnetlink.sh") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/config | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 58a716a6bf2a..a50b96bb3d80 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -21,6 +21,8 @@ CONFIG_CRYPTO_CHACHA20POLY1305=m CONFIG_VLAN_8021Q=y CONFIG_IFB=y CONFIG_INET_DIAG=y +CONFIG_INET_ESP=y +CONFIG_INET_ESP_OFFLOAD=y CONFIG_IP_GRE=m CONFIG_NETFILTER=y CONFIG_NETFILTER_ADVANCED=y @@ -91,3 +93,4 @@ CONFIG_IP_SCTP=m CONFIG_NETFILTER_XT_MATCH_POLICY=m CONFIG_CRYPTO_ARIA=y CONFIG_XFRM_INTERFACE=m +CONFIG_XFRM_USER=m -- 2.43.0
Re: [PATCH net-next] selftests/net: calibrate txtimestamp
On Wed, 2024-01-31 at 10:06 -0500, Willem de Bruijn wrote: > Otherwise I'll start with the gro and so-txtime tests. They may not > be so easily calibrated. As we cannot control the gro timeout, nor > the FQ max horizon. Note that we can control the GRO timeout to some extent, via gro_flush_timeout, see commit 89abe628375301fedb68770644df845d49018d8b. Unfortunately that is not enough for 'large' gro tests. I think the root cause is that the process sending the packets can be de-scheduled - even the qemu VM from the hypervisor CPU - causing an extremely large gap between consecutive pkts. I guess/hope that replacing multiple sendmsg() with a sendmmsg() could improve a bit the scenario, but I fear it will not solve the issue completely. > In such cases we can use the environment variable to either skip the > test entirely or --my preference-- run it to get code coverage, but > suppress a failure if due to timing (only). Sounds good? Sounds good to me! I was wondering about skipping the 'large' test only, but suppressing the failure when KSFT_MACHINE_SLOW=yes only for such test looks a better option. Thanks! Paolo
Re: [PATCH net-next] selftests: net: Fix bridge backup port test flakiness
On Thu, 2024-02-01 at 10:05 +0200, Ido Schimmel wrote: > The test toggles the carrier of a bridge port in order to test the > bridge backup port feature. > > Due to the linkwatch delayed work the carrier change is not always > reflected fast enough to the bridge driver and packets are not forwarded > as the test expects, resulting in failures [1]. > > Fix by adding a one second delay after a carrier change in places where > a packet is sent immediately after the carrier change. What about adding an helper alike wait_local_port_listen(), checking for bridge link status in short intervals, to likely reduce the overall wait time? Thanks, Paolo
Re: [PATCH net-next] selftests: net: Fix bridge backup port test flakiness
On Thu, 2024-02-01 at 13:16 +0200, Ido Schimmel wrote: > On Thu, Feb 01, 2024 at 10:34:52AM +0100, Paolo Abeni wrote: > > What about adding an helper alike wait_local_port_listen(), checking > > for bridge link status in short intervals, to likely reduce the overall > > wait time? > > What about the below? > > diff --git a/tools/testing/selftests/net/test_bridge_backup_port.sh > b/tools/testing/selftests/net/test_bridge_backup_port.sh > index 70a7d87ba2d2..1b3f89e2b86e 100755 > --- a/tools/testing/selftests/net/test_bridge_backup_port.sh > +++ b/tools/testing/selftests/net/test_bridge_backup_port.sh > @@ -124,6 +124,16 @@ tc_check_packets() > [[ $pkts == $count ]] > } > > +bridge_link_check() > +{ > + local ns=$1; shift > + local dev=$1; shift > + local state=$1; shift > + > + bridge -n $ns -d -j link show dev $dev | \ > + jq -e ".[][\"state\"] == \"$state\"" &> /dev/null > +} I was wondering more about a sleeping loop, something alike: wait_bridge_link() { for i in $(seq 10); do if bridge_link_check $1 $2 $3; then break fi sleep 0.1 done } but no strong preference Cheers, Paolo
[PATCH v2 net 0/4] selftests: net: more fixes
Another small bunch of fixes, addressing issues outlined by the netdev CI. The first 2 patches are just rebased. The following 2 are new fixes, for even more problems that surfaced meanwhile. Paolo Abeni (4): selftests: net: cut more slack for gro fwd tests. selftests: net: fix setup_ns usage in rtnetlink.sh selftests: net: fix tcp listener handling in pmtu.sh selftests: net: avoid just another constant wait tools/testing/selftests/net/pmtu.sh | 23 ++- tools/testing/selftests/net/rtnetlink.sh | 6 ++--- tools/testing/selftests/net/udpgro_fwd.sh | 14 +-- tools/testing/selftests/net/udpgso_bench_rx.c | 2 +- 4 files changed, 32 insertions(+), 13 deletions(-) -- 2.43.0
[PATCH v2 net 1/4] selftests: net: cut more slack for gro fwd tests.
The udpgro_fwd.sh self-tests are somewhat unstable. There are a few timing constraints the we struggle to meet on very slow environments. Instead of skipping the whole tests in such envs, increase the test resilience WRT very slow hosts: increase the inter-packets timeouts, avoid resetting the counters every second and finally disable reduce the background traffic noise. Tested with: for I in $(seq 1 100); do ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t net:udpgro_fwd.sh || exit -1 done in a slow environment. Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/udpgro_fwd.sh | 14 -- tools/testing/selftests/net/udpgso_bench_rx.c | 2 +- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh index d6b9c759043c..9cd5e885e91f 100755 --- a/tools/testing/selftests/net/udpgro_fwd.sh +++ b/tools/testing/selftests/net/udpgro_fwd.sh @@ -39,6 +39,10 @@ create_ns() { for ns in $NS_SRC $NS_DST; do ip netns add $ns ip -n $ns link set dev lo up + + # disable route solicitations to decrease 'noise' traffic + ip netns exec $ns sysctl -qw net.ipv6.conf.default.router_solicitations=0 + ip netns exec $ns sysctl -qw net.ipv6.conf.all.router_solicitations=0 done ip link add name veth$SRC type veth peer name veth$DST @@ -80,6 +84,12 @@ create_vxlan_pair() { create_vxlan_endpoint $BASE$ns veth$ns $BM_NET_V6$((3 - $ns)) vxlan6$ns 6 ip -n $BASE$ns addr add dev vxlan6$ns $OL_NET_V6$ns/24 nodad done + + # preload neighbur cache, do avoid some noisy traffic + local addr_dst=$(ip -j -n $BASE$DST link show dev vxlan6$DST |jq -r '.[]["address"]') + local addr_src=$(ip -j -n $BASE$SRC link show dev vxlan6$SRC |jq -r '.[]["address"]') + ip -n $BASE$DST neigh add dev vxlan6$DST lladdr $addr_src $OL_NET_V6$SRC + ip -n $BASE$SRC neigh add dev vxlan6$SRC lladdr $addr_dst $OL_NET_V6$DST } is_ipv6() { @@ -119,7 +129,7 @@ run_test() { # not enable GRO ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 4789 ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 8000 - ip netns exec $NS_DST ./udpgso_bench_rx -C 1000 -R 10 -n 10 -l 1300 $rx_args & + ip netns exec $NS_DST ./udpgso_bench_rx -C 2000 -R 100 -n 10 -l 1300 $rx_args & local spid=$! wait_local_port_listen "$NS_DST" 8000 udp ip netns exec $NS_SRC ./udpgso_bench_tx $family -M 1 -s 13000 -S 1300 -D $dst @@ -168,7 +178,7 @@ run_bench() { # bind the sender and the receiver to different CPUs to try # get reproducible results ip netns exec $NS_DST bash -c "echo 2 > /sys/class/net/veth$DST/queues/rx-0/rps_cpus" - ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 1000 -R 10 & + ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 2000 -R 100 & local spid=$! wait_local_port_listen "$NS_DST" 8000 udp ip netns exec $NS_SRC taskset 0x1 ./udpgso_bench_tx $family -l 3 -S 1300 -D $dst diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c index f35a924d4a30..1cbadd267c96 100644 --- a/tools/testing/selftests/net/udpgso_bench_rx.c +++ b/tools/testing/selftests/net/udpgso_bench_rx.c @@ -375,7 +375,7 @@ static void do_recv(void) do_flush_udp(fd); tnow = gettimeofday_ms(); - if (tnow > treport) { + if (!cfg_expected_pkt_nr && tnow > treport) { if (packets) fprintf(stderr, "%s rx: %6lu MB/s %8lu calls/s\n", -- 2.43.0
[PATCH v2 net 2/4] selftests: net: fix setup_ns usage in rtnetlink.sh
The setup_ns helper marks the testns global variable as readonly. Later attempts to set such variable are unsuccessful, causing a couple test failures. Avoid completely the variable re-initialization and let the function access the global value. Fixes: ("selftests: rtnetlink: use setup_ns in bonding test") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/rtnetlink.sh | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 4667d74579d1..874a2952aa8e 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -440,7 +440,6 @@ kci_test_encap_vxlan() local ret=0 vxlan="test-vxlan0" vlan="test-vlan0" - testns="$1" run_cmd ip -netns "$testns" link add "$vxlan" type vxlan id 42 group 239.1.1.1 \ dev "$devdummy" dstport 4789 if [ $? -ne 0 ]; then @@ -485,7 +484,6 @@ kci_test_encap_fou() { local ret=0 name="test-fou" - testns="$1" run_cmd_grep 'Usage: ip fou' ip fou help if [ $? -ne 0 ];then end_test "SKIP: fou: iproute2 too old" @@ -526,8 +524,8 @@ kci_test_encap() run_cmd ip -netns "$testns" link set lo up run_cmd ip -netns "$testns" link add name "$devdummy" type dummy run_cmd ip -netns "$testns" link set "$devdummy" up - run_cmd kci_test_encap_vxlan "$testns" - run_cmd kci_test_encap_fou "$testns" + run_cmd kci_test_encap_vxlan + run_cmd kci_test_encap_fou ip netns del "$testns" return $ret -- 2.43.0
[PATCH v2 net 3/4] selftests: net: fix tcp listener handling in pmtu.sh
The pmtu.sh test uses a few TCP listener in a problematic way: It hard-codes a constant timeout to wait for the listener starting-up in background. That introduces unneeded latency and on very slow and busy host it can fail. Additionally the test starts again the same listener in the same namespace on the same port, just after the previous connection completed. Fast host can attempt starting the new server before the old one really closed the socket. Address the issues using the wait_local_port_listen helper and explicitly waiting for the background listener process exit. Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/pmtu.sh | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 3f118e3f1c66..f0febc19baae 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -199,6 +199,7 @@ # Same as above but with IPv6 source lib.sh +source net_helper.sh PAUSE_ON_FAIL=no VERBOSE=0 @@ -1336,13 +1337,15 @@ test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() { TCPDST="TCP:[${dst}]:5" fi ${ns_b} socat -T 3 -u -6 TCP-LISTEN:5 STDOUT > $tmpoutfile & + local socat_pid=$! - sleep 1 + wait_local_port_listen ${NS_B} 5 tcp dd if=/dev/zero status=none bs=1M count=1 | ${target} socat -T 3 -u STDIN $TCPDST,connect-timeout=3 size=$(du -sb $tmpoutfile) size=${size%%/tmp/*} + wait ${socat_pid} [ $size -ne 1048576 ] && err "File size $size mismatches exepcted value in locally bridged vxlan test" && return 1 done -- 2.43.0
[PATCH v2 net 4/4] selftests: net: avoid just another constant wait
Using hard-coded constant timeout to wait for some expected event is deemed to fail sooner or later, especially in slow env. Our CI has spotted another of such race: # TEST: ipv6: cleanup of cached exceptions - nexthop objects [FAIL] # can't delete veth device in a timely manner, PMTU dst likely leaked Replace the crude sleep with a loop looking for the expected condition at low interval for a much longer range. Fixes: b3cc4f8a8a41 ("selftests: pmtu: add explicit tests for PMTU exceptions cleanup") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/pmtu.sh | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index f0febc19baae..d65fdd407d73 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -1957,6 +1957,13 @@ check_command() { return 0 } +check_running() { + pid=${1} + cmd=${2} + + [ "$(cat /proc/${pid}/cmdline 2>/dev/null | tr -d '\0')" = "{cmd}" ] +} + test_cleanup_vxlanX_exception() { outer="${1}" encap="vxlan" @@ -1987,11 +1994,12 @@ test_cleanup_vxlanX_exception() { ${ns_a} ip link del dev veth_A-R1 & iplink_pid=$! - sleep 1 - if [ "$(cat /proc/${iplink_pid}/cmdline 2>/dev/null | tr -d '\0')" = "iplinkdeldevveth_A-R1" ]; then - err " can't delete veth device in a timely manner, PMTU dst likely leaked" - return 1 - fi + for i in $(seq 1 20); do + check_running ${iplink_pid} "iplinkdeldevveth_A-R1" || return 0 + sleep 0.1 + done + err " can't delete veth device in a timely manner, PMTU dst likely leaked" + return 1 } test_cleanup_ipv6_exception() { -- 2.43.0
[PATCH net] selftests: net: let big_tcp test cope with slow env
In very slow environments, most big TCP cases including segmentation and reassembly of big TCP packets have a good chance to fail: by default the TCP client uses write size well below 64K. If the host is low enough autocorking is unable to build real big TCP packets. Address the issue using much larger write operations. Note that is hard to observe the issue without an extremely slow and/or overloaded environment; reduce the TCP transfer time to allow for much easier/faster reproducibility. Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/big_tcp.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh index cde9a91c4797..2db9d15cd45f 100755 --- a/tools/testing/selftests/net/big_tcp.sh +++ b/tools/testing/selftests/net/big_tcp.sh @@ -122,7 +122,9 @@ do_netperf() { local netns=$1 [ "$NF" = "6" ] && serip=$SERVER_IP6 - ip net exec $netns netperf -$NF -t TCP_STREAM -H $serip 2>&1 >/dev/null + + # use large write to be sure to generate big tcp packets + ip net exec $netns netperf -$NF -t TCP_STREAM -l 1 -H $serip -- -m 262144 2>&1 >/dev/null } do_test() { -- 2.43.0
Re: [PATCH net] selftests: net: let big_tcp test cope with slow env
On Fri, 2024-02-02 at 17:13 +0100, Eric Dumazet wrote: > On Fri, Feb 2, 2024 at 5:07 PM Paolo Abeni wrote: > > > > In very slow environments, most big TCP cases including > > segmentation and reassembly of big TCP packets have a good > > chance to fail: by default the TCP client uses write size > > well below 64K. If the host is low enough autocorking is > > unable to build real big TCP packets. > > > > Address the issue using much larger write operations. > > > > Note that is hard to observe the issue without an extremely > > slow and/or overloaded environment; reduce the TCP transfer > > time to allow for much easier/faster reproducibility. > > > > Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp") > > Signed-off-by: Paolo Abeni > > --- > > tools/testing/selftests/net/big_tcp.sh | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/tools/testing/selftests/net/big_tcp.sh > > b/tools/testing/selftests/net/big_tcp.sh > > index cde9a91c4797..2db9d15cd45f 100755 > > --- a/tools/testing/selftests/net/big_tcp.sh > > +++ b/tools/testing/selftests/net/big_tcp.sh > > @@ -122,7 +122,9 @@ do_netperf() { > > local netns=$1 > > > > [ "$NF" = "6" ] && serip=$SERVER_IP6 > > - ip net exec $netns netperf -$NF -t TCP_STREAM -H $serip 2>&1 > > >/dev/null > > + > > + # use large write to be sure to generate big tcp packets > > + ip net exec $netns netperf -$NF -t TCP_STREAM -l 1 -H $serip -- -m > > 262144 2>&1 >/dev/null > > } > > Interesting. > > I think we set tcp_wmem[1] to 262144 in our hosts. I think netperf > default depends on tcp_wmem[1] I haven't dug into the netperf source, but the above would be consistent with what I observe: in my VM I see 16Kb writes and tcp_wmem[1] is 16K. > Reviewed-by: Eric Dumazet Thanks! Paolo
Re: [PATCH v2 net 2/4] selftests: net: fix setup_ns usage in rtnetlink.sh
On Thu, 2024-02-01 at 19:42 +0100, Paolo Abeni wrote: > The setup_ns helper marks the testns global variable as > readonly. Later attempts to set such variable are unsuccessful, > causing a couple test failures. > > Avoid completely the variable re-initialization and let the > function access the global value. > > Fixes: ("selftests: rtnetlink: use setup_ns in bonding test") The above should be: Fixes: e9ce7ededf14 ("selftests: rtnetlink: use setup_ns in bonding test")
Re: [PATCH net-next] selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW
On Fri, 2024-02-02 at 19:31 -0500, Willem de Bruijn wrote: > Jakub Kicinski wrote: > > On Thu, 1 Feb 2024 11:21:19 -0500 Willem de Bruijn wrote: > > > This test is time sensitive. It may fail on virtual machines and for > > > debug builds. > > > > > > Continue to run in these environments to get code coverage. But > > > optionally suppress failure for timing errors (only). This is > > > controlled with environment variable KSFT_MACHINE_SLOW. > > > > > > The test continues to return 0 (KSFT_PASS), rather than KSFT_XFAIL > > > as previously discussed. Because making so_txtime.c return that and > > > then making so_txtime.sh capture runs that pass that vs KSFT_FAIL > > > and pass it on added a bunch of (fragile bash) boilerplate, while the > > > result is interpreted the same as KSFT_PASS anyway. > > > > FWIW another idea that came up when talking to Matthieu - > > isolate the VMs which run time-sensitive tests to dedicated > > CPUs. Right now we kick off around 70 4 CPU VMs and let them > > battle for 72 cores. The machines don't look overloaded but > > there can be some latency spikes (CPU use diagram attached). > > > > So the idea would be to have a handful of special VMs running > > on dedicated CPUs without any CPU time competition. That could help > > with latency spikes. But we'd probably need to annotate the tests > > which need some special treatment. > > > > Probably too much work both to annotate tests and set up env, > > but I thought I'd bring it up here in case you had an opinion. > > I'm not sure whether the issue with timing in VMs is CPU affinity. > Variance may just come from expensive hypercalls, even with a > dedicated CPU. Though tests can tell. FTR, I think the CPU affinity setup is a bit too complex, and hard to reproduce for 3rd parties willing to investigate eventual future CI failures, I think the current env-variable-based approach would help with reproducibility. > There's still the debug builds, as well. I understand/hope you are investigating it? Cheers, Paolo
[PATCH net] selftests: net: cope with slow env in gro.sh test
The gro self-tests sends the packets to be aggregated with multiple write operations. When running is slow environment, it's hard to guarantee that the GRO engine will wait for the last packet in an intended train. The above causes almost deterministic failures in our CI for the 'large' test-case. Address the issue explicitly ignoring failures for such case in slow environments (KSFT_MACHINE_SLOW==true). Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Signed-off-by: Paolo Abeni --- Note that the fixes tag is there mainly to justify targeting the net tree, and this is aiming at net to hopefully make the test more stable ASAP for both trees. I experimented with a largish refactory replacing the multiple writes with a single GSO packet, but exhausted by time budget before reaching any good result. --- tools/testing/selftests/net/gro.sh | 4 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh index 19352f106c1d..114b5281a3f5 100755 --- a/tools/testing/selftests/net/gro.sh +++ b/tools/testing/selftests/net/gro.sh @@ -31,6 +31,10 @@ run_test() { 1>>log.txt wait "${server_pid}" exit_code=$? +if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then +echo "Ignoring errors due to slow environment" 1>&2 +exit_code=0 +fi if [[ "${exit_code}" -eq 0 ]]; then break; fi -- 2.43.0
[PATCH net-next] selftests: net: include forwarding lib
The altnames test uses the forwarding/lib.sh and that dependency currently causes failures when running the test after install: make -C tools/testing/selftests/ TARGETS=net install ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t net:altnames.sh # ... # ./altnames.sh: line 8: ./forwarding/lib.sh: No such file or directory # RTNETLINK answers: Operation not permitted # ./altnames.sh: line 73: tests_run: command not found # ./altnames.sh: line 65: pre_cleanup: command not found Address the issue leveraging the TEST_INCLUDES infrastructure provided by commit 2a0683be5b4c ("selftests: Introduce Makefile variable to list shared bash scripts") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 211753756bde..7b6918d5f4af 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -97,6 +97,8 @@ TEST_PROGS += vlan_hw_filter.sh TEST_FILES := settings TEST_FILES += in_netns.sh lib.sh net_helper.sh setup_loopback.sh setup_veth.sh +TEST_INCLUDES := forwarding/lib.sh + include ../lib.mk $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma -- 2.43.0
Re: [PATCH net] selftests: net: cope with slow env in gro.sh test
On Wed, 2024-02-07 at 12:16 +0100, Matthieu Baerts wrote: > Hi Paolo, > > On 06/02/2024 16:27, Paolo Abeni wrote: > > The gro self-tests sends the packets to be aggregated with > > multiple write operations. > > > > When running is slow environment, it's hard to guarantee that > > the GRO engine will wait for the last packet in an intended > > train. > > > > The above causes almost deterministic failures in our CI for > > the 'large' test-case. > > > > Address the issue explicitly ignoring failures for such case > > in slow environments (KSFT_MACHINE_SLOW==true). > > To what value is KSFT_MACHINE_SLOW set in the CI? AFAIK, the CI initialize KSFT_MACHINE_SLOW (to true) only on slow env. > Is it set to a different value if the machine is not slow? e.g. > > KSFT_MACHINE_SLOW == false > > (please see below) > > > Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") > > Signed-off-by: Paolo Abeni > > --- > > Note that the fixes tag is there mainly to justify targeting the net > > tree, and this is aiming at net to hopefully make the test more stable > > ASAP for both trees. > > > > I experimented with a largish refactory replacing the multiple writes > > with a single GSO packet, but exhausted by time budget before reaching > > any good result. > > --- > > tools/testing/selftests/net/gro.sh | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/tools/testing/selftests/net/gro.sh > > b/tools/testing/selftests/net/gro.sh > > index 19352f106c1d..114b5281a3f5 100755 > > --- a/tools/testing/selftests/net/gro.sh > > +++ b/tools/testing/selftests/net/gro.sh > > @@ -31,6 +31,10 @@ run_test() { > >1>>log.txt > > wait "${server_pid}" > > exit_code=$? > > +if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then > > Maybe best to avoid using: > > -n "${KSFT_MACHINE_SLOW}" > > Otherwise, we have the same behaviour if KSFT_MACHINE_SLOW is set to > 1/yes/true or 0/no/false. For consistency, I followed the logic already in place in commit c41dfb0dfbec ("selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW"). > But maybe it is fine like that, and what is just missing is adding > somewhere how KSFT_MACHINE_SLOW is supposed to be set/used? :) > > > Not linked to that, but a small detail about the new line, just in case > you need to send a v2: it looks like it is better to avoid using '-a': > > https://www.shellcheck.net/wiki/SC2166 Thank for the pointer, I was not aware of that. I guess a v2 dropping '-a' would be better. Thanks, Paolo
[PATCH net] selftests: net: add more missing kernel config
The reuseport_addr_any.sh is currently skipping DCCP tests and pmtu.sh is skipping all the FOU/GUE related cases: add the missing options. Signed-off-by: Paolo Abeni --- Note that this does not include the - still missing - OVS-related option and pmtu.sh is will keep skipping such cases. Such tests will still fail in the virtme environment even with the relevant kernel options enabled, as they have an hard to solve dependency on systemd/dbus. The longer term plan is to move such test cases in the openvswitch directory. One short term option to avoid skips in selftests results while retaining the potential code coverage would be making the ovs tests disabled by default but reachable via pmtu.sh command line arguments. --- tools/testing/selftests/net/config | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 3b749addd364..54d21e2911a9 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -13,6 +13,10 @@ CONFIG_IPV6=y CONFIG_IPV6_MULTIPLE_TABLES=y CONFIG_VETH=y CONFIG_NET_IPVTI=y +CONFIG_NET_FOU=y +CONFIG_NET_FOU_IP_TUNNELS=y +CONFIG_NET_IPIP=y +CONFIG_IPV6_SIT=y CONFIG_IPV6_VTI=y CONFIG_DUMMY=y CONFIG_BRIDGE_VLAN_FILTERING=y @@ -24,6 +28,7 @@ CONFIG_IFB=y CONFIG_INET_DIAG=y CONFIG_INET_ESP=y CONFIG_INET_ESP_OFFLOAD=y +CONFIG_IP_DCCP=m CONFIG_IP_GRE=m CONFIG_NETFILTER=y CONFIG_NETFILTER_ADVANCED=y -- 2.43.0
[PATCH net v2] selftests: net: cope with slow env in gro.sh test
The gro self-tests sends the packets to be aggregated with multiple write operations. When running is slow environment, it's hard to guarantee that the GRO engine will wait for the last packet in an intended train. The above causes almost deterministic failures in our CI for the 'large' test-case. Address the issue explicitly ignoring failures for such case in slow environments (KSFT_MACHINE_SLOW==true). Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Reviewed-by: Willem de Bruijn Signed-off-by: Paolo Abeni --- v1 -> v2: - replace the '-a' operator with '&&' - Mattbe Note that the fixes tag is there mainly to justify targeting the net tree, and this is aiming at net to hopefully make the test more stable ASAP for both trees. I experimented with a largish refactory replacing the multiple writes with a single GSO packet, but exhausted by time budget before reaching any good result. --- tools/testing/selftests/net/gro.sh | 4 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh index 19352f106c1d..3190b41e8bfc 100755 --- a/tools/testing/selftests/net/gro.sh +++ b/tools/testing/selftests/net/gro.sh @@ -31,6 +31,10 @@ run_test() { 1>>log.txt wait "${server_pid}" exit_code=$? +if [[ ${test} == "large" && -n "${KSFT_MACHINE_SLOW}" ]]; then +echo "Ignoring errors due to slow environment" 1>&2 +exit_code=0 +fi if [[ "${exit_code}" -eq 0 ]]; then break; fi -- 2.43.0
[PATCH net] selftests: net: wait for receiver startup in so_txtime.sh
The mentioned test is failing in slow environments: # SO_TXTIME ipv4 clock monotonic # ./so_txtime: recv: timeout: Resource temporarily unavailable not ok 1 selftests: net: so_txtime.sh # exit=1 The receiver is started in background and the sender could end-up transmitting the packet before the receiver is ready, so that the later recv times out. Address the issue explcitly waiting for the socket being bound to the relevant port. Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") Signed-off-by: Paolo Abeni --- Note that to really cope with slow env the mentioned self-tests also need net-next commit c41dfb0dfbec ("selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW"), so this could be applied to net-next, too --- tools/testing/selftests/net/so_txtime.sh | 6 ++ 1 file changed, 6 insertions(+) diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh index 3f06f4d286a9..ade0e5755099 100755 --- a/tools/testing/selftests/net/so_txtime.sh +++ b/tools/testing/selftests/net/so_txtime.sh @@ -5,6 +5,8 @@ set -e +source net_helper.sh + readonly DEV="veth0" readonly BIN="./so_txtime" @@ -51,13 +53,16 @@ do_test() { local readonly CLOCK="$2" local readonly TXARGS="$3" local readonly RXARGS="$4" + local PROTO if [[ "${IP}" == "4" ]]; then local readonly SADDR="${SADDR4}" local readonly DADDR="${DADDR4}" + PROTO=udp elif [[ "${IP}" == "6" ]]; then local readonly SADDR="${SADDR6}" local readonly DADDR="${DADDR6}" + PROTO=udp6 else echo "Invalid IP version ${IP}" exit 1 @@ -65,6 +70,7 @@ do_test() { local readonly START="$(date +%s%N --date="+ 0.1 seconds")" ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r & + wait_local_port_listen "${NS2}" 8000 "${PROTO}" ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}" wait "$!" } -- 2.43.0
Re: [PATCH net v2] selftests: net: Fix bridge backup port test flakiness
On Thu, 2024-02-08 at 14:31 +0200, Ido Schimmel wrote: > The test toggles the carrier of a bridge port in order to test the > bridge backup port feature. > > Due to the linkwatch delayed work the carrier change is not always > reflected fast enough to the bridge driver and packets are not forwarded > as the test expects, resulting in failures [1]. > > Fix by busy waiting on the bridge port state until it changes to the > desired state following the carrier change. > > [1] > # Backup port > # --- > [...] > # TEST: swp1 carrier off [ OK ] > # TEST: No forwarding out of swp1 [FAIL] > [ 641.995910] br0: port 1(swp1) entered disabled state > # TEST: No forwarding out of vx0 [ OK ] > > Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup > nexthop ID test") > Signed-off-by: Ido Schimmel > Reviewed-by: Petr Machata > --- > > Notes: > v2: > * Use busy waiting instead of 1 second sleep. Fine by be, thanks! Acked-by: Paolo Abeni
[PATCH net-next] selftests: net: ignore timing errors in txtimestamp if KSFT_MACHINE_SLOW
This test is time sensitive. It may fail on virtual machines and for debug builds. Similar to commit c41dfb0dfbec ("selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW"), optionally suppress failure for timing errors (only). Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/txtimestamp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/txtimestamp.c b/tools/testing/selftests/net/txtimestamp.c index 10f2fde3686b..ec60a16c9307 100644 --- a/tools/testing/selftests/net/txtimestamp.c +++ b/tools/testing/selftests/net/txtimestamp.c @@ -163,7 +163,8 @@ static void validate_timestamp(struct timespec *cur, int min_delay) if (cur64 < start64 + min_delay || cur64 > start64 + max_delay) { fprintf(stderr, "ERROR: %" PRId64 " us expected between %d and %d\n", cur64 - start64, min_delay, max_delay); - test_failed = true; + if (!getenv("KSFT_MACHINE_SLOW")) + test_failed = true; } } -- 2.43.0
Re: [PATCH net] selftests: net: wait for receiver startup in so_txtime.sh
On Thu, 2024-02-08 at 16:45 +0100, Paolo Abeni wrote: > The mentioned test is failing in slow environments: > > # SO_TXTIME ipv4 clock monotonic > # ./so_txtime: recv: timeout: Resource temporarily unavailable > not ok 1 selftests: net: so_txtime.sh # exit=1 > > The receiver is started in background and the sender could end-up > transmitting the packet before the receiver is ready, so that the > later recv times out. > > Address the issue explcitly waiting for the socket being bound to > the relevant port. > > Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") > Signed-off-by: Paolo Abeni > --- > Note that to really cope with slow env the mentioned self-tests also > need net-next commit c41dfb0dfbec ("selftests/net: ignore timing > errors in so_txtime if KSFT_MACHINE_SLOW"), so this could be applied to > net-next, too Oops... CI is saying the above is not enough... > @@ -65,6 +70,7 @@ do_test() { > > local readonly START="$(date +%s%N --date="+ 0.1 seconds")" > ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S > "${SADDR}" -D "${DADDR}" "${RXARGS}" -r & > + wait_local_port_listen "${NS2}" 8000 "${PROTO}" > ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S > "${SADDR}" -D "${DADDR}" "${TXARGS}" The binary explicitly waits up to $START time, and that conflicts with the wait_local_port_listen, something different is needed. Apparently I was just "lucky" during my local testing. Cheers, Paolo
Re: [PATCH net] selftests: net: wait for receiver startup in so_txtime.sh
On Fri, 2024-02-09 at 15:51 +0100, Paolo Abeni wrote: > On Thu, 2024-02-08 at 16:45 +0100, Paolo Abeni wrote: > > The mentioned test is failing in slow environments: > > > > # SO_TXTIME ipv4 clock monotonic > > # ./so_txtime: recv: timeout: Resource temporarily unavailable > > not ok 1 selftests: net: so_txtime.sh # exit=1 > > > > The receiver is started in background and the sender could end-up > > transmitting the packet before the receiver is ready, so that the > > later recv times out. > > > > Address the issue explcitly waiting for the socket being bound to > > the relevant port. > > > > Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") > > Signed-off-by: Paolo Abeni > > --- > > Note that to really cope with slow env the mentioned self-tests also > > need net-next commit c41dfb0dfbec ("selftests/net: ignore timing > > errors in so_txtime if KSFT_MACHINE_SLOW"), so this could be applied to > > net-next, too > > Oops... CI is saying the above is not enough... > > > @@ -65,6 +70,7 @@ do_test() { > > > > local readonly START="$(date +%s%N --date="+ 0.1 seconds")" > > ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S > > "${SADDR}" -D "${DADDR}" "${RXARGS}" -r & > > + wait_local_port_listen "${NS2}" 8000 "${PROTO}" > > ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S > > "${SADDR}" -D "${DADDR}" "${TXARGS}" > > The binary explicitly waits up to $START time, and that conflicts with > the wait_local_port_listen, something different is needed. Apparently I > was just "lucky" during my local testing. I experimented a few different solutions and so far the only option that gave some positive result is increasing start delay and the etf delta by an order of magnitude, see below. But I'm pretty sure that even with that there will be sporadic failures in slow enough environments. When the host-induced jitter/delay is high enough, packets are dropped and there are functional failures. I'm wondering if we should skip this test entirely when KSFT_MACHINE_SLOW=yes. Do you see any other options? Paolo --- diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh index 3f06f4d286a9..6445580f0a66 100755 --- a/tools/testing/selftests/net/so_txtime.sh +++ b/tools/testing/selftests/net/so_txtime.sh @@ -63,7 +63,9 @@ do_test() { exit 1 fi - local readonly START="$(date +%s%N --date="+ 0.1 seconds")" + local delta=0.1 + [ -n "${KSFT_MACHINE_SLOW}" ] && delta=1 + local readonly START="$(date +%s%N --date="+ ${delta} seconds")" ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r & ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}" wait "$!" @@ -76,7 +78,9 @@ do_test 6 mono a,10 a,10 do_test 4 mono a,10,b,20 a,10,b,20 do_test 6 mono a,20,b,10 b,20,a,20 -if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta 40; then +delta=40 +[ -n "${KSFT_MACHINE_SLOW}" ] && delta=$((delta*10)) +if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta "${delta}"; then ! do_test 4 tai a,-1 a,-1 ! do_test 6 tai a,0 a,0 do_test 6 tai a,10 a,10
Re: [PATCH net] selftests: net: wait for receiver startup in so_txtime.sh
On Fri, 2024-02-09 at 11:17 -0800, Jakub Kicinski wrote: > On Fri, 09 Feb 2024 17:45:28 +0100 Paolo Abeni wrote: > > But I'm pretty sure that even with that there will be sporadic failures > > in slow enough environments. > > > > When the host-induced jitter/delay is high enough, packets are dropped > > and there are functional failures. I'm wondering if we should skip this > > test entirely when KSFT_MACHINE_SLOW=yes. > > By skip do you mean the same approach as to the gro test? > Ignore errors? Because keeping the code coverage for KASAN etc. > would still be good (stating the obvious, sorry). I see my wording was not clear/misleading, I'm sorry. Yes, I mean checking KSFT_MACHINE_SLOW in the caller script and ignoring errors. Cheers, Paolo
Re: [PATCH net v2] selftests: net: cope with slow env in gro.sh test
On Fri, 2024-02-09 at 11:15 -0800, Jakub Kicinski wrote: > On Wed, 7 Feb 2024 19:36:46 +0100 Paolo Abeni wrote: > > +if [[ ${test} == "large" && -n "${KSFT_MACHINE_SLOW}" ]]; then > > +echo "Ignoring errors due to slow environment" 1>&2 > > +exit_code=0 > > +fi > > Would it make sense to also add "&& $exit_code -ne 0" ? > It may be useful to see in logs how many times we actually > ignored the error? Yep, I'll send a v3. Thanks! Paolo
[PATCH v3 net] selftests: net: cope with slow env in gro.sh test
The gro self-tests sends the packets to be aggregated with multiple write operations. When running is slow environment, it's hard to guarantee that the GRO engine will wait for the last packet in an intended train. The above causes almost deterministic failures in our CI for the 'large' test-case. Address the issue explicitly ignoring failures for such case in slow environments (KSFT_MACHINE_SLOW==true). Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Reviewed-by: Willem de Bruijn Signed-off-by: Paolo Abeni --- v2 -> v3: - dump the error suppression message only on actual errors (Jakub) v1 -> v2: - replace the '-a' operator with '&&' - Mattbe Note that the fixes tag is there mainly to justify targeting the net tree, and this is aiming at net to hopefully make the test more stable ASAP for both trees. --- tools/testing/selftests/net/gro.sh | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh index 19352f106c1d..02c21ff4ca81 100755 --- a/tools/testing/selftests/net/gro.sh +++ b/tools/testing/selftests/net/gro.sh @@ -31,6 +31,11 @@ run_test() { 1>>log.txt wait "${server_pid}" exit_code=$? +if [[ ${test} == "large" && -n "${KSFT_MACHINE_SLOW}" && \ + ${exit_code} -ne 0 ]]; then +echo "Ignoring errors due to slow environment" 1>&2 +exit_code=0 +fi if [[ "${exit_code}" -eq 0 ]]; then break; fi -- 2.43.0
[PATCH net] selftests: net: cope with slow env in so_txtime.sh test
The mentioned test is failing in slow environments: # SO_TXTIME ipv4 clock monotonic # ./so_txtime: recv: timeout: Resource temporarily unavailable not ok 1 selftests: net: so_txtime.sh # exit=1 Tuning the tolerance in the test binary is error-prone and doomed to failures is slow-enough environment. Just resort to suppress any error in such cases. Note to suppress them we need first to refactor a bit the code moving it to explicit error handling. Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/so_txtime.sh | 29 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh index 3f06f4d286a9..5e861ad32a42 100755 --- a/tools/testing/selftests/net/so_txtime.sh +++ b/tools/testing/selftests/net/so_txtime.sh @@ -5,6 +5,7 @@ set -e +readonly ksft_skip=4 readonly DEV="veth0" readonly BIN="./so_txtime" @@ -46,7 +47,7 @@ ip -netns "${NS2}" addr add 192.168.1.2/24 dev "${DEV}" ip -netns "${NS1}" addr add fd::1/64 dev "${DEV}" nodad ip -netns "${NS2}" addr add fd::2/64 dev "${DEV}" nodad -do_test() { +run_test() { local readonly IP="$1" local readonly CLOCK="$2" local readonly TXARGS="$3" @@ -64,12 +65,25 @@ do_test() { fi local readonly START="$(date +%s%N --date="+ 0.1 seconds")" + ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r & ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}" wait "$!" } +do_test() { + run_test $@ + [ $? -ne 0 ] && ret=1 +} + +do_fail_test() { + run_test $@ + [ $? -eq 0 ] && ret=1 +} + ip netns exec "${NS1}" tc qdisc add dev "${DEV}" root fq +set +e +ret=0 do_test 4 mono a,-1 a,-1 do_test 6 mono a,0 a,0 do_test 6 mono a,10 a,10 @@ -77,13 +91,20 @@ do_test 4 mono a,10,b,20 a,10,b,20 do_test 6 mono a,20,b,10 b,20,a,20 if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta 40; then - ! do_test 4 tai a,-1 a,-1 - ! do_test 6 tai a,0 a,0 + do_fail_test 4 tai a,-1 a,-1 + do_fail_test 6 tai a,0 a,0 do_test 6 tai a,10 a,10 do_test 4 tai a,10,b,20 a,10,b,20 do_test 6 tai a,20,b,10 b,10,a,20 else echo "tc ($(tc -V)) does not support qdisc etf. skipping" + [ $ret -eq 0 ] && ret=$ksft_skip fi -echo OK. All tests passed +if [ $ret -eq 0 ]; then + echo OK. All tests passed +elif [[ $ret -ne $ksft_skip && -n "$KSFT_MACHINE_SLOW" ]]; then + echo "Ignoring errors due to slow environment" 1>&2 + ret=0 +fi +exit $ret -- 2.43.0
[PATCH net 0/2] selftests: net: more pmtu.sh fixes
The mentioned test is still flaky, unusally enough in 'fast' environments. Patch 2/2 [try to] address the existing issues, while patch 1/2 introduces more strict tests for the existing net helpers, to hopefully prevent future pain. Paolo Abeni (2): selftests: net: more strict check in net_helper selftests: net: more pmtu.sh fixes tools/testing/selftests/net/net_helper.sh | 11 +++ tools/testing/selftests/net/pmtu.sh | 4 ++-- 2 files changed, 9 insertions(+), 6 deletions(-) -- 2.43.0
[PATCH net 1/2] selftests: net: more strict check in net_helper
The helper waiting for a listener port can match any socket whose hexadecimal representation of source or destination addresses matches that of the given port. Additionally, any socket state is accepted. All the above can let the helper return successfully before the relevant listener is actually ready, with unexpected results. So far I could not find any related failure in the netdev CI, but the next patch is going to make the critical event more easily reproducible. Address the issue matching the port hex only vs the relevant socket field and additionally checking the socket state for TCP sockets. Fixes: 3bdd9fd29cb0 ("selftests/net: synchronize udpgro tests' tx and rx connection") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/net_helper.sh | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/net_helper.sh b/tools/testing/selftests/net/net_helper.sh index 4fe0befa13fb..6596fe03c77f 100644 --- a/tools/testing/selftests/net/net_helper.sh +++ b/tools/testing/selftests/net/net_helper.sh @@ -8,13 +8,16 @@ wait_local_port_listen() local listener_ns="${1}" local port="${2}" local protocol="${3}" - local port_hex + local pattern local i - port_hex="$(printf "%04X" "${port}")" + pattern=":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ ${protocol} = "tcp" ] && pattern="${pattern}0A" for i in $(seq 10); do - if ip netns exec "${listener_ns}" cat /proc/net/"${protocol}"* | \ - grep -q "${port_hex}"; then + if ip netns exec "${listener_ns}" awk '{print $2" "$4}' \ + /proc/net/"${protocol}"* | grep -q "${pattern}"; then break fi sleep 0.1 -- 2.43.0
[PATCH net 2/2] selftests: net: more pmtu.sh fixes
The netdev CI is reporting failures for the pmtu test: [ 115.929264] br0: port 2(vxlan_a) entered forwarding state # 2024/02/08 17:33:22 socat[7871] E bind(7, {AF=10 [:::::::]:5}, 28): Address already in use # 2024/02/08 17:33:22 socat[7877] E write(7, 0x5598fb6ff000, 8192): Connection refused # TEST: IPv6, bridged vxlan4: PMTU exceptions [FAIL] # File size 0 mismatches exepcted value in locally bridged vxlan test The root cause is apparently a socket created by a previous iteration of the relevant loop still lasting in LAST_ACK state. Note that even the file size check is racy, the receiver process dumping the file could still be running in background Allow the listener to bound on the same local port via SO_REUSEADDR and collect file output file size only after the listener completion. Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp") Signed-off-by: Paolo Abeni --- tools/testing/selftests/net/pmtu.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index d65fdd407d73..cfc84958025a 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -1336,16 +1336,16 @@ test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() { else TCPDST="TCP:[${dst}]:5" fi - ${ns_b} socat -T 3 -u -6 TCP-LISTEN:5 STDOUT > $tmpoutfile & + ${ns_b} socat -T 3 -u -6 TCP-LISTEN:5,reuseaddr STDOUT > $tmpoutfile & local socat_pid=$! wait_local_port_listen ${NS_B} 5 tcp dd if=/dev/zero status=none bs=1M count=1 | ${target} socat -T 3 -u STDIN $TCPDST,connect-timeout=3 + wait ${socat_pid} size=$(du -sb $tmpoutfile) size=${size%%/tmp/*} - wait ${socat_pid} [ $size -ne 1048576 ] && err "File size $size mismatches exepcted value in locally bridged vxlan test" && return 1 done -- 2.43.0
Re: [RFC 4/7] selftests: openvswitch: delete previously allocated netns
On Fri, 2024-02-16 at 10:28 -0500, Aaron Conole wrote: > Many openvswitch test cases reused netns and interface names. This works > fine as long as the test case cleans up gracefully. However, if there is > some kind of ungraceful termination (such as an external signal) the netns > or interfaces can be left lingering. It looks the openvswitch.sh test script is already trying quite hard to delete the allocated resources on ungraceful termination via "trap...". That is usually enough for other self-tests, could you please detail when it fails here? > This happens when the selftest > timeout gets exceeded, while running under very slow debugging conditions. 'timeout' should send SIG_TERM, and the script already handle that gracefully? > The solution here is to cleanup the netns on executing the next test. I suggest avoiding this, it could end up killing innocent alias netns. You could consider using the 'setup_ns' helper from the tools/testing/selftests/net/lib.sh library to always generate unique netns names. > Signed-off-by: Aaron Conole > --- > tools/testing/selftests/net/openvswitch/openvswitch.sh | 4 > 1 file changed, 4 insertions(+) > > diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh > b/tools/testing/selftests/net/openvswitch/openvswitch.sh > index 678a72ad47c1..8dc315585710 100755 > --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh > +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh > @@ -115,6 +115,10 @@ ovs_netns_spawn_daemon() { > > ovs_add_netns_and_veths () { > info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}" > + ntns_e=`ip netns list | grep $3` > + [ "$ntns_e" != "" ] && ip netns del "$3" > + if4_e=`ip link show $4 2>/dev/null` Minor unrelated note: $() is preferable to `` for sub-shells, as it's more friendly to nesting, string expansing, quotes, etc. Cheers, Paolo
Re: [PATCH net 1/2] ioam6: fix write to cloned skb in ipv6_hop_ioam()
On Sat, 2024-02-17 at 00:43 +0100, Justin Iurman wrote: > ioam6_fill_trace_data() writes inside the skb payload without ensuring > it's writeable (e.g., not cloned). This function is called both from the > input and output path. The output path (ioam6_iptunnel) already does the > check. This commit provides a fix for the input path, inside > ipv6_hop_ioam(). > > Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace > ") > Reported-by: Paolo Abeni > Signed-off-by: Justin Iurman > --- > net/ipv6/exthdrs.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c > index 4952ae792450..f68e5faab3aa 100644 > --- a/net/ipv6/exthdrs.c > +++ b/net/ipv6/exthdrs.c > @@ -943,6 +943,14 @@ static bool ipv6_hop_ioam(struct sk_buff *skb, int > optoff) > if (!skb_valid_dst(skb)) > ip6_route_input(skb); > > + if (skb_cloned(skb)) { > + if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) > + goto drop; My personal preference would be for using skb_ensure_writable() here, with write_len == optoff + hdr->opt_len. > + > + hdr = (struct ioam6_hdr *)(skb_network_header(skb) + > optoff); > + trace = (struct ioam6_trace_hdr *)((u8 *)hdr + > sizeof(*hdr)); Note that this can potentially change the network header ptr and the caller - ip6_parse_tlv() - has cached such value in 'nh'. You also need to update ip6_parse_tlv() to reload such pointer. Side note: a bunch of self-tests are apparently stuck after this series. I think it's an unrelated problem. I'll try to have a better look. Cheers, Paolo
Re: [PATCH net 03/13] mptcp: fix lockless access in subflow ULP diag
On Mon, 2024-02-19 at 18:35 +0100, Eric Dumazet wrote: > On Mon, Feb 19, 2024 at 6:21 PM Eric Dumazet wrote: > > > > On Thu, Feb 15, 2024 at 7:25 PM Matthieu Baerts (NGI0) > > wrote: > > > > > > From: Paolo Abeni > > > > > > Since the introduction of the subflow ULP diag interface, the > > > dump callback accessed all the subflow data with lockless. > > > > > > We need either to annotate all the read and write operation accordingly, > > > or acquire the subflow socket lock. Let's do latter, even if slower, to > > > avoid a diffstat havoc. > > > > > > Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace") > > > Cc: sta...@vger.kernel.org > > > Signed-off-by: Paolo Abeni > > > Reviewed-by: Mat Martineau > > > Signed-off-by: Matthieu Baerts (NGI0) > > > --- > > > Notes: > > > - This patch modifies the existing ULP API. No better solutions have > > > been found for -net, and there is some similar prior art, see > > > commit 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info"). > > > > > > Please also note that TLS ULP Diag has likely the same issue. > > > To: Boris Pismenny > > > To: John Fastabend > > > --- > > > include/net/tcp.h | 2 +- > > > net/mptcp/diag.c | 6 +- > > > net/tls/tls_main.c | 2 +- > > > 3 files changed, 7 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/net/tcp.h b/include/net/tcp.h > > > index dd78a1181031..f6eba9652d01 100644 > > > --- a/include/net/tcp.h > > > +++ b/include/net/tcp.h > > > @@ -2506,7 +2506,7 @@ struct tcp_ulp_ops { > > > /* cleanup ulp */ > > > void (*release)(struct sock *sk); > > > /* diagnostic */ > > > - int (*get_info)(const struct sock *sk, struct sk_buff *skb); > > > + int (*get_info)(struct sock *sk, struct sk_buff *skb); > > > size_t (*get_info_size)(const struct sock *sk); > > > /* clone ulp */ > > > void (*clone)(const struct request_sock *req, struct sock *newsk, > > > diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c > > > index a536586742f2..e57c5f47f035 100644 > > > --- a/net/mptcp/diag.c > > > +++ b/net/mptcp/diag.c > > > @@ -13,17 +13,19 @@ > > > #include > > > #include "protocol.h" > > > > > > -static int subflow_get_info(const struct sock *sk, struct sk_buff *skb) > > > +static int subflow_get_info(struct sock *sk, struct sk_buff *skb) > > > { > > > struct mptcp_subflow_context *sf; > > > struct nlattr *start; > > > u32 flags = 0; > > > + bool slow; > > > int err; > > > > > > start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP); > > > if (!start) > > > return -EMSGSIZE; > > > > > > + slow = lock_sock_fast(sk); > > > rcu_read_lock(); > > > > I am afraid lockdep is not happy with this change. > > > > Paolo, we probably need the READ_ONCE() annotations after all. > > Or perhaps something like the following would be enough. > > diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c > index > 6ff6f14674aa2941bc04c680bacd9f79fc65060d..7017dd60659dc7133318c1c82e3f429bea3a5d57 > 100644 > --- a/net/mptcp/diag.c > +++ b/net/mptcp/diag.c > @@ -21,6 +21,9 @@ static int subflow_get_info(struct sock *sk, struct > sk_buff *skb) > bool slow; > int err; > > + if (inet_sk_state_load(sk) == TCP_LISTEN) > + return 0; > + > start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP); > if (!start) > return -EMSGSIZE; Thanks for the head-up. This later option looks preferable, to avoid quit a bit of noise with _ONCE annotation. Is there a syzkaller splat I could look at? if it landed on the ML, I missed it. Thanks! Paolo
Re: [PATCH net 03/13] mptcp: fix lockless access in subflow ULP diag
On Mon, 2024-02-19 at 19:33 +0100, Eric Dumazet wrote: > On Mon, Feb 19, 2024 at 7:04 PM Paolo Abeni wrote: > > Thanks for the head-up. This later option looks preferable, to avoid > > quit a bit of noise with _ONCE annotation. Is there a syzkaller splat I > > could look at? if it landed on the ML, I missed it. > > > > Not landed yet, here is the splat : > > == > WARNING: possible circular locking dependency detected > 6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted > -- > syz-executor.2/24141 is trying to acquire lock: > 888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: > tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] > 888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: > tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 > > but task is already holding lock: > c9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock > include/linux/spinlock.h:351 [inline] > c9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: > inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038 [Sorry for the latency]. Yes it looks like that checking the listener status will work. I can test and send the formal patch - with the due credits! - or do you prefer otherwise? Thanks! Paolo
Re: [PATCH net-next v3 00/11] selftests: kselftest_harness: support using xfail
On Tue, 2024-02-20 at 11:22 -0800, Jakub Kicinski wrote: > When running selftests for our subsystem in our CI we'd like all > tests to pass. Currently some tests use SKIP for cases they > expect to fail, because the kselftest_harness limits the return > codes to pass/fail/skip. > > Clean up and support the use of the full range of ksft exit codes > under kselftest_harness. > > Merge plan is to put it on top of -rc4 and merge into net-next. > That way others should be able to pull the patches without > any networking changes. > > v2: https://lore.kernel.org/all/20240216002619.1999225-1-k...@kernel.org/ > - fix alignment > follow up RFC: > https://lore.kernel.org/all/20240216004122.2004689-1-k...@kernel.org/ > v1: https://lore.kernel.org/all/20240213154416.422739-1-k...@kernel.org/ @Shuah: it's not clear to me if you prefer to take this series via the kselftests tree or we can take it via the net-next tree. Could you please advise? thanks! Paolo p.s. if this was already clarified in the past, I'm sorry: I lost track of it.
Re: [PATCH net-next v4 4/4] net: gro: move L3 flush checks to tcp_gro_receive
Hi, On Tue, 2024-03-26 at 16:02 +0100, Richard Gobert wrote: > This patch is meaningful by itself - removing checks against non-relevant > packets and making the flush/flush_id checks in a single place. I'm personally not sure this patch is a win. The code churn is significant. I understand this is for performance's sake, but I don't see the benefit??? The changelog shows that perf reports slightly lower figures for inet_gro_receive(). That is expected, as this patch move code out of such functio. What about inet_gro_flush()/tcp_gro_receive() where such code is moved? Additionally the reported deltas is within noise level according to my personal experience with similar tests. I think we are better off without this patch. Paolo
Re: [PATCH net-next v4 4/4] net: gro: move L3 flush checks to tcp_gro_receive
On Tue, 2024-03-26 at 18:25 +0100, Richard Gobert wrote: > Paolo Abeni wrote: > > Hi, > > > > On Tue, 2024-03-26 at 16:02 +0100, Richard Gobert wrote: > > > This patch is meaningful by itself - removing checks against non-relevant > > > packets and making the flush/flush_id checks in a single place. > > > > I'm personally not sure this patch is a win. The code churn is > > significant. I understand this is for performance's sake, but I don't > > see the benefit??? > > > > Could you clarify what do you mean by code churn? The diffstat of this patch is not negligible and touches very sensitive areas. > > he changelog shows that perf reports slightly lower figures for > > inet_gro_receive(). That is expected, as this patch move code out of > > such functio. What about inet_gro_flush()/tcp_gro_receive() where such > > code is moved? > > > > Please consider the following 2 common scenarios: > > 1) Multiple packets in the GRO bucket - the common case with multiple >packets in the bucket (i.e. running super_netperf TCP_STREAM) - each layer >executes a for loop - going over each packet in the bucket. Specifically, >L3 gro_receive loops over the bucket making flush,flush_id,is_atomic >checks. Only for packets with the same rx hash. > For most packets in the bucket, these checks are not >relevant. (possibly also dirtying cache lines with non-relevant p >packets). Removing code in the for loop for this case is significant. > > 2) UDP/TCP streams which do not coalesce in GRO. This is the common case >for regular UDP connections (i.e. running netperf UDP_STREAM). In this >case, GRO is just overhead. Removing any code from these layers >is good (shown in the first measurement of the commit message). If UDP GRO is not enabled, there are no UDP packet staging in the UDP gro engine, the bucket list is empty. > > Additionally the reported deltas is within noise level according to my > > personal experience with similar tests. > > > > I've tested the difference between net-next and this patch repetitively, > which showed stable results each time. Is there any specific test you > think would be helpful to show the result? Anything that show measurable gain. Reporting the CPU utilization in the inet_gro_receive() function alone is not enough, as part of the load has been moved into gro_network_flush()/tcp_gro_receive(). Cheers, Paolo
Re: [PATCH net-next v2 5/7] netdevsim: report stats by default, like a real device
On Tue, 2024-04-02 at 19:34 -0700, Jakub Kicinski wrote: > Real devices should implement qstats. Devices which support > pause or FEC configuration should also report the relevant stats. > > nsim was missing FEC stats completely, some of the qstats > and pause stats required toggling a debugfs knob. > > Note that the tests which used pause always initialize the setting > so they shouldn't be affected by the different starting value. > > Signed-off-by: Jakub Kicinski > --- > drivers/net/netdevsim/ethtool.c | 11 > drivers/net/netdevsim/netdev.c | 45 + > 2 files changed, 56 insertions(+) > > diff --git a/drivers/net/netdevsim/ethtool.c b/drivers/net/netdevsim/ethtool.c > index bd546d4d26c6..3f9c9327f149 100644 > --- a/drivers/net/netdevsim/ethtool.c > +++ b/drivers/net/netdevsim/ethtool.c > @@ -140,6 +140,13 @@ nsim_set_fecparam(struct net_device *dev, struct > ethtool_fecparam *fecparam) > return 0; > } > > +static void > +nsim_get_fec_stats(struct net_device *dev, struct ethtool_fec_stats > *fec_stats) > +{ > + fec_stats->corrected_blocks.total = 123; > + fec_stats->uncorrectable_blocks.total = 4; > +} > + > static int nsim_get_ts_info(struct net_device *dev, > struct ethtool_ts_info *info) > { > @@ -163,6 +170,7 @@ static const struct ethtool_ops nsim_ethtool_ops = { > .set_channels = nsim_set_channels, > .get_fecparam = nsim_get_fecparam, > .set_fecparam = nsim_set_fecparam, > + .get_fec_stats = nsim_get_fec_stats, > .get_ts_info= nsim_get_ts_info, > }; > > @@ -182,6 +190,9 @@ void nsim_ethtool_init(struct netdevsim *ns) > > nsim_ethtool_ring_init(ns); > > + ns->ethtool.pauseparam.report_stats_rx = true; > + ns->ethtool.pauseparam.report_stats_tx = true; > + > ns->ethtool.fec.fec = ETHTOOL_FEC_NONE; > ns->ethtool.fec.active_fec = ETHTOOL_FEC_NONE; > > diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c > index 8330bc0bcb7e..096ac0abbc02 100644 > --- a/drivers/net/netdevsim/netdev.c > +++ b/drivers/net/netdevsim/netdev.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -330,6 +331,49 @@ static const struct net_device_ops nsim_vf_netdev_ops = { > .ndo_set_features = nsim_set_features, > }; > > +/* We don't have true par-queue stats, yet, so do some random fakery here. */ > +static void nsim_get_queue_stats_rx(struct net_device *dev, int idx, > + struct netdev_queue_stats_rx *stats) > +{ > + struct rtnl_link_stats64 rtstats = {}; > + > + nsim_get_stats64(dev, &rtstats); > + > + stats->packets = rtstats.rx_packets - !!rtstats.rx_packets; > + stats->bytes = rtstats.rx_bytes; > +} > + > +static void nsim_get_queue_stats_tx(struct net_device *dev, int idx, > + struct netdev_queue_stats_tx *stats) > +{ > + struct rtnl_link_stats64 rtstats = {}; > + > + nsim_get_stats64(dev, &rtstats); > + > + stats->packets = rtstats.tx_packets - !!rtstats.tx_packets; > + stats->bytes = rtstats.tx_bytes; It looks the stats will not be self-consistent with multiple queues enabled. What about zeroing 'stats' when idx > 0 ? /P
Re: [PATCH v2 net-next 0/7] selftests: net: Switch pmtu.sh to use the internal ovs script.
On Mon, 2024-06-24 at 12:53 -0400, Aaron Conole wrote: > Aaron Conole writes: > > > Jakub Kicinski writes: > > > > > On Thu, 20 Jun 2024 08:55:54 -0400 Aaron Conole wrote: > > > > This series enhances the ovs-dpctl utility to provide support for set() > > > > and tunnel() flow specifiers, better ipv6 handling support, and the > > > > ability to add tunnel vports, and LWT interfaces. Finally, it modifies > > > > the pmtu.sh script to call the ovs-dpctl.py utility rather than the > > > > typical OVS userspace utilities. > > > > > > Thanks for the work! > > > > > > Looks like the series no longer applies because of other changes > > > to the kernel config. Before it stopped applying we got some runs, > > > here's what I see: > > > > > > https://netdev-3.bots.linux.dev/vmksft-net/results/648440/3-pmtu-sh/stdout > > > > > > # Cannot find device "ovs_br0" > > > # TEST: IPv4, OVS vxlan4: PMTU exceptions > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv4, OVS vxlan4: PMTU exceptions - nexthop objects > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv6, OVS vxlan4: PMTU exceptions > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv6, OVS vxlan4: PMTU exceptions - nexthop objects > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv4, OVS vxlan6: PMTU exceptions > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv4, OVS vxlan6: PMTU exceptions - nexthop objects > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv6, OVS vxlan6: PMTU exceptions > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv6, OVS vxlan6: PMTU exceptions - nexthop objects > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv4, OVS geneve4: PMTU exceptions > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv4, OVS geneve4: PMTU exceptions - nexthop objects > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv6, OVS geneve4: PMTU exceptions > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv6, OVS geneve4: PMTU exceptions - nexthop objects > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv4, OVS geneve6: PMTU exceptions > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv4, OVS geneve6: PMTU exceptions - nexthop objects > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > # TEST: IPv6, OVS geneve6: PMTU exceptions > > > [FAIL] > > > # Cannot find device "ovs_br0" > > > > > > Any idea why? Looks like kernel config did include OVS, perhaps we need > > > explicit modprobe now? I don't see any more details in the logs. > > > > Strange. I expected that the module should have automatically been > > loaded when attempting to communicate with the OVS genetlink family > > type. At least, that is how it had been working previously. > > > > I'll spend some time looking into it and resubmit a rebased version. > > Thanks, Jakub! > > If the ovs module isn't available, then I see: > > # ovs_bridge not supported > # TEST: IPv4, OVS vxlan4: PMTU exceptions [SKIP] > > But if it is available, I haven't been able to reproduce such ovs_br0 > setup failure - things work. I'm still wondering if the issue is Kconfig-related (plus possibly bad interaction with vng). I don't see the OVS knob enabled in the self- tests config. If it's implied by some other knob, and ends-up being selected as a module, vng could stumble upon loading the module at runtime, especially on incremental build (at least I experience that problem locally). I'm not even sure if the KCI is building incrementally or not, so all the above could is quite a wild guess. In any case I think adding the explicit CONFIG_OPENVSWITCH=y the selftest config would make the scenario more well defined. Cheers, Paolo
Re: [PATCH net-next v15 03/14] netdev: support binding dma-buf to netdevice
On Fri, 2024-06-28 at 00:32 +, Mina Almasry wrote: > +int net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, > +struct net_devmem_dmabuf_binding **out) > +{ > + struct net_devmem_dmabuf_binding *binding; > + static u32 id_alloc_next; > + struct scatterlist *sg; > + struct dma_buf *dmabuf; > + unsigned int sg_idx, i; > + unsigned long virtual; > + int err; > + > + dmabuf = dma_buf_get(dmabuf_fd); > + if (IS_ERR(dmabuf)) > + return -EBADFD; > + > + binding = kzalloc_node(sizeof(*binding), GFP_KERNEL, > +dev_to_node(&dev->dev)); > + if (!binding) { > + err = -ENOMEM; > + goto err_put_dmabuf; > + } > + > + binding->dev = dev; > + > + err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id, > + binding, xa_limit_32b, &id_alloc_next, > + GFP_KERNEL); > + if (err < 0) > + goto err_free_binding; > + > + xa_init_flags(&binding->bound_rxq_list, XA_FLAGS_ALLOC); > + > + refcount_set(&binding->ref, 1); > + > + binding->dmabuf = dmabuf; > + > + binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent); > + if (IS_ERR(binding->attachment)) { > + err = PTR_ERR(binding->attachment); > + goto err_free_id; > + } > + > + binding->sgt = > + dma_buf_map_attachment(binding->attachment, DMA_FROM_DEVICE); > + if (IS_ERR(binding->sgt)) { > + err = PTR_ERR(binding->sgt); > + goto err_detach; > + } > + > + /* For simplicity we expect to make PAGE_SIZE allocations, but the > + * binding can be much more flexible than that. We may be able to > + * allocate MTU sized chunks here. Leave that for future work... > + */ > + binding->chunk_pool = > + gen_pool_create(PAGE_SHIFT, dev_to_node(&dev->dev)); > + if (!binding->chunk_pool) { > + err = -ENOMEM; > + goto err_unmap; > + } > + > + virtual = 0; > + for_each_sgtable_dma_sg(binding->sgt, sg, sg_idx) { > + dma_addr_t dma_addr = sg_dma_address(sg); > + struct dmabuf_genpool_chunk_owner *owner; > + size_t len = sg_dma_len(sg); > + struct net_iov *niov; > + > + owner = kzalloc_node(sizeof(*owner), GFP_KERNEL, > + dev_to_node(&dev->dev)); I'm sorry for not catching this earlier, but it looks like the above allocation lacks a NULL check. Thanks, Paolo
Re: [PATCH net-next v15 03/14] netdev: support binding dma-buf to netdevice
On Tue, 2024-07-02 at 13:08 +0200, Paolo Abeni wrote: > On Fri, 2024-06-28 at 00:32 +, Mina Almasry wrote: > > +int net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, > > + struct net_devmem_dmabuf_binding **out) > > +{ > > + struct net_devmem_dmabuf_binding *binding; > > + static u32 id_alloc_next; > > + struct scatterlist *sg; > > + struct dma_buf *dmabuf; > > + unsigned int sg_idx, i; > > + unsigned long virtual; > > + int err; > > + > > + dmabuf = dma_buf_get(dmabuf_fd); > > + if (IS_ERR(dmabuf)) > > + return -EBADFD; > > + > > + binding = kzalloc_node(sizeof(*binding), GFP_KERNEL, > > + dev_to_node(&dev->dev)); > > + if (!binding) { > > + err = -ENOMEM; > > + goto err_put_dmabuf; > > + } > > + > > + binding->dev = dev; > > + > > + err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id, > > + binding, xa_limit_32b, &id_alloc_next, > > + GFP_KERNEL); > > + if (err < 0) > > + goto err_free_binding; > > + > > + xa_init_flags(&binding->bound_rxq_list, XA_FLAGS_ALLOC); > > + > > + refcount_set(&binding->ref, 1); > > + > > + binding->dmabuf = dmabuf; > > + > > + binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent); > > + if (IS_ERR(binding->attachment)) { > > + err = PTR_ERR(binding->attachment); > > + goto err_free_id; > > + } > > + > > + binding->sgt = > > + dma_buf_map_attachment(binding->attachment, DMA_FROM_DEVICE); > > + if (IS_ERR(binding->sgt)) { > > + err = PTR_ERR(binding->sgt); > > + goto err_detach; > > + } > > + > > + /* For simplicity we expect to make PAGE_SIZE allocations, but the > > +* binding can be much more flexible than that. We may be able to > > +* allocate MTU sized chunks here. Leave that for future work... > > +*/ > > + binding->chunk_pool = > > + gen_pool_create(PAGE_SHIFT, dev_to_node(&dev->dev)); > > + if (!binding->chunk_pool) { > > + err = -ENOMEM; > > + goto err_unmap; > > + } > > + > > + virtual = 0; > > + for_each_sgtable_dma_sg(binding->sgt, sg, sg_idx) { > > + dma_addr_t dma_addr = sg_dma_address(sg); > > + struct dmabuf_genpool_chunk_owner *owner; > > + size_t len = sg_dma_len(sg); > > + struct net_iov *niov; > > + > > + owner = kzalloc_node(sizeof(*owner), GFP_KERNEL, > > +dev_to_node(&dev->dev)); > > I'm sorry for not catching this earlier, but it looks like the above > allocation lacks a NULL check. FTR, given the size of the series, the number of iterations already done and the issue not preventing the functionality, I agree to merge the series as-is, and handle this with a follow-up. Thanks, Paolo
Re: [PATCH] selftests/net: missing close function in randomize function
On Fri, 2024-07-05 at 16:24 +0800, Liu Jing wrote: > in randomize function, there is a open function, but there is no > close function in the randomize, which is easy to cause memory leaks. Please not that the file descriptor is not leaked, the kernel will dispose it at process exit() time. > > Signed-off-by: Liu Jing > --- > tools/testing/selftests/net/tcp_mmap.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/testing/selftests/net/tcp_mmap.c > b/tools/testing/selftests/net/tcp_mmap.c > index 4fcce5150850..ab305e262d0a 100644 > --- a/tools/testing/selftests/net/tcp_mmap.c > +++ b/tools/testing/selftests/net/tcp_mmap.c > @@ -438,6 +438,7 @@ static void randomize(void *target, size_t count) > perror("read /dev/urandom"); > exit(1); > } > + close(urandom); > } > > int main(int argc, char *argv[]) This is outside any loop, so the overall effect WRT the self-test itself is not visible. I think we are better off without this kind of changes, unless they are part of a largish re-factor. Thanks, Paolo
Re: [PATCH net] selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
On 7/23/24 10:30, Nikolay Aleksandrov wrote: On 23/07/2024 11:22, Hangbin Liu wrote: If the testing kernel doesn't support setting fdb_max_learned or show fdb_n_learned, just skip it. Or we will get errors like ./bridge_fdb_learning_limit.sh: line 218: [: null: integer expression expected ./bridge_fdb_learning_limit.sh: line 225: [: null: integer expression expected Fixes: 6f84090333bb ("selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest") Signed-off-by: Hangbin Liu --- .../forwarding/bridge_fdb_learning_limit.sh| 18 ++ 1 file changed, 18 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/bridge_fdb_learning_limit.sh b/tools/testing/selftests/net/forwarding/bridge_fdb_learning_limit.sh index 0760a34b7114..a21b7085da2e 100755 --- a/tools/testing/selftests/net/forwarding/bridge_fdb_learning_limit.sh +++ b/tools/testing/selftests/net/forwarding/bridge_fdb_learning_limit.sh @@ -178,6 +178,22 @@ fdb_del() check_err $? "Failed to remove a FDB entry of type ${type}" } +check_fdb_n_learned_support() +{ + if ! ip link help bridge 2>&1 | grep -q "fdb_max_learned"; then + echo "SKIP: iproute2 too old, missing bridge max learned support" + exit $ksft_skip + fi + + ip link add dev br0 type bridge + local learned=$(fdb_get_n_learned) + ip link del dev br0 + if [ "$learned" == "null" ]; then + echo "SKIP: kernel too old; bridge fdb_n_learned feature not supported." + exit $ksft_skip + fi +} + check_accounting_one_type() { local type=$1 is_counted=$2 overrides_learned=$3 @@ -274,6 +290,8 @@ check_limit() done } +check_fdb_n_learned_support + trap cleanup EXIT setup_prepare Isn't the selftest supposed to be added after the feature was included? I don't understand why this one is special, we should have the same issue with all new features. I must admit I was surprised when I learned the fact, but the stable team routinely runs up2date upstream self-tests on top of stable/older kernels: https://lore.kernel.org/mptcp/zahlyvopeyghr...@kroah.com/ The expected self-test design is to probe the tested feature and skip if not available in the running kernel. The self-test should not break when run on an older kernel not offering such feature. I understand some (most?) of the self-tests do not cope with the above perfectly, but we can improve ;). Thanks, Paolo
Re: [PATCH net-next 0/7] net/selftests: TCP-AO selftests updates
On 7/30/24 04:12, Dmitry Safonov wrote: First 4 patches are more-or-less cleanups/preparations. Patch 5 was sent to me/contributed off-list by Mohammad, who wants 32-bit kernels to run TCP-AO. Patch 6 is a workaround/fix for slow VMs. Albeit, I can't reproduce the issue, but I hope it will fix netdev flakes for connect-deny-* tests. And the biggest change is adding TCP-AO tracepoints to selftests. I think it's a good addition by the following reasons: - The related tracepoints are now tested; - It allows tcp-ao selftests to raise expectations on the kernel behavior - up from the syscalls exit statuses + net counters. - Provides tracepoints usage samples. As tracepoints are not a stable ABI, any kernel changes done to them will be reflected to the selftests, which also will allow users to see how to change their code. It's quite better than parsing dmesg (what BGP was doing pre-tracepoints, ugh). Somewhat arguably, the code parses trace_pipe, rather than uses libtraceevent (which any sane user should do). The reason behind that is the same as for rt-netlink macros instead of libmnl: I'm trying to minimize the library dependencies of the selftests. And the performance of formatting text in kernel and parsing it again in a test is not critical. Current output sample: ok 73 Trace events matched expectations: 13 tcp_hash_md5_required[2] tcp_hash_md5_unexpected[4] tcp_hash_ao_required[3] tcp_ao_key_not_found[4] Previously, tracepoints selftests were part of kernel tcp tracepoints submission [1], but since then the code was quite changed: - Now generic tracing setup is in lib/ftrace.c, separate from lib/ftrace-tcp.c which utilizes TCP trace points. This separation allows future selftests to trace non-TCP events, i.e. to find out an skb's drop reason, which was useful in the creation of TCP-CLOSE stress-test (not in this patch set, but used in attempt to reproduce the issue from [2]). - Another change is that in the previous submission the trace events where used only to detect unexpected TCP-AO/TCP-MD5 events. In this version the selftests will fail if an expected trace event didn't appear. Let's see how reliable this is on the netdev bot - it obviously passes on my testing, but potentially may require a temporary XFAIL patch if it misbehaves on a slow VM. It looks like this is not well digested by the CI, e.g.: https://netdev.bots.linux.dev/flakes.html?tn-needle=tcp-ao https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/705502/8-restore-ipv4/stdout BTW wearing for a moment Cato the censor's shoes, I note that patch 1 && 2 commit messages are quite more informal and less informative than the average;) Thanks, Paolo
Re: [PATCH net-next 1/5] selftests: drv-net: define endpoint structures
On Fri, 2024-04-12 at 16:37 -0700, Jakub Kicinski wrote: > +class Endpoint: > +def __init__(self, name): > +self.name = name > +self._tmpdir = None > + > +def __del__(self): > +if self._tmpdir: > +self.cmd("rm -rf " + self._tmpdir) > +self._tmpdir = None > + > +def cmd(self, comm, *args): > +c = cmd("ssh " + self.name + " " + shlex.quote(comm), *args) > +return c.stdout, c.stderr, c.ret If I read correctly the above will do a full ssh handshake for each command. If the test script/setup is complex, I think/fear the overhead could become a bit cumbersome. Would using something alike Fabric to create a single connection at endpoint instantiation time and re-using it for all the command be too much? Thanks, Paolo
Re: [PATCH net-next 4/5] selftests: drv-net: construct environment for running tests which require an endpoint
On Fri, 2024-04-12 at 16:37 -0700, Jakub Kicinski wrote: > Nothing surprising here, hopefully. Wrap the variables from > the environment into a class or spawn a netdevsim based env > and pass it to the tests. > > Signed-off-by: Jakub Kicinski > --- > .../testing/selftests/drivers/net/README.rst | 31 +++ > .../selftests/drivers/net/lib/py/env.py | 93 ++- > .../testing/selftests/net/lib/py/__init__.py | 1 + > tools/testing/selftests/net/lib/py/netns.py | 31 +++ > 4 files changed, 155 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/net/lib/py/netns.py > > diff --git a/tools/testing/selftests/drivers/net/README.rst > b/tools/testing/selftests/drivers/net/README.rst > index 5ef7c417d431..ffc15fe5d555 100644 > --- a/tools/testing/selftests/drivers/net/README.rst > +++ b/tools/testing/selftests/drivers/net/README.rst > @@ -23,8 +23,39 @@ Variables can be set in the environment or by creating a > net.config ># Variable set in a file >NETIF=eth0 > > +Please note that the config parser is very simple, if there are > +any non-alphanumeric characters in the value it needs to be in > +double quotes. > + > NETIF > ~ > > Name of the netdevice against which the test should be executed. > When empty or not set software devices will be used. > + > +LOCAL_V4, LOCAL_V6, EP_V4, EP_V6 > + > + > +Local and remote (endpoint) IP addresses. > + > +EP_TYPE > +~~~ > + > +Communication method used to run commands on the endpoint. > +Test framework supports using ``netns`` and ``ssh`` channels. > +``netns`` assumes the "remote" interface is part of the same > +host, just moved to the specified netns. > +``ssh`` communicates with remote endpoint over ``ssh`` and ``scp``. > + > +Communication methods are defined by classes in ``lib/py/ep_{name}.py``. > +It should be possible to add a new method without modifying any of > +the framework, by simply adding an appropriately named file to ``lib/py``. > + > +EP_ARGS > +~~~ > + > +Arguments used to construct the communication channel. > +Communication channel dependent:: > + > + for netns - name of the "remote" namespace > + for ssh - name/address of the remote host > diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py > b/tools/testing/selftests/drivers/net/lib/py/env.py > index a081e168f3db..f63be0a72a53 100644 > --- a/tools/testing/selftests/drivers/net/lib/py/env.py > +++ b/tools/testing/selftests/drivers/net/lib/py/env.py > @@ -4,7 +4,8 @@ import os > import shlex > from pathlib import Path > from lib.py import ip > -from lib.py import NetdevSimDev > +from lib.py import NetNS, NetdevSimDev > +from .endpoint import Endpoint > > > def _load_env_file(src_path): > @@ -59,3 +60,93 @@ from lib.py import NetdevSimDev > self._ns = None > > > +class NetDrvEpEnv: > +""" > +Class for an environment with a local device and "remote endpoint" > +which can be used to send traffic in. > + > +For local testing it creates two network namespaces and a pair > +of netdevsim devices. > +""" > +def __init__(self, src_path): > + > +self.env = _load_env_file(src_path) > + > +# Things we try to destroy > +self.endpoint = None > +# These are for local testing state > +self._netns = None > +self._ns = None > +self._ns_peer = None > + > +if "NETIF" in self.env: > +self.dev = ip("link show dev " + self.env['NETIF'], json=True)[0] > + > +self.v4 = self.env.get("LOCAL_V4") > +self.v6 = self.env.get("LOCAL_V6") > +self.ep_v4 = self.env.get("EP_V4") > +self.ep_v6 = self.env.get("EP_V6") > +ep_type = self.env["EP_TYPE"] > +ep_args = self.env["EP_ARGS"] > +else: > +self.create_local() > + > +self.dev = self._ns.nsims[0].dev > + > +self.v4 = "192.0.2.1" > +self.v6 ="0100::1" Minor nit, what about using 2001:db8:, for more consistency with existing 'net' self-tests? Also +1 on Willem suggestion to possibly have both endpoints remote. (Very cool stuff, ça va sans dire ;) Thanks, Paolo
Re: [PATCH net-next 5/5] selftests: drv-net: add a trivial ping test
On Fri, 2024-04-12 at 16:37 -0700, Jakub Kicinski wrote: > +def ping_v4(cfg) -> None: > +if not cfg.v4: > +raise KsftXfailEx() > + > +cmd(f"ping -c 1 -W0.5 {cfg.ep_v4}") > +cmd(f"ping -c 1 -W0.5 {cfg.v4}", host=cfg.endpoint) Very minor nit, I personally find a bit more readable: cfg.endpoint.cmd() Which is already supported by the current infra, right? With both endpoint possibly remote could be: cfg.ep1.cmd() cfg.ep2.cmd() Thanks! Paolo
Re: [PATCH net-next 1/5] selftests: drv-net: define endpoint structures
On Mon, 2024-04-15 at 07:19 -0700, Jakub Kicinski wrote: > On Mon, 15 Apr 2024 10:57:31 +0200 Paolo Abeni wrote: > > If I read correctly the above will do a full ssh handshake for each > > command. If the test script/setup is complex, I think/fear the overhead > > could become a bit cumbersome. > > Connection reuse. I wasn't sure if I should add a hint to the README, > let me do so. > > > Would using something alike Fabric to create a single connection at > > endpoint instantiation time and re-using it for all the command be too > > much? > > IDK what "Fabric" is, if its commonly used we can add the option > in tree. If less commonly - I hope the dynamic loading scheme > will allow users to very easily drop in their own class that > integrates with Fabric, without dirtying the tree? :) I'm really a python-expert. 'Fabric' a python library to execute commands over ssh: https://www.fabfile.org/ > No idea how much commont it is. I'm fine with ssh connection sharing. Thanks, Paolo
Re: [PATCH net-next 5/5] selftests: drv-net: add a trivial ping test
On Mon, 2024-04-15 at 07:33 -0700, Jakub Kicinski wrote: > On Mon, 15 Apr 2024 11:31:05 +0200 Paolo Abeni wrote: > > On Fri, 2024-04-12 at 16:37 -0700, Jakub Kicinski wrote: > > > +def ping_v4(cfg) -> None: > > > +if not cfg.v4: > > > +raise KsftXfailEx() > > > + > > > +cmd(f"ping -c 1 -W0.5 {cfg.ep_v4}") > > > +cmd(f"ping -c 1 -W0.5 {cfg.v4}", host=cfg.endpoint) > > > > Very minor nit, I personally find a bit more readable: > > > > cfg.endpoint.cmd() > > > > Which is already supported by the current infra, right? > > > > With both endpoint possibly remote could be: > > > > cfg.ep1.cmd() > > cfg.ep2.cmd() > > As I said in the cover letter, I don't want to push us too much towards > classes. The argument format make local and local+remote tests look more > similar. I guess it's a matter of personal preferences. I know mine are usually quite twisted ;) I'm fine with either syntax. Cheers, Paolo
Re: [PATCH net-next 1/5] selftests: drv-net: define endpoint structures
On Mon, 2024-04-15 at 07:19 -0700, Jakub Kicinski wrote: > On Mon, 15 Apr 2024 10:57:31 +0200 Paolo Abeni wrote: > > If I read correctly the above will do a full ssh handshake for each > > command. If the test script/setup is complex, I think/fear the overhead > > could become a bit cumbersome. > > Connection reuse. I wasn't sure if I should add a hint to the README, > let me do so. I'm sorry for the multiple, incremental feedback. I think such hinto the readme will be definitely useful, thanks! Paolo
Re: [PATCH net-next v7 2/3] net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment
On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote: > {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, > iph->id, ...) against all packets in a loop. These flush checks are used > currently in all tcp flows and in some UDP flows in GRO. > > These checks need to be done only once and only against the found p skb, > since they only affect flush and not same_flow. > > Leveraging the previous commit in the series, in which correct network > header offsets are saved for both outer and inner network headers - > allowing these checks to be done only once, in tcp_gro_receive and > udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at > all. In addition, flush_id checks are more declarative and contained in > inet_gro_flush, thus removing the need for flush_id in napi_gro_cb. > > This results in less parsing code for UDP flows and non-loop flush tests > for TCP flows. > > To make sure results are not within noise range - I've made netfilter drop > all TCP packets, and measured CPU performance in GRO (in this case GRO is > responsible for about 50% of the CPU utilization). > > L3 flush/flush_id checks are not relevant to UDP connections where > skb_gro_receive_list is called. The only code change relevant to this flow > is inet_gro_receive. The rest of the code parsing this flow stays the > same. > > All concurrent connections tested are with the same ip srcaddr and > dstaddr. > > perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow): > net-next: > 3.03% [kernel] [k] inet_gro_receive > > patch applied: > 2.78% [kernel] [k] inet_gro_receive Why there are no figures for udp_gro_receive_segment()/gro_network_flush() here? Also you should be able to observer a very high amount of CPU usage by GRO even with TCP with very high speed links, keeping the BH/GRO on a CPU and the user-space/data copy on a different one (or using rx zero copy). Thanks, Paolo
Re: [PATCH net-next v7 1/3] net: gro: add {inner_}network_offset to napi_gro_cb
On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote: > This patch adds network_offset and inner_network_offset to napi_gro_cb, and > makes sure both are set correctly. In the common path there's only one > write (skb_gro_reset_offset). > > Signed-off-by: Richard Gobert Does not apply cleanly to net-next. You have to wait until the net dependency is merged into net-next before posting. > --- > drivers/net/geneve.c | 1 + > drivers/net/vxlan/vxlan_core.c | 1 + > include/net/gro.h | 18 -- > net/8021q/vlan_core.c | 2 ++ > net/core/gro.c | 1 + > net/ethernet/eth.c | 1 + > net/ipv4/af_inet.c | 5 + > net/ipv4/gre_offload.c | 1 + > net/ipv6/ip6_offload.c | 8 > 9 files changed, 28 insertions(+), 10 deletions(-) > > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c > index 9c18a39b0d0c..a6256ea1f5bc 100644 > --- a/drivers/net/geneve.c > +++ b/drivers/net/geneve.c > @@ -545,6 +545,7 @@ static struct sk_buff *geneve_gro_receive(struct sock *sk, > if (!ptype) > goto out; > > + NAPI_GRO_CB(skb)->inner_network_offset = hlen; > pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb); > flush = 0; > > diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c > index 6fb182d9d6e7..9fb93c3953c1 100644 > --- a/drivers/net/vxlan/vxlan_core.c > +++ b/drivers/net/vxlan/vxlan_core.c > @@ -754,6 +754,7 @@ static struct sk_buff *vxlan_gpe_gro_receive(struct sock > *sk, > > vh = vxlan_gro_prepare_receive(sk, head, skb, &grc); > if (vh) { > + NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb); > if (!vxlan_parse_gpe_proto(vh, &protocol)) > goto out; > ptype = gro_find_receive_by_type(protocol); What about vxlan_gro_receive? and fou/gue? Side note: the latter apparently exist mainly to make UDP-related changes more difficult, can we deprecated them once for all? Thank, Paolo
Re: [PATCH net-next v7 2/3] net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment
On Tue, 2024-04-16 at 11:21 +0200, Paolo Abeni wrote: > On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote: > > {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, > > iph->id, ...) against all packets in a loop. These flush checks are used > > currently in all tcp flows and in some UDP flows in GRO. > > > > These checks need to be done only once and only against the found p skb, > > since they only affect flush and not same_flow. > > > > Leveraging the previous commit in the series, in which correct network > > header offsets are saved for both outer and inner network headers - > > allowing these checks to be done only once, in tcp_gro_receive and > > udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at > > all. In addition, flush_id checks are more declarative and contained in > > inet_gro_flush, thus removing the need for flush_id in napi_gro_cb. > > > > This results in less parsing code for UDP flows and non-loop flush tests > > for TCP flows. > > > > To make sure results are not within noise range - I've made netfilter drop > > all TCP packets, and measured CPU performance in GRO (in this case GRO is > > responsible for about 50% of the CPU utilization). > > > > L3 flush/flush_id checks are not relevant to UDP connections where > > skb_gro_receive_list is called. The only code change relevant to this flow > > is inet_gro_receive. The rest of the code parsing this flow stays the > > same. > > > > All concurrent connections tested are with the same ip srcaddr and > > dstaddr. > > > > perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow): > > net-next: > > 3.03% [kernel] [k] inet_gro_receive > > > > patch applied: > > 2.78% [kernel] [k] inet_gro_receive > > Why there are no figures for > udp_gro_receive_segment()/gro_network_flush() here? > > Also you should be able to observer a very high amount of CPU usage by > GRO even with TCP with very high speed links, keeping the BH/GRO on a > CPU and the user-space/data copy on a different one (or using rx zero > copy). To be more explicit: I think at least the above figures are required, and I still fear the real gain in that case would range from zero to negative. If you can't do the TCP part of the testing, please provide at least the figures for a single UDP flow, that should give more indication WRT the result we can expect with TCP. Note that GRO is used mainly by TCP and TCP packets with different src/dst port will land into different GRO hash buckets, having different RX hash. That will happen even for UDP, at least for some (most?) nics include the UDP ports in the RX hash. Thanks, Paolo