Re: [PATCH 065/141] airo: Fix fall-through warnings for Clang
"Gustavo A. R. Silva" wrote: > In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning > by explicitly adding a break statement instead of letting the code fall > through to the next case. > > Link: https://github.com/KSPP/linux/issues/115 > Signed-off-by: Gustavo A. R. Silva 4 patches applied to wireless-drivers-next.git, thanks. 48264b23fade airo: Fix fall-through warnings for Clang f48d7dccb3e4 rt2x00: Fix fall-through warnings for Clang 0662fbebf4fb rtw88: Fix fall-through warnings for Clang 18572b0b5493 zd1201: Fix fall-through warnings for Clang -- https://patchwork.kernel.org/project/linux-wireless/patch/b3c0f74f5b6e6bff9f1609b310319b6fdd9ee205.1605896059.git.gustavo...@kernel.org/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH v1 bpf-next 03/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.
On Tue, Dec 08, 2020 at 03:27:14PM +0900, Kuniyuki Iwashima wrote: > From: Martin KaFai Lau > Date: Mon, 7 Dec 2020 12:14:38 -0800 > > On Sun, Dec 06, 2020 at 01:03:07AM +0900, Kuniyuki Iwashima wrote: > > > From: Martin KaFai Lau > > > Date: Fri, 4 Dec 2020 17:42:41 -0800 > > > > On Tue, Dec 01, 2020 at 11:44:10PM +0900, Kuniyuki Iwashima wrote: > > > > [ ... ] > > > > > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c > > > > > index fd133516ac0e..60d7c1f28809 100644 > > > > > --- a/net/core/sock_reuseport.c > > > > > +++ b/net/core/sock_reuseport.c > > > > > @@ -216,9 +216,11 @@ int reuseport_add_sock(struct sock *sk, struct > > > > > sock *sk2, bool bind_inany) > > > > > } > > > > > EXPORT_SYMBOL(reuseport_add_sock); > > > > > > > > > > -void reuseport_detach_sock(struct sock *sk) > > > > > +struct sock *reuseport_detach_sock(struct sock *sk) > > > > > { > > > > > struct sock_reuseport *reuse; > > > > > + struct bpf_prog *prog; > > > > > + struct sock *nsk = NULL; > > > > > int i; > > > > > > > > > > spin_lock_bh(&reuseport_lock); > > > > > @@ -242,8 +244,12 @@ void reuseport_detach_sock(struct sock *sk) > > > > > > > > > > reuse->num_socks--; > > > > > reuse->socks[i] = reuse->socks[reuse->num_socks]; > > > > > + prog = rcu_dereference(reuse->prog); > > > > Is it under rcu_read_lock() here? > > > > > > reuseport_lock is locked in this function, and we do not modify the prog, > > > but is rcu_dereference_protected() preferable? > > > > > > ---8<--- > > > prog = rcu_dereference_protected(reuse->prog, > > >lockdep_is_held(&reuseport_lock)); > > > ---8<--- > > It is not only reuse->prog. Other things also require rcu_read_lock(), > > e.g. please take a look at __htab_map_lookup_elem(). > > > > The TCP_LISTEN sk (selected by bpf to be the target of the migration) > > is also protected by rcu. > > Thank you, I will use rcu_read_lock() and rcu_dereference() in v3 patchset. > > > > I am surprised there is no WARNING in the test. > > Do you have the needed DEBUG_LOCK* config enabled? > > Yes, DEBUG_LOCK* was 'y', but rcu_dereference() without rcu_read_lock() > does not show warnings... I would at least expect the "WARN_ON_ONCE(!rcu_read_lock_held() ...)" from __htab_map_lookup_elem() should fire in your test example in the last patch. It is better to check the config before sending v3. [ ... ] > > > > > diff --git a/net/ipv4/inet_connection_sock.c > > > > > b/net/ipv4/inet_connection_sock.c > > > > > index 1451aa9712b0..b27241ea96bd 100644 > > > > > --- a/net/ipv4/inet_connection_sock.c > > > > > +++ b/net/ipv4/inet_connection_sock.c > > > > > @@ -992,6 +992,36 @@ struct sock *inet_csk_reqsk_queue_add(struct > > > > > sock *sk, > > > > > } > > > > > EXPORT_SYMBOL(inet_csk_reqsk_queue_add); > > > > > > > > > > +void inet_csk_reqsk_queue_migrate(struct sock *sk, struct sock *nsk) > > > > > +{ > > > > > + struct request_sock_queue *old_accept_queue, *new_accept_queue; > > > > > + > > > > > + old_accept_queue = &inet_csk(sk)->icsk_accept_queue; > > > > > + new_accept_queue = &inet_csk(nsk)->icsk_accept_queue; > > > > > + > > > > > + spin_lock(&old_accept_queue->rskq_lock); > > > > > + spin_lock(&new_accept_queue->rskq_lock); > > > > I am also not very thrilled on this double spin_lock. > > > > Can this be done in (or like) inet_csk_listen_stop() instead? > > > > > > It will be possible to migrate sockets in inet_csk_listen_stop(), but I > > > think it is better to do it just after reuseport_detach_sock() becuase we > > > can select a different listener (almost) every time at a lower cost by > > > selecting the moved socket and pass it to inet_csk_reqsk_queue_migrate() > > > easily. > > I don't see the "lower cost" point. Please elaborate. > > In reuseport_select_sock(), we pass sk_hash of the request socket to > reciprocal_scale() and generate a random index for socks[] to select > a different listener every time. > On the other hand, we do not have request sockets in unhash path and > sk_hash of the listener is always 0, so we have to generate a random number > in another way. In reuseport_detach_sock(), we can use the index of the > moved socket, but we do not have it in inet_csk_listen_stop(), so we have > to generate a random number in inet_csk_listen_stop(). > I think it is at lower cost to use the index of the moved socket. Generate a random number is not a big deal for the migration code path. Also, I really still failed to see a particular way that the kernel pick will help in the migration case. The kernel has no clue on how to select the right process to migrate to without a proper policy signal from the user. They are all as bad as a random pick. I am not sure this migration feature is even useful if there is no bpf prog attached to define the policy. That said, if it is still desired to do a random pick by kernel when there is no
Re: [PATCH 00/20] ethernet: ucc_geth: assorted fixes and simplifications
On 08/12/2020 04.07, Qiang Zhao wrote: > On 06/12/2020 05:12, Rasmus Villemoes wrote: > >> I think patch 2 is a bug fix as well, but I'd like someone from NXP to >> comment. > > It 's ok for me. I was hoping for something a bit more than that. Can you please go check with the people who made the hardware and those who wrote the manual (probably not the same ones) what is actually up and down, and then report on what they said. It's fairly obvious that allocating 192 bytes instead of 128 should never hurt (unless we run out of muram), but it would be nice with an official "Yes, table 8-111 is wrong, it should say 192", or alternatively, "No, table 8-53 is wrong, those MTU etc. fields don't really exist". Extra points for providing details such as "first revision of the IP had $foo, but that was never shipped in real products, then $bar was changed", etc. Thanks, Rasmus
Re: [PATCH v1 bpf-next 03/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.
From: Martin KaFai Lau Date: Mon, 7 Dec 2020 23:34:41 -0800 > On Tue, Dec 08, 2020 at 03:31:34PM +0900, Kuniyuki Iwashima wrote: > > From: Martin KaFai Lau > > Date: Mon, 7 Dec 2020 12:33:15 -0800 > > > On Thu, Dec 03, 2020 at 11:14:24PM +0900, Kuniyuki Iwashima wrote: > > > > From: Eric Dumazet > > > > Date: Tue, 1 Dec 2020 16:25:51 +0100 > > > > > On 12/1/20 3:44 PM, Kuniyuki Iwashima wrote: > > > > > > This patch lets reuseport_detach_sock() return a pointer of struct > > > > > > sock, > > > > > > which is used only by inet_unhash(). If it is not NULL, > > > > > > inet_csk_reqsk_queue_migrate() migrates TCP_ESTABLISHED/TCP_SYN_RECV > > > > > > sockets from the closing listener to the selected one. > > > > > > > > > > > > Listening sockets hold incoming connections as a linked list of > > > > > > struct > > > > > > request_sock in the accept queue, and each request has reference to > > > > > > a full > > > > > > socket and its listener. In inet_csk_reqsk_queue_migrate(), we only > > > > > > unlink > > > > > > the requests from the closing listener's queue and relink them to > > > > > > the head > > > > > > of the new listener's queue. We do not process each request and its > > > > > > reference to the listener, so the migration completes in O(1) time > > > > > > complexity. However, in the case of TCP_SYN_RECV sockets, we take > > > > > > special > > > > > > care in the next commit. > > > > > > > > > > > > By default, the kernel selects a new listener randomly. In order to > > > > > > pick > > > > > > out a different socket every time, we select the last element of > > > > > > socks[] as > > > > > > the new listener. This behaviour is based on how the kernel moves > > > > > > sockets > > > > > > in socks[]. (See also [1]) > > > > > > > > > > > > Basically, in order to redistribute sockets evenly, we have to use > > > > > > an eBPF > > > > > > program called in the later commit, but as the side effect of such > > > > > > default > > > > > > selection, the kernel can redistribute old requests evenly to new > > > > > > listeners > > > > > > for a specific case where the application replaces listeners by > > > > > > generations. > > > > > > > > > > > > For example, we call listen() for four sockets (A, B, C, D), and > > > > > > close the > > > > > > first two by turns. The sockets move in socks[] like below. > > > > > > > > > > > > socks[0] : A <-. socks[0] : D socks[0] : D > > > > > > socks[1] : B | => socks[1] : B <-. => socks[1] : C > > > > > > socks[2] : C | socks[2] : C --' > > > > > > socks[3] : D --' > > > > > > > > > > > > Then, if C and D have newer settings than A and B, and each socket > > > > > > has a > > > > > > request (a, b, c, d) in their accept queue, we can redistribute old > > > > > > requests evenly to new listeners. > > > > > > > > > > > > socks[0] : A (a) <-. socks[0] : D (a + d) socks[0] : D > > > > > > (a + d) > > > > > > socks[1] : B (b) | => socks[1] : B (b) <-. => socks[1] : C > > > > > > (b + c) > > > > > > socks[2] : C (c) | socks[2] : C (c) --' > > > > > > socks[3] : D (d) --' > > > > > > > > > > > > Here, (A, D) or (B, C) can have different application settings, but > > > > > > they > > > > > > MUST have the same settings at the socket API level; otherwise, > > > > > > unexpected > > > > > > error may happen. For instance, if only the new listeners have > > > > > > TCP_SAVE_SYN, old requests do not have SYN data, so the application > > > > > > will > > > > > > face inconsistency and cause an error. > > > > > > > > > > > > Therefore, if there are different kinds of sockets, we must attach > > > > > > an eBPF > > > > > > program described in later commits. > > > > > > > > > > > > Link: > > > > > > https://lore.kernel.org/netdev/CAEfhGiyG8Y_amDZ2C8dQoQqjZJMHjTY76b=KBkTKcBtA=dh...@mail.gmail.com/ > > > > > > Reviewed-by: Benjamin Herrenschmidt > > > > > > Signed-off-by: Kuniyuki Iwashima > > > > > > --- > > > > > > include/net/inet_connection_sock.h | 1 + > > > > > > include/net/sock_reuseport.h | 2 +- > > > > > > net/core/sock_reuseport.c | 10 +- > > > > > > net/ipv4/inet_connection_sock.c| 30 > > > > > > ++ > > > > > > net/ipv4/inet_hashtables.c | 9 +++-- > > > > > > 5 files changed, 48 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/include/net/inet_connection_sock.h > > > > > > b/include/net/inet_connection_sock.h > > > > > > index 7338b3865a2a..2ea2d743f8fc 100644 > > > > > > --- a/include/net/inet_connection_sock.h > > > > > > +++ b/include/net/inet_connection_sock.h > > > > > > @@ -260,6 +260,7 @@ struct dst_entry > > > > > > *inet_csk_route_child_sock(const struct sock *sk, > > > > > > struct sock *inet_csk_reqsk_queue_add(struct sock *sk, > > > > > > struct request_sock *req, > > > > > > struct
[PATCHv3 bpf-next] samples/bpf: add xdp program on egress for xdp_redirect_map
This patch add a xdp program on egress to show that we can modify the packet on egress. In this sample we will set the pkt's src mac to egress's mac address. The xdp_prog will be attached when -X option supplied. Signed-off-by: Hangbin Liu --- v3: a) modify the src mac address based on egress mac v2: a) use pkt counter instead of IP ttl modification on egress program b) make the egress program selectable by option -X --- samples/bpf/xdp_redirect_map_kern.c | 60 ++- samples/bpf/xdp_redirect_map_user.c | 153 2 files changed, 168 insertions(+), 45 deletions(-) diff --git a/samples/bpf/xdp_redirect_map_kern.c b/samples/bpf/xdp_redirect_map_kern.c index 6489352ab7a4..6b2164722649 100644 --- a/samples/bpf/xdp_redirect_map_kern.c +++ b/samples/bpf/xdp_redirect_map_kern.c @@ -19,12 +19,22 @@ #include #include +/* The 2nd xdp prog on egress does not support skb mode, so we define two + * maps, tx_port_general and tx_port_native. + */ struct { __uint(type, BPF_MAP_TYPE_DEVMAP); __uint(key_size, sizeof(int)); __uint(value_size, sizeof(int)); __uint(max_entries, 100); -} tx_port SEC(".maps"); +} tx_port_general SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(struct bpf_devmap_val)); + __uint(max_entries, 100); +} tx_port_native SEC(".maps"); /* Count RX packets, as XDP bpf_prog doesn't get direct TX-success * feedback. Redirect TX errors can be caught via a tracepoint. @@ -36,6 +46,14 @@ struct { __uint(max_entries, 1); } rxcnt SEC(".maps"); +/* map to stroe egress interface mac address */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, u32); + __type(value, __be64); + __uint(max_entries, 1); +} tx_mac SEC(".maps"); + static void swap_src_dst_mac(void *data) { unsigned short *p = data; @@ -52,17 +70,16 @@ static void swap_src_dst_mac(void *data) p[5] = dst[2]; } -SEC("xdp_redirect_map") -int xdp_redirect_map_prog(struct xdp_md *ctx) +static int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map) { void *data_end = (void *)(long)ctx->data_end; void *data = (void *)(long)ctx->data; struct ethhdr *eth = data; int rc = XDP_DROP; - int vport, port = 0, m = 0; long *value; u32 key = 0; u64 nh_off; + int vport; nh_off = sizeof(*eth); if (data + nh_off > data_end) @@ -79,7 +96,40 @@ int xdp_redirect_map_prog(struct xdp_md *ctx) swap_src_dst_mac(data); /* send packet out physical port */ - return bpf_redirect_map(&tx_port, vport, 0); + return bpf_redirect_map(redirect_map, vport, 0); +} + +SEC("xdp_redirect_general") +int xdp_redirect_map_general(struct xdp_md *ctx) +{ + return xdp_redirect_map(ctx, &tx_port_general); +} + +SEC("xdp_redirect_native") +int xdp_redirect_map_native(struct xdp_md *ctx) +{ + return xdp_redirect_map(ctx, &tx_port_native); +} + +SEC("xdp_devmap/map_prog") +int xdp_redirect_map_egress(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + struct ethhdr *eth = data; + __be64 *mac; + u32 key = 0; + u64 nh_off; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return XDP_DROP; + + mac = bpf_map_lookup_elem(&tx_mac, &key); + if (mac) + __builtin_memcpy(eth->h_source, mac, ETH_ALEN); + + return XDP_PASS; } /* Redirect require an XDP bpf_prog loaded on the TX device */ diff --git a/samples/bpf/xdp_redirect_map_user.c b/samples/bpf/xdp_redirect_map_user.c index 31131b6e7782..19636045c8dc 100644 --- a/samples/bpf/xdp_redirect_map_user.c +++ b/samples/bpf/xdp_redirect_map_user.c @@ -14,6 +14,10 @@ #include #include #include +#include +#include +#include +#include #include "bpf_util.h" #include @@ -21,7 +25,8 @@ static int ifindex_in; static int ifindex_out; -static bool ifindex_out_xdp_dummy_attached = true; +static bool ifindex_out_xdp_dummy_attached = false; +static bool xdp_devmap_attached = false; static __u32 prog_id; static __u32 dummy_prog_id; @@ -83,6 +88,29 @@ static void poll_stats(int interval, int ifindex) } } +static int get_mac_addr(unsigned int ifindex_out, void *mac_addr) +{ + struct ifreq ifr; + char ifname[IF_NAMESIZE]; + int fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); + + if (fd < 0) + return -1; + + if (!if_indextoname(ifindex_out, ifname)) + return -1; + + strcpy(ifr.ifr_name, ifname); + + if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0) + return -1; + + memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char)); + close(fd); + + return 0; +} + static void usage(const char *prog) { fprintf(stderr, @@ -
Re: [PATCH] net: rmnet: Adjust virtual device MTU on real device capability
What about just returning an error on NETDEV_PRECHANGEMTU notification to prevent real device MTU change while virtual rmnet devices are linked? Not sure there is a more proper and thread safe way to manager that otherwise. Can't you copy what vlan devices do? That'd seem like a reasonable and well tested precedent, no? Could you try this patch. I've tried addressing most of the conditions here. I haven't seen any issues with updating the MTU when rmnet devices are linked. diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c index fcdecdd..8d51b0c 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c @@ -26,7 +26,7 @@ static int rmnet_is_real_dev_registered(const struct net_device *real_dev) } /* Needs rtnl lock */ -static struct rmnet_port* +struct rmnet_port* rmnet_get_port_rtnl(const struct net_device *real_dev) { return rtnl_dereference(real_dev->rx_handler_data); @@ -253,7 +253,10 @@ static int rmnet_config_notify_cb(struct notifier_block *nb, netdev_dbg(real_dev, "Kernel unregister\n"); rmnet_force_unassociate_device(real_dev); break; - + case NETDEV_CHANGEMTU: + if (rmnet_vnd_validate_real_dev_mtu(real_dev)) + return NOTIFY_BAD; + break; default: break; } @@ -329,9 +332,17 @@ static int rmnet_changelink(struct net_device *dev, struct nlattr *tb[], if (data[IFLA_RMNET_FLAGS]) { struct ifla_rmnet_flags *flags; + u32 old_data_format; + old_data_format = port->data_format; flags = nla_data(data[IFLA_RMNET_FLAGS]); port->data_format = flags->flags & flags->mask; + + if (rmnet_vnd_update_dev_mtu(port, real_dev)) { + port->data_format = old_data_format; + NL_SET_ERR_MSG_MOD(extack, "Invalid MTU on real dev"); + return -EINVAL; + } } return 0; diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h index be51598..8d8d469 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h @@ -73,4 +73,6 @@ int rmnet_add_bridge(struct net_device *rmnet_dev, struct netlink_ext_ack *extack); int rmnet_del_bridge(struct net_device *rmnet_dev, struct net_device *slave_dev); +struct rmnet_port* +rmnet_get_port_rtnl(const struct net_device *real_dev); #endif /* _RMNET_CONFIG_H_ */ diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c index d58b51d..df87883 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c @@ -58,9 +58,30 @@ static netdev_tx_t rmnet_vnd_start_xmit(struct sk_buff *skb, return NETDEV_TX_OK; } +static int rmnet_vnd_headroom(struct net_device *real_dev) +{ + struct rmnet_port *port; + u32 headroom; + + port = rmnet_get_port_rtnl(real_dev); + + headroom = sizeof(struct rmnet_map_header); + + if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV4) + headroom += sizeof(struct rmnet_map_dl_csum_trailer); + + return headroom; +} + static int rmnet_vnd_change_mtu(struct net_device *rmnet_dev, int new_mtu) { - if (new_mtu < 0 || new_mtu > RMNET_MAX_PACKET_SIZE) + struct rmnet_priv *priv = netdev_priv(rmnet_dev); + u32 headroom; + + headroom = rmnet_vnd_headroom(priv->real_dev); + + if (new_mtu < 0 || new_mtu > RMNET_MAX_PACKET_SIZE || + new_mtu > (priv->real_dev->mtu - headroom)) return -EINVAL; rmnet_dev->mtu = new_mtu; @@ -229,6 +250,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev, { struct rmnet_priv *priv = netdev_priv(rmnet_dev); + u32 headroom; int rc; if (rmnet_get_endpoint(port, id)) { @@ -242,6 +264,13 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev, priv->real_dev = real_dev; + headroom = rmnet_vnd_headroom(real_dev); + + if (rmnet_vnd_change_mtu(rmnet_dev, real_dev->mtu - headroom)) { + NL_SET_ERR_MSG_MOD(extack, "Invalid MTU on real dev"); + return -EINVAL; + } + rc = register_netdevice(rmnet_dev); if (!rc) { ep->egress_dev = rmnet_dev; @@ -283,3 +312,51 @@ int rmnet_vnd_do_flow_control(struct net_device *rmnet_dev, int enable) return 0; } + +int rmnet_vnd_validate_real_dev_mtu(struct net_device *real_dev) +{ + struct hlist_node *tmp_ep; + struct rmnet_endpoint *ep; + struct rmnet_port *port; + unsigned long bkt_ep; +
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
On Mon, 7 Dec 2020 at 15:15, David Howells wrote: > > Ard Biesheuvel wrote: > > > > I wonder if it would help if the input buffer and output buffer didn't > > > have to correspond exactly in usage - ie. the output buffer could be used > > > at a slower rate than the input to allow for buffering inside the crypto > > > algorithm. > > > > > > > I don't follow - how could one be used at a slower rate? > > I mean that the crypto algorithm might need to buffer the last part of the > input until it has a block's worth before it can write to the output. > This is what is typically handled transparently by the driver. When you populate a scatterlist, it doesn't matter how misaligned the individual elements are, the scatterlist walker will always present the data in chunks that the crypto algorithm can manage. This is why using a single scatterlist for the entire input is preferable in general. > > > The hashes corresponding to the kerberos enctypes I'm supporting are: > > > > > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96. > > > > > > HMAC-SHA256 for aes128-cts-hmac-sha256-128 > > > > > > HMAC-SHA384 for aes256-cts-hmac-sha384-192 > > > > > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac > > > > > > I'm not sure you can support all of those with the instructions available. > > > > It depends on whether the caller can make use of the authenc() > > pattern, which is a type of AEAD we support. > > Interesting. I didn't realise AEAD was an API. > > > There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)), > > including h/w accelerated ones, but none that implement ciphertext > > stealing. So that means that, even if you manage to use the AEAD layer to > > perform both at the same time, the generic authenc() template will perform > > the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes, > > respectively, which won't give you any benefit until accelerated > > implementations turn up that perform the whole operation in one pass over > > the input. And even then, I don't think the performance benefit will be > > worth it. > > Also, the rfc8009 variants that use AES with SHA256/384 hash the ciphertext, > not the plaintext. > > For the moment, it's probably not worth worrying about, then. If I can manage > to abstract the sunrpc bits out into a krb5 library, we can improve the > library later. >
Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
On Mon, 7 Dec 2020 18:01:00 -0700 David Ahern wrote: > On 12/7/20 1:52 PM, John Fastabend wrote: > >> > >> I think we need to keep XDP_TX action separate, because I think that > >> there are use-cases where the we want to disable XDP_TX due to end-user > >> policy or hardware limitations. > > > > How about we discover this at load time though. Nitpick at XDP "attach" time. The general disconnect between BPF and XDP is that BPF can verify at "load" time (as kernel knows what it support) while XDP can have different support/features per driver, and cannot do this until attachment time. (See later issue with tail calls). (All other BPF-hooks don't have this issue) > > Meaning if the program > > doesn't use XDP_TX then the hardware can skip resource allocations for > > it. I think we could have verifier or extra pass discover the use of > > XDP_TX and then pass a bit down to driver to enable/disable TX caps. > > > > This was discussed in the context of virtio_net some months back - it is > hard to impossible to know a program will not return XDP_TX (e.g., value > comes from a map). It is hard, and sometimes not possible. For maps the workaround is that BPF-programmer adds a bound check on values from the map. If not doing that the verifier have to assume all possible return codes are used by BPF-prog. The real nemesis is program tail calls, that can be added dynamically after the XDP program is attached. It is at attachment time that changing the NIC resources is possible. So, for program tail calls the verifier have to assume all possible return codes are used by BPF-prog. BPF now have function calls and function replace right(?) How does this affect this detection of possible return codes? > Flipping that around, what if the program attach indicates whether > XDP_TX could be returned. If so, driver manages the resource needs. If > not, no resource needed and if the program violates that and returns > XDP_TX the packet is dropped. I do like this idea, as IMHO we do need something that is connected with the BPF-prog, that describe what resources the program request (either like above via detecting this in verifier, or simply manually configuring this in the BPF-prog ELF file) The main idea is that we all (I assume) want to provide a better end-user interface/experience. By direct feedback to the end-user that "loading+attaching" this XDP BPF-prog will not work, as e.g. driver don't support a specific return code. Thus, we need to reject "loading+attaching" if features cannot be satisfied. We need a solution as; today it is causing frustration for end-users that packets can be (silently) dropped by XDP, e.g. if driver don't support XDP_REDIRECT. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH net] net: ll_temac: Fix potential NULL dereference in temac_probe()
Zhang Changzhong writes: > platform_get_resource() may fail and in this case a NULL dereference > will occur. > > Fix it to use devm_platform_ioremap_resource() instead of calling > platform_get_resource() and devm_ioremap(). > > This is detected by Coccinelle semantic patch. > > @@ > expression pdev, res, n, t, e, e1, e2; > @@ > > res = \(platform_get_resource\|platform_get_resource_byname\)(pdev, t, n); > + if (!res) > + return -EINVAL; > ... when != res == NULL > e = devm_ioremap(e1, res->start, e2); > > Fixes: 8425c41d1ef7 ("net: ll_temac: Extend support to non-device-tree > platforms") > Signed-off-by: Zhang Changzhong > --- > drivers/net/ethernet/xilinx/ll_temac_main.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c > b/drivers/net/ethernet/xilinx/ll_temac_main.c > index 60c199f..0301853 100644 > --- a/drivers/net/ethernet/xilinx/ll_temac_main.c > +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c > @@ -1351,7 +1351,6 @@ static int temac_probe(struct platform_device *pdev) > struct device_node *temac_np = dev_of_node(&pdev->dev), *dma_np; > struct temac_local *lp; > struct net_device *ndev; > - struct resource *res; > const void *addr; > __be32 *p; > bool little_endian; > @@ -1500,13 +1499,11 @@ static int temac_probe(struct platform_device *pdev) > of_node_put(dma_np); > } else if (pdata) { > /* 2nd memory resource specifies DMA registers */ > - res = platform_get_resource(pdev, IORESOURCE_MEM, 1); > - lp->sdma_regs = devm_ioremap(&pdev->dev, res->start, > - resource_size(res)); > - if (!lp->sdma_regs) { > + lp->sdma_regs = devm_platform_ioremap_resource(pdev, 1); > + if (IS_ERR(lp->sdma_regs)) { > dev_err(&pdev->dev, > "could not map DMA registers\n"); > - return -ENOMEM; > + return PTR_ERR(lp->sdma_regs); > } > if (pdata->dma_little_endian) { > lp->dma_in = temac_dma_in32_le; Acked-by: Esben Haabendal
[PATCH] net: 8021q: vlan: reduce noise in driver initialization
If drivers work properly, they should be silent. Thus remove the unncessary noise von initialization. Signed-off-by: Enrico Weigelt, metux IT consult --- net/8021q/vlan.c | 5 - 1 file changed, 5 deletions(-) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index f292e0267bb9..9f4b1b9a37e4 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -42,9 +42,6 @@ unsigned int vlan_net_id __read_mostly; -const char vlan_fullname[] = "802.1Q VLAN Support"; -const char vlan_version[] = DRV_VERSION; - /* End of global variables definitions. */ static int vlan_group_prealloc_vid(struct vlan_group *vg, @@ -687,8 +684,6 @@ static int __init vlan_proto_init(void) { int err; - pr_info("%s v%s\n", vlan_fullname, vlan_version); - err = register_pernet_subsys(&vlan_net_ops); if (err < 0) goto err0; -- 2.11.0
Re: [PATCH 1/7] net: 8021q: remove unneeded MODULE_VERSION() usage
On 05.12.20 16:53, Greg KH wrote: >> How do we feel about deleting this not really informative message >> altogether in a future patch? > > It too should be removed. If drivers are working properly, they are > quiet. Just sent a separate patch for removing this message. I'll rebase my patch queue when this patch went through. --mtx -- --- Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren GPG/PGP-Schlüssel zu. --- Enrico Weigelt, metux IT consult Free software and Linux embedded engineering i...@metux.net -- +49-151-27565287
[PATCH net-next] net: dsa: mv88e6xxx: don't set non-existing learn2all bit for 6220/6250
The 6220 and 6250 switches do not have a learn2all bit in global1, ATU control register; bit 3 is reserved. On the switches that do have that bit, it is used to control whether learning frames are sent out the ports that have the message_port bit set. So rather than adding yet another chip method, use the existence of the ->port_setup_message_port method as a proxy for determining whether the learn2all bit exists (and should be set). Signed-off-by: Rasmus Villemoes --- This doesn't fix anything from what I can tell, in particular not the VLAN problems I'm having, so just tagging for net-next. But I do think it's worth it on the general principle of not poking around in undocumented/reserved bits. drivers/net/dsa/mv88e6xxx/chip.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 25449f634889..0245f3dfc1cd 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -1347,9 +1347,11 @@ static int mv88e6xxx_atu_setup(struct mv88e6xxx_chip *chip) if (err) return err; - err = mv88e6xxx_g1_atu_set_learn2all(chip, true); - if (err) - return err; + if (chip->info->ops->port_setup_message_port) { + err = mv88e6xxx_g1_atu_set_learn2all(chip, true); + if (err) + return err; + } return mv88e6xxx_g1_atu_set_age_time(chip, 30); } -- 2.23.0
Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
On Mon, 07 Dec 2020 12:52:22 -0800 John Fastabend wrote: > > Use-case(1): Cloud-provider want to give customers (running VMs) ability > > to load XDP program for DDoS protection (only), but don't want to allow > > customer to use XDP_TX (that can implement LB or cheat their VM > > isolation policy). > > Not following. What interface do they want to allow loading on? If its > the VM interface then I don't see how it matters. From outside the > VM there should be no way to discover if its done in VM or in tc or > some other stack. > > If its doing some onloading/offloading I would assume they need to > ensure the isolation, etc. is still maintained because you can't > let one VMs program work on other VMs packets safely. > > So what did I miss, above doesn't make sense to me. The Cloud-provider want to load customer provided BPF-code on the physical Host-OS NIC (that support XDP). The customer can get access to a web-interface where they can write or upload their BPF-prog. As multiple customers can upload BPF-progs, the Cloud-provider have to write a BPF-prog dispatcher that runs these multiple program. This could be done via BPF tail-calls, or via Toke's libxdp[1], or via devmap XDP-progs per egress port. The Cloud-provider don't fully trust customers BPF-prog. They already pre-filtered traffic to the given VM, so they can allow customers freedom to see traffic and do XDP_PASS and XDP_DROP. They administratively (via ethtool) want to disable the XDP_REDIRECT and XDP_TX driver feature, as it can be used for violation their VM isolation policy between customers. Is the use-case more clear now? [1] https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH v2] xfrm: interface: Don't hide plain packets from netfilter
Le 07/12/2020 à 14:43, Phil Sutter a écrit : > With an IPsec tunnel without dedicated interface, netfilter sees locally > generated packets twice as they exit the physical interface: Once as "the > inner packet" with IPsec context attached and once as the encrypted > (ESP) packet. > > With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT > hook anymore, making it impossible to match on both inner header values > and associated IPsec data from that hook. > > Fix this by looping packets transmitted from xfrm_interface through > NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes > behaviour consistent again from netfilter's point of view. > > Fixes: f203b76d78092 ("xfrm: Add virtual xfrm interfaces") > Signed-off-by: Phil Sutter > --- > Changes since v1: > - Extend recipients list, no code changes. > --- > net/xfrm/xfrm_interface.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c > index aa4cdcf69d471..24af61c95b4d4 100644 > --- a/net/xfrm/xfrm_interface.c > +++ b/net/xfrm/xfrm_interface.c > @@ -317,7 +317,8 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, > struct flowi *fl) > skb_dst_set(skb, dst); > skb->dev = tdev; > > - err = dst_output(xi->net, skb->sk, skb); > + err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net, skb->protocol must be correctly set, maybe better to use it instead of skb_dst(skb)->ops->family? > + skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output); And here, tdev instead of skb_dst(skb)->dev ? > if (net_xmit_eval(err) == 0) { > struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats); > >
Re: [PATCH v1 bpf-next 03/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.
From: Martin KaFai Lau Date: Tue, 8 Dec 2020 00:13:28 -0800 > On Tue, Dec 08, 2020 at 03:27:14PM +0900, Kuniyuki Iwashima wrote: > > From: Martin KaFai Lau > > Date: Mon, 7 Dec 2020 12:14:38 -0800 > > > On Sun, Dec 06, 2020 at 01:03:07AM +0900, Kuniyuki Iwashima wrote: > > > > From: Martin KaFai Lau > > > > Date: Fri, 4 Dec 2020 17:42:41 -0800 > > > > > On Tue, Dec 01, 2020 at 11:44:10PM +0900, Kuniyuki Iwashima wrote: > > > > > [ ... ] > > > > > > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c > > > > > > index fd133516ac0e..60d7c1f28809 100644 > > > > > > --- a/net/core/sock_reuseport.c > > > > > > +++ b/net/core/sock_reuseport.c > > > > > > @@ -216,9 +216,11 @@ int reuseport_add_sock(struct sock *sk, struct > > > > > > sock *sk2, bool bind_inany) > > > > > > } > > > > > > EXPORT_SYMBOL(reuseport_add_sock); > > > > > > > > > > > > -void reuseport_detach_sock(struct sock *sk) > > > > > > +struct sock *reuseport_detach_sock(struct sock *sk) > > > > > > { > > > > > > struct sock_reuseport *reuse; > > > > > > + struct bpf_prog *prog; > > > > > > + struct sock *nsk = NULL; > > > > > > int i; > > > > > > > > > > > > spin_lock_bh(&reuseport_lock); > > > > > > @@ -242,8 +244,12 @@ void reuseport_detach_sock(struct sock *sk) > > > > > > > > > > > > reuse->num_socks--; > > > > > > reuse->socks[i] = reuse->socks[reuse->num_socks]; > > > > > > + prog = rcu_dereference(reuse->prog); > > > > > Is it under rcu_read_lock() here? > > > > > > > > reuseport_lock is locked in this function, and we do not modify the > > > > prog, > > > > but is rcu_dereference_protected() preferable? > > > > > > > > ---8<--- > > > > prog = rcu_dereference_protected(reuse->prog, > > > > lockdep_is_held(&reuseport_lock)); > > > > ---8<--- > > > It is not only reuse->prog. Other things also require rcu_read_lock(), > > > e.g. please take a look at __htab_map_lookup_elem(). > > > > > > The TCP_LISTEN sk (selected by bpf to be the target of the migration) > > > is also protected by rcu. > > > > Thank you, I will use rcu_read_lock() and rcu_dereference() in v3 patchset. > > > > > > > I am surprised there is no WARNING in the test. > > > Do you have the needed DEBUG_LOCK* config enabled? > > > > Yes, DEBUG_LOCK* was 'y', but rcu_dereference() without rcu_read_lock() > > does not show warnings... > I would at least expect the "WARN_ON_ONCE(!rcu_read_lock_held() ...)" > from __htab_map_lookup_elem() should fire in your test > example in the last patch. > > It is better to check the config before sending v3. It seems ok, but I will check it again. ---8<--- [ec2-user@ip-10-0-0-124 bpf-next]$ cat .config | grep DEBUG_LOCK CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_DEBUG_LOCKDEP=y CONFIG_DEBUG_LOCKING_API_SELFTESTS=y ---8<--- > > > > > > diff --git a/net/ipv4/inet_connection_sock.c > > > > > > b/net/ipv4/inet_connection_sock.c > > > > > > index 1451aa9712b0..b27241ea96bd 100644 > > > > > > --- a/net/ipv4/inet_connection_sock.c > > > > > > +++ b/net/ipv4/inet_connection_sock.c > > > > > > @@ -992,6 +992,36 @@ struct sock *inet_csk_reqsk_queue_add(struct > > > > > > sock *sk, > > > > > > } > > > > > > EXPORT_SYMBOL(inet_csk_reqsk_queue_add); > > > > > > > > > > > > +void inet_csk_reqsk_queue_migrate(struct sock *sk, struct sock > > > > > > *nsk) > > > > > > +{ > > > > > > + struct request_sock_queue *old_accept_queue, *new_accept_queue; > > > > > > + > > > > > > + old_accept_queue = &inet_csk(sk)->icsk_accept_queue; > > > > > > + new_accept_queue = &inet_csk(nsk)->icsk_accept_queue; > > > > > > + > > > > > > + spin_lock(&old_accept_queue->rskq_lock); > > > > > > + spin_lock(&new_accept_queue->rskq_lock); > > > > > I am also not very thrilled on this double spin_lock. > > > > > Can this be done in (or like) inet_csk_listen_stop() instead? > > > > > > > > It will be possible to migrate sockets in inet_csk_listen_stop(), but I > > > > think it is better to do it just after reuseport_detach_sock() becuase > > > > we > > > > can select a different listener (almost) every time at a lower cost by > > > > selecting the moved socket and pass it to inet_csk_reqsk_queue_migrate() > > > > easily. > > > I don't see the "lower cost" point. Please elaborate. > > > > In reuseport_select_sock(), we pass sk_hash of the request socket to > > reciprocal_scale() and generate a random index for socks[] to select > > a different listener every time. > > On the other hand, we do not have request sockets in unhash path and > > sk_hash of the listener is always 0, so we have to generate a random number > > in another way. In reuseport_detach_sock(), we can use the index of the > > moved socket, but we do not have it in inet_csk_listen_stop(), so we have > > to generate a random number in inet_csk_listen_stop(). > > I think it is at lower cost to use the index of the moved socket. > Generate a random number is not a big deal for the
Re: [PATCH v3 09/11] dt-bindings: usb: convert mediatek,mtk-xhci.txt to YAML schema
On Mon, 2020-12-07 at 15:24 -0600, Rob Herring wrote: > On Wed, Nov 18, 2020 at 04:21:24PM +0800, Chunfeng Yun wrote: > > Convert mediatek,mtk-xhci.txt to YAML schema mediatek,mtk-xhci.yaml > > > > Signed-off-by: Chunfeng Yun > > --- > > v3: > > 1. fix yamllint warning > > 2. remove pinctrl* properties supported by default suggested by Rob > > 3. drop unused labels > > 4. modify description of mediatek,syscon-wakeup > > 5. remove type of imod-interval-ns > > > > v2: new patch > > --- > > .../bindings/usb/mediatek,mtk-xhci.txt| 121 - > > .../bindings/usb/mediatek,mtk-xhci.yaml | 171 ++ > > 2 files changed, 171 insertions(+), 121 deletions(-) > > delete mode 100644 > > Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.txt > > create mode 100644 > > Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.yaml [...] > > diff --git a/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.yaml > > b/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.yaml > > new file mode 100644 > > index ..4a36ad5c4d25 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.yaml > > @@ -0,0 +1,171 @@ > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > > +# Copyright (c) 2020 MediaTek > > +%YAML 1.2 > > +--- > > +$id: http://devicetree.org/schemas/usb/mediatek,mtk-xhci.yaml# > > +$schema: http://devicetree.org/meta-schemas/core.yaml# > > + > > +title: MediaTek USB3 xHCI Device Tree Bindings > > + > > +maintainers: > > + - Chunfeng Yun > > + > > +allOf: > > + - $ref: "usb-hcd.yaml" > > + > > +description: | > > + There are two scenarios: > > + case 1: only supports xHCI driver; > > + case 2: supports dual-role mode, and the host is based on xHCI driver. > > + > > +properties: > > + # common properties for both case 1 and case 2 > > + compatible: > > +items: > > + - enum: > > + - mediatek,mt2712-xhci > > + - mediatek,mt7622-xhci > > + - mediatek,mt7629-xhci > > + - mediatek,mt8173-xhci > > + - mediatek,mt8183-xhci > > + - const: mediatek,mtk-xhci > > + > > + reg: > > +minItems: 1 > > +maxItems: 2 > > +items: > > + - description: the registers of xHCI MAC > > + - description: the registers of IP Port Control > > + > > + reg-names: > > +minItems: 1 > > +maxItems: 2 > > +items: > > + - const: mac > > + - const: ippc # optional, only needed for case 1. > > + > > + interrupts: > > +maxItems: 1 > > + > > + power-domains: > > +description: A phandle to USB power domain node to control USB's MTCMOS > > +maxItems: 1 > > + > > + clocks: > > +minItems: 1 > > +maxItems: 5 > > +items: > > + - description: Controller clock used by normal mode > > + - description: Reference clock used by low power mode etc > > + - description: Mcu bus clock for register access > > + - description: DMA bus clock for data transfer > > + - description: controller clock > > + > > + clock-names: > > +minItems: 1 > > +maxItems: 5 > > +items: > > + - const: sys_ck # required, the following ones are optional > > + - const: ref_ck > > + - const: mcu_ck > > + - const: dma_ck > > + - const: xhci_ck > > + > > + phys: > > +$ref: /usb/usb-hcd.yaml# > > That's not right. > > You need 'items' and list each entry. Will add minItems/maxItems instead due to it's variable and phy-names is not used > > > +description: List of all the USB PHYs on this HCD > > + > > + vusb33-supply: > > +description: Regulator of USB AVDD3.3v > > + > > + vbus-supply: > > +description: Regulator of USB VBUS5v > > + > > + usb3-lpm-capable: > > +description: supports USB3.0 LPM > > +type: boolean > > + > > + imod-interval-ns: > > +description: > > + Interrupt moderation interval value, it is 8 times as much as that > > + defined in the xHCI spec on MTK's controller. > > +default: 5000 > > + > > + # the following properties are only used for case 1 > > + wakeup-source: > > +description: enable USB remote wakeup, see power/wakeup-source.txt > > +type: boolean > > + > > + mediatek,syscon-wakeup: > > +$ref: /schemas/types.yaml#/definitions/phandle-array > > +maxItems: 1 > > +description: | > > + A phandle to syscon used to access the register of the USB wakeup > > glue > > + layer between xHCI and SPM, the field should always be 3 cells long. > > + > > + items: > > Indentation is wrong here. Should be 2 fewer spaces. Will fix it > > > +- description: > > +The first cell represents a phandle to syscon > > +- description: > > +The second cell represents the register base address of the > > glue > > +layer in syscon > > +- description: > > +The third cell represents the hardware version of the glue > > layer, > > +
Re: [PATCH v3 10/11] dt-bindings: usb: convert mediatek,mtu3.txt to YAML schema
On Mon, 2020-12-07 at 15:30 -0600, Rob Herring wrote: > On Wed, Nov 18, 2020 at 04:21:25PM +0800, Chunfeng Yun wrote: > > Convert mediatek,mtu3.txt to YAML schema mediatek,mtu3.yaml > > > > Signed-off-by: Chunfeng Yun > > --- > > v3: > > 1. fix yamllint warning > > 2. remove pinctrl* properties > > 3. remove unnecessary '|' > > 4. drop unused labels in example > > > > v2: new patch > > --- > > .../devicetree/bindings/usb/mediatek,mtu3.txt | 108 - > > .../bindings/usb/mediatek,mtu3.yaml | 218 ++ > > 2 files changed, 218 insertions(+), 108 deletions(-) > > delete mode 100644 Documentation/devicetree/bindings/usb/mediatek,mtu3.txt > > create mode 100644 Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml > > [...] > > diff --git a/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml > > b/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml > > new file mode 100644 > > index ..290e97a06f2a > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml > > @@ -0,0 +1,218 @@ > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > > +# Copyright (c) 2020 MediaTek > > +%YAML 1.2 > > +--- > > +$id: http://devicetree.org/schemas/usb/mediatek,mtu3.yaml# > > +$schema: http://devicetree.org/meta-schemas/core.yaml# > > + > > +title: MediaTek USB3 DRD Controller Device Tree Bindings > > + > > +maintainers: > > + - Chunfeng Yun > > + > > +description: | > > + The DRD controller has a glue layer IPPC (IP Port Control), and its host > > is > > + based on xHCI. > > + > > +properties: > > + compatible: > > +items: > > + - enum: > > + - mediatek,mt2712-mtu3 > > + - mediatek,mt8173-mtu3 > > + - mediatek,mt8183-mtu3 > > + - const: mediatek,mtu3 > > + > > + reg: > > +items: > > + - description: the registers of device MAC > > + - description: the registers of IP Port Control > > + > > + reg-names: > > +items: > > + - const: mac > > + - const: ippc > > + > > + interrupts: > > +maxItems: 1 > > + > > + power-domains: > > +description: A phandle to USB power domain node to control USB's MTCMOS > > +maxItems: 1 > > + > > + clocks: > > +minItems: 1 > > +maxItems: 4 > > +items: > > + - description: Controller clock used by normal mode > > + - description: Reference clock used by low power mode etc > > + - description: Mcu bus clock for register access > > + - description: DMA bus clock for data transfer > > + > > + clock-names: > > +minItems: 1 > > +maxItems: 4 > > +items: > > + - const: sys_ck # required, the following ones are optional > > + - const: ref_ck > > + - const: mcu_ck > > + - const: dma_ck > > + > > + phys: > > +$ref: /schemas/types.yaml#/definitions/phandle-array > > Drop. Need to say how many entries and what each one is if more than 1. Ok > > > +description: List of all the USB PHYs used > > + > > + vusb33-supply: > > +description: Regulator of USB AVDD3.3v > > + > > + vbus-supply: > > +$ref: /connector/usb-connector.yaml# > > Nope. Will remove it > > > +deprecated: true > > +description: | > > + Regulator of USB VBUS5v, needed when supports dual-role mode. > > + Particularly, if use an output GPIO to control a VBUS regulator, > > should > > + model it as a regulator. See bindings/regulator/fixed-regulator.yaml > > + It's considered valid for compatibility reasons, not allowed for > > + new bindings, and put into a usb-connector node. > > + > > + dr_mode: > > +description: See usb/generic.txt > > +enum: [host, peripheral, otg] > > +default: otg > > + > > + maximum-speed: > > +description: See usb/generic.txt > > +enum: [super-speed-plus, super-speed, high-speed, full-speed] > > + > > + "#address-cells": > > +enum: [1, 2] > > + > > + "#size-cells": > > +enum: [1, 2] > > + > > + ranges: true > > + > > + extcon: > > +deprecated: true > > +description: | > > + Phandle to the extcon device detecting the IDDIG/VBUS state, neede > > + when supports dual-role mode. > > + It's considered valid for compatibility reasons, not allowed for > > + new bindings, and use "usb-role-switch" property instead. > > + > > + usb-role-switch: > > +$ref: /schemas/types.yaml#/definitions/flag > > +description: Support role switch. See usb/generic.txt > > +type: boolean > > + > > + connector: > > +$ref: /connector/usb-connector.yaml# > > +description: > > + Connector for dual role switch, especially for "gpio-usb-b-connector" > > +type: object > > + > > + port: > > +description: > > + Any connector to the data bus of this controller should be modelled > > + using the OF graph bindings specified, if the "usb-role-switch" > > + property is used. See graph.txt > > +type: object > > Please include port and connector i
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
Ard Biesheuvel wrote: Ard Biesheuvel wrote: > > > > I wonder if it would help if the input buffer and output buffer didn't > > > > have to correspond exactly in usage - ie. the output buffer could be > > > > used at a slower rate than the input to allow for buffering inside the > > > > crypto algorithm. > > > > > > I don't follow - how could one be used at a slower rate? > > > > I mean that the crypto algorithm might need to buffer the last part of the > > input until it has a block's worth before it can write to the output. > > This is what is typically handled transparently by the driver. When > you populate a scatterlist, it doesn't matter how misaligned the > individual elements are, the scatterlist walker will always present > the data in chunks that the crypto algorithm can manage. This is why > using a single scatterlist for the entire input is preferable in > general. Yep - but the assumption currently on the part of the callers is that they provide the input buffer and corresponding output buffer - and that the algorithm will transfer data from one to the other, such that the same amount of input and output bufferage will be used. However, if we start pushing data in progressively, this would no longer hold true unless we also require the caller to only present in block-size chunks. For example, if I gave the encryption function 120 bytes of data and a 120 byte output buffer, but the algorithm has a 16-byte blocksize, it will, presumably, consume 120 bytes of input, but it can only write 112 bytes of output at this time. So the current interface would need to evolve to indicate separately how much input has been consumed and how much output has been produced - in which case it can't be handled transparently. For krb5, it's actually worse than that, since we want to be able to insert/remove a header and a trailer (and might need to go back and update the header after) - but I think in the krb5 case, we need to treat the header and trailer specially and update them after the fact in the wrapping case (unwrapping is not a problem, since we can just cache the header). David
[PATCH net-next] net: Limit logical shift left of TCP probe0 timeout
For each TCP zero window probe, the icsk_backoff is increased by one and its max value is tcp_retries2. If tcp_retries2 is greater than 63, the probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the shift count would be masked to range 0 to 63. And on ARMv7 the result is zero. If the shift count is masked, only several probes will be sent with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it needs tcp_retries2 times probes to end this false timeout. Besides, bitwise shift greater than or equal to the width is an undefined behavior. This patch adds a limit to the backoff. The max value of max_when is TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit is the backoff from TCP_RTO_MIN to TCP_RTO_MAX. Signed-off-by: Cambda Zhu --- include/net/tcp.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index d4ef5bf94168..82044179c345 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1321,7 +1321,9 @@ static inline unsigned long tcp_probe0_base(const struct sock *sk) static inline unsigned long tcp_probe0_when(const struct sock *sk, unsigned long max_when) { - u64 when = (u64)tcp_probe0_base(sk) << inet_csk(sk)->icsk_backoff; + u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1, + inet_csk(sk)->icsk_backoff); + u64 when = (u64)tcp_probe0_base(sk) << backoff; return (unsigned long)min_t(u64, when, max_when); } -- 2.16.6
RE: [EXT] Re: [PATCH v2] MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver
Hi Jakub, thanks for the guidelines. > On Sat, 5 Dec 2020 18:43:00 +0200 Mickey Rachamim wrote: > > Add maintainers info for new Marvell Prestera Ethernet switch driver. > > > > Signed-off-by: Mickey Rachamim > > --- > > v2: > > Update the maintainers list according to community recommendation. > > > > MAINTAINERS | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/MAINTAINERS b/MAINTAINERS index > > 061e64b2423a..c92b44754436 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -10550,6 +10550,14 @@ S: Supported > > F: Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst > > F: drivers/net/ethernet/marvell/octeontx2/af/ > > > > +MARVELL PRESTERA ETHERNET SWITCH DRIVER > > +M: Vadym Kochan > > +M: Taras Chornyi > > Just a heads up, again, we'll start removing maintainers who aren't > participating, so Taras needs to be active. We haven't seen a single email > from him so far AFAICT. > Fully clear, Taras is an expert on Linux kernel code working on PLVision and under contract with Marvell. He will became active on contributions and reviews very soon. > > +L: netdev@vger.kernel.org > > nit: I don't think you need to list netdev, it'll get inherited from the > general entry for networking drivers (you can test running get_maintainer.pl > on a patch to the driver and see if it reports it). Right, will remove. > > +S: Supported > > +W: http://www.marvell.com > > The website entry is for a project-specific website. If you have a link to a > site with open resources about the chips/driver that'd be great, otherwise > please drop it. Also https is expected these days ;) Can I placed here the Github project link? https://github.com/Marvell-switching/switchdev-prestera
[PATCH net-next 00/13] mlxsw: Add support for Q-in-VNI
From: Ido Schimmel This patch set adds support for Q-in-VNI over Spectrum-{2,3} ASICs. Q-in-VNI is like regular VxLAN encapsulation with the sole difference that overlay packets can contain a VLAN tag. In Linux, this is achieved by adding the VxLAN device to a 802.1ad bridge instead of a 802.1q bridge. >From mlxsw perspective, Q-in-VNI support entails two main changes: 1. An outer VLAN tag should always be pushed to the overlay packet during decapsulation 2. The EtherType used during decapsulation should be 802.1ad (0x88a8) instead of the default 802.1q (0x8100) Patch set overview: Patches #1-#3 add required device registers and fields Patch #4 performs small refactoring to allow code re-use Patches #5-#7 make the EtherType used during decapsulation a property of the tunnel port (i.e., VxLAN). This leads to the driver vetoing configurations in which VxLAN devices are member in both 802.1ad and 802.1q/802.1d bridges. Will be handled in the future by determining the overlay EtherType on the egress port instead Patch #8 adds support for Q-in-VNI for Spectrum-2 and newer ASICs Patches #9-#10 veto Q-in-VNI for Spectrum-1 ASICs due to some hardware limitations. Can be worked around, but decided not to support it for now Patch #11 adjusts mlxsw to stop vetoing addition of VXLAN devices to 802.1ad bridges Patch #12 adds a generic forwarding test that can be used with both veth pairs and physical ports with a loopback Patch #13 adds a test to make sure mlxsw vetoes unsupported Q-in-VNI configurations Amit Cohen (12): mlxsw: Use one enum for all registers that contain tunnel_port field mlxsw: reg: Add Switch Port VLAN Stacking Register mlxsw: reg: Add support for tunnel port in SPVID register mlxsw: spectrum_switchdev: Create common function for joining VxLAN to VLAN-aware bridge mlxsw: Save EtherType as part of mlxsw_sp_nve_params mlxsw: Save EtherType as part of mlxsw_sp_nve_config mlxsw: spectrum: Publish mlxsw_sp_ethtype_to_sver_type() mlxsw: spectrum_nve_vxlan: Add support for Q-in-VNI for Spectrum-2 ASIC mlxsw: spectrum_switchdev: Use ops->vxlan_join() when adding VLAN to VxLAN device mlxsw: Veto Q-in-VNI for Spectrum-1 ASIC mlxsw: spectrum_switchdev: Allow joining VxLAN to 802.1ad bridge selftests: mlxsw: Add Q-in-VNI veto tests Petr Machata (1): selftests: forwarding: Add Q-in-VNI test drivers/net/ethernet/mellanox/mlxsw/reg.h | 146 ++-- .../net/ethernet/mellanox/mlxsw/spectrum.c| 2 +- .../net/ethernet/mellanox/mlxsw/spectrum.h| 2 + .../ethernet/mellanox/mlxsw/spectrum_nve.c| 6 +- .../ethernet/mellanox/mlxsw/spectrum_nve.h| 5 +- .../mellanox/mlxsw/spectrum_nve_vxlan.c | 67 +++- .../mellanox/mlxsw/spectrum_switchdev.c | 32 +- .../net/mlxsw/spectrum-2/q_in_vni_veto.sh | 77 .../net/mlxsw/spectrum/q_in_vni_veto.sh | 66 .../selftests/net/forwarding/q_in_vni.sh | 347 ++ 10 files changed, 703 insertions(+), 47 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh create mode 100755 tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh create mode 100755 tools/testing/selftests/net/forwarding/q_in_vni.sh -- 2.28.0
[PATCH net-next 04/13] mlxsw: spectrum_switchdev: Create common function for joining VxLAN to VLAN-aware bridge
From: Amit Cohen The code in mlxsw_sp_bridge_8021q_vxlan_join() can be used also for 802.1ad bridge. Move the code to function called mlxsw_sp_bridge_vlan_aware_vxlan_join() and call it from mlxsw_sp_bridge_8021q_vxlan_join() to enable code reuse. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- .../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index 9c4e17607e6a..c53e0ab9f971 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -2053,9 +2053,9 @@ mlxsw_sp_bridge_8021q_port_leave(struct mlxsw_sp_bridge_device *bridge_device, } static int -mlxsw_sp_bridge_8021q_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device, -const struct net_device *vxlan_dev, u16 vid, -struct netlink_ext_ack *extack) +mlxsw_sp_bridge_vlan_aware_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device, + const struct net_device *vxlan_dev, + u16 vid, struct netlink_ext_ack *extack) { struct mlxsw_sp *mlxsw_sp = mlxsw_sp_lower_get(bridge_device->dev); struct vxlan_dev *vxlan = netdev_priv(vxlan_dev); @@ -2101,6 +2101,15 @@ mlxsw_sp_bridge_8021q_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device, return err; } +static int +mlxsw_sp_bridge_8021q_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device, +const struct net_device *vxlan_dev, u16 vid, +struct netlink_ext_ack *extack) +{ + return mlxsw_sp_bridge_vlan_aware_vxlan_join(bridge_device, vxlan_dev, +vid, extack); +} + static struct net_device * mlxsw_sp_bridge_8021q_vxlan_dev_find(struct net_device *br_dev, u16 vid) { -- 2.28.0
[PATCH net-next 02/13] mlxsw: reg: Add Switch Port VLAN Stacking Register
From: Amit Cohen SPVTR register configures the VLAN mode of the port to enable VLAN stacking. It will be used to configure VxLAN to push VLAN to the decapsulated packet. Without this setting, Spectrum-2 overtakes the VLAN tag of decapsulated packet for bridging. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/reg.h | 104 ++ 1 file changed, 104 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index 0a3c5f89268c..ad6798c2169d 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -1693,6 +1693,109 @@ static inline void mlxsw_reg_svfa_pack(char *payload, u8 local_port, mlxsw_reg_svfa_vid_set(payload, vid); } +/* SPVTR - Switch Port VLAN Stacking Register + * -- + * The Switch Port VLAN Stacking register configures the VLAN mode of the port + * to enable VLAN stacking. + */ +#define MLXSW_REG_SPVTR_ID 0x201D +#define MLXSW_REG_SPVTR_LEN 0x10 + +MLXSW_REG_DEFINE(spvtr, MLXSW_REG_SPVTR_ID, MLXSW_REG_SPVTR_LEN); + +/* reg_spvtr_tport + * Port is tunnel port. + * Access: Index + * + * Note: Reserved when SwitchX/-2 or Spectrum-1. + */ +MLXSW_ITEM32(reg, spvtr, tport, 0x00, 24, 1); + +/* reg_spvtr_local_port + * When tport = 0: local port number (Not supported from/to CPU). + * When tport = 1: tunnel port. + * Access: Index + */ +MLXSW_ITEM32(reg, spvtr, local_port, 0x00, 16, 8); + +/* reg_spvtr_ippe + * Ingress Port Prio Mode Update Enable. + * When set, the Port Prio Mode is updated with the provided ipprio_mode field. + * Reserved on Get operations. + * Access: OP + */ +MLXSW_ITEM32(reg, spvtr, ippe, 0x04, 31, 1); + +/* reg_spvtr_ipve + * Ingress Port VID Mode Update Enable. + * When set, the Ingress Port VID Mode is updated with the provided ipvid_mode + * field. + * Reserved on Get operations. + * Access: OP + */ +MLXSW_ITEM32(reg, spvtr, ipve, 0x04, 30, 1); + +/* reg_spvtr_epve + * Egress Port VID Mode Update Enable. + * When set, the Egress Port VID Mode is updated with the provided epvid_mode + * field. + * Access: OP + */ +MLXSW_ITEM32(reg, spvtr, epve, 0x04, 29, 1); + +/* reg_spvtr_ipprio_mode + * Ingress Port Priority Mode. + * This controls the PCP and DEI of the new outer VLAN + * Note: for SwitchX/-2 the DEI is not affected. + * 0: use port default PCP and DEI (configured by QPDPC). + * 1: use C-VLAN PCP and DEI. + * Has no effect when ipvid_mode = 0. + * Reserved when tport = 1. + * Access: RW + */ +MLXSW_ITEM32(reg, spvtr, ipprio_mode, 0x04, 20, 4); + +enum mlxsw_reg_spvtr_ipvid_mode { + /* IEEE Compliant PVID (default) */ + MLXSW_REG_SPVTR_IPVID_MODE_IEEE_COMPLIANT_PVID, + /* Push VLAN (for VLAN stacking, except prio tagged packets) */ + MLXSW_REG_SPVTR_IPVID_MODE_PUSH_VLAN_FOR_UNTAGGED_PACKET, + /* Always push VLAN (also for prio tagged packets) */ + MLXSW_REG_SPVTR_IPVID_MODE_ALWAYS_PUSH_VLAN, +}; + +/* reg_spvtr_ipvid_mode + * Ingress Port VLAN-ID Mode. + * For Spectrum family, this affects the values of SPVM.i + * Access: RW + */ +MLXSW_ITEM32(reg, spvtr, ipvid_mode, 0x04, 16, 4); + +enum mlxsw_reg_spvtr_epvid_mode { + /* IEEE Compliant VLAN membership */ + MLXSW_REG_SPVTR_EPVID_MODE_IEEE_COMPLIANT_VLAN_MEMBERSHIP, + /* Pop VLAN (for VLAN stacking) */ + MLXSW_REG_SPVTR_EPVID_MODE_POP_VLAN, +}; + +/* reg_spvtr_epvid_mode + * Egress Port VLAN-ID Mode. + * For Spectrum family, this affects the values of SPVM.e,u,pt. + * Access: WO + */ +MLXSW_ITEM32(reg, spvtr, epvid_mode, 0x04, 0, 4); + +static inline void mlxsw_reg_spvtr_pack(char *payload, bool tport, + u8 local_port, + enum mlxsw_reg_spvtr_ipvid_mode ipvid_mode) +{ + MLXSW_REG_ZERO(spvtr, payload); + mlxsw_reg_spvtr_tport_set(payload, tport); + mlxsw_reg_spvtr_local_port_set(payload, local_port); + mlxsw_reg_spvtr_ipvid_mode_set(payload, ipvid_mode); + mlxsw_reg_spvtr_ipve_set(payload, true); +} + /* SVPE - Switch Virtual-Port Enabling Register * * Enables port virtualization. @@ -11306,6 +11409,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = { MLXSW_REG(slcor), MLXSW_REG(spmlr), MLXSW_REG(svfa), + MLXSW_REG(spvtr), MLXSW_REG(svpe), MLXSW_REG(sfmr), MLXSW_REG(spvmlr), -- 2.28.0
[PATCH net-next 06/13] mlxsw: Save EtherType as part of mlxsw_sp_nve_config
From: Amit Cohen Add EtherType field to mlxsw_sp_nve_config struct. Set EtherType according to mlxsw_sp_nve_params.ethertype. Pass 'mlxsw_sp_nve_params' instead of 'mlxsw_sp_nve_params->dev' to the function which initializes mlxsw_sp_nve_config struct to know which EtherType to use. This field is needed to configure which EtherType will be used when VLAN is pushed at ingress of the tunnel port. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c | 2 +- drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h | 3 ++- drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c | 5 +++-- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c index ed0d334b5fd1..adf499665f87 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c @@ -802,7 +802,7 @@ int mlxsw_sp_nve_fid_enable(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_fid *fid, return -EINVAL; memset(&config, 0, sizeof(config)); - ops->nve_config(nve, params->dev, &config); + ops->nve_config(nve, params, &config); if (nve->num_nve_tunnels && memcmp(&config, &nve->config, sizeof(config))) { NL_SET_ERR_MSG_MOD(extack, "Conflicting NVE tunnels configuration"); diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h index 12f664f42f21..68bd9422be2a 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h @@ -18,6 +18,7 @@ struct mlxsw_sp_nve_config { u32 ul_tb_id; enum mlxsw_sp_l3proto ul_proto; union mlxsw_sp_l3addr ul_sip; + u16 ethertype; }; struct mlxsw_sp_nve { @@ -38,7 +39,7 @@ struct mlxsw_sp_nve_ops { const struct net_device *dev, struct netlink_ext_ack *extack); void (*nve_config)(const struct mlxsw_sp_nve *nve, - const struct net_device *dev, + const struct mlxsw_sp_nve_params *params, struct mlxsw_sp_nve_config *config); int (*init)(struct mlxsw_sp_nve *nve, const struct mlxsw_sp_nve_config *config); diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c index e9bff13ec264..f9a48a0109ff 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c @@ -87,10 +87,10 @@ static bool mlxsw_sp_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve, } static void mlxsw_sp_nve_vxlan_config(const struct mlxsw_sp_nve *nve, - const struct net_device *dev, + const struct mlxsw_sp_nve_params *params, struct mlxsw_sp_nve_config *config) { - struct vxlan_dev *vxlan = netdev_priv(dev); + struct vxlan_dev *vxlan = netdev_priv(params->dev); struct vxlan_config *cfg = &vxlan->cfg; config->type = MLXSW_SP_NVE_TYPE_VXLAN; @@ -101,6 +101,7 @@ static void mlxsw_sp_nve_vxlan_config(const struct mlxsw_sp_nve *nve, config->ul_proto = MLXSW_SP_L3_PROTO_IPV4; config->ul_sip.addr4 = cfg->saddr.sin.sin_addr.s_addr; config->udp_dport = cfg->dst_port; + config->ethertype = params->ethertype; } static int __mlxsw_sp_nve_parsing_set(struct mlxsw_sp *mlxsw_sp, -- 2.28.0
[PATCH net-next 03/13] mlxsw: reg: Add support for tunnel port in SPVID register
From: Amit Cohen Add spvid_tport field which indicates if the port is tunnel port. When spvid_tport is true, local_port field supposed to be tunnel port type. It will be used to configure which Ethertype will be used when VLAN is pushed at ingress for tunnel port. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/reg.h | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index ad6798c2169d..2a89b3261f00 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -821,8 +821,16 @@ static inline void mlxsw_reg_spms_vid_pack(char *payload, u16 vid, MLXSW_REG_DEFINE(spvid, MLXSW_REG_SPVID_ID, MLXSW_REG_SPVID_LEN); +/* reg_spvid_tport + * Port is tunnel port. + * Reserved when SwitchX/-2 or Spectrum-1. + * Access: Index + */ +MLXSW_ITEM32(reg, spvid, tport, 0x00, 24, 1); + /* reg_spvid_local_port - * Local port number. + * When tport = 0: Local port number. Not supported for CPU port. + * When tport = 1: Tunnel port. * Access: Index */ MLXSW_ITEM32(reg, spvid, local_port, 0x00, 16, 8); -- 2.28.0
[PATCH net-next 07/13] mlxsw: spectrum: Publish mlxsw_sp_ethtype_to_sver_type()
From: Amit Cohen Declare mlxsw_sp_ethtype_to_sver_type() in spectrum.h to enable using it in other files. It will be used in the next patch to map between EtherType and the relevant value configured by SVER register. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 2 +- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index 963eb0b1d9dd..df8175cd44ab 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -384,7 +384,7 @@ int mlxsw_sp_port_vid_learning_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 vid, return err; } -static int mlxsw_sp_ethtype_to_sver_type(u16 ethtype, u8 *p_sver_type) +int mlxsw_sp_ethtype_to_sver_type(u16 ethtype, u8 *p_sver_type) { switch (ethtype) { case ETH_P_8021Q: diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h index 7e728a8a9fb3..a6956cfc9cb1 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h @@ -584,6 +584,7 @@ int mlxsw_sp_port_vid_stp_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 vid, int mlxsw_sp_port_vp_mode_set(struct mlxsw_sp_port *mlxsw_sp_port, bool enable); int mlxsw_sp_port_vid_learning_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 vid, bool learn_enable); +int mlxsw_sp_ethtype_to_sver_type(u16 ethtype, u8 *p_sver_type); int mlxsw_sp_port_pvid_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 vid, u16 ethtype); struct mlxsw_sp_port_vlan * -- 2.28.0
[PATCH net-next 05/13] mlxsw: Save EtherType as part of mlxsw_sp_nve_params
From: Amit Cohen Add EtherType field to mlxsw_sp_nve_params struct. Set it when VxLAN device is added to bridge device. This field is needed to configure which EtherType will be used when VLAN is pushed at ingress of the tunnel port. Use ETH_P_8021Q for tunnel port enslaved to 802.1d and 802.1q bridges. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 + drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 7 +-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h index 6092243a69cb..7e728a8a9fb3 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h @@ -1202,6 +1202,7 @@ struct mlxsw_sp_nve_params { enum mlxsw_sp_nve_type type; __be32 vni; const struct net_device *dev; + u16 ethertype; }; extern const struct mlxsw_sp_nve_ops *mlxsw_sp1_nve_ops_arr[]; diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index c53e0ab9f971..051a77440afe 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -2055,7 +2055,8 @@ mlxsw_sp_bridge_8021q_port_leave(struct mlxsw_sp_bridge_device *bridge_device, static int mlxsw_sp_bridge_vlan_aware_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device, const struct net_device *vxlan_dev, - u16 vid, struct netlink_ext_ack *extack) + u16 vid, u16 ethertype, + struct netlink_ext_ack *extack) { struct mlxsw_sp *mlxsw_sp = mlxsw_sp_lower_get(bridge_device->dev); struct vxlan_dev *vxlan = netdev_priv(vxlan_dev); @@ -2063,6 +2064,7 @@ mlxsw_sp_bridge_vlan_aware_vxlan_join(struct mlxsw_sp_bridge_device *bridge_devi .type = MLXSW_SP_NVE_TYPE_VXLAN, .vni = vxlan->cfg.vni, .dev = vxlan_dev, + .ethertype = ethertype, }; struct mlxsw_sp_fid *fid; int err; @@ -2107,7 +2109,7 @@ mlxsw_sp_bridge_8021q_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device, struct netlink_ext_ack *extack) { return mlxsw_sp_bridge_vlan_aware_vxlan_join(bridge_device, vxlan_dev, -vid, extack); +vid, ETH_P_8021Q, extack); } static struct net_device * @@ -2240,6 +2242,7 @@ mlxsw_sp_bridge_8021d_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device, .type = MLXSW_SP_NVE_TYPE_VXLAN, .vni = vxlan->cfg.vni, .dev = vxlan_dev, + .ethertype = ETH_P_8021Q, }; struct mlxsw_sp_fid *fid; int err; -- 2.28.0
[PATCH net-next 09/13] mlxsw: spectrum_switchdev: Use ops->vxlan_join() when adding VLAN to VxLAN device
From: Amit Cohen Currently mlxsw_sp_switchdev_vxlan_vlan_add() always calls mlxsw_sp_bridge_8021q_vxlan_join() because VLANs were only ever added to a VLAN-filtering bridge, which is only 802.1q bridge. This set adds support for VxLAN with 802.1ad bridge, so VLAN-filtering bridge is not only 802.1q. Call ops->vxlan_join(), so mlxsw_sp_bridge_802{1q, 1ad}_vxlan_join() will be called according to bridge type. This is needed to ensure that VxLAN with 802.1ad bridge will be vetoed in Spectrum-1 with the next patch. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index 051a77440afe..73290f71eb9c 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -3320,8 +3320,8 @@ mlxsw_sp_switchdev_vxlan_vlan_add(struct mlxsw_sp *mlxsw_sp, if (!fid) { if (!flag_untagged || !flag_pvid) return 0; - return mlxsw_sp_bridge_8021q_vxlan_join(bridge_device, - vxlan_dev, vid, extack); + return bridge_device->ops->vxlan_join(bridge_device, vxlan_dev, + vid, extack); } /* Second case: FID is associated with the VNI and the VLAN associated @@ -3360,16 +3360,14 @@ mlxsw_sp_switchdev_vxlan_vlan_add(struct mlxsw_sp *mlxsw_sp, if (!flag_untagged) return 0; - err = mlxsw_sp_bridge_8021q_vxlan_join(bridge_device, vxlan_dev, vid, - extack); + err = bridge_device->ops->vxlan_join(bridge_device, vxlan_dev, vid, extack); if (err) goto err_vxlan_join; return 0; err_vxlan_join: - mlxsw_sp_bridge_8021q_vxlan_join(bridge_device, vxlan_dev, old_vid, -NULL); + bridge_device->ops->vxlan_join(bridge_device, vxlan_dev, old_vid, NULL); return err; } -- 2.28.0
[PATCH net-next 01/13] mlxsw: Use one enum for all registers that contain tunnel_port field
From: Amit Cohen Currently SFN, TNUMT and TNPC registers use separate enums for tunnel_port. Create one enum with a neutral name and use it. Remove the enums that are not currently required. The next patches add two more registers that contain tunnel_port field, the new enum can be used for them also. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/reg.h | 32 ++- .../ethernet/mellanox/mlxsw/spectrum_nve.c| 2 +- .../mellanox/mlxsw/spectrum_nve_vxlan.c | 2 +- 3 files changed, 11 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index 1077ed2046fe..0a3c5f89268c 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -581,6 +581,13 @@ mlxsw_reg_sfd_uc_tunnel_pack(char *payload, int rec_index, mlxsw_reg_sfd_uc_tunnel_protocol_set(payload, rec_index, proto); } +enum mlxsw_reg_tunnel_port { + MLXSW_REG_TUNNEL_PORT_NVE, + MLXSW_REG_TUNNEL_PORT_VPLS, + MLXSW_REG_TUNNEL_PORT_FLEX_TUNNEL0, + MLXSW_REG_TUNNEL_PORT_FLEX_TUNNEL1, +}; + /* SFN - Switch FDB Notification Register * --- * The switch provides notifications on newly learned FDB entries and @@ -738,13 +745,6 @@ MLXSW_ITEM32_INDEXED(reg, sfn, uc_tunnel_protocol, MLXSW_REG_SFN_BASE_LEN, 27, MLXSW_ITEM32_INDEXED(reg, sfn, uc_tunnel_uip_lsb, MLXSW_REG_SFN_BASE_LEN, 0, 24, MLXSW_REG_SFN_REC_LEN, 0x0C, false); -enum mlxsw_reg_sfn_tunnel_port { - MLXSW_REG_SFN_TUNNEL_PORT_NVE, - MLXSW_REG_SFN_TUNNEL_PORT_VPLS, - MLXSW_REG_SFN_TUNNEL_FLEX_TUNNEL0, - MLXSW_REG_SFN_TUNNEL_FLEX_TUNNEL1, -}; - /* reg_sfn_uc_tunnel_port * Tunnel port. * Reserved on Spectrum. @@ -10507,13 +10507,6 @@ enum mlxsw_reg_tnumt_record_type { */ MLXSW_ITEM32(reg, tnumt, record_type, 0x00, 28, 4); -enum mlxsw_reg_tnumt_tunnel_port { - MLXSW_REG_TNUMT_TUNNEL_PORT_NVE, - MLXSW_REG_TNUMT_TUNNEL_PORT_VPLS, - MLXSW_REG_TNUMT_TUNNEL_FLEX_TUNNEL0, - MLXSW_REG_TNUMT_TUNNEL_FLEX_TUNNEL1, -}; - /* reg_tnumt_tunnel_port * Tunnel port. * Access: RW @@ -10561,7 +10554,7 @@ MLXSW_ITEM32_INDEXED(reg, tnumt, udip_ptr, 0x0C, 0, 24, 0x04, 0x00, false); static inline void mlxsw_reg_tnumt_pack(char *payload, enum mlxsw_reg_tnumt_record_type type, - enum mlxsw_reg_tnumt_tunnel_port tport, + enum mlxsw_reg_tunnel_port tport, u32 underlay_mc_ptr, bool vnext, u32 next_underlay_mc_ptr, u8 record_size) @@ -10725,13 +10718,6 @@ static inline void mlxsw_reg_tndem_pack(char *payload, u8 underlay_ecn, MLXSW_REG_DEFINE(tnpc, MLXSW_REG_TNPC_ID, MLXSW_REG_TNPC_LEN); -enum mlxsw_reg_tnpc_tunnel_port { - MLXSW_REG_TNPC_TUNNEL_PORT_NVE, - MLXSW_REG_TNPC_TUNNEL_PORT_VPLS, - MLXSW_REG_TNPC_TUNNEL_FLEX_TUNNEL0, - MLXSW_REG_TNPC_TUNNEL_FLEX_TUNNEL1, -}; - /* reg_tnpc_tunnel_port * Tunnel port. * Access: Index @@ -10751,7 +10737,7 @@ MLXSW_ITEM32(reg, tnpc, learn_enable_v6, 0x04, 1, 1); MLXSW_ITEM32(reg, tnpc, learn_enable_v4, 0x04, 0, 1); static inline void mlxsw_reg_tnpc_pack(char *payload, - enum mlxsw_reg_tnpc_tunnel_port tport, + enum mlxsw_reg_tunnel_port tport, bool learn_enable) { MLXSW_REG_ZERO(tnpc, payload); diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c index 54d3e7dcd303..ed0d334b5fd1 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c @@ -368,7 +368,7 @@ mlxsw_sp_nve_mc_record_refresh(struct mlxsw_sp_nve_mc_record *mc_record) next_valid = true; } - mlxsw_reg_tnumt_pack(tnumt_pl, type, MLXSW_REG_TNUMT_TUNNEL_PORT_NVE, + mlxsw_reg_tnumt_pack(tnumt_pl, type, MLXSW_REG_TUNNEL_PORT_NVE, mc_record->kvdl_index, next_valid, next_kvdl_index, mc_record->num_entries); diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c index 05517c7feaa5..e9bff13ec264 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c @@ -299,7 +299,7 @@ static bool mlxsw_sp2_nve_vxlan_learning_set(struct mlxsw_sp *mlxsw_sp, { char tnpc_pl[MLXSW_REG_TNPC_LEN]; - mlxsw_reg_tnpc_pack(tnpc_pl, MLXSW_REG_TNPC_TUNNEL_PORT_NVE, + mlxsw_reg_tnpc_pack(tnpc_pl, M
[PATCH net-next 11/13] mlxsw: spectrum_switchdev: Allow joining VxLAN to 802.1ad bridge
From: Amit Cohen The previous patches added support for VxLAN device enslaved to 802.1ad bridge in Spectrum-2 ASIC and vetoed it in Spectrum-1. Do not veto VxLAN with 802.1ad bridge. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index 73290f71eb9c..cea42f6ed89b 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -2347,8 +2347,8 @@ mlxsw_sp_bridge_8021ad_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device, const struct net_device *vxlan_dev, u16 vid, struct netlink_ext_ack *extack) { - NL_SET_ERR_MSG_MOD(extack, "VXLAN is not supported with 802.1ad"); - return -EOPNOTSUPP; + return mlxsw_sp_bridge_vlan_aware_vxlan_join(bridge_device, vxlan_dev, +vid, ETH_P_8021AD, extack); } static const struct mlxsw_sp_bridge_ops mlxsw_sp_bridge_8021ad_ops = { -- 2.28.0
[PATCH net-next 08/13] mlxsw: spectrum_nve_vxlan: Add support for Q-in-VNI for Spectrum-2 ASIC
From: Amit Cohen On Spectrum-2, the default setting is not to push VLAN to the decapsulated packet. This is controlled by SPVTR.ipvid_mode. Set SPVTR.ipvid_mode to always push VLAN. Without this setting, Spectrum-2 overtakes the VLAN tag of decapsulated packet for bridging. In addition, set SPVID register to use EtherType saved in mlxsw_sp_nve_config when VLAN is pushed for the NVE tunnel. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- .../mellanox/mlxsw/spectrum_nve_vxlan.c | 42 +++ 1 file changed, 42 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c index f9a48a0109ff..b586c8f34d49 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c @@ -305,11 +305,30 @@ static bool mlxsw_sp2_nve_vxlan_learning_set(struct mlxsw_sp *mlxsw_sp, return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tnpc), tnpc_pl); } +static int +mlxsw_sp2_nve_decap_ethertype_set(struct mlxsw_sp *mlxsw_sp, u16 ethertype) +{ + char spvid_pl[MLXSW_REG_SPVID_LEN] = {}; + u8 sver_type; + int err; + + mlxsw_reg_spvid_tport_set(spvid_pl, true); + mlxsw_reg_spvid_local_port_set(spvid_pl, + MLXSW_REG_TUNNEL_PORT_NVE); + err = mlxsw_sp_ethtype_to_sver_type(ethertype, &sver_type); + if (err) + return err; + + mlxsw_reg_spvid_et_vlan_set(spvid_pl, sver_type); + return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(spvid), spvid_pl); +} + static int mlxsw_sp2_nve_vxlan_config_set(struct mlxsw_sp *mlxsw_sp, const struct mlxsw_sp_nve_config *config) { char tngcr_pl[MLXSW_REG_TNGCR_LEN]; + char spvtr_pl[MLXSW_REG_SPVTR_LEN]; u16 ul_rif_index; int err; @@ -330,8 +349,25 @@ mlxsw_sp2_nve_vxlan_config_set(struct mlxsw_sp *mlxsw_sp, if (err) goto err_tngcr_write; + mlxsw_reg_spvtr_pack(spvtr_pl, true, MLXSW_REG_TUNNEL_PORT_NVE, +MLXSW_REG_SPVTR_IPVID_MODE_ALWAYS_PUSH_VLAN); + err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(spvtr), spvtr_pl); + if (err) + goto err_spvtr_write; + + err = mlxsw_sp2_nve_decap_ethertype_set(mlxsw_sp, config->ethertype); + if (err) + goto err_decap_ethertype_set; + return 0; +err_decap_ethertype_set: + mlxsw_reg_spvtr_pack(spvtr_pl, true, MLXSW_REG_TUNNEL_PORT_NVE, +MLXSW_REG_SPVTR_IPVID_MODE_IEEE_COMPLIANT_PVID); + mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(spvtr), spvtr_pl); +err_spvtr_write: + mlxsw_reg_tngcr_pack(tngcr_pl, MLXSW_REG_TNGCR_TYPE_VXLAN, false, 0); + mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tngcr), tngcr_pl); err_tngcr_write: mlxsw_sp2_nve_vxlan_learning_set(mlxsw_sp, false); err_vxlan_learning_set: @@ -341,8 +377,14 @@ mlxsw_sp2_nve_vxlan_config_set(struct mlxsw_sp *mlxsw_sp, static void mlxsw_sp2_nve_vxlan_config_clear(struct mlxsw_sp *mlxsw_sp) { + char spvtr_pl[MLXSW_REG_SPVTR_LEN]; char tngcr_pl[MLXSW_REG_TNGCR_LEN]; + /* Set default EtherType */ + mlxsw_sp2_nve_decap_ethertype_set(mlxsw_sp, ETH_P_8021Q); + mlxsw_reg_spvtr_pack(spvtr_pl, true, MLXSW_REG_TUNNEL_PORT_NVE, +MLXSW_REG_SPVTR_IPVID_MODE_IEEE_COMPLIANT_PVID); + mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(spvtr), spvtr_pl); mlxsw_reg_tngcr_pack(tngcr_pl, MLXSW_REG_TNGCR_TYPE_VXLAN, false, 0); mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tngcr), tngcr_pl); mlxsw_sp2_nve_vxlan_learning_set(mlxsw_sp, false); -- 2.28.0
[PATCH net-next 10/13] mlxsw: Veto Q-in-VNI for Spectrum-1 ASIC
From: Amit Cohen Implementation of Q-in-VNI is different between ASIC types, this set adds support only for Spectrum-2. Return an error when trying to create VxLAN device and enslave it to 802.1ad bridge in Spectrum-1. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- .../net/ethernet/mellanox/mlxsw/spectrum_nve.c | 2 +- .../net/ethernet/mellanox/mlxsw/spectrum_nve.h | 2 +- .../mellanox/mlxsw/spectrum_nve_vxlan.c| 18 +++--- 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c index adf499665f87..e5ec595593f4 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c @@ -798,7 +798,7 @@ int mlxsw_sp_nve_fid_enable(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_fid *fid, ops = nve->nve_ops_arr[params->type]; - if (!ops->can_offload(nve, params->dev, extack)) + if (!ops->can_offload(nve, params, extack)) return -EINVAL; memset(&config, 0, sizeof(config)); diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h index 68bd9422be2a..2796d3659979 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h @@ -36,7 +36,7 @@ struct mlxsw_sp_nve { struct mlxsw_sp_nve_ops { enum mlxsw_sp_nve_type type; bool (*can_offload)(const struct mlxsw_sp_nve *nve, - const struct net_device *dev, + const struct mlxsw_sp_nve_params *params, struct netlink_ext_ack *extack); void (*nve_config)(const struct mlxsw_sp_nve *nve, const struct mlxsw_sp_nve_params *params, diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c index b586c8f34d49..3e2bb22e9ca6 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c @@ -22,10 +22,10 @@ VXLAN_F_LEARN) static bool mlxsw_sp_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve, - const struct net_device *dev, + const struct mlxsw_sp_nve_params *params, struct netlink_ext_ack *extack) { - struct vxlan_dev *vxlan = netdev_priv(dev); + struct vxlan_dev *vxlan = netdev_priv(params->dev); struct vxlan_config *cfg = &vxlan->cfg; if (cfg->saddr.sa.sa_family != AF_INET) { @@ -86,6 +86,18 @@ static bool mlxsw_sp_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve, return true; } +static bool mlxsw_sp1_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve, + const struct mlxsw_sp_nve_params *params, + struct netlink_ext_ack *extack) +{ + if (params->ethertype == ETH_P_8021AD) { + NL_SET_ERR_MSG_MOD(extack, "VxLAN: 802.1ad bridge is not supported with VxLAN"); + return false; + } + + return mlxsw_sp_nve_vxlan_can_offload(nve, params, extack); +} + static void mlxsw_sp_nve_vxlan_config(const struct mlxsw_sp_nve *nve, const struct mlxsw_sp_nve_params *params, struct mlxsw_sp_nve_config *config) @@ -287,7 +299,7 @@ mlxsw_sp_nve_vxlan_clear_offload(const struct net_device *nve_dev, __be32 vni) const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops = { .type = MLXSW_SP_NVE_TYPE_VXLAN, - .can_offload= mlxsw_sp_nve_vxlan_can_offload, + .can_offload= mlxsw_sp1_nve_vxlan_can_offload, .nve_config = mlxsw_sp_nve_vxlan_config, .init = mlxsw_sp1_nve_vxlan_init, .fini = mlxsw_sp1_nve_vxlan_fini, -- 2.28.0
[PATCH net-next 13/13] selftests: mlxsw: Add Q-in-VNI veto tests
From: Amit Cohen Add tests to ensure that the forbidden and unsupported cases are indeed vetoed by mlxsw driver. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- .../net/mlxsw/spectrum-2/q_in_vni_veto.sh | 77 +++ .../net/mlxsw/spectrum/q_in_vni_veto.sh | 66 2 files changed, 143 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh create mode 100755 tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh new file mode 100755 index ..0231205a7147 --- /dev/null +++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh @@ -0,0 +1,77 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +lib_dir=$(dirname $0)/../../../../net/forwarding + +VXPORT=4789 + +ALL_TESTS=" + create_dot1d_and_dot1ad_vxlans +" +NUM_NETIFS=2 +source $lib_dir/lib.sh + +setup_prepare() +{ + swp1=${NETIFS[p1]} + swp2=${NETIFS[p2]} + + ip link set dev $swp1 up + ip link set dev $swp2 up +} + +cleanup() +{ + pre_cleanup + + ip link set dev $swp2 down + ip link set dev $swp1 down +} + +create_dot1d_and_dot1ad_vxlans() +{ + RET=0 + + ip link add dev br0 type bridge vlan_filtering 1 vlan_protocol 802.1ad \ + vlan_default_pvid 0 mcast_snooping 0 + ip link set dev br0 up + + ip link add name vx100 type vxlan id 1000 local 192.0.2.17 dstport \ + "$VXPORT" nolearning noudpcsum tos inherit ttl 100 + ip link set dev vx100 up + + ip link set dev $swp1 master br0 + ip link set dev vx100 master br0 + bridge vlan add vid 100 dev vx100 pvid untagged + + ip link add dev br1 type bridge vlan_filtering 0 mcast_snooping 0 + ip link set dev br1 up + + ip link add name vx200 type vxlan id 2000 local 192.0.2.17 dstport \ + "$VXPORT" nolearning noudpcsum tos inherit ttl 100 + ip link set dev vx200 up + + ip link set dev $swp2 master br1 + ip link set dev vx200 master br1 2>/dev/null + check_fail $? "802.1d and 802.1ad VxLANs at the same time not rejected" + + ip link set dev vx200 master br1 2>&1 >/dev/null \ + | grep -q mlxsw_spectrum + check_err $? "802.1d and 802.1ad VxLANs at the same time rejected without extack" + + log_test "create 802.1d and 802.1ad VxLANs" + + ip link del dev vx200 + ip link del dev br1 + ip link del dev vx100 + ip link del dev br0 +} + +trap cleanup EXIT + +setup_prepare +setup_wait + +tests_run + +exit $EXIT_STATUS diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh b/tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh new file mode 100755 index ..f0443b1b05b9 --- /dev/null +++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh @@ -0,0 +1,66 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +lib_dir=$(dirname $0)/../../../../net/forwarding + +VXPORT=4789 + +ALL_TESTS=" + create_vxlan_on_top_of_8021ad_bridge +" +NUM_NETIFS=2 +source $lib_dir/lib.sh + +setup_prepare() +{ + swp1=${NETIFS[p1]} + swp2=${NETIFS[p2]} + + ip link set dev $swp1 up + ip link set dev $swp2 up +} + +cleanup() +{ + pre_cleanup + + ip link set dev $swp2 down + ip link set dev $swp1 down +} + +create_vxlan_on_top_of_8021ad_bridge() +{ + RET=0 + + ip link add dev br0 type bridge vlan_filtering 1 vlan_protocol 802.1ad \ + vlan_default_pvid 0 mcast_snooping 0 + ip link set dev br0 up + + ip link add name vx100 type vxlan id 1000 local 192.0.2.17 dstport \ + "$VXPORT" nolearning noudpcsum tos inherit ttl 100 + ip link set dev vx100 up + + ip link set dev $swp1 master br0 + ip link set dev vx100 master br0 + + bridge vlan add vid 100 dev vx100 pvid untagged 2>/dev/null + check_fail $? "802.1ad bridge with VxLAN in Spectrum-1 not rejected" + + bridge vlan add vid 100 dev vx100 pvid untagged 2>&1 >/dev/null \ + | grep -q mlxsw_spectrum + check_err $? "802.1ad bridge with VxLAN in Spectrum-1 rejected without extack" + + log_test "create VxLAN on top of 802.1ad bridge" + + ip link del dev vx100 + ip link del dev br0 +} + +trap cleanup EXIT + +setup_prepare +setup_wait + +tests_run + +exit $EXIT_STATUS -- 2.28.0
[PATCH net-next 12/13] selftests: forwarding: Add Q-in-VNI test
From: Petr Machata Add test to check Q-in-VNI traffic. Signed-off-by: Petr Machata Signed-off-by: Ido Schimmel --- .../selftests/net/forwarding/q_in_vni.sh | 347 ++ 1 file changed, 347 insertions(+) create mode 100755 tools/testing/selftests/net/forwarding/q_in_vni.sh diff --git a/tools/testing/selftests/net/forwarding/q_in_vni.sh b/tools/testing/selftests/net/forwarding/q_in_vni.sh new file mode 100755 index ..4c50c0234bce --- /dev/null +++ b/tools/testing/selftests/net/forwarding/q_in_vni.sh @@ -0,0 +1,347 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# +---+ ++ +# | H1 (vrf) | | H2 (vrf) | +# | + $h1.10 | | + $h2.10 | +# | | 192.0.2.1/28 | | | 192.0.2.2/28| +# | || | | | +# | | + $h1.20 | | | + $h2.20| +# | \ | 198.51.100.1/24 | | \ | 198.51.100.2/24 | +# | \| | | \| | +# |+ $h1 | |+ $h2 | +# +|--+ +|---+ +# | | +# +|--|---+ +# | SW | | | +# | +--|--|-+ | +# | | + $swp1 BR1 (802.1ad)+ $swp2 | | +# | |vid 100 pvid untagged vid 100 pvid| | +# | | untagged| | +# | | + vx100 (vxlan) | | +# | |local 192.0.2.17 | | +# | |remote 192.0.2.34 192.0.2.50 | | +# | |id 1000 dstport $VXPORT| | +# | |vid 100 pvid untagged | | +# | +---+ | +# | | +# | 192.0.2.32/28 via 192.0.2.18 | +# | 192.0.2.48/28 via 192.0.2.18 | +# | | +# |+ $rp1 | +# || 192.0.2.17/28| +# +|--+ +# | +# +|+ +# || VRP2 (vrf) | +# |+ $rp2 | +# | 192.0.2.18/28 | +# | | (maybe) HW +# = +# | | (likely) SW +# |+ v1 (veth) + v3 (veth) | +# || 192.0.2.33/28 | 192.0.2.49/28 | +# +|---|+ +# | | +# +|--+ +|--+ +# |+ v2 (veth)NS1 (netns) | |+ v4 (veth)NS2 (netns) | +# | 192.0.2.34/28| | 192.0.2.50/28| +# | | | | +# | 192.0.2.16/28 via 192.0.2.33| | 192.0.2.16/28 via 192.0.2.49| +# | 192.0.2.50/32 via 192.0.2.33| | 192.0.2.34/32 via 192.0.2.49| +# | | | | +# | +---+ | | +---+ | +# | | BR2 (802.1ad) | | | | BR2 (802.1ad) | | +# | | + vx100 (vxlan) | | | | + vx100 (vxlan) | | +# | |local 192.0.2.34 | | | |local 192.0.2.50 | | +# | |remote 192.0.2.17 | | | |remote 192.0.2.17 | | +# | |remote 192.0.2.50 | | | |remote 192.0.2.34 | | +# | |id 1000 dstport $VXPORT| | | |id 1000 dstport $VXPORT| | +# | |vid 100 pvid untagged | | | |vid 100 pvid untagged
Re: [PATCH v3 0/7] Improve s0ix flows for systems i219LM
Hi, On 12/8/20 6:08 AM, Neftin, Sasha wrote: > On 12/7/2020 17:41, Limonciello, Mario wrote: >>> First of all thank you for working on this. >>> >>> I must say though that I don't like the approach taken here very >>> much. >>> >>> This is not so much a criticism of this series as it is a criticism >>> of the earlier decision to simply disable s0ix on all devices >>> with the i219-LM + and active ME. >> >> I was not happy with that decision either as it did cause regressions >> on all of the "named" Comet Lake laptops that were in the market at >> the time. The "unnamed" ones are not yet released, and I don't feel >> it's fair to call it a regression on "unreleased" hardware. >> >>> >>> AFAIK there was a perfectly acceptable patch to workaround those >>> broken devices, which increased a timeout: >>> https://patchwork.ozlabs.org/project/intel-wired- >>> lan/patch/20200323191639.48826-1-aaron...@canonical.com/ >>> >>> That patch was nacked because it increased the resume time >>> *on broken devices*. >>> > Officially CSME/ME not POR for Linux and we haven't interfrace to the ME. > Nobody can tell how long (and why) ME will hold PHY access semaphore ant just > increasing the resuming time (ULP configure) won't be solve the problem. This > is not reliable approach. > I would agree users can add ME system on their responsibilities. It is not clear to me what you are trying to say here. Are you saying that you insist on keeping the e1000e_check_me check and thus needlessly penalizing 100s of laptops models with higher power-consumption unless these 100s of laptops are added manually to an allow list for this? I'm sorry but that is simply unacceptable, the maintenance burden of that is just way too high. Testing on the models where the timeout issue was first hit has shown that increasing the timeout does actually fix it on those models. Sure in theory the ME on some buggy model could hold the semaphore even longer, but then the right thing would be to have a deny-list for s0ix where we can add those buggy models (none of which we have encountered sofar). Just like we have denylist for buggy hw in other places in the kernel. Maintaining an ever growing allow list for the *theoretical* case of encountering a model where things do not work with the increased timeout is not a workable and this not an acceptable solution. The initial addition of the e1000e_check_me check instead of just going with the confirmed fix of bumping the timeout was already highly controversial and should IMHO never have been done. Combining this with an ever-growing allow-list on which every new laptop model needs to be added separately + a new "s0ix-enabled" ethertool flag, which existence is basically an admission that the allow-list approach is flawed goes from controversial to just plain not acceptable. Regards, Hans >>> So it seems to me that we have a simple choice here: >>> >>> 1. Longer resume time on devices with an improperly configured ME >>> 2. Higher power-consumption on all non-buggy devices >>> >>> Your patches 4-7 try to workaround 2. but IMHO those are just >>> bandaids for getting the initial priorities *very* wrong. >> >> They were done based upon the discussion in that thread you linked and >> others. >> If the owners of this driver feel it's possible/scalable to follow your >> proposal >> I'm happy to resubmit a new v4 series with these sets of patches: >> >> 1) Fixup for the exception corner case referenced in this thread >> 2) Patch 1 from this series that fixes cable connected case >> 3) Increase the timeout (from your referenced link) >> 4) Revert the ME disallow list >> >>> >>> Instead of penalizing non-buggy devices with a higher power-consumption, >>> we should default to penalizing the buggy devices with a higher >>> resume time. And if it is decided that the higher resume time is >>> a worse problem then the higher power-consumption, then there >>> should be a list of broken devices and s0ix can be disabled on those. >> >> I'm perfectly happy either way, my primary goal is that Dell's notebooks and >> desktops that meet the architectural and firmware guidelines for appropriate >> low power consumption over s0ix are not penalized. >> >>> >>> The current allow-list approach is simply never going to work well >>> leading to too high power-consumption on countless devices. >>> This is going to be an endless game of whack-a-mole and as >>> such really is a bad idea. >> >> I envisioned that it would evolve over time. For example if by the time Dell >> finished shipping new CML models it was deemed that all the CML hardware was >> done >> properly it could instead by an allow list of Dell + Comet Point. >> If all of Tiger Lake are done properly 'maybe' by the time the ML ships >> maybe it >> could be an allow list of Dell + CML or newer. >> >> But even if the heuristic changed - this particular configuration needs to >> be tested >> on every single new model. All of the notebooks that have a T
Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
On 12/8/20 10:00 AM, Jesper Dangaard Brouer wrote: On Mon, 07 Dec 2020 12:52:22 -0800 John Fastabend wrote: Use-case(1): Cloud-provider want to give customers (running VMs) ability to load XDP program for DDoS protection (only), but don't want to allow customer to use XDP_TX (that can implement LB or cheat their VM isolation policy). Not following. What interface do they want to allow loading on? If its the VM interface then I don't see how it matters. From outside the VM there should be no way to discover if its done in VM or in tc or some other stack. If its doing some onloading/offloading I would assume they need to ensure the isolation, etc. is still maintained because you can't let one VMs program work on other VMs packets safely. So what did I miss, above doesn't make sense to me. The Cloud-provider want to load customer provided BPF-code on the physical Host-OS NIC (that support XDP). The customer can get access to a web-interface where they can write or upload their BPF-prog. As multiple customers can upload BPF-progs, the Cloud-provider have to write a BPF-prog dispatcher that runs these multiple program. This could be done via BPF tail-calls, or via Toke's libxdp[1], or via devmap XDP-progs per egress port. The Cloud-provider don't fully trust customers BPF-prog. They already pre-filtered traffic to the given VM, so they can allow customers freedom to see traffic and do XDP_PASS and XDP_DROP. They administratively (via ethtool) want to disable the XDP_REDIRECT and XDP_TX driver feature, as it can be used for violation their VM isolation policy between customers. Is the use-case more clear now? I think we're talking about two different things. The use case as I understood it in (1) mentioned to be able to disable XDP_TX for NICs that are deployed in the VM. This would be a no-go as-is since that would mean my basic assumption for attaching XDP progs is gone in that today return codes pass/drop/tx is pretty much available everywhere on native XDP supported NICs. And if you've tried it on major cloud providers like AWS or Azure that offer SRIOV-based networking that works okay and further restricting this would mean breakage of existing programs. What you mean here is "offload" from guest to host which is a different use case than what likely John and I read from your description in (1). Such program should then be loaded via BPF offload API. Meaning, if offload is used and the host is then configured to disallow XDP_TX for such requests from guests, then these get rejected through such facility, but if the /same/ program was loaded as regular native XDP where it's still running in the guest, then it must succeed. These are two entirely different things. It's not clear to me whether some ethtool XDP properties flag is the right place to describe this (plus this needs to differ between offloaded / non-offloaded progs) or whether this should be an implementation detail for things like virtio_net e.g. via virtio_has_feature(). Feels more like the latter to me which already has such a facility in place.
[PATCH 0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)'
From: SeongJae Park On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls make the number of active slab objects including 'sock_inode_cache' type rapidly and continuously increase. As a result, memory pressure occurs. 'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the relevant memory objects. They are asynchronously invoked by the work queues and internally use 'rcu_barrier()' to ensure safe destructions. 'cleanup_net()' works in a batched maneer in a single thread worker, while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the 'system_wq'. Therefore, 'fqdir_work_fn()' called frequently under the workload and made the contention for 'rcu_barrier()' high. In more detail, the global mutex, 'rcu_state.barrier_mutex' became the bottleneck. I tried making 'fqdir_work_fn()' batched and confirmed it works. The following patch is for the change. I think this is the right solution for point fix of this issue, but someone might blame different parts. 1. User: Frequent 'unshare()' calls >From some point of view, such frequent 'unshare()' calls might seem only insane. 2. Global mutex in 'rcu_barrier()' Because of the global mutex, 'rcu_barrier()' callers could wait long even after the callbacks started before the call finished. Therefore, similar issues could happen in another 'rcu_barrier()' usages. Maybe we can use some wait queue like mechanism to notify the waiters when the desired time came. I personally believe applying the point fix for now and making 'rcu_barrier()' improvement in longterm make sense. If I'm missing something or you have different opinions, please feel free to let me know. SeongJae Park (1): net/ipv4/inet_fragment: Batch fqdir destroy works include/net/inet_frag.h | 2 +- net/ipv4/inet_fragment.c | 28 2 files changed, 21 insertions(+), 9 deletions(-) -- 2.17.1
[PATCH 1/1] net/ipv4/inet_fragment: Batch fqdir destroy works
From: SeongJae Park In 'fqdir_exit()', a work for destruction of the 'fqdir' is enqueued. The work function, 'fqdir_work_fn()', calls 'rcu_barrier()'. In case of intensive 'fqdir_exit()' (e.g., frequent 'unshare(CLONE_NEWNET)' systemcalls), this increased contention could result in unacceptably high latency of 'rcu_barrier()'. This commit avoids such contention by doing the destruction in batched manner, as similar to that of 'cleanup_net()'. Signed-off-by: SeongJae Park --- include/net/inet_frag.h | 2 +- net/ipv4/inet_fragment.c | 28 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index bac79e817776..558893d8810c 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -20,7 +20,7 @@ struct fqdir { /* Keep atomic mem on separate cachelines in structs that include it */ atomic_long_t mem cacheline_aligned_in_smp; - struct work_struct destroy_work; + struct llist_node destroy_list; }; /** diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 10d31733297d..796b559137c5 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -145,12 +145,19 @@ static void inet_frags_free_cb(void *ptr, void *arg) inet_frag_destroy(fq); } +static LLIST_HEAD(destroy_list); + static void fqdir_work_fn(struct work_struct *work) { - struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work); - struct inet_frags *f = fqdir->f; + struct llist_node *kill_list; + struct fqdir *fqdir; + struct inet_frags *f; + + /* Atomically snapshot the list of fqdirs to destroy */ + kill_list = llist_del_all(&destroy_list); - rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); + llist_for_each_entry(fqdir, kill_list, destroy_list) + rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); /* We need to make sure all ongoing call_rcu(..., inet_frag_destroy_rcu) * have completed, since they need to dereference fqdir. @@ -158,10 +165,13 @@ static void fqdir_work_fn(struct work_struct *work) */ rcu_barrier(); - if (refcount_dec_and_test(&f->refcnt)) - complete(&f->completion); + llist_for_each_entry(fqdir, kill_list, destroy_list) { + f = fqdir->f; + if (refcount_dec_and_test(&f->refcnt)) + complete(&f->completion); - kfree(fqdir); + kfree(fqdir); + } } int fqdir_init(struct fqdir **fqdirp, struct inet_frags *f, struct net *net) @@ -184,10 +194,12 @@ int fqdir_init(struct fqdir **fqdirp, struct inet_frags *f, struct net *net) } EXPORT_SYMBOL(fqdir_init); +static DECLARE_WORK(fqdir_destroy_work, fqdir_work_fn); + void fqdir_exit(struct fqdir *fqdir) { - INIT_WORK(&fqdir->destroy_work, fqdir_work_fn); - queue_work(system_wq, &fqdir->destroy_work); + if (llist_add(&fqdir->destroy_list, &destroy_list)) + queue_work(system_wq, &fqdir_destroy_work); } EXPORT_SYMBOL(fqdir_exit); -- 2.17.1
Re: [PATCH v5 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame
On Mon, 07 Dec 2020 22:49:55 -0800 Saeed Mahameed wrote: > On Mon, 2020-12-07 at 19:16 -0800, Alexander Duyck wrote: > > On Mon, Dec 7, 2020 at 3:03 PM Saeed Mahameed > > wrote: > > > On Mon, 2020-12-07 at 13:16 -0800, Alexander Duyck wrote: > > > > On Mon, Dec 7, 2020 at 8:36 AM Lorenzo Bianconi < > > > > lore...@kernel.org> > > > > wrote: > > > > > Introduce multi-buffer bit (mb) in xdp_frame/xdp_buffer data > > > > > structure > > > > > in order to specify if this is a linear buffer (mb = 0) or a > > > > > multi- > > > > > buffer > > > > > frame (mb = 1). In the latter case the shared_info area at the > > > > > end > > > > > of the > > > > > first buffer is been properly initialized to link together > > > > > subsequent > > > > > buffers. > > > > > > > > > > Signed-off-by: Lorenzo Bianconi > > > > > --- > > > > > include/net/xdp.h | 8 ++-- > > > > > net/core/xdp.c| 1 + > > > > > 2 files changed, 7 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/include/net/xdp.h b/include/net/xdp.h > > > > > index 700ad5db7f5d..70559720ff44 100644 > > > > > --- a/include/net/xdp.h > > > > > +++ b/include/net/xdp.h > > > > > @@ -73,7 +73,8 @@ struct xdp_buff { > > > > > void *data_hard_start; > > > > > struct xdp_rxq_info *rxq; > > > > > struct xdp_txq_info *txq; > > > > > - u32 frame_sz; /* frame size to deduce > > > > > data_hard_end/reserved tailroom*/ > > > > > + u32 frame_sz:31; /* frame size to deduce > > > > > data_hard_end/reserved tailroom*/ > > > > > + u32 mb:1; /* xdp non-linear buffer */ > > > > > }; > > > > > > > > > > > > > If we are really going to do something like this I say we should > > > > just > > > > rip a swath of bits out instead of just grabbing one. We are > > > > already > > > > cutting the size down then we should just decide on the minimum > > > > size > > > > that is acceptable and just jump to that instead of just stealing > > > > one > > > > bit at a time. It looks like we already have differences between > > > > the > > > > size here and frame_size in xdp_frame. > > > > > > > > > > +1 > > > > > > > If we have to steal a bit why not look at something like one of > > > > the > > > > lower 2/3 bits in rxq? You could then do the same thing using > > > > dev_rx > > > > in a similar fashion instead of stealing from a bit that is > > > > likely to > > > > be used in multiple spots and modifying like this adds extra > > > > overhead > > > > to? > > > > > > > > > > What do you mean in rxq ? from the pointer ? > > > > Yeah, the pointers have a few bits that are guaranteed 0 and in my > > mind reusing the lower bits from a 4 or 8 byte aligned pointer would > > make more sense then stealing the upper bits from the size of the > > frame. > > Ha, i can't imagine how accessing that pointer would look like .. > is possible to define the pointer as a bit-field and just access it > normally ? or do we need to fix it up every time we need to access it ? > will gcc/static checkers complain about wrong pointer type ? This is a pattern that is used all over the kernel. Yes, it needs to be fixed it up every time we access it. In this case, we don't want to to deploy this trick. For two reason, (1) rxq is accessed by BPF byte-code rewrite (which would also need to handle masking out the bit), (2) this optimization is trading CPU cycles for saving space. IIRC Alexei have already pointed out that the change to struct xdp_buff looks suboptimal. Why don't you simply add a u8 with the info. The general point is that struct xdp_buff layout is for fast access, and struct xdp_frame is a state compressed version of xdp_buff. (Still room in xdp_buff is limited to 64 bytes - one cacheline, which is rather close according to pahole) Thus, it is more okay to do these bit tricks in struct xdp_frame. For xdp_frame, it might be better to take some room/space from the member 'mem' (struct xdp_mem_info). (Would it help later that multi-buffer bit is officially part of struct xdp_mem_info, when later freeing the memory backing the frame?) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer $ pahole -C xdp_buff struct xdp_buff { void * data; /* 0 8 */ void * data_end; /* 8 8 */ void * data_meta;/*16 8 */ void * data_hard_start; /*24 8 */ struct xdp_rxq_info * rxq; /*32 8 */ struct xdp_txq_info * txq; /*40 8 */ u32frame_sz; /*48 4 */ /* size: 56, cachelines: 1, members: 7 */ /* padding: 4 */ /* last cacheline: 56 bytes */ }; $ pahole -C xdp_frame struct xdp_frame { void *
Re: [PATCH 2/7] net: batman-adv: remove unneeded MODULE_VERSION() usage
On Tuesday, 8 December 2020 08:48:56 CET Enrico Weigelt, metux IT consult wrote: > > Is there some explanation besides an opinion? Some kind goal which you want > > to > > achieve with it maybe? > > Just a cleanup. I've been under the impression that this version is just > an relic from oot times. There are various entities which are loving to use the distro kernel and replace the batman-adv module with a backport from a newer kernel version. Similar to what is done in OpenWrt for the wifi drivers. > > At least for us it was an easy way to query the release cycle information > > via > > batctl. Which made it easier for us to roughly figure out what an reporter/ > > inquirer was using - independent of whether he is using the in-kernel > > version > > or a backported version. > > Is the OOT scenario still valid ? Since the backport is OOT - yes, it is still valid. Kind regards, Sven signature.asc Description: This is a digitally signed message part.
Re: [PATCH RFC] ethernet: stmmac: clean up the code for release/suspend/resume function
On Mon, 7 Dec 2020 19:38:49 +0800 Joakim Zhang wrote: > > commit 1c35cc9cf6a0 ("net: stmmac: remove redundant null check before > clk_disable_unprepare()"), > have not clean up check NULL clock parameter completely, this patch did it. > > commit e8377e7a29efb ("net: stmmac: only call pmt() during suspend/resume if > HW enables PMT"), > after this patch, we use > if (device_may_wakeup(priv->device) && priv->plat->pmt) check MAC wakeup > if (device_may_wakeup(priv->device)) check PHY wakeup > Add oneline comment for readability. > > commit 77b2898394e3b ("net: stmmac: Speed down the PHY if WoL to save > energy"), > slow down phy speed when release net device under any condition. > > Slightly adjust the order of the codes so that suspend/resume look more > symmetrical, generally speaking they should appear symmetrically. > > Signed-off-by: Joakim Zhang > --- > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 +-- > 1 file changed, 10 insertions(+), 12 deletions(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index c33db79cdd0a..a46e865c4acc 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -2908,8 +2908,7 @@ static int stmmac_release(struct net_device *dev) > struct stmmac_priv *priv = netdev_priv(dev); > u32 chan; > > - if (device_may_wakeup(priv->device)) This check is to prevent link speed down if the stmmac isn't a wakeup device. > - phylink_speed_down(priv->phylink, false); > + phylink_speed_down(priv->phylink, false); > /* Stop and disconnect the PHY */ > phylink_stop(priv->phylink); > phylink_disconnect_phy(priv->phylink); > @@ -5183,6 +5182,7 @@ int stmmac_suspend(struct device *dev) > } else { > mutex_unlock(&priv->lock); > rtnl_lock(); > + /* For PHY wakeup case */ > if (device_may_wakeup(priv->device)) > phylink_speed_down(priv->phylink, false); > phylink_stop(priv->phylink); > @@ -5260,11 +5260,17 @@ int stmmac_resume(struct device *dev) > /* enable the clk previously disabled */ > clk_prepare_enable(priv->plat->stmmac_clk); > clk_prepare_enable(priv->plat->pclk); > - if (priv->plat->clk_ptp_ref) > - clk_prepare_enable(priv->plat->clk_ptp_ref); > + clk_prepare_enable(priv->plat->clk_ptp_ref); I think this 3 line modifications can be a separated patch. > /* reset the phy so that it's ready */ > if (priv->mii) > stmmac_mdio_reset(priv->mii); > + > + rtnl_lock(); > + phylink_start(priv->phylink); > + /* We may have called phylink_speed_down before */ > + if (device_may_wakeup(priv->device)) > + phylink_speed_up(priv->phylink); > + rtnl_unlock(); This is moving phylink op before mac setup, I'm not sure whether this is safe. > } > > if (priv->plat->serdes_powerup) { > @@ -5275,14 +5281,6 @@ int stmmac_resume(struct device *dev) > return ret; > } > > - if (!device_may_wakeup(priv->device) || !priv->plat->pmt) { > - rtnl_lock(); > - phylink_start(priv->phylink); > - /* We may have called phylink_speed_down before */ > - phylink_speed_up(priv->phylink); > - rtnl_unlock(); > - } > - > rtnl_lock(); > mutex_lock(&priv->lock); > > -- > 2.17.1 >
Re: [PATCH v5 bpf-next 02/14] xdp: initialize xdp_buff mb bit to 0 in all XDP drivers
> On Mon, 2020-12-07 at 22:37 +0100, Maciej Fijalkowski wrote: > > On Mon, Dec 07, 2020 at 01:15:00PM -0800, Alexander Duyck wrote: > > > On Mon, Dec 7, 2020 at 8:36 AM Lorenzo Bianconi > > > wrote: > > > > Initialize multi-buffer bit (mb) to 0 in all XDP-capable drivers. > > > > This is a preliminary patch to enable xdp multi-buffer support. > > > > > > > > Signed-off-by: Lorenzo Bianconi > > > > > > I'm really not a fan of this design. Having to update every driver > > > in > > > order to initialize a field that was fragmented is a pain. At a > > > minimum it seems like it might be time to consider introducing some > > > sort of initializer function for this so that you can update things > > > in > > > one central place the next time you have to add a new field instead > > > of > > > having to update every individual driver that supports XDP. > > > Otherwise > > > this isn't going to scale going forward. > > > > Also, a good example of why this might be bothering for us is a fact > > that > > in the meantime the dpaa driver got XDP support and this patch hasn't > > been > > updated to include mb setting in that driver. > > > something like > init_xdp_buff(hard_start, headroom, len, frame_sz, rxq); > > would work for most of the drivers. > ack, agree. I will add init_xdp_buff() in v6. Regards, Lorenzo signature.asc Description: PGP signature
Re: [PATCHv3 bpf-next] samples/bpf: add xdp program on egress for xdp_redirect_map
On Tue, 8 Dec 2020 16:18:56 +0800 Hangbin Liu wrote: > This patch add a xdp program on egress to show that we can modify > the packet on egress. In this sample we will set the pkt's src > mac to egress's mac address. The xdp_prog will be attached when > -X option supplied. > > Signed-off-by: Hangbin Liu > --- > v3: > a) modify the src mac address based on egress mac > > v2: > a) use pkt counter instead of IP ttl modification on egress program > b) make the egress program selectable by option -X > --- > samples/bpf/xdp_redirect_map_kern.c | 60 ++- > samples/bpf/xdp_redirect_map_user.c | 153 > 2 files changed, 168 insertions(+), 45 deletions(-) > [...] > diff --git a/samples/bpf/xdp_redirect_map_user.c > b/samples/bpf/xdp_redirect_map_user.c > index 31131b6e7782..19636045c8dc 100644 > --- a/samples/bpf/xdp_redirect_map_user.c > +++ b/samples/bpf/xdp_redirect_map_user.c > @@ -14,6 +14,10 @@ > #include > #include > #include > +#include > +#include > +#include > +#include > > #include "bpf_util.h" > #include > @@ -21,7 +25,8 @@ > > static int ifindex_in; > static int ifindex_out; > -static bool ifindex_out_xdp_dummy_attached = true; > +static bool ifindex_out_xdp_dummy_attached = false; > +static bool xdp_devmap_attached = false; > static __u32 prog_id; > static __u32 dummy_prog_id; > > @@ -83,6 +88,29 @@ static void poll_stats(int interval, int ifindex) > } > } > > +static int get_mac_addr(unsigned int ifindex_out, void *mac_addr) > +{ > + struct ifreq ifr; > + char ifname[IF_NAMESIZE]; > + int fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); I would have expected (like ethtool): fd = socket(AF_INET, SOCK_DGRAM, 0); > + if (fd < 0) > + return -1; > + > + if (!if_indextoname(ifindex_out, ifname)) > + return -1; > + > + strcpy(ifr.ifr_name, ifname); > + > + if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0) > + return -1; > + > + memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char)); > + close(fd); > + > + return 0; > +} [...] > - /* Loading dummy XDP prog on out-device */ > - if (bpf_set_link_xdp_fd(ifindex_out, dummy_prog_fd, > - (xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST)) < 0) { > - printf("WARN: link set xdp fd failed on %d\n", ifindex_out); > - ifindex_out_xdp_dummy_attached = false; > - } > + /* If -X supplied, load 2nd xdp prog on egress. > + * If not, just load dummy prog on egress. > + */ The dummy prog need to be loaded, regardless of 2nd xdp prog on egress. > + if (xdp_devmap_attached) { > + unsigned char mac_addr[6]; > > - memset(&info, 0, sizeof(info)); > - ret = bpf_obj_get_info_by_fd(dummy_prog_fd, &info, &info_len); > - if (ret) { > - printf("can't get prog info - %s\n", strerror(errno)); > - return ret; > + devmap_prog = bpf_object__find_program_by_title(obj, > "xdp_devmap/map_prog"); > + if (!devmap_prog) { > + printf("finding devmap_prog in obj file failed\n"); > + goto out; > + } > + devmap_prog_fd = bpf_program__fd(devmap_prog); > + if (devmap_prog_fd < 0) { > + printf("finding devmap_prog fd failed\n"); > + goto out; > + } > + > + if (get_mac_addr(ifindex_out, mac_addr) < 0) { > + printf("get interface %d mac failed\n", ifindex_out); > + goto out; > + } > + > + ret = bpf_map_update_elem(tx_mac_map_fd, &key, mac_addr, 0); > + if (ret) { > + perror("bpf_update_elem tx_mac_map_fd"); > + goto out; > + } > + } else if (ifindex_in != ifindex_out) { > + dummy_prog = bpf_object__find_program_by_title(obj, > "xdp_redirect_dummy"); > + if (!dummy_prog) { > + printf("finding dummy_prog in obj file failed\n"); > + goto out; > + } > + > + dummy_prog_fd = bpf_program__fd(dummy_prog); > + if (dummy_prog_fd < 0) { > + printf("find dummy_prog fd failed\n"); > + goto out; > + } > + > + if (bpf_set_link_xdp_fd(ifindex_out, dummy_prog_fd, > + (xdp_flags | > XDP_FLAGS_UPDATE_IF_NOEXIST)) == 0) { > + ifindex_out_xdp_dummy_attached = true; > + } else { > + printf("WARN: link set xdp fd failed on %d\n", > ifindex_out); > + } > + > + memset(&info, 0, sizeof(info)); > + ret = bpf_obj_get_info_by_fd(dummy_prog_fd, &info, &info_len); > + if (ret) { > + printf("can't get prog info - %s\n", strerror(errno)); > + } > +
RE: [PATCH RFC] ethernet: stmmac: clean up the code for release/suspend/resume function
> -Original Message- > From: Jisheng Zhang > Sent: 2020年12月8日 18:24 > To: Joakim Zhang > Cc: peppe.cavall...@st.com; alexandre.tor...@st.com; > joab...@synopsys.com; da...@davemloft.net; k...@kernel.org; > netdev@vger.kernel.org; dl-linux-imx > Subject: Re: [PATCH RFC] ethernet: stmmac: clean up the code for > release/suspend/resume function > > On Mon, 7 Dec 2020 19:38:49 +0800 Joakim Zhang wrote: > > > > > > commit 1c35cc9cf6a0 ("net: stmmac: remove redundant null check before > > clk_disable_unprepare()"), have not clean up check NULL clock parameter > completely, this patch did it. > > > > commit e8377e7a29efb ("net: stmmac: only call pmt() during > > suspend/resume if HW enables PMT"), after this patch, we use if > > (device_may_wakeup(priv->device) && priv->plat->pmt) check MAC wakeup > > if (device_may_wakeup(priv->device)) check PHY wakeup Add oneline > > comment for readability. > > > > commit 77b2898394e3b ("net: stmmac: Speed down the PHY if WoL to save > > energy"), slow down phy speed when release net device under any condition. > > > > Slightly adjust the order of the codes so that suspend/resume look > > more symmetrical, generally speaking they should appear symmetrically. > > > > Signed-off-by: Joakim Zhang > > --- > > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 > > +-- > > 1 file changed, 10 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index c33db79cdd0a..a46e865c4acc 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -2908,8 +2908,7 @@ static int stmmac_release(struct net_device *dev) > > struct stmmac_priv *priv = netdev_priv(dev); > > u32 chan; > > > > - if (device_may_wakeup(priv->device)) > > This check is to prevent link speed down if the stmmac isn't a wakeup device. When we invoke .ndo_stop, we down the net device. Per my understanding, we can speed down the phy, no matter it is a wakeup device or not. Since when invoke .ndo_open to up the net devce, we will re-config mac and phy. Please point out to me if I mis-understand something. Thanks. > > - phylink_speed_down(priv->phylink, false); > > + phylink_speed_down(priv->phylink, false); > > /* Stop and disconnect the PHY */ > > phylink_stop(priv->phylink); > > phylink_disconnect_phy(priv->phylink); > > @@ -5183,6 +5182,7 @@ int stmmac_suspend(struct device *dev) > > } else { > > mutex_unlock(&priv->lock); > > rtnl_lock(); > > + /* For PHY wakeup case */ > > if (device_may_wakeup(priv->device)) > > phylink_speed_down(priv->phylink, false); > > phylink_stop(priv->phylink); @@ -5260,11 +5260,17 @@ > > int stmmac_resume(struct device *dev) > > /* enable the clk previously disabled */ > > clk_prepare_enable(priv->plat->stmmac_clk); > > clk_prepare_enable(priv->plat->pclk); > > - if (priv->plat->clk_ptp_ref) > > - clk_prepare_enable(priv->plat->clk_ptp_ref); > > + clk_prepare_enable(priv->plat->clk_ptp_ref); > > I think this 3 line modifications can be a separated patch. Yes, this just a RFC to export issue. > > /* reset the phy so that it's ready */ > > if (priv->mii) > > stmmac_mdio_reset(priv->mii); > > + > > + rtnl_lock(); > > + phylink_start(priv->phylink); > > + /* We may have called phylink_speed_down before */ > > + if (device_may_wakeup(priv->device)) > > + phylink_speed_up(priv->phylink); > > + rtnl_unlock(); > > This is moving phylink op before mac setup, I'm not sure whether this is safe. We encounter an issue, need move phylink before mac setup, please see below patch. https://www.spinics.net/lists/netdev/msg706458.html Have not found problems after test. Is there ang risk? Best Regards, Joakim Zhang > > } > > > > if (priv->plat->serdes_powerup) { @@ -5275,14 +5281,6 @@ int > > stmmac_resume(struct device *dev) > > return ret; > > } > > > > - if (!device_may_wakeup(priv->device) || !priv->plat->pmt) { > > - rtnl_lock(); > > - phylink_start(priv->phylink); > > - /* We may have called phylink_speed_down before */ > > - phylink_speed_up(priv->phylink); > > - rtnl_unlock(); > > - } > > - > > rtnl_lock(); > > mutex_lock(&priv->lock); > > > > -- > > 2.17.1 > >
RE: [PATCH v4 2/6] igb: take vlan double header into account
On Tue, Dec 01, 2020 at 09:58:52AM +0100, Jesper Dangaard Brouer wrote: > > On Tue, 1 Dec 2020 08:23:23 + > > "Penigalapati, Sandeep" wrote: > > > > > Tested-by: Sandeep Penigalapati > > > > Very happy that you are testing this. > > > > Have you also tested that samples/bpf/ xdp_redirect_cpu program works? > > Hi Jesper, > > I have tested the xdp routing example but it would be good if someone can > double check this. > > Best > Sven > Hi Jesper, Sven I have tested xdp_redirect_cpu and it is working. Thanks, Sandeep > > > > -- > > Best regards, > > Jesper Dangaard Brouer > > MSc.CS, Principal Kernel Engineer at Red Hat > > LinkedIn: > > > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.l > > > inkedin.com%2Fin%2Fbrouer&data=04%7C01%7Csven.auhagen%40vol > eatech. > > > de%7C5a78333f75c945b9bcee08d895d75e5b%7Cb82a99f679814a7295344d3 > 5298f84 > > > 7b%7C0%7C0%7C637424099531073949%7CUnknown%7CTWFpbGZsb3d8eyJ > WIjoiMC4wLj > > > AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000& > sdata= > > g80690tGbCHAi3lr412ZlKoxwIFSIzn5e8V8nO1aZcw%3D&reserved=0 > >
Re: [PATCH v5 bpf-next 03/14] xdp: add xdp_shared_info data structure
> On Mon, 2020-12-07 at 17:32 +0100, Lorenzo Bianconi wrote: > > Introduce xdp_shared_info data structure to contain info about > > "non-linear" xdp frame. xdp_shared_info will alias skb_shared_info > > allowing to keep most of the frags in the same cache-line. > > Introduce some xdp_shared_info helpers aligned to skb_frag* ones > > > > is there or will be a more general purpose use to this xdp_shared_info > ? other than hosting frags ? I do not have other use-cases at the moment other than multi-buff but in theory it is possible I guess. The reason we introduced it is to have most of the frags in the first shared_info cache-line to avoid cache-misses. > > > Signed-off-by: Lorenzo Bianconi > > --- > > drivers/net/ethernet/marvell/mvneta.c | 62 +++ > > > > include/net/xdp.h | 52 -- > > 2 files changed, 82 insertions(+), 32 deletions(-) > > > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > > b/drivers/net/ethernet/marvell/mvneta.c > > index 1e5b5c69685a..d635463609ad 100644 > > --- a/drivers/net/ethernet/marvell/mvneta.c > > +++ b/drivers/net/ethernet/marvell/mvneta.c > > @@ -2033,14 +2033,17 @@ int mvneta_rx_refill_queue(struct mvneta_port > > *pp, struct mvneta_rx_queue *rxq) > > > > [...] > > > static void > > @@ -2278,7 +2281,7 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port > > *pp, > > struct mvneta_rx_desc *rx_desc, > > struct mvneta_rx_queue *rxq, > > struct xdp_buff *xdp, int *size, > > - struct skb_shared_info *xdp_sinfo, > > + struct xdp_shared_info *xdp_sinfo, > > struct page *page) > > { > > struct net_device *dev = pp->dev; > > @@ -2301,13 +2304,13 @@ mvneta_swbm_add_rx_fragment(struct > > mvneta_port *pp, > > if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) { > > skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo- > > >nr_frags++]; > > > > - skb_frag_off_set(frag, pp->rx_offset_correction); > > - skb_frag_size_set(frag, data_len); > > - __skb_frag_set_page(frag, page); > > + xdp_set_frag_offset(frag, pp->rx_offset_correction); > > + xdp_set_frag_size(frag, data_len); > > + xdp_set_frag_page(frag, page); > > > > why three separate setters ? why not just one > xdp_set_frag(page, offset, size) ? to be aligned with skb_frags helpers, but I guess we can have a single helper, I do not have a strong opinion on it > > > /* last fragment */ > > if (len == *size) { > > - struct skb_shared_info *sinfo; > > + struct xdp_shared_info *sinfo; > > > > sinfo = xdp_get_shared_info_from_buff(xdp); > > sinfo->nr_frags = xdp_sinfo->nr_frags; > > @@ -2324,10 +2327,13 @@ static struct sk_buff * > > mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue > > *rxq, > > struct xdp_buff *xdp, u32 desc_status) > > { [...] > > > > -static inline struct skb_shared_info * > > +struct xdp_shared_info { > > xdp_shared_info is a bad name, we need this to have a specific purpose > xdp_frags should the proper name, so people will think twice before > adding weird bits to this so called shared_info. I named the struct xdp_shared_info to recall skb_shared_info but I guess xdp_frags is fine too. Agree? > > > + u16 nr_frags; > > + u16 data_length; /* paged area length */ > > + skb_frag_t frags[MAX_SKB_FRAGS]; > > why MAX_SKB_FRAGS ? just use a flexible array member > skb_frag_t frags[]; > > and enforce size via the n_frags and on the construction of the > tailroom preserved buffer, which is already being done. > > this is waste of unnecessary space, at lease by definition of the > struct, in your use case you do: > memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * num_frags); > And the tailroom space was already preserved for a full skb_shinfo. > so i don't see why you need this array to be of a fixed MAX_SKB_FRAGS > size. In order to avoid cache-misses, xdp_shared info is built as a variable on mvneta_rx_swbm() stack and it is written to "shared_info" area only on the last fragment in mvneta_swbm_add_rx_fragment(). I used MAX_SKB_FRAGS to be aligned with skb_shared_info struct but probably we can use even a smaller value. Another approach would be to define two different struct, e.g. stuct xdp_frag_metadata { u16 nr_frags; u16 data_length; /* paged area length */ }; struct xdp_frags { skb_frag_t frags[MAX_SKB_FRAGS]; }; and then define xdp_shared_info as struct xdp_shared_info { stuct xdp_frag_metadata meta; skb_frag_t frags[]; }; In this way we can probably optimize the space. What do you think? > > > +}; > > + > > +static inline struct xdp_shared_info * > > xdp_get_shared_info_from_buff(struct xdp_buff *x
Re: [PATCH 01/17] wil6210: wmi: Correct misnamed function parameter 'ptr_'
On Wed, 02 Dec 2020, Kalle Valo wrote: > Lee Jones wrote: > > > Fixes the following W=1 kernel build warning(s): > > > > drivers/net/wireless/ath/wil6210/wmi.c:279: warning: Function parameter or > > member 'ptr_' not described in 'wmi_buffer_block' > > drivers/net/wireless/ath/wil6210/wmi.c:279: warning: Excess function > > parameter 'ptr' description in 'wmi_buffer_block' > > > > Cc: Maya Erez > > Cc: Kalle Valo > > Cc: "David S. Miller" > > Cc: Jakub Kicinski > > Cc: linux-wirel...@vger.kernel.org > > Cc: wil6...@qti.qualcomm.com > > Cc: netdev@vger.kernel.org > > Signed-off-by: Lee Jones > > Failed to apply: > > error: patch failed: drivers/net/wireless/ath/wil6210/wmi.c:262 > error: drivers/net/wireless/ath/wil6210/wmi.c: patch does not apply > stg import: Diff does not apply cleanly > > Patch set to Changes Requested. That's so strange. I just rebased my branch onto the latest -next with no issue. I will re-submit after the merge-window closes. -- Lee Jones [李琼斯] Senior Technical Lead - Developer Services Linaro.org │ Open source software for Arm SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [PATCHv3 bpf-next] samples/bpf: add xdp program on egress for xdp_redirect_map
On Tue, Dec 08, 2020 at 11:39:14AM +0100, Jesper Dangaard Brouer wrote: > > + /* If -X supplied, load 2nd xdp prog on egress. > > +* If not, just load dummy prog on egress. > > +*/ > > The dummy prog need to be loaded, regardless of 2nd xdp prog on egress. Thanks for this remind, Now I know why the pkts are dropped with I do perf test on physical NICs. Regards Hangbin
Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support
Hi Tobias, On Wed, Dec 02, 2020 at 10:13:54AM +0100, Tobias Waldekranz wrote: > Monitor the following events and notify the driver when: > > - A DSA port joins/leaves a LAG. > - A LAG, made up of DSA ports, joins/leaves a bridge. > - A DSA port in a LAG is enabled/disabled (enabled meaning > "distributing" in 802.3ad LACP terms). > > Each LAG interface to which a DSA port is attached is represented by a > `struct dsa_lag` which is globally reachable from the switch tree and > from each associated port. > > When a LAG joins a bridge, the DSA subsystem will treat that as each > individual port joining the bridge. The driver may look at the port's > LAG pointer to see if it is associated with any LAG, if that is > required. This is analogue to how switchdev events are replicated out > to all lower devices when reaching e.g. a LAG. > > Signed-off-by: Tobias Waldekranz > --- > > +struct dsa_lag { > + struct net_device *dev; > + int id; > + > + struct list_head ports; > + > + /* For multichip systems, we must ensure that each hash bucket > + * is only enabled on a single egress port throughout the > + * whole tree, lest we send duplicates. Therefore we must > + * maintain a global list of active tx ports, so that each > + * switch can figure out which buckets to enable on which > + * ports. > + */ > + struct list_head tx_ports; > + int num_tx; > + > + refcount_t refcount; > +}; Sorry it took so long. I wanted to understand: (a) where are the challenged for drivers to uniformly support software bridging when they already have code for bridge offloading. I found the following issues: - We have taggers that unconditionally set skb->offload_fwd_mark = 1, which kind of prevents software bridging. I'm not sure what the fix for these should be. - Source address is a big problem, but this time not in the sense that it traditionally has been. Specifically, due to address learning being enabled, the hardware FDB will set destinations to take the autonomous fast path. But surprise, the autonomous fast path is blocked, because as far as the switch is concerned, the ports are standalone and not offloading the bridge. We have drivers that don't disable address learning when they operate in standalone mode, which is something they definitely should do. There is nothing actionable for you in this patch set to resolve this. I just wanted to get an idea. (b) Whether struct dsa_lag really brings us any significant benefit. I found that it doesn't. It's a lot of code added to the DSA core, that should not really belong in the middle layer. I need to go back and quote your motivation in the RFC: | All LAG configuration is cached in `struct dsa_lag`s. I realize that | the standard M.O. of DSA is to read back information from hardware | when required. With LAGs this becomes very tricky though. For example, | the change of a link state on one switch will require re-balancing of | LAG hash buckets on another one, which in turn depends on the total | number of active links in the LAG. Do you agree that this is | motivated? After reimplementing bonding offload in ocelot, I have found struct dsa_lag to not provide any benefit. All the information a driver needs is already provided through the struct net_device *lag_dev argument given to lag_join and lag_leave, and through the struct netdev_lag_lower_state_info *info given to lag_change. I will send an RFC to you and the list shortly to prove that this information is absolutely sufficient for the driver to do decent internal bookkeeping, and that DSA should not really care beyond that. There are two points to be made: - Recently we have seen people with non-DSA (pure switchdev) hardware being compelled to write DSA drivers, because they noticed that a large part of the middle layer had already been written, and it presents an API with a lot of syntactic sugar. Maybe there is a larger issue here in that the switchdev offloading APIs are fairly bulky and repetitive, but that does not mean that we should be encouraging the attitude "come to DSA, we have cookies". https://lwn.net/ml/linux-kernel/20201125232459.378-1-lu...@denx.de/ - Remember that the only reason why the DSA framework and the syntactic sugar exists is that we are presenting the hardware a unified view for the ports which have a struct net_device registered, and the ports which don't (DSA links and CPU ports). The argument really needs to be broken down into two: - For cross-chip DSA links, I can see why it was convenient for you to have the dsa_lag_by_dev(ds->dst, lag_dev) helper. But just as we currently have a struct net_device *bridge_dev in struct dsa_port, so we could have a struct net_device *bond, without the extra fat
[PATCH 1/1] mwifiex: Fix possible buffer overflows in mwifiex_uap_bss_param_prepare
From: Zhang Xiaohui mwifiex_uap_bss_param_prepare() calls memcpy() without checking the destination size may trigger a buffer overflower, which a local user could use to cause denial of service or the execution of arbitrary code. Fix it by putting the length check before calling memcpy(). Signed-off-by: Zhang Xiaohui --- drivers/net/wireless/marvell/mwifiex/uap_cmd.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/marvell/mwifiex/uap_cmd.c b/drivers/net/wireless/marvell/mwifiex/uap_cmd.c index b48a85d79..fb937c7ee 100644 --- a/drivers/net/wireless/marvell/mwifiex/uap_cmd.c +++ b/drivers/net/wireless/marvell/mwifiex/uap_cmd.c @@ -496,13 +496,16 @@ mwifiex_uap_bss_param_prepare(u8 *tlv, void *cmd_buf, u16 *param_size) struct mwifiex_ie_types_wmmcap *wmm_cap; struct mwifiex_uap_bss_param *bss_cfg = cmd_buf; int i; + int ssid_size; u16 cmd_size = *param_size; if (bss_cfg->ssid.ssid_len) { ssid = (struct host_cmd_tlv_ssid *)tlv; ssid->header.type = cpu_to_le16(TLV_TYPE_UAP_SSID); ssid->header.len = cpu_to_le16((u16)bss_cfg->ssid.ssid_len); - memcpy(ssid->ssid, bss_cfg->ssid.ssid, bss_cfg->ssid.ssid_len); + ssid_size = bss_cfg->ssid.ssid_len > strlen(ssid->ssid) ? + strlen(ssid->ssid) : bss_cfg->ssid.ssid_len; + memcpy(ssid->ssid, bss_cfg->ssid.ssid, ssid_size); cmd_size += sizeof(struct mwifiex_ie_types_header) + bss_cfg->ssid.ssid_len; tlv += sizeof(struct mwifiex_ie_types_header) + -- 2.17.1
Re: [PATCH v4 2/6] igb: take vlan double header into account
On Tue, 8 Dec 2020 10:52:28 + "Penigalapati, Sandeep" wrote: > On Tue, Dec 01, 2020 at 09:58:52AM +0100, Jesper Dangaard Brouer wrote: > > > On Tue, 1 Dec 2020 08:23:23 + > > > "Penigalapati, Sandeep" wrote: > > > > > > > Tested-by: Sandeep Penigalapati > > > > > > Very happy that you are testing this. > > > > > > Have you also tested that samples/bpf/ xdp_redirect_cpu program works? > > > > Hi Jesper, > > > > I have tested the xdp routing example but it would be good if someone can > > double check this. > > > Hi Jesper, Sven > > I have tested xdp_redirect_cpu and it is working. Thanks this is great to hear. You have tested with large frames right? As cpumap just creates SKBs based on xdp_frame, and send them to the normal network stack (on remote CPU), you can just to a standard TCP-stream throughput test with iperf or netperf. That should hopefully blowup if we screwed up the boundaries of the two packets sharing the same page. (In principle we should verify the content of the TCP transfer, so maybe a scp + md5sum is a better test). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
[PATCH] [v11] wireless: Initial driver submission for pureLiFi STA devices
This introduces the pureLiFi LiFi driver for LiFi-X, LiFi-XC and LiFi-XL USB devices. This driver implementation has been based on the zd1211rw driver. Driver is based on 802.11 softMAC Architecture and uses native 802.11 for configuration and management. The driver is compiled and tested in ARM, x86 architectures and compiled in powerpc architecture. Signed-off-by: Srinivasan Raju --- v11, v10: - Addressed review comment on readability - Changed firmware names to match products and latest firmware v9: - Addressed review comments on style and content defects - Used kmemdup instead of alloc and memcpy v7 , v8: - Magic numbers removed and used IEEE80211 macors - usb.c is split into two files firmware.c and dbgfs.c - Other code style and timer function fixes (mod_timer) v6: - Code style fix patch from Joe Perches v5: - Code refactoring for clarity and redundnacy removal - Fix warnings from kernel test robot v4: - Code refactoring based on kernel code guidelines - Remove multi level macors and use kernel debug macros v3: - Code style fixes kconfig fix v2: - Driver submitted to wireless-next - Code style fixes and copyright statement fix v1: - Driver submitted to staging --- MAINTAINERS |5 + drivers/net/wireless/Kconfig |1 + drivers/net/wireless/Makefile|1 + drivers/net/wireless/purelifi/Kconfig| 27 + drivers/net/wireless/purelifi/Makefile |3 + drivers/net/wireless/purelifi/chip.c | 93 ++ drivers/net/wireless/purelifi/chip.h | 81 ++ drivers/net/wireless/purelifi/dbgfs.c| 150 +++ drivers/net/wireless/purelifi/firmware.c | 384 drivers/net/wireless/purelifi/intf.h | 38 + drivers/net/wireless/purelifi/mac.c | 873 ++ drivers/net/wireless/purelifi/mac.h | 189 drivers/net/wireless/purelifi/usb.c | 1075 ++ drivers/net/wireless/purelifi/usb.h | 199 14 files changed, 3119 insertions(+) create mode 100644 drivers/net/wireless/purelifi/Kconfig create mode 100644 drivers/net/wireless/purelifi/Makefile create mode 100644 drivers/net/wireless/purelifi/chip.c create mode 100644 drivers/net/wireless/purelifi/chip.h create mode 100644 drivers/net/wireless/purelifi/dbgfs.c create mode 100644 drivers/net/wireless/purelifi/firmware.c create mode 100644 drivers/net/wireless/purelifi/intf.h create mode 100644 drivers/net/wireless/purelifi/mac.c create mode 100644 drivers/net/wireless/purelifi/mac.h create mode 100644 drivers/net/wireless/purelifi/usb.c create mode 100644 drivers/net/wireless/purelifi/usb.h diff --git a/MAINTAINERS b/MAINTAINERS index c80f87d7258c..17955b8497df 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14108,6 +14108,11 @@ T: git git://linuxtv.org/media_tree.git F: Documentation/admin-guide/media/pulse8-cec.rst F: drivers/media/cec/usb/pulse8/ +PUREILIFI USB DRIVER +M: Srinivasan Raju +S: Supported +F: drivers/net/wireless/purelifi + PVRUSB2 VIDEO4LINUX DRIVER M: Mike Isely L: pvru...@isely.net (subscribers-only) diff --git a/drivers/net/wireless/Kconfig b/drivers/net/wireless/Kconfig index 170a64e67709..b87da3139f94 100644 --- a/drivers/net/wireless/Kconfig +++ b/drivers/net/wireless/Kconfig @@ -48,6 +48,7 @@ source "drivers/net/wireless/st/Kconfig" source "drivers/net/wireless/ti/Kconfig" source "drivers/net/wireless/zydas/Kconfig" source "drivers/net/wireless/quantenna/Kconfig" +source "drivers/net/wireless/purelifi/Kconfig" config PCMCIA_RAYCS tristate "Aviator/Raytheon 2.4GHz wireless support" diff --git a/drivers/net/wireless/Makefile b/drivers/net/wireless/Makefile index 80b324499786..e9fc770026f0 100644 --- a/drivers/net/wireless/Makefile +++ b/drivers/net/wireless/Makefile @@ -20,6 +20,7 @@ obj-$(CONFIG_WLAN_VENDOR_ST) += st/ obj-$(CONFIG_WLAN_VENDOR_TI) += ti/ obj-$(CONFIG_WLAN_VENDOR_ZYDAS) += zydas/ obj-$(CONFIG_WLAN_VENDOR_QUANTENNA) += quantenna/ +obj-$(CONFIG_WLAN_VENDOR_PURELIFI) += purelifi/ # 16-bit wireless PCMCIA client drivers obj-$(CONFIG_PCMCIA_RAYCS) += ray_cs.o diff --git a/drivers/net/wireless/purelifi/Kconfig b/drivers/net/wireless/purelifi/Kconfig new file mode 100644 index ..f6630791df9d --- /dev/null +++ b/drivers/net/wireless/purelifi/Kconfig @@ -0,0 +1,27 @@ +# SPDX-License-Identifier: GPL-2.0 +config WLAN_VENDOR_PURELIFI + bool "pureLiFi devices" + default y + help + If you have a pureLiFi device, say Y. + + Note that the answer to this question doesn't directly affect the + kernel: saying N will just cause the configurator to skip all the + questions about these cards. If you say Y, you will be asked for + your specific card in the following questions. + +if WLAN_VENDOR_PURELIFI + +config PURELIFI + + tristate "pureLiFi device support" + depends on CFG80211 && MAC80211 && USB + help + This driver makes th
Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
Jesper Dangaard Brouer writes: > On Mon, 7 Dec 2020 18:01:00 -0700 > David Ahern wrote: > >> On 12/7/20 1:52 PM, John Fastabend wrote: >> >> >> >> I think we need to keep XDP_TX action separate, because I think that >> >> there are use-cases where the we want to disable XDP_TX due to end-user >> >> policy or hardware limitations. >> > >> > How about we discover this at load time though. > > Nitpick at XDP "attach" time. The general disconnect between BPF and > XDP is that BPF can verify at "load" time (as kernel knows what it > support) while XDP can have different support/features per driver, and > cannot do this until attachment time. (See later issue with tail calls). > (All other BPF-hooks don't have this issue) > >> > Meaning if the program >> > doesn't use XDP_TX then the hardware can skip resource allocations for >> > it. I think we could have verifier or extra pass discover the use of >> > XDP_TX and then pass a bit down to driver to enable/disable TX caps. >> > >> >> This was discussed in the context of virtio_net some months back - it is >> hard to impossible to know a program will not return XDP_TX (e.g., value >> comes from a map). > > It is hard, and sometimes not possible. For maps the workaround is > that BPF-programmer adds a bound check on values from the map. If not > doing that the verifier have to assume all possible return codes are > used by BPF-prog. > > The real nemesis is program tail calls, that can be added dynamically > after the XDP program is attached. It is at attachment time that > changing the NIC resources is possible. So, for program tail calls the > verifier have to assume all possible return codes are used by BPF-prog. We actually had someone working on a scheme for how to express this for programs some months ago, but unfortunately that stalled out (Jesper already knows this, but FYI to the rest of you). In any case, I view this as a "next step". Just exposing the feature bits to userspace will help users today, and as a side effect, this also makes drivers declare what they support, which we can then incorporate into the core code to, e.g., reject attachment of programs that won't work anyway. But let's do this in increments and not make the perfect the enemy of the good here. > BPF now have function calls and function replace right(?) How does > this affect this detection of possible return codes? It does have the same issue as tail calls, in that the return code of the function being replaced can obviously change. However, the verifier knows the target of a replace, so it can propagate any constraints put upon the caller if we implement it that way. -Toke
[PATCHv4 bpf-next] samples/bpf: add xdp program on egress for xdp_redirect_map
This patch add a xdp program on egress to show that we can modify the packet on egress. In this sample we will set the pkt's src mac to egress's mac address. The xdp_prog will be attached when -X option supplied. Signed-off-by: Hangbin Liu --- v4: a) Update get_mac_addr socket creation b) Load dummy prog regardless of 2nd xdp prog on egress v3: a) modify the src mac address based on egress mac v2: a) use pkt counter instead of IP ttl modification on egress program b) make the egress program selectable by option -X --- samples/bpf/xdp_redirect_map_kern.c | 60 ++-- samples/bpf/xdp_redirect_map_user.c | 104 +++- 2 files changed, 140 insertions(+), 24 deletions(-) diff --git a/samples/bpf/xdp_redirect_map_kern.c b/samples/bpf/xdp_redirect_map_kern.c index 6489352ab7a4..6b2164722649 100644 --- a/samples/bpf/xdp_redirect_map_kern.c +++ b/samples/bpf/xdp_redirect_map_kern.c @@ -19,12 +19,22 @@ #include #include +/* The 2nd xdp prog on egress does not support skb mode, so we define two + * maps, tx_port_general and tx_port_native. + */ struct { __uint(type, BPF_MAP_TYPE_DEVMAP); __uint(key_size, sizeof(int)); __uint(value_size, sizeof(int)); __uint(max_entries, 100); -} tx_port SEC(".maps"); +} tx_port_general SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(struct bpf_devmap_val)); + __uint(max_entries, 100); +} tx_port_native SEC(".maps"); /* Count RX packets, as XDP bpf_prog doesn't get direct TX-success * feedback. Redirect TX errors can be caught via a tracepoint. @@ -36,6 +46,14 @@ struct { __uint(max_entries, 1); } rxcnt SEC(".maps"); +/* map to stroe egress interface mac address */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, u32); + __type(value, __be64); + __uint(max_entries, 1); +} tx_mac SEC(".maps"); + static void swap_src_dst_mac(void *data) { unsigned short *p = data; @@ -52,17 +70,16 @@ static void swap_src_dst_mac(void *data) p[5] = dst[2]; } -SEC("xdp_redirect_map") -int xdp_redirect_map_prog(struct xdp_md *ctx) +static int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map) { void *data_end = (void *)(long)ctx->data_end; void *data = (void *)(long)ctx->data; struct ethhdr *eth = data; int rc = XDP_DROP; - int vport, port = 0, m = 0; long *value; u32 key = 0; u64 nh_off; + int vport; nh_off = sizeof(*eth); if (data + nh_off > data_end) @@ -79,7 +96,40 @@ int xdp_redirect_map_prog(struct xdp_md *ctx) swap_src_dst_mac(data); /* send packet out physical port */ - return bpf_redirect_map(&tx_port, vport, 0); + return bpf_redirect_map(redirect_map, vport, 0); +} + +SEC("xdp_redirect_general") +int xdp_redirect_map_general(struct xdp_md *ctx) +{ + return xdp_redirect_map(ctx, &tx_port_general); +} + +SEC("xdp_redirect_native") +int xdp_redirect_map_native(struct xdp_md *ctx) +{ + return xdp_redirect_map(ctx, &tx_port_native); +} + +SEC("xdp_devmap/map_prog") +int xdp_redirect_map_egress(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + struct ethhdr *eth = data; + __be64 *mac; + u32 key = 0; + u64 nh_off; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return XDP_DROP; + + mac = bpf_map_lookup_elem(&tx_mac, &key); + if (mac) + __builtin_memcpy(eth->h_source, mac, ETH_ALEN); + + return XDP_PASS; } /* Redirect require an XDP bpf_prog loaded on the TX device */ diff --git a/samples/bpf/xdp_redirect_map_user.c b/samples/bpf/xdp_redirect_map_user.c index 31131b6e7782..9866d759bd11 100644 --- a/samples/bpf/xdp_redirect_map_user.c +++ b/samples/bpf/xdp_redirect_map_user.c @@ -14,6 +14,10 @@ #include #include #include +#include +#include +#include +#include #include "bpf_util.h" #include @@ -22,6 +26,7 @@ static int ifindex_in; static int ifindex_out; static bool ifindex_out_xdp_dummy_attached = true; +static bool xdp_devmap_attached = false; static __u32 prog_id; static __u32 dummy_prog_id; @@ -83,6 +88,29 @@ static void poll_stats(int interval, int ifindex) } } +static int get_mac_addr(unsigned int ifindex_out, void *mac_addr) +{ + struct ifreq ifr; + char ifname[IF_NAMESIZE]; + int fd = socket(AF_INET, SOCK_DGRAM, 0); + + if (fd < 0) + return -1; + + if (!if_indextoname(ifindex_out, ifname)) + return -1; + + strcpy(ifr.ifr_name, ifname); + + if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0) + return -1; + + memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char)); + close(fd); + + return 0; +} + static void usage(const ch
BUG: unable to handle kernel paging request in smc_nl_handle_smcr_dev
Hello, syzbot found the following issue on: HEAD commit:b1f7b098 Merge branch 's390-qeth-next' git tree: net-next console output: https://syzkaller.appspot.com/x/log.txt?x=164d246b50 kernel config: https://syzkaller.appspot.com/x/.config?x=2ac2dabe250b3a58 dashboard link: https://syzkaller.appspot.com/bug?extid=600fef7c414ee7e2d71b compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+600fef7c414ee7e2d...@syzkaller.appspotmail.com BUG: unable to handle page fault for address: ff84 #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD b08f067 P4D b08f067 PUD b091067 PMD 0 Oops: [#1] PREEMPT SMP KASAN CPU: 0 PID: 21334 Comm: syz-executor.1 Not tainted 5.10.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:smc_set_pci_values net/smc/smc_core.h:396 [inline] RIP: 0010:smc_nl_handle_smcr_dev.isra.0+0x4bd/0x11b0 net/smc/smc_ib.c:422 Code: 00 00 00 fc ff df 48 8d 7b 84 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 59 0c 00 00 <0f> b7 43 84 48 8d 7b 86 48 89 fa 48 c1 ea 03 66 89 84 24 ee 00 00 RSP: 0018:c900018b7228 EFLAGS: 00010246 RAX: 0005 RBX: RCX: RDX: RSI: RDI: ff84 RBP: 8ccc6120 R08: 0001 R09: c900018b7310 R10: f52000316e65 R11: R12: R13: 88802f52d540 R14: dc00 R15: 888062412014 FS: 7f9ce0405700() GS:8880b9e0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: ff84 CR3: 13c46000 CR4: 001506f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: smc_nl_prep_smcr_dev net/smc/smc_ib.c:469 [inline] smcr_nl_get_device+0xdf/0x1f0 net/smc/smc_ib.c:481 genl_lock_dumpit+0x60/0x90 net/netlink/genetlink.c:623 netlink_dump+0x4b9/0xb70 net/netlink/af_netlink.c:2268 __netlink_dump_start+0x642/0x900 net/netlink/af_netlink.c:2373 genl_family_rcv_msg_dumpit+0x2af/0x310 net/netlink/genetlink.c:686 genl_family_rcv_msg net/netlink/genetlink.c:780 [inline] genl_rcv_msg+0x434/0x580 net/netlink/genetlink.c:800 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:671 sys_sendmsg+0x6e8/0x810 net/socket.c:2331 ___sys_sendmsg+0xf3/0x170 net/socket.c:2385 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2418 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45e0f9 Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7f9ce0404c68 EFLAGS: 0246 ORIG_RAX: 002e RAX: ffda RBX: 0003 RCX: 0045e0f9 RDX: RSI: 2040 RDI: 0003 RBP: 0119bfc0 R08: R09: R10: R11: 0246 R12: 0119bf8c R13: 7ffda3a6b65f R14: 7f9ce04059c0 R15: 0119bf8c Modules linked in: CR2: ff84 ---[ end trace 7323b30ca37a03b9 ]--- RIP: 0010:smc_set_pci_values net/smc/smc_core.h:396 [inline] RIP: 0010:smc_nl_handle_smcr_dev.isra.0+0x4bd/0x11b0 net/smc/smc_ib.c:422 Code: 00 00 00 fc ff df 48 8d 7b 84 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 59 0c 00 00 <0f> b7 43 84 48 8d 7b 86 48 89 fa 48 c1 ea 03 66 89 84 24 ee 00 00 RSP: 0018:c900018b7228 EFLAGS: 00010246 RAX: 0005 RBX: RCX: RDX: RSI: RDI: ff84 RBP: 8ccc6120 R08: 0001 R09: c900018b7310 R10: f52000316e65 R11: R12: R13: 88802f52d540 R14: dc00 R15: 888062412014 FS: 7f9ce0405700() GS:8880b9e0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: ff84 CR3: 13c46000 CR4: 001506f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be r
Re: [PATCH net-next 2/4] net: mvpp2: add mvpp2_phylink_to_port() helper
Hi Greg, Apologies for delayed response:. pon., 2 lis 2020 o 19:02 Greg Kroah-Hartman napisał(a): > > On Mon, Nov 02, 2020 at 06:38:54PM +0100, Marcin Wojtas wrote: > > Hi Greg and Sasha, > > > > pt., 9 paź 2020 o 05:43 Marcin Wojtas napisał(a): > > > > > > Hi, > > > > > > sob., 20 cze 2020 o 11:21 Russell King > > > napisał(a): > > > > > > > > Add a helper to convert the struct phylink_config pointer passed in > > > > from phylink to the drivers internal struct mvpp2_port. > > > > > > > > Signed-off-by: Russell King > > > > --- > > > > .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 29 +-- > > > > 1 file changed, 14 insertions(+), 15 deletions(-) > > > > > > > > diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > index 7653277d03b7..313f5a60a605 100644 > > > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > @@ -4767,12 +4767,16 @@ static void mvpp2_port_copy_mac_addr(struct > > > > net_device *dev, struct mvpp2 *priv, > > > > eth_hw_addr_random(dev); > > > > } > > > > > > > > +static struct mvpp2_port *mvpp2_phylink_to_port(struct phylink_config > > > > *config) > > > > +{ > > > > + return container_of(config, struct mvpp2_port, phylink_config); > > > > +} > > > > + > > > > static void mvpp2_phylink_validate(struct phylink_config *config, > > > >unsigned long *supported, > > > >struct phylink_link_state *state) > > > > { > > > > - struct mvpp2_port *port = container_of(config, struct > > > > mvpp2_port, > > > > - phylink_config); > > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > > __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, }; > > > > > > > > /* Invalid combinations */ > > > > @@ -4913,8 +4917,7 @@ static void mvpp2_gmac_pcs_get_state(struct > > > > mvpp2_port *port, > > > > static void mvpp2_phylink_mac_pcs_get_state(struct phylink_config > > > > *config, > > > > struct phylink_link_state > > > > *state) > > > > { > > > > - struct mvpp2_port *port = container_of(config, struct > > > > mvpp2_port, > > > > - phylink_config); > > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > > > > > > if (port->priv->hw_version == MVPP22 && port->gop_id == 0) { > > > > u32 mode = readl(port->base + MVPP22_XLG_CTRL3_REG); > > > > @@ -4931,8 +4934,7 @@ static void > > > > mvpp2_phylink_mac_pcs_get_state(struct phylink_config *config, > > > > > > > > static void mvpp2_mac_an_restart(struct phylink_config *config) > > > > { > > > > - struct mvpp2_port *port = container_of(config, struct > > > > mvpp2_port, > > > > - phylink_config); > > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > > u32 val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG); > > > > > > > > writel(val | MVPP2_GMAC_IN_BAND_RESTART_AN, > > > > @@ -5105,13 +5107,12 @@ static void mvpp2_gmac_config(struct mvpp2_port > > > > *port, unsigned int mode, > > > > static void mvpp2_mac_config(struct phylink_config *config, unsigned > > > > int mode, > > > > const struct phylink_link_state *state) > > > > { > > > > - struct net_device *dev = to_net_dev(config->dev); > > > > - struct mvpp2_port *port = netdev_priv(dev); > > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > > bool change_interface = port->phy_interface != state->interface; > > > > > > > > /* Check for invalid configuration */ > > > > if (mvpp2_is_xlg(state->interface) && port->gop_id != 0) { > > > > - netdev_err(dev, "Invalid mode on %s\n", dev->name); > > > > + netdev_err(port->dev, "Invalid mode on %s\n", > > > > port->dev->name); > > > > return; > > > > } > > > > > > > > @@ -5151,8 +5152,7 @@ static void mvpp2_mac_link_up(struct > > > > phylink_config *config, > > > > int speed, int duplex, > > > > bool tx_pause, bool rx_pause) > > > > { > > > > - struct net_device *dev = to_net_dev(config->dev); > > > > - struct mvpp2_port *port = netdev_priv(dev); > > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > > u32 val; > > > > > > > > if (mvpp2_is_xlg(interface)) { > > > > @@ -5199,14 +5199,13 @@ static void mvpp2_mac_link_up(struct > > > > phylink_config *config, > > > > > > > > mvpp2_egress_enable(port); > > > > mvpp2_ingress_enable(port); > > > > - netif_tx_wake_all_queues(dev); > > > > + netif_tx_wake_all_queues(p
[PATCH net-next] net/sched: cls_u32: simplify the return expression of u32_reoffload_knode()
Simplify the return expression. Signed-off-by: Zheng Yongjun --- net/sched/cls_u32.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 54209a18d7fe..6e1abe805448 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -1171,7 +1171,6 @@ static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n, struct tc_u_hnode *ht = rtnl_dereference(n->ht_down); struct tcf_block *block = tp->chain->block; struct tc_cls_u32_offload cls_u32 = {}; - int err; tc_cls_common_offload_init(&cls_u32.common, tp, n->flags, extack); cls_u32.command = add ? @@ -1194,13 +1193,9 @@ static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n, cls_u32.knode.link_handle = ht->handle; } - err = tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSU32, - &cls_u32, cb_priv, &n->flags, - &n->in_hw_count); - if (err) - return err; - - return 0; + return tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSU32, +&cls_u32, cb_priv, &n->flags, +&n->in_hw_count); } static int u32_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb, -- 2.22.0
[RFC PATCH net-next 00/16] LAG offload for Ocelot DSA switches
This patch series comes as a continuation of the discussion started with Tobias Waldekranz in his patch series to offload bonding/team from DSA: https://patchwork.kernel.org/project/netdevbpf/patch/20201202091356.24075-3-tob...@waldekranz.com/ On one hand, it shows the rework that needs to be done to ocelot such that a pure switchdev and a DSA driver could share the same implementation. On the other hand, it tries to identify what data structures does DSA really need to keep and pass along to drivers, and which structures are best left for the drivers to deal privately with them. Testing has been done in the following topology: +--+ | Board 1 br0 | | +-+ | |/ \ | || | | || bond0 | ||+-+ | || / \ | | eno0 swp0swp1swp2 | +---||---|---|-+ || | | ++ | | Cable | | Cable| |Cable Cable | | ++ | | || | | +---||---|---|-+ | eno0 swp0swp1swp2 | || \ / | ||+-+ | || bond0 | || | | |\ / | | +-+ | | Board 2 br0 | +--+ The same script can be run on both Board 1 and Board 2 to set this up: #!/bin/bash ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 ip link set swp0 master br0 Then traffic can be tested between eno0 of Board 1 and eno0 of Board 2. Vladimir Oltean (16): net: mscc: ocelot: offload bridge port flags to device net: mscc: ocelot: allow offloading of bridge on top of LAG net: mscc: ocelot: rename ocelot_netdevice_port_event to ocelot_netdevice_changeupper net: mscc: ocelot: use a switch-case statement in ocelot_netdevice_event net: mscc: ocelot: don't refuse bonding interfaces we can't offload net: mscc: ocelot: use ipv6 in the aggregation code net: mscc: ocelot: set up the bonding mask in a way that avoids a net_device net: mscc: ocelot: avoid unneeded "lp" variable in LAG join net: mscc: ocelot: use "lag" variable name in ocelot_bridge_stp_state_set net: mscc: ocelot: reapply bridge forwarding mask on bonding join/leave net: mscc: ocelot: set up logical port IDs centrally net: mscc: ocelot: drop the use of the "lags" array net: mscc: ocelot: rename aggr_count to num_ports_in_lag net: mscc: ocelot: rebalance LAGs on link up/down events net: dsa: felix: propagate the LAG offload ops towards the ocelot lib net: dsa: ocelot: tell DSA that we can offload link aggregation drivers/net/dsa/ocelot/felix.c | 28 +++ drivers/net/ethernet/mscc/ocelot.c | 276 +++-- drivers/net/ethernet/mscc/ocelot.h | 7 +- drivers/net/ethernet/mscc/ocelot_net.c | 139 - include/soc/mscc/ocelot.h | 13 +- 5 files changed, 298 insertions(+), 165 deletions(-) -- 2.25.1
[RFC PATCH net-next 02/16] net: mscc: ocelot: allow offloading of bridge on top of LAG
Commit 7afb3e575e5a ("net: mscc: ocelot: don't handle netdev events for other netdevs") was too aggressive, and it made ocelot_netdevice_event react only to network interface events emitted for the ocelot switch ports. In fact, only the PRECHANGEUPPER should have had that check. When we ignore all events that are not for us, we miss the fact that the upper of the LAG changes, and the bonding interface gets enslaved to a bridge. This is an operation we could offload under certain conditions. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot_net.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c index 93ecd5274156..6fb2a813e694 100644 --- a/drivers/net/ethernet/mscc/ocelot_net.c +++ b/drivers/net/ethernet/mscc/ocelot_net.c @@ -1047,10 +1047,8 @@ static int ocelot_netdevice_event(struct notifier_block *unused, struct net_device *dev = netdev_notifier_info_to_dev(ptr); int ret = 0; - if (!ocelot_netdevice_dev_check(dev)) - return 0; - if (event == NETDEV_PRECHANGEUPPER && + ocelot_netdevice_dev_check(dev) && netif_is_lag_master(info->upper_dev)) { struct netdev_lag_upper_info *lag_upper_info = info->upper_info; struct netlink_ext_ack *extack; -- 2.25.1
[RFC PATCH net-next 01/16] net: mscc: ocelot: offload bridge port flags to device
We should not be unconditionally enabling address learning, since doing that is actively detrimential when a port is standalone and not offloading a bridge. Namely, if a port in the switch is standalone and others are offloading the bridge, then we could enter a situation where we learn an address towards the standalone port, but the bridged ports could not forward the packet there, because the CPU is the only path between the standalone and the bridged ports. The solution of course is to not enable address learning unless the bridge asks for it. Currently this is the only bridge port flag we are looking at. The others (flooding etc) are TBD. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 21 - drivers/net/ethernet/mscc/ocelot.h | 3 +++ drivers/net/ethernet/mscc/ocelot_net.c | 4 include/soc/mscc/ocelot.h | 2 ++ 4 files changed, 29 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index b9626eec8db6..7a5c534099d3 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -883,6 +883,7 @@ EXPORT_SYMBOL(ocelot_get_ts_info); void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state) { + struct ocelot_port *ocelot_port = ocelot->ports[port]; u32 port_cfg; int p, i; @@ -896,7 +897,8 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state) ocelot->bridge_fwd_mask |= BIT(port); fallthrough; case BR_STATE_LEARNING: - port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA; + if (ocelot_port->brport_flags & BR_LEARNING) + port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA; break; default: @@ -1178,6 +1180,7 @@ EXPORT_SYMBOL(ocelot_port_bridge_join); int ocelot_port_bridge_leave(struct ocelot *ocelot, int port, struct net_device *bridge) { + struct ocelot_port *ocelot_port = ocelot->ports[port]; struct ocelot_vlan pvid = {0}, native_vlan = {0}; struct switchdev_trans trans; int ret; @@ -1200,6 +1203,10 @@ int ocelot_port_bridge_leave(struct ocelot *ocelot, int port, ocelot_port_set_pvid(ocelot, port, pvid); ocelot_port_set_native_vlan(ocelot, port, native_vlan); + ocelot_port->brport_flags = 0; + ocelot_rmw_gix(ocelot, 0, ANA_PORT_PORT_CFG_LEARN_ENA, + ANA_PORT_PORT_CFG, port); + return 0; } EXPORT_SYMBOL(ocelot_port_bridge_leave); @@ -1391,6 +1398,18 @@ int ocelot_get_max_mtu(struct ocelot *ocelot, int port) } EXPORT_SYMBOL(ocelot_get_max_mtu); +void ocelot_port_bridge_flags(struct ocelot *ocelot, int port, + unsigned long flags, + struct switchdev_trans *trans) +{ + struct ocelot_port *ocelot_port = ocelot->ports[port]; + + if (switchdev_trans_ph_prepare(trans)) + return; + + ocelot_port->brport_flags = flags; +} + void ocelot_init_port(struct ocelot *ocelot, int port) { struct ocelot_port *ocelot_port = ocelot->ports[port]; diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h index 291d39d49c4e..739bd201d951 100644 --- a/drivers/net/ethernet/mscc/ocelot.h +++ b/drivers/net/ethernet/mscc/ocelot.h @@ -102,6 +102,9 @@ struct ocelot_multicast { struct ocelot_pgid *pgid; }; +void ocelot_port_bridge_flags(struct ocelot *ocelot, int port, + unsigned long flags, + struct switchdev_trans *trans); int ocelot_port_fdb_do_dump(const unsigned char *addr, u16 vid, bool is_static, void *data); int ocelot_mact_learn(struct ocelot *ocelot, int port, diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c index 9ba7e2b166e9..93ecd5274156 100644 --- a/drivers/net/ethernet/mscc/ocelot_net.c +++ b/drivers/net/ethernet/mscc/ocelot_net.c @@ -882,6 +882,10 @@ static int ocelot_port_attr_set(struct net_device *dev, case SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED: ocelot_port_attr_mc_set(ocelot, port, !attr->u.mc_disabled); break; + case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS: + ocelot_port_bridge_flags(ocelot, port, attr->u.brport_flags, +trans); + break; default: err = -EOPNOTSUPP; break; diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 2f4cd3288bcc..50514c087231 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -581,6 +581,8 @@ struct ocelot_port { struct regmap *target; + unsigned long brport_flags; + boolvlan_aware; /* VLAN that untag
[RFC PATCH net-next 06/16] net: mscc: ocelot: use ipv6 in the aggregation code
IPv6 header information is not currently part of the entropy source for the 4-bit aggregation code used for LAG offload, even though it could be. The hardware reference manual says about these fields: ANA::AGGR_CFG.AC_IP6_TCPUDP_PORT_ENA Use IPv6 TCP/UDP port when calculating aggregation code. Configure identically for all ports. Recommended value is 1. ANA::AGGR_CFG.AC_IP6_FLOW_LBL_ENA Use IPv6 flow label when calculating AC. Configure identically for all ports. Recommended value is 1. Integration with the xmit_hash_policy of the bonding interface is TBD. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 7a5c534099d3..13e86dd71e5a 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1557,7 +1557,10 @@ int ocelot_init(struct ocelot *ocelot) ocelot_write(ocelot, ANA_AGGR_CFG_AC_SMAC_ENA | ANA_AGGR_CFG_AC_DMAC_ENA | ANA_AGGR_CFG_AC_IP4_SIPDIP_ENA | -ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA, ANA_AGGR_CFG); +ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA | +ANA_AGGR_CFG_AC_IP6_FLOW_LBL_ENA | +ANA_AGGR_CFG_AC_IP6_TCPUDP_ENA, +ANA_AGGR_CFG); /* Set MAC age time to default value. The entry is aged after * 2*AGE_PERIOD -- 2.25.1
[RFC PATCH net-next 05/16] net: mscc: ocelot: don't refuse bonding interfaces we can't offload
Since switchdev/DSA exposes network interfaces that fulfill many of the same user space expectations that dedicated NICs do, it makes sense to not deny bonding interfaces with a bonding policy that we cannot offload, but instead allow the bonding driver to select the egress interface in software. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot_net.c | 38 ++ 1 file changed, 15 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c index 47b620967156..77957328722a 100644 --- a/drivers/net/ethernet/mscc/ocelot_net.c +++ b/drivers/net/ethernet/mscc/ocelot_net.c @@ -1022,6 +1022,15 @@ static int ocelot_netdevice_changeupper(struct net_device *dev, } } if (netif_is_lag_master(info->upper_dev)) { + struct netdev_lag_upper_info *lag_upper_info; + + lag_upper_info = info->upper_info; + + /* Only offload what we can */ + if (lag_upper_info && + lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) + return NOTIFY_DONE; + if (info->linking) err = ocelot_port_lag_join(ocelot, port, info->upper_dev); @@ -1037,10 +1046,16 @@ static int ocelot_netdevice_lag_changeupper(struct net_device *dev, struct netdev_notifier_changeupper_info *info) { + struct netdev_lag_upper_info *lag_upper_info = info->upper_info; struct net_device *lower; struct list_head *iter; int err = NOTIFY_DONE; + /* Can't offload LAG => also do bridging in software */ + if (lag_upper_info && + lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) + return NOTIFY_DONE; + netdev_for_each_lower_dev(dev, lower, iter) { err = ocelot_netdevice_changeupper(lower, info); if (err) @@ -1056,29 +1071,6 @@ static int ocelot_netdevice_event(struct notifier_block *unused, struct net_device *dev = netdev_notifier_info_to_dev(ptr); switch (event) { - case NETDEV_PRECHANGEUPPER: { - struct netdev_notifier_changeupper_info *info = ptr; - struct netdev_lag_upper_info *lag_upper_info; - struct netlink_ext_ack *extack; - - if (!ocelot_netdevice_dev_check(dev)) - break; - - if (!netif_is_lag_master(info->upper_dev)) - break; - - lag_upper_info = info->upper_info; - - if (lag_upper_info && - lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) { - extack = netdev_notifier_info_to_extack(&info->info); - NL_SET_ERR_MSG_MOD(extack, "LAG device using unsupported Tx type"); - - return NOTIFY_BAD; - } - - break; - } case NETDEV_CHANGEUPPER: { struct netdev_notifier_changeupper_info *info = ptr; -- 2.25.1
[RFC PATCH net-next 16/16] net: dsa: ocelot: tell DSA that we can offload link aggregation
For preallocation purposes, we need to specify the maximum number of individual bonding/team devices that we can offload, which in our case is equal to the number of physical interfaces. Signed-off-by: Vladimir Oltean --- drivers/net/dsa/ocelot/felix.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c index 53ed182fac12..ad73aaa4457c 100644 --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -653,6 +653,7 @@ static int felix_setup(struct dsa_switch *ds) ds->mtu_enforcement_ingress = true; ds->configure_vlan_while_not_filtering = true; + ds->num_lags = ds->num_ports; return 0; } -- 2.25.1
[RFC PATCH net-next 03/16] net: mscc: ocelot: rename ocelot_netdevice_port_event to ocelot_netdevice_changeupper
ocelot_netdevice_port_event treats a single event, NETDEV_CHANGEUPPER. So we can remove the check for the type of event, and rename the function to be more suggestive, since there already is a function with a very similar name of ocelot_netdevice_event. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot_net.c | 59 -- 1 file changed, 27 insertions(+), 32 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c index 6fb2a813e694..50765a3b1c44 100644 --- a/drivers/net/ethernet/mscc/ocelot_net.c +++ b/drivers/net/ethernet/mscc/ocelot_net.c @@ -1003,9 +1003,8 @@ static int ocelot_port_obj_del(struct net_device *dev, return ret; } -static int ocelot_netdevice_port_event(struct net_device *dev, - unsigned long event, - struct netdev_notifier_changeupper_info *info) +static int ocelot_netdevice_changeupper(struct net_device *dev, + struct netdev_notifier_changeupper_info *info) { struct ocelot_port_private *priv = netdev_priv(dev); struct ocelot_port *ocelot_port = &priv->port; @@ -1013,28 +1012,22 @@ static int ocelot_netdevice_port_event(struct net_device *dev, int port = priv->chip_port; int err = 0; - switch (event) { - case NETDEV_CHANGEUPPER: - if (netif_is_bridge_master(info->upper_dev)) { - if (info->linking) { - err = ocelot_port_bridge_join(ocelot, port, - info->upper_dev); - } else { - err = ocelot_port_bridge_leave(ocelot, port, - info->upper_dev); - } - } - if (netif_is_lag_master(info->upper_dev)) { - if (info->linking) - err = ocelot_port_lag_join(ocelot, port, - info->upper_dev); - else - ocelot_port_lag_leave(ocelot, port, + if (netif_is_bridge_master(info->upper_dev)) { + if (info->linking) { + err = ocelot_port_bridge_join(ocelot, port, info->upper_dev); + } else { + err = ocelot_port_bridge_leave(ocelot, port, + info->upper_dev); } - break; - default: - break; + } + if (netif_is_lag_master(info->upper_dev)) { + if (info->linking) + err = ocelot_port_lag_join(ocelot, port, + info->upper_dev); + else + ocelot_port_lag_leave(ocelot, port, + info->upper_dev); } return err; @@ -1063,17 +1056,19 @@ static int ocelot_netdevice_event(struct notifier_block *unused, } } - if (netif_is_lag_master(dev)) { - struct net_device *slave; - struct list_head *iter; + if (event == NETDEV_CHANGEUPPER) { + if (netif_is_lag_master(dev)) { + struct net_device *slave; + struct list_head *iter; - netdev_for_each_lower_dev(dev, slave, iter) { - ret = ocelot_netdevice_port_event(slave, event, info); - if (ret) - goto notify; + netdev_for_each_lower_dev(dev, slave, iter) { + ret = ocelot_netdevice_changeupper(slave, event, info); + if (ret) + goto notify; + } + } else { + ret = ocelot_netdevice_changeupper(dev, event, info); } - } else { - ret = ocelot_netdevice_port_event(dev, event, info); } notify: -- 2.25.1
[RFC PATCH net-next 07/16] net: mscc: ocelot: set up the bonding mask in a way that avoids a net_device
Since this code should be called from pure switchdev as well as from DSA, we must find a way to determine the bonding mask not by looking directly at the net_device lowers of the bonding interface, since those could have different private structures. We keep a pointer to the bonding upper interface, if present, in struct ocelot_port. Then the bonding mask becomes the bitwise OR of all ports that have the same bonding upper interface. This adds a duplication of functionality with the current "lags" array, but the duplication will be short-lived, since further patches will remove the latter completely. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 29 ++--- include/soc/mscc/ocelot.h | 2 ++ 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 13e86dd71e5a..30dee1f957d1 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -881,6 +881,24 @@ int ocelot_get_ts_info(struct ocelot *ocelot, int port, } EXPORT_SYMBOL(ocelot_get_ts_info); +static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device *bond) +{ + u32 bond_mask = 0; + int port; + + for (port = 0; port < ocelot->num_phys_ports; port++) { + struct ocelot_port *ocelot_port = ocelot->ports[port]; + + if (!ocelot_port) + continue; + + if (ocelot_port->bond == bond) + bond_mask |= BIT(port); + } + + return bond_mask; +} + void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state) { struct ocelot_port *ocelot_port = ocelot->ports[port]; @@ -1272,17 +1290,12 @@ static void ocelot_setup_lag(struct ocelot *ocelot, int lag) int ocelot_port_lag_join(struct ocelot *ocelot, int port, struct net_device *bond) { - struct net_device *ndev; u32 bond_mask = 0; int lag, lp; - rcu_read_lock(); - for_each_netdev_in_bond_rcu(bond, ndev) { - struct ocelot_port_private *priv = netdev_priv(ndev); + ocelot->ports[port]->bond = bond; - bond_mask |= BIT(priv->chip_port); - } - rcu_read_unlock(); + bond_mask = ocelot_get_bond_mask(ocelot, bond); lp = __ffs(bond_mask); @@ -1315,6 +1328,8 @@ void ocelot_port_lag_leave(struct ocelot *ocelot, int port, u32 port_cfg; int i; + ocelot->ports[port]->bond = NULL; + /* Remove port from any lag */ for (i = 0; i < ocelot->num_phys_ports; i++) ocelot->lags[i] &= ~BIT(port); diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 50514c087231..b812bdff1da1 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -597,6 +597,8 @@ struct ocelot_port { phy_interface_t phy_mode; u8 *xmit_template; + + struct net_device *bond; }; struct ocelot { -- 2.25.1
[RFC PATCH net-next 04/16] net: mscc: ocelot: use a switch-case statement in ocelot_netdevice_event
Make ocelot's net device event handler more streamlined by structuring it in a similar way with others. The inspiration here was dsa_slave_netdevice_event. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot_net.c | 68 +- 1 file changed, 45 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c index 50765a3b1c44..47b620967156 100644 --- a/drivers/net/ethernet/mscc/ocelot_net.c +++ b/drivers/net/ethernet/mscc/ocelot_net.c @@ -1030,49 +1030,71 @@ static int ocelot_netdevice_changeupper(struct net_device *dev, info->upper_dev); } - return err; + return notifier_from_errno(err); +} + +static int +ocelot_netdevice_lag_changeupper(struct net_device *dev, +struct netdev_notifier_changeupper_info *info) +{ + struct net_device *lower; + struct list_head *iter; + int err = NOTIFY_DONE; + + netdev_for_each_lower_dev(dev, lower, iter) { + err = ocelot_netdevice_changeupper(lower, info); + if (err) + return notifier_from_errno(err); + } + + return NOTIFY_DONE; } static int ocelot_netdevice_event(struct notifier_block *unused, unsigned long event, void *ptr) { - struct netdev_notifier_changeupper_info *info = ptr; struct net_device *dev = netdev_notifier_info_to_dev(ptr); - int ret = 0; - if (event == NETDEV_PRECHANGEUPPER && - ocelot_netdevice_dev_check(dev) && - netif_is_lag_master(info->upper_dev)) { - struct netdev_lag_upper_info *lag_upper_info = info->upper_info; + switch (event) { + case NETDEV_PRECHANGEUPPER: { + struct netdev_notifier_changeupper_info *info = ptr; + struct netdev_lag_upper_info *lag_upper_info; struct netlink_ext_ack *extack; + if (!ocelot_netdevice_dev_check(dev)) + break; + + if (!netif_is_lag_master(info->upper_dev)) + break; + + lag_upper_info = info->upper_info; + if (lag_upper_info && lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) { extack = netdev_notifier_info_to_extack(&info->info); NL_SET_ERR_MSG_MOD(extack, "LAG device using unsupported Tx type"); - ret = -EINVAL; - goto notify; + return NOTIFY_BAD; } + + break; } + case NETDEV_CHANGEUPPER: { + struct netdev_notifier_changeupper_info *info = ptr; - if (event == NETDEV_CHANGEUPPER) { - if (netif_is_lag_master(dev)) { - struct net_device *slave; - struct list_head *iter; + if (ocelot_netdevice_dev_check(dev)) + return ocelot_netdevice_changeupper(dev, info); - netdev_for_each_lower_dev(dev, slave, iter) { - ret = ocelot_netdevice_changeupper(slave, event, info); - if (ret) - goto notify; - } - } else { - ret = ocelot_netdevice_changeupper(dev, event, info); - } + if (netif_is_lag_master(dev)) + return ocelot_netdevice_lag_changeupper(dev, info); + + break; + } + default: + break; } -notify: - return notifier_from_errno(ret); + return NOTIFY_DONE; } struct notifier_block ocelot_netdevice_nb __read_mostly = { -- 2.25.1
[RFC PATCH net-next 15/16] net: dsa: felix: propagate the LAG offload ops towards the ocelot lib
The ocelot switch has been supporting LAG offload since its initial commit, however felix could not make use of that, due to lack of a LAG abstraction in DSA. Now that we have that, let's forward DSA's calls towards the ocelot library, who will deal with setting up the bonding. Note that ocelot_port_lag_leave can return an error due to memory allocation but we are currently ignoring that, because the DSA method returns void. Signed-off-by: Vladimir Oltean --- drivers/net/dsa/ocelot/felix.c | 27 +++ drivers/net/ethernet/mscc/ocelot.c | 1 + drivers/net/ethernet/mscc/ocelot.h | 6 -- include/soc/mscc/ocelot.h | 6 ++ 4 files changed, 34 insertions(+), 6 deletions(-) diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c index 7dc230677b78..53ed182fac12 100644 --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -112,6 +112,30 @@ static void felix_bridge_leave(struct dsa_switch *ds, int port, ocelot_port_bridge_leave(ocelot, port, br); } +static int felix_lag_join(struct dsa_switch *ds, int port, + struct net_device *lag_dev) +{ + struct ocelot *ocelot = ds->priv; + + return ocelot_port_lag_join(ocelot, port, lag_dev); +} + +static void felix_lag_leave(struct dsa_switch *ds, int port, + struct net_device *lag_dev) +{ + struct ocelot *ocelot = ds->priv; + + ocelot_port_lag_leave(ocelot, port, lag_dev); +} + +static int felix_lag_change(struct dsa_switch *ds, int port, + struct netdev_lag_lower_state_info *linfo) +{ + struct ocelot *ocelot = ds->priv; + + return ocelot_port_lag_change(ocelot, port, linfo); +} + static int felix_vlan_prepare(struct dsa_switch *ds, int port, const struct switchdev_obj_port_vlan *vlan) { @@ -803,6 +827,9 @@ const struct dsa_switch_ops felix_switch_ops = { .port_mdb_del = felix_mdb_del, .port_bridge_join = felix_bridge_join, .port_bridge_leave = felix_bridge_leave, + .port_lag_join = felix_lag_join, + .port_lag_leave = felix_lag_leave, + .port_lag_change= felix_lag_change, .port_stp_state_set = felix_bridge_stp_state_set, .port_vlan_prepare = felix_vlan_prepare, .port_vlan_filtering= felix_vlan_filtering, diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 5c71d121048d..cd7a2e558301 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1381,6 +1381,7 @@ int ocelot_port_lag_change(struct ocelot *ocelot, int port, /* Rebalance the LAGs */ return ocelot_set_aggr_pgids(ocelot); } +EXPORT_SYMBOL(ocelot_port_lag_change); /* Configure the maximum SDU (L2 payload) on RX to the value specified in @sdu. * The length of VLAN tags is accounted for automatically via DEV_MAC_TAGS_CFG. diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h index 0860125b623c..3141ccde6a66 100644 --- a/drivers/net/ethernet/mscc/ocelot.h +++ b/drivers/net/ethernet/mscc/ocelot.h @@ -112,12 +112,6 @@ int ocelot_mact_learn(struct ocelot *ocelot, int port, unsigned int vid, enum macaccess_entry_type type); int ocelot_mact_forget(struct ocelot *ocelot, const unsigned char mac[ETH_ALEN], unsigned int vid); -int ocelot_port_lag_join(struct ocelot *ocelot, int port, -struct net_device *bond); -int ocelot_port_lag_leave(struct ocelot *ocelot, int port, - struct net_device *bond); -int ocelot_port_lag_change(struct ocelot *ocelot, int port, - struct netdev_lag_lower_state_info *info); struct net_device *ocelot_port_to_netdev(struct ocelot *ocelot, int port); int ocelot_netdev_to_port(struct net_device *dev); diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 8a44b9064789..7c104f08796d 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -780,5 +780,11 @@ int ocelot_port_mdb_add(struct ocelot *ocelot, int port, const struct switchdev_obj_port_mdb *mdb); int ocelot_port_mdb_del(struct ocelot *ocelot, int port, const struct switchdev_obj_port_mdb *mdb); +int ocelot_port_lag_join(struct ocelot *ocelot, int port, +struct net_device *bond); +int ocelot_port_lag_leave(struct ocelot *ocelot, int port, + struct net_device *bond); +int ocelot_port_lag_change(struct ocelot *ocelot, int port, + struct netdev_lag_lower_state_info *info); #endif -- 2.25.1
[RFC PATCH net-next 09/16] net: mscc: ocelot: use "lag" variable name in ocelot_bridge_stp_state_set
In anticipation of further simplification, make it more clear what we're iterating over. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 080fd4ce37ea..c3c6682e6e79 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -903,7 +903,7 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state) { struct ocelot_port *ocelot_port = ocelot->ports[port]; u32 port_cfg; - int p, i; + int p; if (!(BIT(port) & ocelot->bridge_mask)) return; @@ -928,14 +928,17 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state) ocelot_write_gix(ocelot, port_cfg, ANA_PORT_PORT_CFG, port); /* Apply FWD mask. The loop is needed to add/remove the current port as -* a source for the other ports. +* a source for the other ports. If the source port is in a bond, then +* all the other ports from that bond need to be removed from this +* source port's forwarding mask. */ for (p = 0; p < ocelot->num_phys_ports; p++) { if (ocelot->bridge_fwd_mask & BIT(p)) { unsigned long mask = ocelot->bridge_fwd_mask & ~BIT(p); + int lag; - for (i = 0; i < ocelot->num_phys_ports; i++) { - unsigned long bond_mask = ocelot->lags[i]; + for (lag = 0; lag < ocelot->num_phys_ports; lag++) { + unsigned long bond_mask = ocelot->lags[lag]; if (!bond_mask) continue; -- 2.25.1
[PATCH net-next] net: ipv6: rpl_iptunnel: simplify the return expression of rpl_do_srh()
Simplify the return expression. Signed-off-by: Zheng Yongjun --- net/ipv6/rpl_iptunnel.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c index 5fdf3ebb953f..f16cf45a2421 100644 --- a/net/ipv6/rpl_iptunnel.c +++ b/net/ipv6/rpl_iptunnel.c @@ -190,18 +190,13 @@ static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt) { struct dst_entry *dst = skb_dst(skb); struct rpl_iptunnel_encap *tinfo; - int err = 0; if (skb->protocol != htons(ETH_P_IPV6)) return -EINVAL; tinfo = rpl_encap_lwtunnel(dst->lwtstate); - err = rpl_do_srh_inline(skb, rlwt, tinfo->srh); - if (err) - return err; - - return 0; + return rpl_do_srh_inline(skb, rlwt, tinfo->srh); } static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb) -- 2.22.0
[RFC PATCH net-next 10/16] net: mscc: ocelot: reapply bridge forwarding mask on bonding join/leave
Applying the bridge forwarding mask currently is done only on the STP state changes for any port. But it depends on both STP state changes, and bonding interface state changes. Export the bit that recalculates the forwarding mask so that it could be reused, and call it when a port starts and stops offloading a bonding interface. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 68 +- 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index c3c6682e6e79..ee0fcee8e09a 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -899,11 +899,45 @@ static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device *bond) return bond_mask; } +static void ocelot_apply_bridge_fwd_mask(struct ocelot *ocelot) +{ + int port; + + /* Apply FWD mask. The loop is needed to add/remove the current port as +* a source for the other ports. If the source port is in a bond, then +* all the other ports from that bond need to be removed from this +* source port's forwarding mask. +*/ + for (port = 0; port < ocelot->num_phys_ports; port++) { + if (ocelot->bridge_fwd_mask & BIT(port)) { + unsigned long mask = ocelot->bridge_fwd_mask & ~BIT(port); + int lag; + + for (lag = 0; lag < ocelot->num_phys_ports; lag++) { + unsigned long bond_mask = ocelot->lags[lag]; + + if (!bond_mask) + continue; + + if (bond_mask & BIT(port)) { + mask &= ~bond_mask; + break; + } + } + + ocelot_write_rix(ocelot, mask, +ANA_PGID_PGID, PGID_SRC + port); + } else { + ocelot_write_rix(ocelot, 0, +ANA_PGID_PGID, PGID_SRC + port); + } + } +} + void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state) { struct ocelot_port *ocelot_port = ocelot->ports[port]; u32 port_cfg; - int p; if (!(BIT(port) & ocelot->bridge_mask)) return; @@ -927,35 +961,7 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state) ocelot_write_gix(ocelot, port_cfg, ANA_PORT_PORT_CFG, port); - /* Apply FWD mask. The loop is needed to add/remove the current port as -* a source for the other ports. If the source port is in a bond, then -* all the other ports from that bond need to be removed from this -* source port's forwarding mask. -*/ - for (p = 0; p < ocelot->num_phys_ports; p++) { - if (ocelot->bridge_fwd_mask & BIT(p)) { - unsigned long mask = ocelot->bridge_fwd_mask & ~BIT(p); - int lag; - - for (lag = 0; lag < ocelot->num_phys_ports; lag++) { - unsigned long bond_mask = ocelot->lags[lag]; - - if (!bond_mask) - continue; - - if (bond_mask & BIT(p)) { - mask &= ~bond_mask; - break; - } - } - - ocelot_write_rix(ocelot, mask, -ANA_PGID_PGID, PGID_SRC + p); - } else { - ocelot_write_rix(ocelot, 0, -ANA_PGID_PGID, PGID_SRC + p); - } - } + ocelot_apply_bridge_fwd_mask(ocelot); } EXPORT_SYMBOL(ocelot_bridge_stp_state_set); @@ -1315,6 +1321,7 @@ int ocelot_port_lag_join(struct ocelot *ocelot, int port, } ocelot_setup_lag(ocelot, lag); + ocelot_apply_bridge_fwd_mask(ocelot); ocelot_set_aggr_pgids(ocelot); return 0; @@ -1350,6 +1357,7 @@ void ocelot_port_lag_leave(struct ocelot *ocelot, int port, ocelot_write_gix(ocelot, port_cfg | ANA_PORT_PORT_CFG_PORTID_VAL(port), ANA_PORT_PORT_CFG, port); + ocelot_apply_bridge_fwd_mask(ocelot); ocelot_set_aggr_pgids(ocelot); } EXPORT_SYMBOL(ocelot_port_lag_leave); -- 2.25.1
[RFC PATCH net-next 08/16] net: mscc: ocelot: avoid unneeded "lp" variable in LAG join
The index of the LAG is equal to the logical port ID that all the physical port members have, which is further equal to the index of the first physical port that is a member of the LAG. The code gets a bit carried away with logic like this: if (a == b) c = a; else c = b; which can be simplified, of course, into: c = b; (with a being port, b being lp, c being lag) This further makes the "lp" variable redundant, since we can use "lag" everywhere where "lp" (logical port) was used. So instead of a "c = b" assignment, we can do a complete deletion of b. Only one comment here: if (bond_mask) { lp = __ffs(bond_mask); ocelot->lags[lp] = 0; } lp was clobbered before, because it was used as a temporary variable to hold the new smallest port ID from the bond. Now that we don't have "lp" any longer, we'll just avoid the temporary variable and zeroize the bonding mask directly. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 30dee1f957d1..080fd4ce37ea 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1291,28 +1291,24 @@ int ocelot_port_lag_join(struct ocelot *ocelot, int port, struct net_device *bond) { u32 bond_mask = 0; - int lag, lp; + int lag; ocelot->ports[port]->bond = bond; bond_mask = ocelot_get_bond_mask(ocelot, bond); - lp = __ffs(bond_mask); + lag = __ffs(bond_mask); /* If the new port is the lowest one, use it as the logical port from * now on */ - if (port == lp) { - lag = port; + if (port == lag) { ocelot->lags[port] = bond_mask; bond_mask &= ~BIT(port); - if (bond_mask) { - lp = __ffs(bond_mask); - ocelot->lags[lp] = 0; - } + if (bond_mask) + ocelot->lags[__ffs(bond_mask)] = 0; } else { - lag = lp; - ocelot->lags[lp] |= BIT(port); + ocelot->lags[lag] |= BIT(port); } ocelot_setup_lag(ocelot, lag); -- 2.25.1
[RFC PATCH net-next 12/16] net: mscc: ocelot: drop the use of the "lags" array
We can now simplify the implementation by always using ocelot_get_bond_mask to look up the other ports that are offloading the same bonding interface as us. In ocelot_set_aggr_pgids, the code had a way to uniquely iterate through LAGs. We need to achieve the same behavior by marking each LAG as visited, which we do now by temporarily allocating an array of pointers to bonding uppers of each port, and marking each bonding upper as NULL once it has been treated by the first port that is a member. And because we now do some dynamic allocation, we need to propagate errors from ocelot_set_aggr_pgid all the way to ocelot_port_lag_leave. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 104 ++--- drivers/net/ethernet/mscc/ocelot.h | 4 +- drivers/net/ethernet/mscc/ocelot_net.c | 4 +- include/soc/mscc/ocelot.h | 2 - 4 files changed, 47 insertions(+), 67 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 1a98c24af056..d4dbba66aa65 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -909,21 +909,17 @@ static void ocelot_apply_bridge_fwd_mask(struct ocelot *ocelot) * source port's forwarding mask. */ for (port = 0; port < ocelot->num_phys_ports; port++) { - if (ocelot->bridge_fwd_mask & BIT(port)) { - unsigned long mask = ocelot->bridge_fwd_mask & ~BIT(port); - int lag; + struct ocelot_port *ocelot_port = ocelot->ports[port]; - for (lag = 0; lag < ocelot->num_phys_ports; lag++) { - unsigned long bond_mask = ocelot->lags[lag]; + if (!ocelot_port) + continue; - if (!bond_mask) - continue; + if (ocelot->bridge_fwd_mask & BIT(port)) { + unsigned long mask = ocelot->bridge_fwd_mask & ~BIT(port); + struct net_device *bond = ocelot_port->bond; - if (bond_mask & BIT(port)) { - mask &= ~bond_mask; - break; - } - } + if (bond) + mask &= ~ocelot_get_bond_mask(ocelot, bond); ocelot_write_rix(ocelot, mask, ANA_PGID_PGID, PGID_SRC + port); @@ -1238,10 +1234,16 @@ int ocelot_port_bridge_leave(struct ocelot *ocelot, int port, } EXPORT_SYMBOL(ocelot_port_bridge_leave); -static void ocelot_set_aggr_pgids(struct ocelot *ocelot) +static int ocelot_set_aggr_pgids(struct ocelot *ocelot) { + struct net_device **bonds; int i, port, lag; + bonds = kcalloc(ocelot->num_phys_ports, sizeof(struct net_device *), + GFP_KERNEL); + if (!bonds) + return -ENOMEM; + /* Reset destination and aggregation PGIDS */ for_each_unicast_dest_pgid(ocelot, port) ocelot_write_rix(ocelot, BIT(port), ANA_PGID_PGID, port); @@ -1250,16 +1252,26 @@ static void ocelot_set_aggr_pgids(struct ocelot *ocelot) ocelot_write_rix(ocelot, GENMASK(ocelot->num_phys_ports - 1, 0), ANA_PGID_PGID, i); + for (port = 0; port < ocelot->num_phys_ports; port++) { + struct ocelot_port *ocelot_port = ocelot->ports[port]; + + if (!ocelot_port) + continue; + + bonds[port] = ocelot_port->bond; + } + /* Now, set PGIDs for each LAG */ for (lag = 0; lag < ocelot->num_phys_ports; lag++) { unsigned long bond_mask; int aggr_count = 0; u8 aggr_idx[16]; - bond_mask = ocelot->lags[lag]; - if (!bond_mask) + if (!bonds[lag]) continue; + bond_mask = ocelot_get_bond_mask(ocelot, bonds[lag]); + for_each_set_bit(port, &bond_mask, ocelot->num_phys_ports) { // Destination mask ocelot_write_rix(ocelot, bond_mask, @@ -1276,7 +1288,19 @@ static void ocelot_set_aggr_pgids(struct ocelot *ocelot) ac |= BIT(aggr_idx[i % aggr_count]); ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i); } + + /* Mark the bonding interface as visited to avoid applying +* the same config again +*/ + for (i = lag + 1; i < ocelot->num_phys_ports; i++) + if (bonds[i] == bonds[lag]) + bonds[i] = NULL; + + bonds[lag] = NULL; } + + kfree(bonds); + return
[RFC PATCH net-next 14/16] net: mscc: ocelot: rebalance LAGs on link up/down events
At present there is an issue when ocelot is offloading a bonding interface, but one of the links of the physical ports goes down. Traffic keeps being hashed towards that destination, and of course gets dropped on egress. Monitor the netdev notifier events emitted by the bonding driver for changes in the physical state of lower interfaces, to determine which ports are active and which ones are no longer. Then extend ocelot_get_bond_mask to return either the configured bonding interfaces, or the active ones, depending on a boolean argument. The code that does rebalancing only needs to do so among the active ports, whereas the bridge forwarding mask and the logical port IDs still need to look at the permanently bonded ports. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 43 -- drivers/net/ethernet/mscc/ocelot.h | 2 ++ drivers/net/ethernet/mscc/ocelot_net.c | 26 include/soc/mscc/ocelot.h | 1 + 4 files changed, 63 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index d87e80a15ca5..5c71d121048d 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -881,7 +881,8 @@ int ocelot_get_ts_info(struct ocelot *ocelot, int port, } EXPORT_SYMBOL(ocelot_get_ts_info); -static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device *bond) +static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device *bond, + bool just_active_ports) { u32 bond_mask = 0; int port; @@ -892,8 +893,12 @@ static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device *bond) if (!ocelot_port) continue; - if (ocelot_port->bond == bond) + if (ocelot_port->bond == bond) { + if (just_active_ports && !ocelot_port->lag_tx_active) + continue; + bond_mask |= BIT(port); + } } return bond_mask; @@ -919,7 +924,7 @@ static void ocelot_apply_bridge_fwd_mask(struct ocelot *ocelot) struct net_device *bond = ocelot_port->bond; if (bond) - mask &= ~ocelot_get_bond_mask(ocelot, bond); + mask &= ~ocelot_get_bond_mask(ocelot, bond, false); ocelot_write_rix(ocelot, mask, ANA_PGID_PGID, PGID_SRC + port); @@ -1261,22 +1266,22 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot) bonds[port] = ocelot_port->bond; } - /* Now, set PGIDs for each LAG */ + /* Now, set PGIDs for each active LAG */ for (lag = 0; lag < ocelot->num_phys_ports; lag++) { - int num_ports_in_lag = 0; + int num_active_ports = 0; unsigned long bond_mask; u8 aggr_idx[16]; if (!bonds[lag]) continue; - bond_mask = ocelot_get_bond_mask(ocelot, bonds[lag]); + bond_mask = ocelot_get_bond_mask(ocelot, bonds[lag], true); for_each_set_bit(port, &bond_mask, ocelot->num_phys_ports) { // Destination mask ocelot_write_rix(ocelot, bond_mask, ANA_PGID_PGID, port); - aggr_idx[num_ports_in_lag++] = port; + aggr_idx[num_active_ports++] = port; } for_each_aggr_pgid(ocelot, i) { @@ -1284,7 +1289,11 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot) ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i); ac &= ~bond_mask; - ac |= BIT(aggr_idx[i % num_ports_in_lag]); + /* Don't do division by zero if there was no active +* port. Just make all aggregation codes zero. +*/ + if (num_active_ports) + ac |= BIT(aggr_idx[i % num_active_ports]); ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i); } @@ -1320,7 +1329,8 @@ static void ocelot_setup_logical_port_ids(struct ocelot *ocelot) bond = ocelot_port->bond; if (bond) { - int lag = __ffs(ocelot_get_bond_mask(ocelot, bond)); + int lag = __ffs(ocelot_get_bond_mask(ocelot, bond, +false)); ocelot_rmw_gix(ocelot, ANA_PORT_PORT_CFG_PORTID_VAL(lag), @@ -1357,6 +1367,21 @@ int ocelot_port_lag_leave(struct ocelot *ocelot, int port, } EXPORT_SYMBOL(ocelot_port_lag_lea
[RFC PATCH net-next 13/16] net: mscc: ocelot: rename aggr_count to num_ports_in_lag
It makes it a bit easier to read and understand the code that deals with balancing the 16 aggregation codes among the ports in a certain LAG. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index d4dbba66aa65..d87e80a15ca5 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1263,8 +1263,8 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot) /* Now, set PGIDs for each LAG */ for (lag = 0; lag < ocelot->num_phys_ports; lag++) { + int num_ports_in_lag = 0; unsigned long bond_mask; - int aggr_count = 0; u8 aggr_idx[16]; if (!bonds[lag]) @@ -1276,8 +1276,7 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot) // Destination mask ocelot_write_rix(ocelot, bond_mask, ANA_PGID_PGID, port); - aggr_idx[aggr_count] = port; - aggr_count++; + aggr_idx[num_ports_in_lag++] = port; } for_each_aggr_pgid(ocelot, i) { @@ -1285,7 +1284,7 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot) ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i); ac &= ~bond_mask; - ac |= BIT(aggr_idx[i % aggr_count]); + ac |= BIT(aggr_idx[i % num_ports_in_lag]); ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i); } -- 2.25.1
[PATCH net-next] net: core: devlink: simplify the return expression of devlink_nl_cmd_trap_set_doit()
Simplify the return expression. Signed-off-by: Zheng Yongjun --- net/core/devlink.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/net/core/devlink.c b/net/core/devlink.c index 8c5ddffd707d..3f0a65ee0474 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -6981,7 +6981,6 @@ static int devlink_nl_cmd_trap_set_doit(struct sk_buff *skb, struct netlink_ext_ack *extack = info->extack; struct devlink *devlink = info->user_ptr[0]; struct devlink_trap_item *trap_item; - int err; if (list_empty(&devlink->trap_list)) return -EOPNOTSUPP; @@ -6992,11 +6991,7 @@ static int devlink_nl_cmd_trap_set_doit(struct sk_buff *skb, return -ENOENT; } - err = devlink_trap_action_set(devlink, trap_item, info); - if (err) - return err; - - return 0; + return devlink_trap_action_set(devlink, trap_item, info); } static struct devlink_trap_group_item * -- 2.22.0
[RFC PATCH net-next 11/16] net: mscc: ocelot: set up logical port IDs centrally
The setup of logical port IDs is done in two places: from the inconclusively named ocelot_setup_lag and from ocelot_port_lag_leave, a function that also calls ocelot_setup_lag (which apparently does an incomplete setup of the LAG). To improve this situation, we can rename ocelot_setup_lag into ocelot_setup_logical_port_ids, and drop the "lag" argument. It will now set up the logical port IDs of all switch ports, which may be just slightly more inefficient but more maintainable. Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot.c | 47 ++ 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index ee0fcee8e09a..1a98c24af056 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1279,20 +1279,36 @@ static void ocelot_set_aggr_pgids(struct ocelot *ocelot) } } -static void ocelot_setup_lag(struct ocelot *ocelot, int lag) +/* When offloading a bonding interface, the switch ports configured under the + * same bond must have the same logical port ID, equal to the physical port ID + * of the lowest numbered physical port in that bond. Otherwise, in standalone/ + * bridged mode, each port has a logical port ID equal to its physical port ID. + */ +static void ocelot_setup_logical_port_ids(struct ocelot *ocelot) { - unsigned long bond_mask = ocelot->lags[lag]; - unsigned int p; + int port; - for_each_set_bit(p, &bond_mask, ocelot->num_phys_ports) { - u32 port_cfg = ocelot_read_gix(ocelot, ANA_PORT_PORT_CFG, p); + for (port = 0; port < ocelot->num_phys_ports; port++) { + struct ocelot_port *ocelot_port = ocelot->ports[port]; + struct net_device *bond; + + if (!ocelot_port) + continue; - port_cfg &= ~ANA_PORT_PORT_CFG_PORTID_VAL_M; + bond = ocelot_port->bond; + if (bond) { + int lag = __ffs(ocelot_get_bond_mask(ocelot, bond)); - /* Use lag port as logical port for port i */ - ocelot_write_gix(ocelot, port_cfg | -ANA_PORT_PORT_CFG_PORTID_VAL(lag), -ANA_PORT_PORT_CFG, p); + ocelot_rmw_gix(ocelot, + ANA_PORT_PORT_CFG_PORTID_VAL(lag), + ANA_PORT_PORT_CFG_PORTID_VAL_M, + ANA_PORT_PORT_CFG, port); + } else { + ocelot_rmw_gix(ocelot, + ANA_PORT_PORT_CFG_PORTID_VAL(port), + ANA_PORT_PORT_CFG_PORTID_VAL_M, + ANA_PORT_PORT_CFG, port); + } } } @@ -1320,7 +1336,7 @@ int ocelot_port_lag_join(struct ocelot *ocelot, int port, ocelot->lags[lag] |= BIT(port); } - ocelot_setup_lag(ocelot, lag); + ocelot_setup_logical_port_ids(ocelot); ocelot_apply_bridge_fwd_mask(ocelot); ocelot_set_aggr_pgids(ocelot); @@ -1331,7 +1347,6 @@ EXPORT_SYMBOL(ocelot_port_lag_join); void ocelot_port_lag_leave(struct ocelot *ocelot, int port, struct net_device *bond) { - u32 port_cfg; int i; ocelot->ports[port]->bond = NULL; @@ -1348,15 +1363,9 @@ void ocelot_port_lag_leave(struct ocelot *ocelot, int port, ocelot->lags[n] = ocelot->lags[port]; ocelot->lags[port] = 0; - - ocelot_setup_lag(ocelot, n); } - port_cfg = ocelot_read_gix(ocelot, ANA_PORT_PORT_CFG, port); - port_cfg &= ~ANA_PORT_PORT_CFG_PORTID_VAL_M; - ocelot_write_gix(ocelot, port_cfg | ANA_PORT_PORT_CFG_PORTID_VAL(port), -ANA_PORT_PORT_CFG, port); - + ocelot_setup_logical_port_ids(ocelot); ocelot_apply_bridge_fwd_mask(ocelot); ocelot_set_aggr_pgids(ocelot); } -- 2.25.1
KMSAN: uninit-value in smsc95xx_wait_eeprom (2)
Hello, syzbot found the following issue on: HEAD commit:73d62e81 kmsan: random: prevent boot-time reports in _mix_.. git tree: https://github.com/google/kmsan.git master console output: https://syzkaller.appspot.com/x/log.txt?x=178d246b50 kernel config: https://syzkaller.appspot.com/x/.config?x=eef728deea880383 dashboard link: https://syzkaller.appspot.com/bug?extid=94b1393490c2c70b781b compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project.git ca2dcbd030eadbf0aa9b660efe864ff08af6e18b) Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+94b1393490c2c70b7...@syzkaller.appspotmail.com = BUG: KMSAN: uninit-value in smsc95xx_wait_eeprom+0x223/0x3e0 drivers/net/usb/smsc95xx.c:303 CPU: 1 PID: 28836 Comm: kworker/1:1 Not tainted 5.10.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197 smsc95xx_wait_eeprom+0x223/0x3e0 drivers/net/usb/smsc95xx.c:303 smsc95xx_read_eeprom+0x46d/0xa10 drivers/net/usb/smsc95xx.c:360 smsc95xx_init_mac_address drivers/net/usb/smsc95xx.c:769 [inline] smsc95xx_bind+0x811/0x1d30 drivers/net/usb/smsc95xx.c:1090 usbnet_probe+0x1169/0x3e90 drivers/net/usb/usbnet.c:1712 usb_probe_interface+0xfcc/0x1520 drivers/usb/core/driver.c:396 really_probe+0xebd/0x2420 drivers/base/dd.c:558 driver_probe_device+0x293/0x390 drivers/base/dd.c:738 __device_attach_driver+0x63f/0x830 drivers/base/dd.c:844 bus_for_each_drv+0x2ca/0x3f0 drivers/base/bus.c:431 __device_attach+0x538/0x860 drivers/base/dd.c:912 device_initial_probe+0x4a/0x60 drivers/base/dd.c:959 bus_probe_device+0x177/0x3d0 drivers/base/bus.c:491 device_add+0x399e/0x3f20 drivers/base/core.c:2936 usb_set_configuration+0x39cf/0x4010 drivers/usb/core/message.c:2159 usb_generic_driver_probe+0x138/0x300 drivers/usb/core/generic.c:238 usb_probe_device+0x317/0x570 drivers/usb/core/driver.c:293 really_probe+0xebd/0x2420 drivers/base/dd.c:558 driver_probe_device+0x293/0x390 drivers/base/dd.c:738 __device_attach_driver+0x63f/0x830 drivers/base/dd.c:844 bus_for_each_drv+0x2ca/0x3f0 drivers/base/bus.c:431 __device_attach+0x538/0x860 drivers/base/dd.c:912 device_initial_probe+0x4a/0x60 drivers/base/dd.c:959 bus_probe_device+0x177/0x3d0 drivers/base/bus.c:491 device_add+0x399e/0x3f20 drivers/base/core.c:2936 usb_new_device+0x1bd6/0x2a30 drivers/usb/core/hub.c:2554 hub_port_connect drivers/usb/core/hub.c:5222 [inline] hub_port_connect_change drivers/usb/core/hub.c:5362 [inline] port_event drivers/usb/core/hub.c:5508 [inline] hub_event+0x5bc9/0x8890 drivers/usb/core/hub.c:5590 process_one_work+0x121c/0x1fc0 kernel/workqueue.c:2272 worker_thread+0x10cc/0x2740 kernel/workqueue.c:2418 kthread+0x51c/0x560 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Local variable buf.i.i@smsc95xx_wait_eeprom created at: __smsc95xx_read_reg drivers/net/usb/smsc95xx.c:77 [inline] smsc95xx_read_reg drivers/net/usb/smsc95xx.c:141 [inline] smsc95xx_wait_eeprom+0x9d/0x3e0 drivers/net/usb/smsc95xx.c:297 __smsc95xx_read_reg drivers/net/usb/smsc95xx.c:77 [inline] smsc95xx_read_reg drivers/net/usb/smsc95xx.c:141 [inline] smsc95xx_wait_eeprom+0x9d/0x3e0 drivers/net/usb/smsc95xx.c:297 = --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
[PATCH net-next] net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()
Simplify the return expression. Signed-off-by: Zheng Yongjun --- net/openvswitch/conntrack.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 4beb96139d77..96a49aa3a128 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -2025,15 +2025,11 @@ static int ovs_ct_limit_get_default_limit(struct ovs_ct_limit_info *info, struct sk_buff *reply) { struct ovs_zone_limit zone_limit; - int err; zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE; zone_limit.limit = info->default_limit; - err = nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit); - if (err) - return err; - return 0; + return nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit); } static int __ovs_ct_limit_get_zone_limit(struct net *net, -- 2.22.0
RE: [PATCH v4 1/6] igb: XDP xmit back fix error code
> From: sven.auha...@voleatech.de > Sent: Wednesday, November 11, 2020 10:35 PM > To: Nguyen, Anthony L ; Fijalkowski, Maciej > ; k...@kernel.org > Cc: da...@davemloft.net; intel-wired-...@lists.osuosl.org; > netdev@vger.kernel.org; nhor...@redhat.com; sassm...@redhat.com; > Penigalapati, Sandeep ; > bro...@redhat.com; pmen...@molgen.mpg.de > Subject: [PATCH v4 1/6] igb: XDP xmit back fix error code > > From: Sven Auhagen > > The igb XDP xmit back function should only return defined error codes. > > Reported-by: Dan Carpenter > Acked-by: Maciej Fijalkowski > Signed-off-by: Sven Auhagen > --- > drivers/net/ethernet/intel/igb/igb_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Tested-by: Sandeep Penigalapati
RE: [PATCH v4 4/6] igb: skb add metasize for xdp
> From: sven.auha...@voleatech.de > Sent: Wednesday, November 11, 2020 10:35 PM > To: Nguyen, Anthony L ; Fijalkowski, Maciej > ; k...@kernel.org > Cc: da...@davemloft.net; intel-wired-...@lists.osuosl.org; > netdev@vger.kernel.org; nhor...@redhat.com; sassm...@redhat.com; > Penigalapati, Sandeep ; > bro...@redhat.com; pmen...@molgen.mpg.de > Subject: [PATCH v4 4/6] igb: skb add metasize for xdp > > From: Sven Auhagen > > add metasize if it is set in xdp > > Suggested-by: Maciej Fijalkowski > Reviewed-by: Maciej Fijalkowski > Acked-by: Maciej Fijalkowski > Signed-off-by: Sven Auhagen > --- > drivers/net/ethernet/intel/igb/igb_main.c | 4 > 1 file changed, 4 insertions(+) > Tested-by: Sandeep Penigalapati
Re: [PATCH] net: 8021q: vlan: reduce noise in driver initialization
Hi "Enrico, I love your patch! Yet something to improve: [auto build test ERROR on linux/master] [also build test ERROR on net-next/master net/master linus/master v5.10-rc7 next-20201207] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Enrico-Weigelt-metux-IT-consult/net-8021q-vlan-reduce-noise-in-driver-initialization/20201208-165821 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 09162bc32c880a791c6c0668ce0745cf7958f576 config: i386-randconfig-s001-20201208 (attached as .config) compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.3-179-ga00755aa-dirty # https://github.com/0day-ci/linux/commit/7c73ca17c3872132d7bd1b9407a26dd5ed916e2c git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Enrico-Weigelt-metux-IT-consult/net-8021q-vlan-reduce-noise-in-driver-initialization/20201208-165821 git checkout 7c73ca17c3872132d7bd1b9407a26dd5ed916e2c # save the attached .config to linux build tree make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): ld: net/8021q/vlan_dev.o: in function `strlcpy': >> include/linux/string.h:346: undefined reference to `vlan_fullname' >> ld: include/linux/string.h:346: undefined reference to `vlan_version' vim +346 include/linux/string.h 6974f0c4555e28 Daniel Micay 2017-07-12 337 6974f0c4555e28 Daniel Micay 2017-07-12 338 /* defined after fortified strlen to reuse it */ 6974f0c4555e28 Daniel Micay 2017-07-12 339 extern size_t __real_strlcpy(char *, const char *, size_t) __RENAME(strlcpy); 6974f0c4555e28 Daniel Micay 2017-07-12 340 __FORTIFY_INLINE size_t strlcpy(char *p, const char *q, size_t size) 6974f0c4555e28 Daniel Micay 2017-07-12 341 { 6974f0c4555e28 Daniel Micay 2017-07-12 342size_t ret; 6974f0c4555e28 Daniel Micay 2017-07-12 343size_t p_size = __builtin_object_size(p, 0); 6974f0c4555e28 Daniel Micay 2017-07-12 344size_t q_size = __builtin_object_size(q, 0); 6974f0c4555e28 Daniel Micay 2017-07-12 345if (p_size == (size_t)-1 && q_size == (size_t)-1) 6974f0c4555e28 Daniel Micay 2017-07-12 @346return __real_strlcpy(p, q, size); 6974f0c4555e28 Daniel Micay 2017-07-12 347ret = strlen(q); 6974f0c4555e28 Daniel Micay 2017-07-12 348if (size) { 6974f0c4555e28 Daniel Micay 2017-07-12 349size_t len = (ret >= size) ? size - 1 : ret; 6974f0c4555e28 Daniel Micay 2017-07-12 350if (__builtin_constant_p(len) && len >= p_size) 6974f0c4555e28 Daniel Micay 2017-07-12 351 __write_overflow(); 6974f0c4555e28 Daniel Micay 2017-07-12 352if (len >= p_size) 6974f0c4555e28 Daniel Micay 2017-07-12 353 fortify_panic(__func__); 47227d27e2fcb0 Daniel Axtens 2020-06-03 354__underlying_memcpy(p, q, len); 6974f0c4555e28 Daniel Micay 2017-07-12 355p[len] = '\0'; 6974f0c4555e28 Daniel Micay 2017-07-12 356} 6974f0c4555e28 Daniel Micay 2017-07-12 357return ret; 6974f0c4555e28 Daniel Micay 2017-07-12 358 } 6974f0c4555e28 Daniel Micay 2017-07-12 359 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [net 3/3] can: isotp: add SF_BROADCAST support for functional addressing
On 05.12.20 22:09, Jakub Kicinski wrote: On Sat, 5 Dec 2020 21:56:33 +0100 Marc Kleine-Budde wrote: On 12/5/20 9:33 PM, Jakub Kicinski wrote: What about the (incremental?) change that Thomas Wagner posted? https://lore.kernel.org/r/20201204135557.55599-1-th...@web.de That settles it :) This change needs to got into -next and 5.11. Ok. Can you take patch 1, which is a real fix: https://lore.kernel.org/linux-can/20201204133508.742120-2-...@pengutronix.de/ Sure! Applied that one from the ML (I assumed that's what you meant). I just double-checked this mail and in fact the second patch from Marc's pull request was a real fix too: https://lore.kernel.org/linux-can/20201204133508.742120-3-...@pengutronix.de/ Btw. the missing feature which was added for completeness of the ISOTP implementation has now also integrated the improvement suggested by Thomas Wagner: https://lore.kernel.org/linux-can/20201206144731.4609-1-socket...@hartkopp.net/T/#u Would be cool if it could go into the initial iso-tp contribution as 5.10 becomes a long-term kernel. But I don't want to be pushy - treat it as your like. Many thanks, Oliver
[PATCH 1/1] mwifiex: Fix possible buffer overflows in mwifiex_config_scan
From: Zhang Xiaohui mwifiex_config_scan() calls memcpy() without checking the destination size may trigger a buffer overflower, which a local user could use to cause denial of service or the execution of arbitrary code. Fix it by putting the length check before calling memcpy(). Signed-off-by: Zhang Xiaohui --- drivers/net/wireless/marvell/mwifiex/scan.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c b/drivers/net/wireless/marvell/mwifiex/scan.c index c2a685f63..b1d90678f 100644 --- a/drivers/net/wireless/marvell/mwifiex/scan.c +++ b/drivers/net/wireless/marvell/mwifiex/scan.c @@ -930,6 +930,8 @@ mwifiex_config_scan(struct mwifiex_private *priv, "DIRECT-", 7)) wildcard_ssid_tlv->max_ssid_length = 0xfe; + if (ssid_len > 1) + ssid_len = 1; memcpy(wildcard_ssid_tlv->ssid, user_scan_in->ssid_list[i].ssid, ssid_len); -- 2.17.1
Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
On Mon, Dec 07, 2020 at 12:42:33PM -0800, Jakub Kicinski wrote: > The behavior is not entirely dissimilar to the time stamps on > multi-layered devices (e.g. DSA switches). The time stamp can either > be generated when the packet enters the device (current mlx5 behavior) > or when it actually egresses thru the MAC (what this set adds). To be useful, the time stamps must be taken on the external ports. Generating the time stamp at the DMA reception in the device doesn't even make sense, unless the delay through the device is constant. > My main concern is the user friendliness. I think there is no question > that user running ptp4l would want this mlx5 knob to be enabled. Right. > Would > we rather see a patch to ptp4l that turns per driver knob or should we > shoot for some form of an API that tells the kernel that we're > expecting ns level time accuracy? This is a hardware-specific "feature". One of the guiding principles of the linuxptp user space stack is not to become a catalog of workarounds for random hardware. IMO the kernel's API should not encourage "special" hardware either. After all, we have lots and lots of PTP hardware supported, all of them already working with the existing API just fine. My preference is for a global knob for users of this hardware, either - a compile time Kconfig option on the driver, or - some kind of sysctl/debugfs knob Thanks, Richard
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
I wonder - would it make sense to reserve two arrays of scatterlist structs and a mutex per CPU sufficient to map up to 1MiB of pages with each array while the krb5 service is in use? That way sunrpc could, say, grab the mutex, map the input and output buffers, do the entire crypto op in one go and then release the mutex - at least for big ops, small ops needn't use this service. For rxrpc/afs's use case this would probably be overkill - it's doing crypto on each packet, not on whole operations - but I could still make use of it there. However, that then limits the maximum size of an op to 1MiB, plus dangly bits on either side (which can be managed with chained scatterlist structs) and also limits the number of large simultaneous krb5 crypto ops we can do. David
Re: [PATCH v5 bpf-next 02/14] xdp: initialize xdp_buff mb bit to 0 in all XDP drivers
On Tue, 8 Dec 2020 11:31:03 +0100 Lorenzo Bianconi wrote: > > On Mon, 2020-12-07 at 22:37 +0100, Maciej Fijalkowski wrote: > > > On Mon, Dec 07, 2020 at 01:15:00PM -0800, Alexander Duyck wrote: > > > > On Mon, Dec 7, 2020 at 8:36 AM Lorenzo Bianconi > > > > wrote: > > > > > Initialize multi-buffer bit (mb) to 0 in all XDP-capable drivers. > > > > > This is a preliminary patch to enable xdp multi-buffer support. > > > > > > > > > > Signed-off-by: Lorenzo Bianconi > > > > > > > > I'm really not a fan of this design. Having to update every driver in > > > > order to initialize a field that was fragmented is a pain. At a > > > > minimum it seems like it might be time to consider introducing some > > > > sort of initializer function for this so that you can update things in > > > > one central place the next time you have to add a new field instead of > > > > having to update every individual driver that supports XDP. Otherwise > > > > this isn't going to scale going forward. +1 > > > Also, a good example of why this might be bothering for us is a fact that > > > in the meantime the dpaa driver got XDP support and this patch hasn't been > > > updated to include mb setting in that driver. > > > > > something like > > init_xdp_buff(hard_start, headroom, len, frame_sz, rxq); > > > > would work for most of the drivers. > > > > ack, agree. I will add init_xdp_buff() in v6. I do like the idea of an initialize helper function. Remember this is fast-path code and likely need to be inlined. Further more, remember that drivers can and do optimize the number of writes they do to xdp_buff. There are a number of fields in xdp_buff that only need to be initialized once per NAPI. E.g. rxq and frame_sz (some driver do change frame_sz per packet). Thus, you likely need two inlined helpers for init. Again, remember that C-compiler will generate an expensive operation (rep stos) for clearing a struct if it is initialized like this, where all member are not initialized (do NOT do this): struct xdp_buff xdp = { .rxq = rxq, .frame_sz = PAGE_SIZE, }; -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH net-next 2/4] net: mvpp2: add mvpp2_phylink_to_port() helper
On Tue, Dec 08, 2020 at 01:03:38PM +0100, Marcin Wojtas wrote: Hi Greg, Apologies for delayed response:. pon., 2 lis 2020 o 19:02 Greg Kroah-Hartman napisał(a): On Mon, Nov 02, 2020 at 06:38:54PM +0100, Marcin Wojtas wrote: > Hi Greg and Sasha, > > pt., 9 paź 2020 o 05:43 Marcin Wojtas napisał(a): > > > > Hi, > > > > sob., 20 cze 2020 o 11:21 Russell King napisał(a): > > > > > > Add a helper to convert the struct phylink_config pointer passed in > > > from phylink to the drivers internal struct mvpp2_port. > > > > > > Signed-off-by: Russell King > > > --- > > > .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 29 +-- > > > 1 file changed, 14 insertions(+), 15 deletions(-) > > > > > > diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > index 7653277d03b7..313f5a60a605 100644 > > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > @@ -4767,12 +4767,16 @@ static void mvpp2_port_copy_mac_addr(struct net_device *dev, struct mvpp2 *priv, > > > eth_hw_addr_random(dev); > > > } > > > > > > +static struct mvpp2_port *mvpp2_phylink_to_port(struct phylink_config *config) > > > +{ > > > + return container_of(config, struct mvpp2_port, phylink_config); > > > +} > > > + > > > static void mvpp2_phylink_validate(struct phylink_config *config, > > >unsigned long *supported, > > >struct phylink_link_state *state) > > > { > > > - struct mvpp2_port *port = container_of(config, struct mvpp2_port, > > > - phylink_config); > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, }; > > > > > > /* Invalid combinations */ > > > @@ -4913,8 +4917,7 @@ static void mvpp2_gmac_pcs_get_state(struct mvpp2_port *port, > > > static void mvpp2_phylink_mac_pcs_get_state(struct phylink_config *config, > > > struct phylink_link_state *state) > > > { > > > - struct mvpp2_port *port = container_of(config, struct mvpp2_port, > > > - phylink_config); > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > > > > if (port->priv->hw_version == MVPP22 && port->gop_id == 0) { > > > u32 mode = readl(port->base + MVPP22_XLG_CTRL3_REG); > > > @@ -4931,8 +4934,7 @@ static void mvpp2_phylink_mac_pcs_get_state(struct phylink_config *config, > > > > > > static void mvpp2_mac_an_restart(struct phylink_config *config) > > > { > > > - struct mvpp2_port *port = container_of(config, struct mvpp2_port, > > > - phylink_config); > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > u32 val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG); > > > > > > writel(val | MVPP2_GMAC_IN_BAND_RESTART_AN, > > > @@ -5105,13 +5107,12 @@ static void mvpp2_gmac_config(struct mvpp2_port *port, unsigned int mode, > > > static void mvpp2_mac_config(struct phylink_config *config, unsigned int mode, > > > const struct phylink_link_state *state) > > > { > > > - struct net_device *dev = to_net_dev(config->dev); > > > - struct mvpp2_port *port = netdev_priv(dev); > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > bool change_interface = port->phy_interface != state->interface; > > > > > > /* Check for invalid configuration */ > > > if (mvpp2_is_xlg(state->interface) && port->gop_id != 0) { > > > - netdev_err(dev, "Invalid mode on %s\n", dev->name); > > > + netdev_err(port->dev, "Invalid mode on %s\n", port->dev->name); > > > return; > > > } > > > > > > @@ -5151,8 +5152,7 @@ static void mvpp2_mac_link_up(struct phylink_config *config, > > > int speed, int duplex, > > > bool tx_pause, bool rx_pause) > > > { > > > - struct net_device *dev = to_net_dev(config->dev); > > > - struct mvpp2_port *port = netdev_priv(dev); > > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > > u32 val; > > > > > > if (mvpp2_is_xlg(interface)) { > > > @@ -5199,14 +5199,13 @@ static void mvpp2_mac_link_up(struct phylink_config *config, > > > > > > mvpp2_egress_enable(port); > > > mvpp2_ingress_enable(port); > > > - netif_tx_wake_all_queues(dev); > > > + netif_tx_wake_all_queues(port->dev); > > > } > > > > > > static void mvpp2_mac_link_down(struct phylink_config *config, > > > unsigned int mode, phy_interface_t interface) > > > { > > > - struct net_device *dev = to_net_dev(config->
[PATCH net-next] drivers: net: ionic: simplify the return expression of ionic_set_rxfh()
Simplify the return expression. Signed-off-by: Zheng Yongjun --- drivers/net/ethernet/pensando/ionic/ionic_ethtool.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c index 35c72d4a78b3..0832bedcb3b4 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c @@ -738,16 +738,11 @@ static int ionic_set_rxfh(struct net_device *netdev, const u32 *indir, const u8 *key, const u8 hfunc) { struct ionic_lif *lif = netdev_priv(netdev); - int err; if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP) return -EOPNOTSUPP; - err = ionic_lif_rss_config(lif, lif->rss_types, key, indir); - if (err) - return err; - - return 0; + return ionic_lif_rss_config(lif, lif->rss_types, key, indir); } static int ionic_set_tunable(struct net_device *dev, -- 2.22.0
[PATCH net-next] drivers: net: qlcnic: simplify the return expression of qlcnic_sriov_vf_shutdown()
Simplify the return expression. Signed-off-by: Zheng Yongjun --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c index 30e52f969759..dd03be3fc82a 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c @@ -2112,7 +2112,6 @@ static int qlcnic_sriov_vf_shutdown(struct pci_dev *pdev) { struct qlcnic_adapter *adapter = pci_get_drvdata(pdev); struct net_device *netdev = adapter->netdev; - int retval; netif_device_detach(netdev); qlcnic_cancel_idc_work(adapter); @@ -2125,11 +2124,7 @@ static int qlcnic_sriov_vf_shutdown(struct pci_dev *pdev) qlcnic_83xx_disable_mbx_intr(adapter); cancel_delayed_work_sync(&adapter->idc_aen_work); - retval = pci_save_state(pdev); - if (retval) - return retval; - - return 0; + return pci_save_state(pdev); } static int qlcnic_sriov_vf_resume(struct qlcnic_adapter *adapter) -- 2.22.0
[PATCH net-next] net/mlx4: simplify the return expression of mlx4_init_cq_table()
Simplify the return expression. Signed-off-by: Zheng Yongjun --- drivers/net/ethernet/mellanox/mlx4/cq.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c index 3b8576b9c2f9..68bd18ee6ee3 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c @@ -462,19 +462,14 @@ EXPORT_SYMBOL_GPL(mlx4_cq_free); int mlx4_init_cq_table(struct mlx4_dev *dev) { struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table; - int err; spin_lock_init(&cq_table->lock); INIT_RADIX_TREE(&cq_table->tree, GFP_ATOMIC); if (mlx4_is_slave(dev)) return 0; - err = mlx4_bitmap_init(&cq_table->bitmap, dev->caps.num_cqs, - dev->caps.num_cqs - 1, dev->caps.reserved_cqs, 0); - if (err) - return err; - - return 0; + return mlx4_bitmap_init(&cq_table->bitmap, dev->caps.num_cqs, + dev->caps.num_cqs - 1, dev->caps.reserved_cqs, 0); } void mlx4_cleanup_cq_table(struct mlx4_dev *dev) -- 2.22.0
Re: [PATCH net-next] net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()
On 8 Dec 2020, at 13:13, Zheng Yongjun wrote: > Simplify the return expression. > > Signed-off-by: Zheng Yongjun Change looks good to me. Reviewed-by: Eelco Chaudron
[PATCH net-next] net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()
Simplify the return expression. Signed-off-by: Zheng Yongjun --- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index c9c2962ad49f..786d2fc4b403 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -1893,13 +1893,8 @@ void esw_offloads_unload_rep(struct mlx5_eswitch *esw, u16 vport_num) static int mlx5_esw_offloads_pair(struct mlx5_eswitch *esw, struct mlx5_eswitch *peer_esw) { - int err; - err = esw_add_fdb_peer_miss_rules(esw, peer_esw->dev); - if (err) - return err; - - return 0; + return esw_add_fdb_peer_miss_rules(esw, peer_esw->dev); } static void mlx5_esw_offloads_unpair(struct mlx5_eswitch *esw) -- 2.22.0
[PATCH net-next] net: atheros: simplify the return expression of atl2_phy_setup_autoneg_adv()
Simplify the return expression. Signed-off-by: Zheng Yongjun --- drivers/net/ethernet/atheros/atlx/atl2.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/net/ethernet/atheros/atlx/atl2.c b/drivers/net/ethernet/atheros/atlx/atl2.c index 7b80d924632a..f016f2e12ee7 100644 --- a/drivers/net/ethernet/atheros/atlx/atl2.c +++ b/drivers/net/ethernet/atheros/atlx/atl2.c @@ -2549,7 +2549,6 @@ static s32 atl2_write_phy_reg(struct atl2_hw *hw, u32 reg_addr, u16 phy_data) */ static s32 atl2_phy_setup_autoneg_adv(struct atl2_hw *hw) { - s32 ret_val; s16 mii_autoneg_adv_reg; /* Read the MII Auto-Neg Advertisement Register (Address 4). */ @@ -2605,12 +2604,7 @@ static s32 atl2_phy_setup_autoneg_adv(struct atl2_hw *hw) hw->mii_autoneg_adv_reg = mii_autoneg_adv_reg; - ret_val = atl2_write_phy_reg(hw, MII_ADVERTISE, mii_autoneg_adv_reg); - - if (ret_val) - return ret_val; - - return 0; + return atl2_write_phy_reg(hw, MII_ADVERTISE, mii_autoneg_adv_reg); } /* -- 2.22.0
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
David Howells wrote: > I wonder - would it make sense to reserve two arrays of scatterlist structs > and a mutex per CPU sufficient to map up to 1MiB of pages with each array > while the krb5 service is in use? Actually, simply reserving a set per CPU is probably unnecessary. We could, say, set a minimum and a maximum on the reservations (say 2 -> 2*nr_cpus) and then allocate new ones when we run out. Then let the memory shrinker clean them up off an lru list. David
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
On Tue, 8 Dec 2020 at 14:25, David Howells wrote: > > I wonder - would it make sense to reserve two arrays of scatterlist structs > and a mutex per CPU sufficient to map up to 1MiB of pages with each array > while the krb5 service is in use? > > That way sunrpc could, say, grab the mutex, map the input and output buffers, > do the entire crypto op in one go and then release the mutex - at least for > big ops, small ops needn't use this service. > > For rxrpc/afs's use case this would probably be overkill - it's doing crypto > on each packet, not on whole operations - but I could still make use of it > there. > > However, that then limits the maximum size of an op to 1MiB, plus dangly bits > on either side (which can be managed with chained scatterlist structs) and > also limits the number of large simultaneous krb5 crypto ops we can do. > Apparently, it is permitted for gss_krb5_cts_crypt() to do a kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is being invoked, and so I don't see why it wouldn't be possible to simply kmalloc() a scatterlist[] of the appropriate size, populate it with all the pages, bufs and whatever else gets passed into the skcipher, and pass it into the skcipher in one go.
[PATCH v2 net-next 0/2] nfc: s3fwrn5: Change I2C interrupt trigger to EDGE_RISING
From: Bongsu Jeon For stable Samsung's I2C interrupt handling, I changed the interrupt trigger from IRQ_TYPE_LEVEL_HIGH to IRQ_TYPE_EDGE_RISING and removed the hard coded interrupt trigger type in the i2c module for the flexible control. 1/2 is the changed dt binding for the edge rising trigger. 2/2 is to remove the hard coded interrupt trigger type in the i2c module. ChangeLog: v2: 2/2 - remove the hard coded interrupt trigger type. Bongsu Jeon (2): dt-bindings: net: nfc: s3fwrn5: Change I2C interrupt trigger to EDGE_RISING nfc: s3fwrn5: Remove hard coded interrupt trigger type from the i2c module .../devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml | 2 +- drivers/nfc/s3fwrn5/i2c.c | 8 +++- 2 files changed, 8 insertions(+), 2 deletions(-) -- 2.17.1
[PATCH v2 net-next 1/2] dt-bindings: net: nfc: s3fwrn5: Change I2C interrupt trigger type
From: Bongsu Jeon Change interrupt trigger from IRQ_TYPE_LEVEL_HIGH to IRQ_TYPE_EDGE_RISING for stable NFC I2C interrupt handling. Samsung's NFC Firmware sends an i2c frame as below. 1. NFC Firmware sets the GPIO(interrupt pin) high when there is an i2c frame to send. 2. If the CPU's I2C master has received the i2c frame, NFC F/W sets the GPIO low. NFC driver's i2c interrupt handler would be called in the abnormal case as the NFC FW task of number 2 is delayed because of other high priority tasks. In that case, NFC driver will try to receive the i2c frame but there isn't any i2c frame to send in NFC. It would cause an I2C communication problem. This case would hardly happen. But, I changed the interrupt as a defense code. If Driver uses the TRIGGER_RISING instead of the LEVEL trigger, there would be no problem even if the NFC FW task is delayed. Signed-off-by: Bongsu Jeon --- Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml b/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml index ca3904bf90e0..477066e2b821 100644 --- a/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml +++ b/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml @@ -76,7 +76,7 @@ examples: reg = <0x27>; interrupt-parent = <&gpa1>; -interrupts = <3 IRQ_TYPE_LEVEL_HIGH>; +interrupts = <3 IRQ_TYPE_EDGE_RISING>; en-gpios = <&gpf1 4 GPIO_ACTIVE_HIGH>; wake-gpios = <&gpj0 2 GPIO_ACTIVE_HIGH>; -- 2.17.1
[PATCH v2 net-next 2/2] nfc: s3fwrn5: Remove hard coded interrupt trigger type from the i2c module
From: Bongsu Jeon For the flexible control of interrupt trigger type, remove the hard coded interrupt trigger type in the i2c module. The trigger type will be loaded from a dts. Signed-off-by: Bongsu Jeon --- drivers/nfc/s3fwrn5/i2c.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/nfc/s3fwrn5/i2c.c b/drivers/nfc/s3fwrn5/i2c.c index e1bdde105f24..42f1f610ac2c 100644 --- a/drivers/nfc/s3fwrn5/i2c.c +++ b/drivers/nfc/s3fwrn5/i2c.c @@ -179,6 +179,8 @@ static int s3fwrn5_i2c_probe(struct i2c_client *client, const struct i2c_device_id *id) { struct s3fwrn5_i2c_phy *phy; + struct irq_data *irq_data; + unsigned long irqflags; int ret; phy = devm_kzalloc(&client->dev, sizeof(*phy), GFP_KERNEL); @@ -212,8 +214,11 @@ static int s3fwrn5_i2c_probe(struct i2c_client *client, if (ret < 0) return ret; + irq_data = irq_get_irq_data(client->irq); + irqflags = irqd_get_trigger_type(irq_data) | IRQF_ONESHOT; + ret = devm_request_threaded_irq(&client->dev, phy->i2c_dev->irq, NULL, - s3fwrn5_i2c_irq_thread_fn, IRQF_TRIGGER_HIGH | IRQF_ONESHOT, + s3fwrn5_i2c_irq_thread_fn, irqflags, S3FWRN5_I2C_DRIVER_NAME, phy); if (ret) s3fwrn5_remove(phy->common.ndev); -- 2.17.1
Re: Why the auxiliary cipher in gss_krb5_crypto.c?
Ard Biesheuvel wrote: > Apparently, it is permitted for gss_krb5_cts_crypt() to do a > kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is > being invoked, and so I don't see why it wouldn't be possible to > simply kmalloc() a scatterlist[] of the appropriate size, populate it > with all the pages, bufs and whatever else gets passed into the > skcipher, and pass it into the skcipher in one go. I never said it wasn't possible. But doing a pair of order-1 allocations from there might have a significant detrimental effect on performance - in which case Trond and co. will say "no". Remember: to crypt 1MiB of data on a 64-bit machine requires 2 x minimum 8KiB scatterlist arrays. That's assuming the pages in the middle are contiguous, which might not be the case for a direct I/O read/write. So for the DIO case, it could be involve an order-2 allocation (or chaining of single pages). David
Re: [PATCH v2] net: dsa: ksz8795: adjust CPU link to host interface
> > Hi Jean > > > > I never said i was too specific to your board. There are other boards > > using different switches like this. This is where the commit message > > is so important. Without understanding Why? it is hard to point you in > > the right direction. > > > > So you setup is: > > > > SoC - MAC - PHY - PHY - MAC - Switch. > > > > The SoC MAC driver is looking after the first PHY? > > No, the connection is at the MAC level, via RGMII but it is missing the MDC/ > MDIO signals. That means we have to fix the auto-neg parameters from the DT. So the PHY is there, but you cannot talk to it? It has strapping resisters to make it auto-neg to the other PHY? Some switches default their CPU port to the maximum speed the port can do. But not all do. It is worth checking that. > On the 4.14 LTS kernel we are working with, the setup of the parameters is > done > via adjust_link. Since the phylink conversion adjust_link is not required > anymore, is this correct? 4.14 is dead in terms of development. Anything you contribute needs to be for net-next, and then you need to figure out how to backport it. Using v5.4 will help with that, since it is much closer, and v5.10 will be LTS. Given the change to phylink, you probably want as new a kernel as possible. If you put a fixed-link property in the "CPU" node, phylink should do the right thing for you. Andrew