date:20201208

Re: [PATCH 065/141] airo: Fix fall-through warnings for Clang

2020-12-08 Thread Kalle Valo

"Gustavo A. R. Silva"  wrote:

> In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
> by explicitly adding a break statement instead of letting the code fall
> through to the next case.
> 
> Link: https://github.com/KSPP/linux/issues/115
> Signed-off-by: Gustavo A. R. Silva 

4 patches applied to wireless-drivers-next.git, thanks.

48264b23fade airo: Fix fall-through warnings for Clang
f48d7dccb3e4 rt2x00: Fix fall-through warnings for Clang
0662fbebf4fb rtw88: Fix fall-through warnings for Clang
18572b0b5493 zd1201: Fix fall-through warnings for Clang

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/b3c0f74f5b6e6bff9f1609b310319b6fdd9ee205.1605896059.git.gustavo...@kernel.org/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [PATCH v1 bpf-next 03/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.

2020-12-08 Thread Martin KaFai Lau

On Tue, Dec 08, 2020 at 03:27:14PM +0900, Kuniyuki Iwashima wrote:
> From:   Martin KaFai Lau 
> Date:   Mon, 7 Dec 2020 12:14:38 -0800
> > On Sun, Dec 06, 2020 at 01:03:07AM +0900, Kuniyuki Iwashima wrote:
> > > From:   Martin KaFai Lau 
> > > Date:   Fri, 4 Dec 2020 17:42:41 -0800
> > > > On Tue, Dec 01, 2020 at 11:44:10PM +0900, Kuniyuki Iwashima wrote:
> > > > [ ... ]
> > > > > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
> > > > > index fd133516ac0e..60d7c1f28809 100644
> > > > > --- a/net/core/sock_reuseport.c
> > > > > +++ b/net/core/sock_reuseport.c
> > > > > @@ -216,9 +216,11 @@ int reuseport_add_sock(struct sock *sk, struct 
> > > > > sock *sk2, bool bind_inany)
> > > > >  }
> > > > >  EXPORT_SYMBOL(reuseport_add_sock);
> > > > >  
> > > > > -void reuseport_detach_sock(struct sock *sk)
> > > > > +struct sock *reuseport_detach_sock(struct sock *sk)
> > > > >  {
> > > > >   struct sock_reuseport *reuse;
> > > > > + struct bpf_prog *prog;
> > > > > + struct sock *nsk = NULL;
> > > > >   int i;
> > > > >  
> > > > >   spin_lock_bh(&reuseport_lock);
> > > > > @@ -242,8 +244,12 @@ void reuseport_detach_sock(struct sock *sk)
> > > > >  
> > > > >   reuse->num_socks--;
> > > > >   reuse->socks[i] = reuse->socks[reuse->num_socks];
> > > > > + prog = rcu_dereference(reuse->prog);
> > > > Is it under rcu_read_lock() here?
> > > 
> > > reuseport_lock is locked in this function, and we do not modify the prog,
> > > but is rcu_dereference_protected() preferable?
> > > 
> > > ---8<---
> > > prog = rcu_dereference_protected(reuse->prog,
> > >lockdep_is_held(&reuseport_lock));
> > > ---8<---
> > It is not only reuse->prog.  Other things also require rcu_read_lock(),
> > e.g. please take a look at __htab_map_lookup_elem().
> > 
> > The TCP_LISTEN sk (selected by bpf to be the target of the migration)
> > is also protected by rcu.
> 
> Thank you, I will use rcu_read_lock() and rcu_dereference() in v3 patchset.
> 
> 
> > I am surprised there is no WARNING in the test.
> > Do you have the needed DEBUG_LOCK* config enabled?
> 
> Yes, DEBUG_LOCK* was 'y', but rcu_dereference() without rcu_read_lock()
> does not show warnings...
I would at least expect the "WARN_ON_ONCE(!rcu_read_lock_held() ...)"
from __htab_map_lookup_elem() should fire in your test
example in the last patch.

It is better to check the config before sending v3.

[ ... ]

> > > > > diff --git a/net/ipv4/inet_connection_sock.c 
> > > > > b/net/ipv4/inet_connection_sock.c
> > > > > index 1451aa9712b0..b27241ea96bd 100644
> > > > > --- a/net/ipv4/inet_connection_sock.c
> > > > > +++ b/net/ipv4/inet_connection_sock.c
> > > > > @@ -992,6 +992,36 @@ struct sock *inet_csk_reqsk_queue_add(struct 
> > > > > sock *sk,
> > > > >  }
> > > > >  EXPORT_SYMBOL(inet_csk_reqsk_queue_add);
> > > > >  
> > > > > +void inet_csk_reqsk_queue_migrate(struct sock *sk, struct sock *nsk)
> > > > > +{
> > > > > + struct request_sock_queue *old_accept_queue, *new_accept_queue;
> > > > > +
> > > > > + old_accept_queue = &inet_csk(sk)->icsk_accept_queue;
> > > > > + new_accept_queue = &inet_csk(nsk)->icsk_accept_queue;
> > > > > +
> > > > > + spin_lock(&old_accept_queue->rskq_lock);
> > > > > + spin_lock(&new_accept_queue->rskq_lock);
> > > > I am also not very thrilled on this double spin_lock.
> > > > Can this be done in (or like) inet_csk_listen_stop() instead?
> > > 
> > > It will be possible to migrate sockets in inet_csk_listen_stop(), but I
> > > think it is better to do it just after reuseport_detach_sock() becuase we
> > > can select a different listener (almost) every time at a lower cost by
> > > selecting the moved socket and pass it to inet_csk_reqsk_queue_migrate()
> > > easily.
> > I don't see the "lower cost" point.  Please elaborate.
> 
> In reuseport_select_sock(), we pass sk_hash of the request socket to
> reciprocal_scale() and generate a random index for socks[] to select
> a different listener every time.
> On the other hand, we do not have request sockets in unhash path and
> sk_hash of the listener is always 0, so we have to generate a random number
> in another way. In reuseport_detach_sock(), we can use the index of the
> moved socket, but we do not have it in inet_csk_listen_stop(), so we have
> to generate a random number in inet_csk_listen_stop().
> I think it is at lower cost to use the index of the moved socket.
Generate a random number is not a big deal for the migration code path.

Also, I really still failed to see a particular way that the kernel
pick will help in the migration case.  The kernel has no clue
on how to select the right process to migrate to without
a proper policy signal from the user.  They are all as bad as
a random pick.  I am not sure this migration feature is
even useful if there is no bpf prog attached to define the policy.
That said, if it is still desired to do a random pick by kernel when
there is no

Re: [PATCH 00/20] ethernet: ucc_geth: assorted fixes and simplifications

2020-12-08 Thread Rasmus Villemoes

On 08/12/2020 04.07, Qiang Zhao wrote:
> On 06/12/2020 05:12, Rasmus Villemoes  wrote:
> 

>> I think patch 2 is a bug fix as well, but I'd like someone from NXP to 
>> comment.
> 
> It 's ok for me.

I was hoping for something a bit more than that. Can you please go check
with the people who made the hardware and those who wrote the manual
(probably not the same ones) what is actually up and down, and then
report on what they said.

It's fairly obvious that allocating 192 bytes instead of 128 should
never hurt (unless we run out of muram), but it would be nice with an
official "Yes, table 8-111 is wrong, it should say 192", or
alternatively, "No, table 8-53 is wrong, those MTU etc. fields don't
really exist". Extra points for providing details such as "first
revision of the IP had $foo, but that was never shipped in real
products, then $bar was changed", etc.

Thanks,
Rasmus

Re: [PATCH v1 bpf-next 03/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.

2020-12-08 Thread Kuniyuki Iwashima

From:   Martin KaFai Lau 
Date:   Mon, 7 Dec 2020 23:34:41 -0800
> On Tue, Dec 08, 2020 at 03:31:34PM +0900, Kuniyuki Iwashima wrote:
> > From:   Martin KaFai Lau 
> > Date:   Mon, 7 Dec 2020 12:33:15 -0800
> > > On Thu, Dec 03, 2020 at 11:14:24PM +0900, Kuniyuki Iwashima wrote:
> > > > From:   Eric Dumazet 
> > > > Date:   Tue, 1 Dec 2020 16:25:51 +0100
> > > > > On 12/1/20 3:44 PM, Kuniyuki Iwashima wrote:
> > > > > > This patch lets reuseport_detach_sock() return a pointer of struct 
> > > > > > sock,
> > > > > > which is used only by inet_unhash(). If it is not NULL,
> > > > > > inet_csk_reqsk_queue_migrate() migrates TCP_ESTABLISHED/TCP_SYN_RECV
> > > > > > sockets from the closing listener to the selected one.
> > > > > > 
> > > > > > Listening sockets hold incoming connections as a linked list of 
> > > > > > struct
> > > > > > request_sock in the accept queue, and each request has reference to 
> > > > > > a full
> > > > > > socket and its listener. In inet_csk_reqsk_queue_migrate(), we only 
> > > > > > unlink
> > > > > > the requests from the closing listener's queue and relink them to 
> > > > > > the head
> > > > > > of the new listener's queue. We do not process each request and its
> > > > > > reference to the listener, so the migration completes in O(1) time
> > > > > > complexity. However, in the case of TCP_SYN_RECV sockets, we take 
> > > > > > special
> > > > > > care in the next commit.
> > > > > > 
> > > > > > By default, the kernel selects a new listener randomly. In order to 
> > > > > > pick
> > > > > > out a different socket every time, we select the last element of 
> > > > > > socks[] as
> > > > > > the new listener. This behaviour is based on how the kernel moves 
> > > > > > sockets
> > > > > > in socks[]. (See also [1])
> > > > > > 
> > > > > > Basically, in order to redistribute sockets evenly, we have to use 
> > > > > > an eBPF
> > > > > > program called in the later commit, but as the side effect of such 
> > > > > > default
> > > > > > selection, the kernel can redistribute old requests evenly to new 
> > > > > > listeners
> > > > > > for a specific case where the application replaces listeners by
> > > > > > generations.
> > > > > > 
> > > > > > For example, we call listen() for four sockets (A, B, C, D), and 
> > > > > > close the
> > > > > > first two by turns. The sockets move in socks[] like below.
> > > > > > 
> > > > > >   socks[0] : A <-.  socks[0] : D  socks[0] : D
> > > > > >   socks[1] : B   |  =>  socks[1] : B <-.  =>  socks[1] : C
> > > > > >   socks[2] : C   |  socks[2] : C --'
> > > > > >   socks[3] : D --'
> > > > > > 
> > > > > > Then, if C and D have newer settings than A and B, and each socket 
> > > > > > has a
> > > > > > request (a, b, c, d) in their accept queue, we can redistribute old
> > > > > > requests evenly to new listeners.
> > > > > > 
> > > > > >   socks[0] : A (a) <-.  socks[0] : D (a + d)  socks[0] : D 
> > > > > > (a + d)
> > > > > >   socks[1] : B (b)   |  =>  socks[1] : B (b) <-.  =>  socks[1] : C 
> > > > > > (b + c)
> > > > > >   socks[2] : C (c)   |  socks[2] : C (c) --'
> > > > > >   socks[3] : D (d) --'
> > > > > > 
> > > > > > Here, (A, D) or (B, C) can have different application settings, but 
> > > > > > they
> > > > > > MUST have the same settings at the socket API level; otherwise, 
> > > > > > unexpected
> > > > > > error may happen. For instance, if only the new listeners have
> > > > > > TCP_SAVE_SYN, old requests do not have SYN data, so the application 
> > > > > > will
> > > > > > face inconsistency and cause an error.
> > > > > > 
> > > > > > Therefore, if there are different kinds of sockets, we must attach 
> > > > > > an eBPF
> > > > > > program described in later commits.
> > > > > > 
> > > > > > Link: 
> > > > > > https://lore.kernel.org/netdev/CAEfhGiyG8Y_amDZ2C8dQoQqjZJMHjTY76b=KBkTKcBtA=dh...@mail.gmail.com/
> > > > > > Reviewed-by: Benjamin Herrenschmidt 
> > > > > > Signed-off-by: Kuniyuki Iwashima 
> > > > > > ---
> > > > > >  include/net/inet_connection_sock.h |  1 +
> > > > > >  include/net/sock_reuseport.h   |  2 +-
> > > > > >  net/core/sock_reuseport.c  | 10 +-
> > > > > >  net/ipv4/inet_connection_sock.c| 30 
> > > > > > ++
> > > > > >  net/ipv4/inet_hashtables.c |  9 +++--
> > > > > >  5 files changed, 48 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/net/inet_connection_sock.h 
> > > > > > b/include/net/inet_connection_sock.h
> > > > > > index 7338b3865a2a..2ea2d743f8fc 100644
> > > > > > --- a/include/net/inet_connection_sock.h
> > > > > > +++ b/include/net/inet_connection_sock.h
> > > > > > @@ -260,6 +260,7 @@ struct dst_entry 
> > > > > > *inet_csk_route_child_sock(const struct sock *sk,
> > > > > >  struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
> > > > > >   struct request_sock *req,
> > > > > >   struct

[PATCHv3 bpf-next] samples/bpf: add xdp program on egress for xdp_redirect_map

2020-12-08 Thread Hangbin Liu

This patch add a xdp program on egress to show that we can modify
the packet on egress. In this sample we will set the pkt's src
mac to egress's mac address. The xdp_prog will be attached when
-X option supplied.

Signed-off-by: Hangbin Liu 
---
v3:
a) modify the src mac address based on egress mac

v2:
a) use pkt counter instead of IP ttl modification on egress program
b) make the egress program selectable by option -X
---
 samples/bpf/xdp_redirect_map_kern.c |  60 ++-
 samples/bpf/xdp_redirect_map_user.c | 153 
 2 files changed, 168 insertions(+), 45 deletions(-)

diff --git a/samples/bpf/xdp_redirect_map_kern.c 
b/samples/bpf/xdp_redirect_map_kern.c
index 6489352ab7a4..6b2164722649 100644
--- a/samples/bpf/xdp_redirect_map_kern.c
+++ b/samples/bpf/xdp_redirect_map_kern.c
@@ -19,12 +19,22 @@
 #include 
 #include 
 
+/* The 2nd xdp prog on egress does not support skb mode, so we define two
+ * maps, tx_port_general and tx_port_native.
+ */
 struct {
__uint(type, BPF_MAP_TYPE_DEVMAP);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
__uint(max_entries, 100);
-} tx_port SEC(".maps");
+} tx_port_general SEC(".maps");
+
+struct {
+   __uint(type, BPF_MAP_TYPE_DEVMAP);
+   __uint(key_size, sizeof(int));
+   __uint(value_size, sizeof(struct bpf_devmap_val));
+   __uint(max_entries, 100);
+} tx_port_native SEC(".maps");
 
 /* Count RX packets, as XDP bpf_prog doesn't get direct TX-success
  * feedback.  Redirect TX errors can be caught via a tracepoint.
@@ -36,6 +46,14 @@ struct {
__uint(max_entries, 1);
 } rxcnt SEC(".maps");
 
+/* map to stroe egress interface mac address */
+struct {
+   __uint(type, BPF_MAP_TYPE_ARRAY);
+   __type(key, u32);
+   __type(value, __be64);
+   __uint(max_entries, 1);
+} tx_mac SEC(".maps");
+
 static void swap_src_dst_mac(void *data)
 {
unsigned short *p = data;
@@ -52,17 +70,16 @@ static void swap_src_dst_mac(void *data)
p[5] = dst[2];
 }
 
-SEC("xdp_redirect_map")
-int xdp_redirect_map_prog(struct xdp_md *ctx)
+static int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map)
 {
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
struct ethhdr *eth = data;
int rc = XDP_DROP;
-   int vport, port = 0, m = 0;
long *value;
u32 key = 0;
u64 nh_off;
+   int vport;
 
nh_off = sizeof(*eth);
if (data + nh_off > data_end)
@@ -79,7 +96,40 @@ int xdp_redirect_map_prog(struct xdp_md *ctx)
swap_src_dst_mac(data);
 
/* send packet out physical port */
-   return bpf_redirect_map(&tx_port, vport, 0);
+   return bpf_redirect_map(redirect_map, vport, 0);
+}
+
+SEC("xdp_redirect_general")
+int xdp_redirect_map_general(struct xdp_md *ctx)
+{
+   return xdp_redirect_map(ctx, &tx_port_general);
+}
+
+SEC("xdp_redirect_native")
+int xdp_redirect_map_native(struct xdp_md *ctx)
+{
+   return xdp_redirect_map(ctx, &tx_port_native);
+}
+
+SEC("xdp_devmap/map_prog")
+int xdp_redirect_map_egress(struct xdp_md *ctx)
+{
+   void *data_end = (void *)(long)ctx->data_end;
+   void *data = (void *)(long)ctx->data;
+   struct ethhdr *eth = data;
+   __be64 *mac;
+   u32 key = 0;
+   u64 nh_off;
+
+   nh_off = sizeof(*eth);
+   if (data + nh_off > data_end)
+   return XDP_DROP;
+
+   mac = bpf_map_lookup_elem(&tx_mac, &key);
+   if (mac)
+   __builtin_memcpy(eth->h_source, mac, ETH_ALEN);
+
+   return XDP_PASS;
 }
 
 /* Redirect require an XDP bpf_prog loaded on the TX device */
diff --git a/samples/bpf/xdp_redirect_map_user.c 
b/samples/bpf/xdp_redirect_map_user.c
index 31131b6e7782..19636045c8dc 100644
--- a/samples/bpf/xdp_redirect_map_user.c
+++ b/samples/bpf/xdp_redirect_map_user.c
@@ -14,6 +14,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include "bpf_util.h"
 #include 
@@ -21,7 +25,8 @@
 
 static int ifindex_in;
 static int ifindex_out;
-static bool ifindex_out_xdp_dummy_attached = true;
+static bool ifindex_out_xdp_dummy_attached = false;
+static bool xdp_devmap_attached = false;
 static __u32 prog_id;
 static __u32 dummy_prog_id;
 
@@ -83,6 +88,29 @@ static void poll_stats(int interval, int ifindex)
}
 }
 
+static int get_mac_addr(unsigned int ifindex_out, void *mac_addr)
+{
+   struct ifreq ifr;
+   char ifname[IF_NAMESIZE];
+   int fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+   if (fd < 0)
+   return -1;
+
+   if (!if_indextoname(ifindex_out, ifname))
+   return -1;
+
+   strcpy(ifr.ifr_name, ifname);
+
+   if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0)
+   return -1;
+
+   memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char));
+   close(fd);
+
+   return 0;
+}
+
 static void usage(const char *prog)
 {
fprintf(stderr,
@@ -

Re: [PATCH] net: rmnet: Adjust virtual device MTU on real device capability

2020-12-08 Thread subashab


What about just returning an error on NETDEV_PRECHANGEMTU notification
to prevent real device MTU change while virtual rmnet devices are
linked? Not sure there is a more proper and thread safe way to manager
that otherwise.


Can't you copy what vlan devices do?  That'd seem like a reasonable and
well tested precedent, no?


Could you try this patch. I've tried addressing most of the conditions 
here.
I haven't seen any issues with updating the MTU when rmnet devices are 
linked.


diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c

index fcdecdd..8d51b0c 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -26,7 +26,7 @@ static int rmnet_is_real_dev_registered(const struct 
net_device *real_dev)

 }

 /* Needs rtnl lock */
-static struct rmnet_port*
+struct rmnet_port*
 rmnet_get_port_rtnl(const struct net_device *real_dev)
 {
return rtnl_dereference(real_dev->rx_handler_data);
@@ -253,7 +253,10 @@ static int rmnet_config_notify_cb(struct 
notifier_block *nb,

netdev_dbg(real_dev, "Kernel unregister\n");
rmnet_force_unassociate_device(real_dev);
break;
-
+   case NETDEV_CHANGEMTU:
+   if (rmnet_vnd_validate_real_dev_mtu(real_dev))
+   return NOTIFY_BAD;
+   break;
default:
break;
}
@@ -329,9 +332,17 @@ static int rmnet_changelink(struct net_device *dev, 
struct nlattr *tb[],


if (data[IFLA_RMNET_FLAGS]) {
struct ifla_rmnet_flags *flags;
+   u32 old_data_format;

+   old_data_format = port->data_format;
flags = nla_data(data[IFLA_RMNET_FLAGS]);
port->data_format = flags->flags & flags->mask;
+
+   if (rmnet_vnd_update_dev_mtu(port, real_dev)) {
+   port->data_format = old_data_format;
+   NL_SET_ERR_MSG_MOD(extack, "Invalid MTU on real dev");
+   return -EINVAL;
+   }
}

return 0;
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h

index be51598..8d8d469 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -73,4 +73,6 @@ int rmnet_add_bridge(struct net_device *rmnet_dev,
 struct netlink_ext_ack *extack);
 int rmnet_del_bridge(struct net_device *rmnet_dev,
 struct net_device *slave_dev);
+struct rmnet_port*
+rmnet_get_port_rtnl(const struct net_device *real_dev);
 #endif /* _RMNET_CONFIG_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c

index d58b51d..df87883 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -58,9 +58,30 @@ static netdev_tx_t rmnet_vnd_start_xmit(struct 
sk_buff *skb,

return NETDEV_TX_OK;
 }

+static int rmnet_vnd_headroom(struct net_device *real_dev)
+{
+   struct rmnet_port *port;
+   u32 headroom;
+
+   port = rmnet_get_port_rtnl(real_dev);
+
+   headroom = sizeof(struct rmnet_map_header);
+
+   if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV4)
+   headroom += sizeof(struct rmnet_map_dl_csum_trailer);
+
+   return headroom;
+}
+
 static int rmnet_vnd_change_mtu(struct net_device *rmnet_dev, int 
new_mtu)

 {
-   if (new_mtu < 0 || new_mtu > RMNET_MAX_PACKET_SIZE)
+   struct rmnet_priv *priv = netdev_priv(rmnet_dev);
+   u32 headroom;
+
+   headroom = rmnet_vnd_headroom(priv->real_dev);
+
+   if (new_mtu < 0 || new_mtu > RMNET_MAX_PACKET_SIZE ||
+   new_mtu > (priv->real_dev->mtu - headroom))
return -EINVAL;

rmnet_dev->mtu = new_mtu;
@@ -229,6 +250,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device 
*rmnet_dev,


 {
struct rmnet_priv *priv = netdev_priv(rmnet_dev);
+   u32 headroom;
int rc;

if (rmnet_get_endpoint(port, id)) {
@@ -242,6 +264,13 @@ int rmnet_vnd_newlink(u8 id, struct net_device 
*rmnet_dev,


priv->real_dev = real_dev;

+   headroom = rmnet_vnd_headroom(real_dev);
+
+   if (rmnet_vnd_change_mtu(rmnet_dev, real_dev->mtu - headroom)) {
+   NL_SET_ERR_MSG_MOD(extack, "Invalid MTU on real dev");
+   return -EINVAL;
+   }
+
rc = register_netdevice(rmnet_dev);
if (!rc) {
ep->egress_dev = rmnet_dev;
@@ -283,3 +312,51 @@ int rmnet_vnd_do_flow_control(struct net_device 
*rmnet_dev, int enable)


return 0;
 }
+
+int rmnet_vnd_validate_real_dev_mtu(struct net_device *real_dev)
+{
+   struct hlist_node *tmp_ep;
+   struct rmnet_endpoint *ep;
+   struct rmnet_port *port;
+   unsigned long bkt_ep;
+

Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread Ard Biesheuvel

On Mon, 7 Dec 2020 at 15:15, David Howells  wrote:
>
> Ard Biesheuvel  wrote:
>
> > > I wonder if it would help if the input buffer and output buffer didn't
> > > have to correspond exactly in usage - ie. the output buffer could be used
> > > at a slower rate than the input to allow for buffering inside the crypto
> > > algorithm.
> > >
> >
> > I don't follow - how could one be used at a slower rate?
>
> I mean that the crypto algorithm might need to buffer the last part of the
> input until it has a block's worth before it can write to the output.
>

This is what is typically handled transparently by the driver. When
you populate a scatterlist, it doesn't matter how misaligned the
individual elements are, the scatterlist walker will always present
the data in chunks that the crypto algorithm can manage. This is why
using a single scatterlist for the entire input is preferable in
general.

> > > The hashes corresponding to the kerberos enctypes I'm supporting are:
> > >
> > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96.
> > >
> > > HMAC-SHA256 for aes128-cts-hmac-sha256-128
> > >
> > > HMAC-SHA384 for aes256-cts-hmac-sha384-192
> > >
> > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac
> > >
> > > I'm not sure you can support all of those with the instructions available.
> >
> > It depends on whether the caller can make use of the authenc()
> > pattern, which is a type of AEAD we support.
>
> Interesting.  I didn't realise AEAD was an API.
>
> > There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)),
> > including h/w accelerated ones, but none that implement ciphertext
> > stealing. So that means that, even if you manage to use the AEAD layer to
> > perform both at the same time, the generic authenc() template will perform
> > the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes,
> > respectively, which won't give you any benefit until accelerated
> > implementations turn up that perform the whole operation in one pass over
> > the input. And even then, I don't think the performance benefit will be
> > worth it.
>
> Also, the rfc8009 variants that use AES with SHA256/384 hash the ciphertext,
> not the plaintext.
>
> For the moment, it's probably not worth worrying about, then.  If I can manage
> to abstract the sunrpc bits out into a krb5 library, we can improve the
> library later.
>

Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set

2020-12-08 Thread Jesper Dangaard Brouer

On Mon, 7 Dec 2020 18:01:00 -0700
David Ahern  wrote:

> On 12/7/20 1:52 PM, John Fastabend wrote:
> >>
> >> I think we need to keep XDP_TX action separate, because I think that
> >> there are use-cases where the we want to disable XDP_TX due to end-user
> >> policy or hardware limitations.  
> > 
> > How about we discover this at load time though. 

Nitpick at XDP "attach" time. The general disconnect between BPF and
XDP is that BPF can verify at "load" time (as kernel knows what it
support) while XDP can have different support/features per driver, and
cannot do this until attachment time. (See later issue with tail calls).
(All other BPF-hooks don't have this issue)

> > Meaning if the program
> > doesn't use XDP_TX then the hardware can skip resource allocations for
> > it. I think we could have verifier or extra pass discover the use of
> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
> >   
> 
> This was discussed in the context of virtio_net some months back - it is
> hard to impossible to know a program will not return XDP_TX (e.g., value
> comes from a map).

It is hard, and sometimes not possible.  For maps the workaround is
that BPF-programmer adds a bound check on values from the map. If not
doing that the verifier have to assume all possible return codes are
used by BPF-prog.

The real nemesis is program tail calls, that can be added dynamically
after the XDP program is attached.  It is at attachment time that
changing the NIC resources is possible.  So, for program tail calls the
verifier have to assume all possible return codes are used by BPF-prog.

BPF now have function calls and function replace right(?)  How does
this affect this detection of possible return codes?

> Flipping that around, what if the program attach indicates whether
> XDP_TX could be returned. If so, driver manages the resource needs. If
> not, no resource needed and if the program violates that and returns
> XDP_TX the packet is dropped.

I do like this idea, as IMHO we do need something that is connected
with the BPF-prog, that describe what resources the program request
(either like above via detecting this in verifier, or simply manually
configuring this in the BPF-prog ELF file)

The main idea is that we all (I assume) want to provide a better
end-user interface/experience. By direct feedback to the end-user that
"loading+attaching" this XDP BPF-prog will not work, as e.g. driver
don't support a specific return code.  Thus, we need to reject
"loading+attaching" if features cannot be satisfied.

We need a solution as; today it is causing frustration for end-users
that packets can be (silently) dropped by XDP, e.g. if driver don't
support XDP_REDIRECT.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH net] net: ll_temac: Fix potential NULL dereference in temac_probe()

2020-12-08 Thread Esben Haabendal

Zhang Changzhong  writes:

> platform_get_resource() may fail and in this case a NULL dereference
> will occur.
>
> Fix it to use devm_platform_ioremap_resource() instead of calling
> platform_get_resource() and devm_ioremap().
>
> This is detected by Coccinelle semantic patch.
>
> @@
> expression pdev, res, n, t, e, e1, e2;
> @@
>
> res = \(platform_get_resource\|platform_get_resource_byname\)(pdev, t, n);
> + if (!res)
> +   return -EINVAL;
> ... when != res == NULL
> e = devm_ioremap(e1, res->start, e2);
>
> Fixes: 8425c41d1ef7 ("net: ll_temac: Extend support to non-device-tree 
> platforms")
> Signed-off-by: Zhang Changzhong 
> ---
>  drivers/net/ethernet/xilinx/ll_temac_main.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c 
> b/drivers/net/ethernet/xilinx/ll_temac_main.c
> index 60c199f..0301853 100644
> --- a/drivers/net/ethernet/xilinx/ll_temac_main.c
> +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c
> @@ -1351,7 +1351,6 @@ static int temac_probe(struct platform_device *pdev)
>   struct device_node *temac_np = dev_of_node(&pdev->dev), *dma_np;
>   struct temac_local *lp;
>   struct net_device *ndev;
> - struct resource *res;
>   const void *addr;
>   __be32 *p;
>   bool little_endian;
> @@ -1500,13 +1499,11 @@ static int temac_probe(struct platform_device *pdev)
>   of_node_put(dma_np);
>   } else if (pdata) {
>   /* 2nd memory resource specifies DMA registers */
> - res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> - lp->sdma_regs = devm_ioremap(&pdev->dev, res->start,
> -  resource_size(res));
> - if (!lp->sdma_regs) {
> + lp->sdma_regs = devm_platform_ioremap_resource(pdev, 1);
> + if (IS_ERR(lp->sdma_regs)) {
>   dev_err(&pdev->dev,
>   "could not map DMA registers\n");
> - return -ENOMEM;
> + return PTR_ERR(lp->sdma_regs);
>   }
>   if (pdata->dma_little_endian) {
>   lp->dma_in = temac_dma_in32_le;

Acked-by: Esben Haabendal

[PATCH] net: 8021q: vlan: reduce noise in driver initialization

2020-12-08 Thread Enrico Weigelt, metux IT consult

If drivers work properly, they should be silent. Thus remove the
unncessary noise von initialization.

Signed-off-by: Enrico Weigelt, metux IT consult 
---
 net/8021q/vlan.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index f292e0267bb9..9f4b1b9a37e4 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -42,9 +42,6 @@
 
 unsigned int vlan_net_id __read_mostly;
 
-const char vlan_fullname[] = "802.1Q VLAN Support";
-const char vlan_version[] = DRV_VERSION;
-
 /* End of global variables definitions. */
 
 static int vlan_group_prealloc_vid(struct vlan_group *vg,
@@ -687,8 +684,6 @@ static int __init vlan_proto_init(void)
 {
int err;
 
-   pr_info("%s v%s\n", vlan_fullname, vlan_version);
-
err = register_pernet_subsys(&vlan_net_ops);
if (err < 0)
goto err0;
-- 
2.11.0

Re: [PATCH 1/7] net: 8021q: remove unneeded MODULE_VERSION() usage

2020-12-08 Thread Enrico Weigelt, metux IT consult

On 05.12.20 16:53, Greg KH wrote:
>> How do we feel about deleting this not really informative message
>> altogether in a future patch?
> 
> It too should be removed.  If drivers are working properly, they are
> quiet.

Just sent a separate patch for removing this message. I'll rebase my
patch queue when this patch went through.


--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
i...@metux.net -- +49-151-27565287

[PATCH net-next] net: dsa: mv88e6xxx: don't set non-existing learn2all bit for 6220/6250

2020-12-08 Thread Rasmus Villemoes

The 6220 and 6250 switches do not have a learn2all bit in global1, ATU
control register; bit 3 is reserved.

On the switches that do have that bit, it is used to control whether
learning frames are sent out the ports that have the message_port bit
set. So rather than adding yet another chip method, use the existence
of the ->port_setup_message_port method as a proxy for determining
whether the learn2all bit exists (and should be set).

Signed-off-by: Rasmus Villemoes 
---

This doesn't fix anything from what I can tell, in particular not the
VLAN problems I'm having, so just tagging for net-next. But I do think
it's worth it on the general principle of not poking around in
undocumented/reserved bits.

 drivers/net/dsa/mv88e6xxx/chip.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 25449f634889..0245f3dfc1cd 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1347,9 +1347,11 @@ static int mv88e6xxx_atu_setup(struct mv88e6xxx_chip 
*chip)
if (err)
return err;
 
-   err = mv88e6xxx_g1_atu_set_learn2all(chip, true);
-   if (err)
-   return err;
+   if (chip->info->ops->port_setup_message_port) {
+   err = mv88e6xxx_g1_atu_set_learn2all(chip, true);
+   if (err)
+   return err;
+   }
 
return mv88e6xxx_g1_atu_set_age_time(chip, 30);
 }
-- 
2.23.0

Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set

2020-12-08 Thread Jesper Dangaard Brouer

On Mon, 07 Dec 2020 12:52:22 -0800
John Fastabend  wrote:

> > Use-case(1): Cloud-provider want to give customers (running VMs) ability
> > to load XDP program for DDoS protection (only), but don't want to allow
> > customer to use XDP_TX (that can implement LB or cheat their VM
> > isolation policy).  
> 
> Not following. What interface do they want to allow loading on? If its
> the VM interface then I don't see how it matters. From outside the
> VM there should be no way to discover if its done in VM or in tc or
> some other stack.
> 
> If its doing some onloading/offloading I would assume they need to
> ensure the isolation, etc. is still maintained because you can't
> let one VMs program work on other VMs packets safely.
> 
> So what did I miss, above doesn't make sense to me.

The Cloud-provider want to load customer provided BPF-code on the
physical Host-OS NIC (that support XDP).  The customer can get access
to a web-interface where they can write or upload their BPF-prog.

As multiple customers can upload BPF-progs, the Cloud-provider have to
write a BPF-prog dispatcher that runs these multiple program.  This
could be done via BPF tail-calls, or via Toke's libxdp[1], or via
devmap XDP-progs per egress port.

The Cloud-provider don't fully trust customers BPF-prog.   They already
pre-filtered traffic to the given VM, so they can allow customers
freedom to see traffic and do XDP_PASS and XDP_DROP.  They
administratively (via ethtool) want to disable the XDP_REDIRECT and
XDP_TX driver feature, as it can be used for violation their VM
isolation policy between customers.

Is the use-case more clear now?

[1] https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH v2] xfrm: interface: Don't hide plain packets from netfilter

2020-12-08 Thread Nicolas Dichtel

Le 07/12/2020 à 14:43, Phil Sutter a écrit :
> With an IPsec tunnel without dedicated interface, netfilter sees locally
> generated packets twice as they exit the physical interface: Once as "the
> inner packet" with IPsec context attached and once as the encrypted
> (ESP) packet.
> 
> With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT
> hook anymore, making it impossible to match on both inner header values
> and associated IPsec data from that hook.
> 
> Fix this by looping packets transmitted from xfrm_interface through
> NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
> behaviour consistent again from netfilter's point of view.
> 
> Fixes: f203b76d78092 ("xfrm: Add virtual xfrm interfaces")
> Signed-off-by: Phil Sutter 
> ---
> Changes since v1:
> - Extend recipients list, no code changes.
> ---
>  net/xfrm/xfrm_interface.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
> index aa4cdcf69d471..24af61c95b4d4 100644
> --- a/net/xfrm/xfrm_interface.c
> +++ b/net/xfrm/xfrm_interface.c
> @@ -317,7 +317,8 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, 
> struct flowi *fl)
>   skb_dst_set(skb, dst);
>   skb->dev = tdev;
>  
> - err = dst_output(xi->net, skb->sk, skb);
> + err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net,
skb->protocol must be correctly set, maybe better to use it instead of
skb_dst(skb)->ops->family?

> +   skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output);
And here, tdev instead of skb_dst(skb)->dev ?

>   if (net_xmit_eval(err) == 0) {
>   struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
>  
>

Re: [PATCH v1 bpf-next 03/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.

2020-12-08 Thread Kuniyuki Iwashima

From:   Martin KaFai Lau 
Date:   Tue, 8 Dec 2020 00:13:28 -0800
> On Tue, Dec 08, 2020 at 03:27:14PM +0900, Kuniyuki Iwashima wrote:
> > From:   Martin KaFai Lau 
> > Date:   Mon, 7 Dec 2020 12:14:38 -0800
> > > On Sun, Dec 06, 2020 at 01:03:07AM +0900, Kuniyuki Iwashima wrote:
> > > > From:   Martin KaFai Lau 
> > > > Date:   Fri, 4 Dec 2020 17:42:41 -0800
> > > > > On Tue, Dec 01, 2020 at 11:44:10PM +0900, Kuniyuki Iwashima wrote:
> > > > > [ ... ]
> > > > > > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
> > > > > > index fd133516ac0e..60d7c1f28809 100644
> > > > > > --- a/net/core/sock_reuseport.c
> > > > > > +++ b/net/core/sock_reuseport.c
> > > > > > @@ -216,9 +216,11 @@ int reuseport_add_sock(struct sock *sk, struct 
> > > > > > sock *sk2, bool bind_inany)
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(reuseport_add_sock);
> > > > > >  
> > > > > > -void reuseport_detach_sock(struct sock *sk)
> > > > > > +struct sock *reuseport_detach_sock(struct sock *sk)
> > > > > >  {
> > > > > > struct sock_reuseport *reuse;
> > > > > > +   struct bpf_prog *prog;
> > > > > > +   struct sock *nsk = NULL;
> > > > > > int i;
> > > > > >  
> > > > > > spin_lock_bh(&reuseport_lock);
> > > > > > @@ -242,8 +244,12 @@ void reuseport_detach_sock(struct sock *sk)
> > > > > >  
> > > > > > reuse->num_socks--;
> > > > > > reuse->socks[i] = reuse->socks[reuse->num_socks];
> > > > > > +   prog = rcu_dereference(reuse->prog);
> > > > > Is it under rcu_read_lock() here?
> > > > 
> > > > reuseport_lock is locked in this function, and we do not modify the 
> > > > prog,
> > > > but is rcu_dereference_protected() preferable?
> > > > 
> > > > ---8<---
> > > > prog = rcu_dereference_protected(reuse->prog,
> > > >  lockdep_is_held(&reuseport_lock));
> > > > ---8<---
> > > It is not only reuse->prog.  Other things also require rcu_read_lock(),
> > > e.g. please take a look at __htab_map_lookup_elem().
> > > 
> > > The TCP_LISTEN sk (selected by bpf to be the target of the migration)
> > > is also protected by rcu.
> > 
> > Thank you, I will use rcu_read_lock() and rcu_dereference() in v3 patchset.
> > 
> > 
> > > I am surprised there is no WARNING in the test.
> > > Do you have the needed DEBUG_LOCK* config enabled?
> > 
> > Yes, DEBUG_LOCK* was 'y', but rcu_dereference() without rcu_read_lock()
> > does not show warnings...
> I would at least expect the "WARN_ON_ONCE(!rcu_read_lock_held() ...)"
> from __htab_map_lookup_elem() should fire in your test
> example in the last patch.
> 
> It is better to check the config before sending v3.

It seems ok, but I will check it again.

---8<---
[ec2-user@ip-10-0-0-124 bpf-next]$ cat .config | grep DEBUG_LOCK
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
---8<---


> > > > > > diff --git a/net/ipv4/inet_connection_sock.c 
> > > > > > b/net/ipv4/inet_connection_sock.c
> > > > > > index 1451aa9712b0..b27241ea96bd 100644
> > > > > > --- a/net/ipv4/inet_connection_sock.c
> > > > > > +++ b/net/ipv4/inet_connection_sock.c
> > > > > > @@ -992,6 +992,36 @@ struct sock *inet_csk_reqsk_queue_add(struct 
> > > > > > sock *sk,
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(inet_csk_reqsk_queue_add);
> > > > > >  
> > > > > > +void inet_csk_reqsk_queue_migrate(struct sock *sk, struct sock 
> > > > > > *nsk)
> > > > > > +{
> > > > > > +   struct request_sock_queue *old_accept_queue, *new_accept_queue;
> > > > > > +
> > > > > > +   old_accept_queue = &inet_csk(sk)->icsk_accept_queue;
> > > > > > +   new_accept_queue = &inet_csk(nsk)->icsk_accept_queue;
> > > > > > +
> > > > > > +   spin_lock(&old_accept_queue->rskq_lock);
> > > > > > +   spin_lock(&new_accept_queue->rskq_lock);
> > > > > I am also not very thrilled on this double spin_lock.
> > > > > Can this be done in (or like) inet_csk_listen_stop() instead?
> > > > 
> > > > It will be possible to migrate sockets in inet_csk_listen_stop(), but I
> > > > think it is better to do it just after reuseport_detach_sock() becuase 
> > > > we
> > > > can select a different listener (almost) every time at a lower cost by
> > > > selecting the moved socket and pass it to inet_csk_reqsk_queue_migrate()
> > > > easily.
> > > I don't see the "lower cost" point.  Please elaborate.
> > 
> > In reuseport_select_sock(), we pass sk_hash of the request socket to
> > reciprocal_scale() and generate a random index for socks[] to select
> > a different listener every time.
> > On the other hand, we do not have request sockets in unhash path and
> > sk_hash of the listener is always 0, so we have to generate a random number
> > in another way. In reuseport_detach_sock(), we can use the index of the
> > moved socket, but we do not have it in inet_csk_listen_stop(), so we have
> > to generate a random number in inet_csk_listen_stop().
> > I think it is at lower cost to use the index of the moved socket.
> Generate a random number is not a big deal for the

Re: [PATCH v3 09/11] dt-bindings: usb: convert mediatek,mtk-xhci.txt to YAML schema

2020-12-08 Thread Chunfeng Yun

On Mon, 2020-12-07 at 15:24 -0600, Rob Herring wrote:
> On Wed, Nov 18, 2020 at 04:21:24PM +0800, Chunfeng Yun wrote:
> > Convert mediatek,mtk-xhci.txt to YAML schema mediatek,mtk-xhci.yaml
> > 
> > Signed-off-by: Chunfeng Yun 
> > ---
> > v3:
> >   1. fix yamllint warning
> >   2. remove pinctrl* properties supported by default suggested by Rob
> >   3. drop unused labels
> >   4. modify description of mediatek,syscon-wakeup
> >   5. remove type of imod-interval-ns
> > 
> > v2: new patch
> > ---
> >  .../bindings/usb/mediatek,mtk-xhci.txt| 121 -
> >  .../bindings/usb/mediatek,mtk-xhci.yaml   | 171 ++
> >  2 files changed, 171 insertions(+), 121 deletions(-)
> >  delete mode 100644 
> > Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.txt
> >  create mode 100644 
> > Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.yaml
[...]
> > diff --git a/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.yaml 
> > b/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.yaml
> > new file mode 100644
> > index ..4a36ad5c4d25
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.yaml
> > @@ -0,0 +1,171 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +# Copyright (c) 2020 MediaTek
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/usb/mediatek,mtk-xhci.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: MediaTek USB3 xHCI Device Tree Bindings
> > +
> > +maintainers:
> > +  - Chunfeng Yun 
> > +
> > +allOf:
> > +  - $ref: "usb-hcd.yaml"
> > +
> > +description: |
> > +  There are two scenarios:
> > +  case 1: only supports xHCI driver;
> > +  case 2: supports dual-role mode, and the host is based on xHCI driver.
> > +
> > +properties:
> > +  # common properties for both case 1 and case 2
> > +  compatible:
> > +items:
> > +  - enum:
> > +  - mediatek,mt2712-xhci
> > +  - mediatek,mt7622-xhci
> > +  - mediatek,mt7629-xhci
> > +  - mediatek,mt8173-xhci
> > +  - mediatek,mt8183-xhci
> > +  - const: mediatek,mtk-xhci
> > +
> > +  reg:
> > +minItems: 1
> > +maxItems: 2
> > +items:
> > +  - description: the registers of xHCI MAC
> > +  - description: the registers of IP Port Control
> > +
> > +  reg-names:
> > +minItems: 1
> > +maxItems: 2
> > +items:
> > +  - const: mac
> > +  - const: ippc  # optional, only needed for case 1.
> > +
> > +  interrupts:
> > +maxItems: 1
> > +
> > +  power-domains:
> > +description: A phandle to USB power domain node to control USB's MTCMOS
> > +maxItems: 1
> > +
> > +  clocks:
> > +minItems: 1
> > +maxItems: 5
> > +items:
> > +  - description: Controller clock used by normal mode
> > +  - description: Reference clock used by low power mode etc
> > +  - description: Mcu bus clock for register access
> > +  - description: DMA bus clock for data transfer
> > +  - description: controller clock
> > +
> > +  clock-names:
> > +minItems: 1
> > +maxItems: 5
> > +items:
> > +  - const: sys_ck  # required, the following ones are optional
> > +  - const: ref_ck
> > +  - const: mcu_ck
> > +  - const: dma_ck
> > +  - const: xhci_ck
> > +
> > +  phys:
> > +$ref: /usb/usb-hcd.yaml#
> 
> That's not right.
> 
> You need 'items' and list each entry.
Will add minItems/maxItems instead due to it's variable and phy-names is
not used

> 
> > +description: List of all the USB PHYs on this HCD
> > +
> > +  vusb33-supply:
> > +description: Regulator of USB AVDD3.3v
> > +
> > +  vbus-supply:
> > +description: Regulator of USB VBUS5v
> > +
> > +  usb3-lpm-capable:
> > +description: supports USB3.0 LPM
> > +type: boolean
> > +
> > +  imod-interval-ns:
> > +description:
> > +  Interrupt moderation interval value, it is 8 times as much as that
> > +  defined in the xHCI spec on MTK's controller.
> > +default: 5000
> > +
> > +  # the following properties are only used for case 1
> > +  wakeup-source:
> > +description: enable USB remote wakeup, see power/wakeup-source.txt
> > +type: boolean
> > +
> > +  mediatek,syscon-wakeup:
> > +$ref: /schemas/types.yaml#/definitions/phandle-array
> > +maxItems: 1
> > +description: |
> > +  A phandle to syscon used to access the register of the USB wakeup 
> > glue
> > +  layer between xHCI and SPM, the field should always be 3 cells long.
> > +
> > +  items:
> 
> Indentation is wrong here. Should be 2 fewer spaces.
Will fix it
> 
> > +- description:
> > +The first cell represents a phandle to syscon
> > +- description:
> > +The second cell represents the register base address of the 
> > glue
> > +layer in syscon
> > +- description:
> > +The third cell represents the hardware version of the glue 
> > layer,
> > +

Re: [PATCH v3 10/11] dt-bindings: usb: convert mediatek,mtu3.txt to YAML schema

2020-12-08 Thread Chunfeng Yun

On Mon, 2020-12-07 at 15:30 -0600, Rob Herring wrote:
> On Wed, Nov 18, 2020 at 04:21:25PM +0800, Chunfeng Yun wrote:
> > Convert mediatek,mtu3.txt to YAML schema mediatek,mtu3.yaml
> > 
> > Signed-off-by: Chunfeng Yun 
> > ---
> > v3:
> >   1. fix yamllint warning
> >   2. remove pinctrl* properties
> >   3. remove unnecessary '|'
> >   4. drop unused labels in example
> > 
> > v2: new patch
> > ---
> >  .../devicetree/bindings/usb/mediatek,mtu3.txt | 108 -
> >  .../bindings/usb/mediatek,mtu3.yaml   | 218 ++
> >  2 files changed, 218 insertions(+), 108 deletions(-)
> >  delete mode 100644 Documentation/devicetree/bindings/usb/mediatek,mtu3.txt
> >  create mode 100644 Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml
> > 
[...]
> > diff --git a/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml 
> > b/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml
> > new file mode 100644
> > index ..290e97a06f2a
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml
> > @@ -0,0 +1,218 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +# Copyright (c) 2020 MediaTek
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/usb/mediatek,mtu3.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: MediaTek USB3 DRD Controller Device Tree Bindings
> > +
> > +maintainers:
> > +  - Chunfeng Yun 
> > +
> > +description: |
> > +  The DRD controller has a glue layer IPPC (IP Port Control), and its host 
> > is
> > +  based on xHCI.
> > +
> > +properties:
> > +  compatible:
> > +items:
> > +  - enum:
> > +  - mediatek,mt2712-mtu3
> > +  - mediatek,mt8173-mtu3
> > +  - mediatek,mt8183-mtu3
> > +  - const: mediatek,mtu3
> > +
> > +  reg:
> > +items:
> > +  - description: the registers of device MAC
> > +  - description: the registers of IP Port Control
> > +
> > +  reg-names:
> > +items:
> > +  - const: mac
> > +  - const: ippc
> > +
> > +  interrupts:
> > +maxItems: 1
> > +
> > +  power-domains:
> > +description: A phandle to USB power domain node to control USB's MTCMOS
> > +maxItems: 1
> > +
> > +  clocks:
> > +minItems: 1
> > +maxItems: 4
> > +items:
> > +  - description: Controller clock used by normal mode
> > +  - description: Reference clock used by low power mode etc
> > +  - description: Mcu bus clock for register access
> > +  - description: DMA bus clock for data transfer
> > +
> > +  clock-names:
> > +minItems: 1
> > +maxItems: 4
> > +items:
> > +  - const: sys_ck  # required, the following ones are optional
> > +  - const: ref_ck
> > +  - const: mcu_ck
> > +  - const: dma_ck
> > +
> > +  phys:
> > +$ref: /schemas/types.yaml#/definitions/phandle-array
> 
> Drop. Need to say how many entries and what each one is if more than 1.
Ok

> 
> > +description: List of all the USB PHYs used
> > +
> > +  vusb33-supply:
> > +description: Regulator of USB AVDD3.3v
> > +
> > +  vbus-supply:
> > +$ref: /connector/usb-connector.yaml#
> 
> Nope.
Will remove it
> 
> > +deprecated: true
> > +description: |
> > +  Regulator of USB VBUS5v, needed when supports dual-role mode.
> > +  Particularly, if use an output GPIO to control a VBUS regulator, 
> > should
> > +  model it as a regulator. See bindings/regulator/fixed-regulator.yaml
> > +  It's considered valid for compatibility reasons, not allowed for
> > +  new bindings, and put into a usb-connector node.
> > +
> > +  dr_mode:
> > +description: See usb/generic.txt
> > +enum: [host, peripheral, otg]
> > +default: otg
> > +
> > +  maximum-speed:
> > +description: See usb/generic.txt
> > +enum: [super-speed-plus, super-speed, high-speed, full-speed]
> > +
> > +  "#address-cells":
> > +enum: [1, 2]
> > +
> > +  "#size-cells":
> > +enum: [1, 2]
> > +
> > +  ranges: true
> > +
> > +  extcon:
> > +deprecated: true
> > +description: |
> > +  Phandle to the extcon device detecting the IDDIG/VBUS state, neede
> > +  when supports dual-role mode.
> > +  It's considered valid for compatibility reasons, not allowed for
> > +  new bindings, and use "usb-role-switch" property instead.
> > +
> > +  usb-role-switch:
> > +$ref: /schemas/types.yaml#/definitions/flag
> > +description: Support role switch. See usb/generic.txt
> > +type: boolean
> > +
> > +  connector:
> > +$ref: /connector/usb-connector.yaml#
> > +description:
> > +  Connector for dual role switch, especially for "gpio-usb-b-connector"
> > +type: object
> > +
> > +  port:
> > +description:
> > +  Any connector to the data bus of this controller should be modelled
> > +  using the OF graph bindings specified, if the "usb-role-switch"
> > +  property is used. See graph.txt
> > +type: object
> 
> Please include port and connector i

Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread David Howells

Ard Biesheuvel  wrote:

Ard Biesheuvel  wrote:

> > > > I wonder if it would help if the input buffer and output buffer didn't
> > > > have to correspond exactly in usage - ie. the output buffer could be
> > > > used at a slower rate than the input to allow for buffering inside the
> > > > crypto algorithm.
> > >
> > > I don't follow - how could one be used at a slower rate?
> >
> > I mean that the crypto algorithm might need to buffer the last part of the
> > input until it has a block's worth before it can write to the output.
> 
> This is what is typically handled transparently by the driver. When
> you populate a scatterlist, it doesn't matter how misaligned the
> individual elements are, the scatterlist walker will always present
> the data in chunks that the crypto algorithm can manage. This is why
> using a single scatterlist for the entire input is preferable in
> general.

Yep - but the assumption currently on the part of the callers is that they
provide the input buffer and corresponding output buffer - and that the
algorithm will transfer data from one to the other, such that the same amount
of input and output bufferage will be used.

However, if we start pushing data in progressively, this would no longer hold
true unless we also require the caller to only present in block-size chunks.

For example, if I gave the encryption function 120 bytes of data and a 120
byte output buffer, but the algorithm has a 16-byte blocksize, it will,
presumably, consume 120 bytes of input, but it can only write 112 bytes of
output at this time.  So the current interface would need to evolve to
indicate separately how much input has been consumed and how much output has
been produced - in which case it can't be handled transparently.

For krb5, it's actually worse than that, since we want to be able to
insert/remove a header and a trailer (and might need to go back and update the
header after) - but I think in the krb5 case, we need to treat the header and
trailer specially and update them after the fact in the wrapping case
(unwrapping is not a problem, since we can just cache the header).

David

[PATCH net-next] net: Limit logical shift left of TCP probe0 timeout

2020-12-08 Thread Cambda Zhu

For each TCP zero window probe, the icsk_backoff is increased by one and
its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
shift count would be masked to range 0 to 63. And on ARMv7 the result is
zero. If the shift count is masked, only several probes will be sent
with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
needs tcp_retries2 times probes to end this false timeout. Besides,
bitwise shift greater than or equal to the width is an undefined
behavior.

This patch adds a limit to the backoff. The max value of max_when is
TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit
is the backoff from TCP_RTO_MIN to TCP_RTO_MAX.

Signed-off-by: Cambda Zhu 
---
 include/net/tcp.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d4ef5bf94168..82044179c345 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1321,7 +1321,9 @@ static inline unsigned long tcp_probe0_base(const struct 
sock *sk)
 static inline unsigned long tcp_probe0_when(const struct sock *sk,
unsigned long max_when)
 {
-   u64 when = (u64)tcp_probe0_base(sk) << inet_csk(sk)->icsk_backoff;
+   u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1,
+  inet_csk(sk)->icsk_backoff);
+   u64 when = (u64)tcp_probe0_base(sk) << backoff;
 
return (unsigned long)min_t(u64, when, max_when);
 }
-- 
2.16.6

RE: [EXT] Re: [PATCH v2] MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver

2020-12-08 Thread Mickey Rachamim

Hi Jakub, thanks for the guidelines.

> On Sat, 5 Dec 2020 18:43:00 +0200 Mickey Rachamim wrote:
> > Add maintainers info for new Marvell Prestera Ethernet switch driver.
> > 
> > Signed-off-by: Mickey Rachamim 
> > ---
> > v2:
> >  Update the maintainers list according to community recommendation.
> > 
> >  MAINTAINERS | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS index 
> > 061e64b2423a..c92b44754436 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -10550,6 +10550,14 @@ S: Supported
> >  F: Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
> >  F: drivers/net/ethernet/marvell/octeontx2/af/
> >  
> > +MARVELL PRESTERA ETHERNET SWITCH DRIVER
> > +M: Vadym Kochan 
> > +M: Taras Chornyi 
> 
> Just a heads up, again, we'll start removing maintainers who aren't 
> participating, so Taras needs to be active. We haven't seen a single email 
> from him so far AFAICT.
> 
Fully clear, Taras is an expert on Linux kernel code working on PLVision and 
under contract with Marvell.
He will became active on contributions and reviews very soon.

> > +L: netdev@vger.kernel.org
> 
> nit: I don't think you need to list netdev, it'll get inherited from the 
> general entry for networking drivers (you can test running get_maintainer.pl 
> on a patch to the driver and see if it reports it).

Right, will remove.

> > +S: Supported
> > +W: http://www.marvell.com
> 
> The website entry is for a project-specific website. If you have a link to a 
> site with open resources about the chips/driver that'd be great, otherwise 
> please drop it. Also https is expected these days ;)

Can I placed here the Github project link?
https://github.com/Marvell-switching/switchdev-prestera

[PATCH net-next 00/13] mlxsw: Add support for Q-in-VNI

2020-12-08 Thread Ido Schimmel

From: Ido Schimmel 

This patch set adds support for Q-in-VNI over Spectrum-{2,3} ASICs.
Q-in-VNI is like regular VxLAN encapsulation with the sole difference
that overlay packets can contain a VLAN tag. In Linux, this is achieved
by adding the VxLAN device to a 802.1ad bridge instead of a 802.1q
bridge.

>From mlxsw perspective, Q-in-VNI support entails two main changes:

1. An outer VLAN tag should always be pushed to the overlay packet
during decapsulation

2. The EtherType used during decapsulation should be 802.1ad (0x88a8)
instead of the default 802.1q (0x8100)

Patch set overview:

Patches #1-#3 add required device registers and fields

Patch #4 performs small refactoring to allow code re-use

Patches #5-#7 make the EtherType used during decapsulation a property of
the tunnel port (i.e., VxLAN). This leads to the driver vetoing
configurations in which VxLAN devices are member in both 802.1ad and
802.1q/802.1d bridges. Will be handled in the future by determining the
overlay EtherType on the egress port instead

Patch #8 adds support for Q-in-VNI for Spectrum-2 and newer ASICs

Patches #9-#10 veto Q-in-VNI for Spectrum-1 ASICs due to some hardware
limitations. Can be worked around, but decided not to support it for now

Patch #11 adjusts mlxsw to stop vetoing addition of VXLAN devices to
802.1ad bridges

Patch #12 adds a generic forwarding test that can be used with both veth
pairs and physical ports with a loopback

Patch #13 adds a test to make sure mlxsw vetoes unsupported Q-in-VNI
configurations

Amit Cohen (12):
  mlxsw: Use one enum for all registers that contain tunnel_port field
  mlxsw: reg: Add Switch Port VLAN Stacking Register
  mlxsw: reg: Add support for tunnel port in SPVID register
  mlxsw: spectrum_switchdev: Create common function for joining VxLAN to
VLAN-aware bridge
  mlxsw: Save EtherType as part of mlxsw_sp_nve_params
  mlxsw: Save EtherType as part of mlxsw_sp_nve_config
  mlxsw: spectrum: Publish mlxsw_sp_ethtype_to_sver_type()
  mlxsw: spectrum_nve_vxlan: Add support for Q-in-VNI for Spectrum-2
ASIC
  mlxsw: spectrum_switchdev: Use ops->vxlan_join() when adding VLAN to
VxLAN device
  mlxsw: Veto Q-in-VNI for Spectrum-1 ASIC
  mlxsw: spectrum_switchdev: Allow joining VxLAN to 802.1ad bridge
  selftests: mlxsw: Add Q-in-VNI veto tests

Petr Machata (1):
  selftests: forwarding: Add Q-in-VNI test

 drivers/net/ethernet/mellanox/mlxsw/reg.h | 146 ++--
 .../net/ethernet/mellanox/mlxsw/spectrum.c|   2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum.h|   2 +
 .../ethernet/mellanox/mlxsw/spectrum_nve.c|   6 +-
 .../ethernet/mellanox/mlxsw/spectrum_nve.h|   5 +-
 .../mellanox/mlxsw/spectrum_nve_vxlan.c   |  67 +++-
 .../mellanox/mlxsw/spectrum_switchdev.c   |  32 +-
 .../net/mlxsw/spectrum-2/q_in_vni_veto.sh |  77 
 .../net/mlxsw/spectrum/q_in_vni_veto.sh   |  66 
 .../selftests/net/forwarding/q_in_vni.sh  | 347 ++
 10 files changed, 703 insertions(+), 47 deletions(-)
 create mode 100755 
tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh
 create mode 100755 
tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh
 create mode 100755 tools/testing/selftests/net/forwarding/q_in_vni.sh

-- 
2.28.0

[PATCH net-next 04/13] mlxsw: spectrum_switchdev: Create common function for joining VxLAN to VLAN-aware bridge

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

The code in mlxsw_sp_bridge_8021q_vxlan_join() can be used also for
802.1ad bridge.

Move the code to function called mlxsw_sp_bridge_vlan_aware_vxlan_join()
and call it from mlxsw_sp_bridge_8021q_vxlan_join() to enable code
reuse.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c  | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 9c4e17607e6a..c53e0ab9f971 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2053,9 +2053,9 @@ mlxsw_sp_bridge_8021q_port_leave(struct 
mlxsw_sp_bridge_device *bridge_device,
 }
 
 static int
-mlxsw_sp_bridge_8021q_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device,
-const struct net_device *vxlan_dev, u16 vid,
-struct netlink_ext_ack *extack)
+mlxsw_sp_bridge_vlan_aware_vxlan_join(struct mlxsw_sp_bridge_device 
*bridge_device,
+ const struct net_device *vxlan_dev,
+ u16 vid, struct netlink_ext_ack *extack)
 {
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_lower_get(bridge_device->dev);
struct vxlan_dev *vxlan = netdev_priv(vxlan_dev);
@@ -2101,6 +2101,15 @@ mlxsw_sp_bridge_8021q_vxlan_join(struct 
mlxsw_sp_bridge_device *bridge_device,
return err;
 }
 
+static int
+mlxsw_sp_bridge_8021q_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device,
+const struct net_device *vxlan_dev, u16 vid,
+struct netlink_ext_ack *extack)
+{
+   return mlxsw_sp_bridge_vlan_aware_vxlan_join(bridge_device, vxlan_dev,
+vid, extack);
+}
+
 static struct net_device *
 mlxsw_sp_bridge_8021q_vxlan_dev_find(struct net_device *br_dev, u16 vid)
 {
-- 
2.28.0

[PATCH net-next 02/13] mlxsw: reg: Add Switch Port VLAN Stacking Register

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

SPVTR register configures the VLAN mode of the port to enable VLAN
stacking.

It will be used to configure VxLAN to push VLAN to the decapsulated packet.
Without this setting, Spectrum-2 overtakes the VLAN tag of decapsulated
packet for bridging.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 104 ++
 1 file changed, 104 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 0a3c5f89268c..ad6798c2169d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -1693,6 +1693,109 @@ static inline void mlxsw_reg_svfa_pack(char *payload, 
u8 local_port,
mlxsw_reg_svfa_vid_set(payload, vid);
 }
 
+/*  SPVTR - Switch Port VLAN Stacking Register
+ *  --
+ *  The Switch Port VLAN Stacking register configures the VLAN mode of the port
+ *  to enable VLAN stacking.
+ */
+#define MLXSW_REG_SPVTR_ID 0x201D
+#define MLXSW_REG_SPVTR_LEN 0x10
+
+MLXSW_REG_DEFINE(spvtr, MLXSW_REG_SPVTR_ID, MLXSW_REG_SPVTR_LEN);
+
+/* reg_spvtr_tport
+ * Port is tunnel port.
+ * Access: Index
+ *
+ * Note: Reserved when SwitchX/-2 or Spectrum-1.
+ */
+MLXSW_ITEM32(reg, spvtr, tport, 0x00, 24, 1);
+
+/* reg_spvtr_local_port
+ * When tport = 0: local port number (Not supported from/to CPU).
+ * When tport = 1: tunnel port.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, spvtr, local_port, 0x00, 16, 8);
+
+/* reg_spvtr_ippe
+ * Ingress Port Prio Mode Update Enable.
+ * When set, the Port Prio Mode is updated with the provided ipprio_mode field.
+ * Reserved on Get operations.
+ * Access: OP
+ */
+MLXSW_ITEM32(reg, spvtr, ippe, 0x04, 31, 1);
+
+/* reg_spvtr_ipve
+ * Ingress Port VID Mode Update Enable.
+ * When set, the Ingress Port VID Mode is updated with the provided ipvid_mode
+ * field.
+ * Reserved on Get operations.
+ * Access: OP
+ */
+MLXSW_ITEM32(reg, spvtr, ipve, 0x04, 30, 1);
+
+/* reg_spvtr_epve
+ * Egress Port VID Mode Update Enable.
+ * When set, the Egress Port VID Mode is updated with the provided epvid_mode
+ * field.
+ * Access: OP
+ */
+MLXSW_ITEM32(reg, spvtr, epve, 0x04, 29, 1);
+
+/* reg_spvtr_ipprio_mode
+ * Ingress Port Priority Mode.
+ * This controls the PCP and DEI of the new outer VLAN
+ * Note: for SwitchX/-2 the DEI is not affected.
+ * 0: use port default PCP and DEI (configured by QPDPC).
+ * 1: use C-VLAN PCP and DEI.
+ * Has no effect when ipvid_mode = 0.
+ * Reserved when tport = 1.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, spvtr, ipprio_mode, 0x04, 20, 4);
+
+enum mlxsw_reg_spvtr_ipvid_mode {
+   /* IEEE Compliant PVID (default) */
+   MLXSW_REG_SPVTR_IPVID_MODE_IEEE_COMPLIANT_PVID,
+   /* Push VLAN (for VLAN stacking, except prio tagged packets) */
+   MLXSW_REG_SPVTR_IPVID_MODE_PUSH_VLAN_FOR_UNTAGGED_PACKET,
+   /* Always push VLAN (also for prio tagged packets) */
+   MLXSW_REG_SPVTR_IPVID_MODE_ALWAYS_PUSH_VLAN,
+};
+
+/* reg_spvtr_ipvid_mode
+ * Ingress Port VLAN-ID Mode.
+ * For Spectrum family, this affects the values of SPVM.i
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, spvtr, ipvid_mode, 0x04, 16, 4);
+
+enum mlxsw_reg_spvtr_epvid_mode {
+   /* IEEE Compliant VLAN membership */
+   MLXSW_REG_SPVTR_EPVID_MODE_IEEE_COMPLIANT_VLAN_MEMBERSHIP,
+   /* Pop VLAN (for VLAN stacking) */
+   MLXSW_REG_SPVTR_EPVID_MODE_POP_VLAN,
+};
+
+/* reg_spvtr_epvid_mode
+ * Egress Port VLAN-ID Mode.
+ * For Spectrum family, this affects the values of SPVM.e,u,pt.
+ * Access: WO
+ */
+MLXSW_ITEM32(reg, spvtr, epvid_mode, 0x04, 0, 4);
+
+static inline void mlxsw_reg_spvtr_pack(char *payload, bool tport,
+   u8 local_port,
+   enum mlxsw_reg_spvtr_ipvid_mode 
ipvid_mode)
+{
+   MLXSW_REG_ZERO(spvtr, payload);
+   mlxsw_reg_spvtr_tport_set(payload, tport);
+   mlxsw_reg_spvtr_local_port_set(payload, local_port);
+   mlxsw_reg_spvtr_ipvid_mode_set(payload, ipvid_mode);
+   mlxsw_reg_spvtr_ipve_set(payload, true);
+}
+
 /* SVPE - Switch Virtual-Port Enabling Register
  * 
  * Enables port virtualization.
@@ -11306,6 +11409,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = 
{
MLXSW_REG(slcor),
MLXSW_REG(spmlr),
MLXSW_REG(svfa),
+   MLXSW_REG(spvtr),
MLXSW_REG(svpe),
MLXSW_REG(sfmr),
MLXSW_REG(spvmlr),
-- 
2.28.0

[PATCH net-next 06/13] mlxsw: Save EtherType as part of mlxsw_sp_nve_config

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

Add EtherType field to mlxsw_sp_nve_config struct.
Set EtherType according to mlxsw_sp_nve_params.ethertype.

Pass 'mlxsw_sp_nve_params' instead of 'mlxsw_sp_nve_params->dev' to the
function which initializes mlxsw_sp_nve_config struct to know which
EtherType to use.

This field is needed to configure which EtherType will be used when
VLAN is pushed at ingress of the tunnel port.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c   | 2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h   | 3 ++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c | 5 +++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index ed0d334b5fd1..adf499665f87 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -802,7 +802,7 @@ int mlxsw_sp_nve_fid_enable(struct mlxsw_sp *mlxsw_sp, 
struct mlxsw_sp_fid *fid,
return -EINVAL;
 
memset(&config, 0, sizeof(config));
-   ops->nve_config(nve, params->dev, &config);
+   ops->nve_config(nve, params, &config);
if (nve->num_nve_tunnels &&
memcmp(&config, &nve->config, sizeof(config))) {
NL_SET_ERR_MSG_MOD(extack, "Conflicting NVE tunnels 
configuration");
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
index 12f664f42f21..68bd9422be2a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
@@ -18,6 +18,7 @@ struct mlxsw_sp_nve_config {
u32 ul_tb_id;
enum mlxsw_sp_l3proto ul_proto;
union mlxsw_sp_l3addr ul_sip;
+   u16 ethertype;
 };
 
 struct mlxsw_sp_nve {
@@ -38,7 +39,7 @@ struct mlxsw_sp_nve_ops {
const struct net_device *dev,
struct netlink_ext_ack *extack);
void (*nve_config)(const struct mlxsw_sp_nve *nve,
-  const struct net_device *dev,
+  const struct mlxsw_sp_nve_params *params,
   struct mlxsw_sp_nve_config *config);
int (*init)(struct mlxsw_sp_nve *nve,
const struct mlxsw_sp_nve_config *config);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
index e9bff13ec264..f9a48a0109ff 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
@@ -87,10 +87,10 @@ static bool mlxsw_sp_nve_vxlan_can_offload(const struct 
mlxsw_sp_nve *nve,
 }
 
 static void mlxsw_sp_nve_vxlan_config(const struct mlxsw_sp_nve *nve,
- const struct net_device *dev,
+ const struct mlxsw_sp_nve_params *params,
  struct mlxsw_sp_nve_config *config)
 {
-   struct vxlan_dev *vxlan = netdev_priv(dev);
+   struct vxlan_dev *vxlan = netdev_priv(params->dev);
struct vxlan_config *cfg = &vxlan->cfg;
 
config->type = MLXSW_SP_NVE_TYPE_VXLAN;
@@ -101,6 +101,7 @@ static void mlxsw_sp_nve_vxlan_config(const struct 
mlxsw_sp_nve *nve,
config->ul_proto = MLXSW_SP_L3_PROTO_IPV4;
config->ul_sip.addr4 = cfg->saddr.sin.sin_addr.s_addr;
config->udp_dport = cfg->dst_port;
+   config->ethertype = params->ethertype;
 }
 
 static int __mlxsw_sp_nve_parsing_set(struct mlxsw_sp *mlxsw_sp,
-- 
2.28.0

[PATCH net-next 03/13] mlxsw: reg: Add support for tunnel port in SPVID register

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

Add spvid_tport field which indicates if the port is tunnel port.
When spvid_tport is true, local_port field supposed to be tunnel port
type.

It will be used to configure which Ethertype will be used when VLAN is
pushed at ingress for tunnel port.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index ad6798c2169d..2a89b3261f00 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -821,8 +821,16 @@ static inline void mlxsw_reg_spms_vid_pack(char *payload, 
u16 vid,
 
 MLXSW_REG_DEFINE(spvid, MLXSW_REG_SPVID_ID, MLXSW_REG_SPVID_LEN);
 
+/* reg_spvid_tport
+ * Port is tunnel port.
+ * Reserved when SwitchX/-2 or Spectrum-1.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, spvid, tport, 0x00, 24, 1);
+
 /* reg_spvid_local_port
- * Local port number.
+ * When tport = 0: Local port number. Not supported for CPU port.
+ * When tport = 1: Tunnel port.
  * Access: Index
  */
 MLXSW_ITEM32(reg, spvid, local_port, 0x00, 16, 8);
-- 
2.28.0

[PATCH net-next 07/13] mlxsw: spectrum: Publish mlxsw_sp_ethtype_to_sver_type()

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

Declare mlxsw_sp_ethtype_to_sver_type() in spectrum.h to enable using it
in other files.

It will be used in the next patch to map between EtherType and the
relevant value configured by SVER register.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 963eb0b1d9dd..df8175cd44ab 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -384,7 +384,7 @@ int mlxsw_sp_port_vid_learning_set(struct mlxsw_sp_port 
*mlxsw_sp_port, u16 vid,
return err;
 }
 
-static int mlxsw_sp_ethtype_to_sver_type(u16 ethtype, u8 *p_sver_type)
+int mlxsw_sp_ethtype_to_sver_type(u16 ethtype, u8 *p_sver_type)
 {
switch (ethtype) {
case ETH_P_8021Q:
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 7e728a8a9fb3..a6956cfc9cb1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -584,6 +584,7 @@ int mlxsw_sp_port_vid_stp_set(struct mlxsw_sp_port 
*mlxsw_sp_port, u16 vid,
 int mlxsw_sp_port_vp_mode_set(struct mlxsw_sp_port *mlxsw_sp_port, bool 
enable);
 int mlxsw_sp_port_vid_learning_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 
vid,
   bool learn_enable);
+int mlxsw_sp_ethtype_to_sver_type(u16 ethtype, u8 *p_sver_type);
 int mlxsw_sp_port_pvid_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 vid,
   u16 ethtype);
 struct mlxsw_sp_port_vlan *
-- 
2.28.0

[PATCH net-next 05/13] mlxsw: Save EtherType as part of mlxsw_sp_nve_params

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

Add EtherType field to mlxsw_sp_nve_params struct.
Set it when VxLAN device is added to bridge device.

This field is needed to configure which EtherType will be used when
VLAN is pushed at ingress of the tunnel port.

Use ETH_P_8021Q for tunnel port enslaved to 802.1d and 802.1q bridges.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   | 1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 7 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 6092243a69cb..7e728a8a9fb3 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -1202,6 +1202,7 @@ struct mlxsw_sp_nve_params {
enum mlxsw_sp_nve_type type;
__be32 vni;
const struct net_device *dev;
+   u16 ethertype;
 };
 
 extern const struct mlxsw_sp_nve_ops *mlxsw_sp1_nve_ops_arr[];
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index c53e0ab9f971..051a77440afe 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2055,7 +2055,8 @@ mlxsw_sp_bridge_8021q_port_leave(struct 
mlxsw_sp_bridge_device *bridge_device,
 static int
 mlxsw_sp_bridge_vlan_aware_vxlan_join(struct mlxsw_sp_bridge_device 
*bridge_device,
  const struct net_device *vxlan_dev,
- u16 vid, struct netlink_ext_ack *extack)
+ u16 vid, u16 ethertype,
+ struct netlink_ext_ack *extack)
 {
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_lower_get(bridge_device->dev);
struct vxlan_dev *vxlan = netdev_priv(vxlan_dev);
@@ -2063,6 +2064,7 @@ mlxsw_sp_bridge_vlan_aware_vxlan_join(struct 
mlxsw_sp_bridge_device *bridge_devi
.type = MLXSW_SP_NVE_TYPE_VXLAN,
.vni = vxlan->cfg.vni,
.dev = vxlan_dev,
+   .ethertype = ethertype,
};
struct mlxsw_sp_fid *fid;
int err;
@@ -2107,7 +2109,7 @@ mlxsw_sp_bridge_8021q_vxlan_join(struct 
mlxsw_sp_bridge_device *bridge_device,
 struct netlink_ext_ack *extack)
 {
return mlxsw_sp_bridge_vlan_aware_vxlan_join(bridge_device, vxlan_dev,
-vid, extack);
+vid, ETH_P_8021Q, extack);
 }
 
 static struct net_device *
@@ -2240,6 +2242,7 @@ mlxsw_sp_bridge_8021d_vxlan_join(struct 
mlxsw_sp_bridge_device *bridge_device,
.type = MLXSW_SP_NVE_TYPE_VXLAN,
.vni = vxlan->cfg.vni,
.dev = vxlan_dev,
+   .ethertype = ETH_P_8021Q,
};
struct mlxsw_sp_fid *fid;
int err;
-- 
2.28.0

[PATCH net-next 09/13] mlxsw: spectrum_switchdev: Use ops->vxlan_join() when adding VLAN to VxLAN device

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

Currently mlxsw_sp_switchdev_vxlan_vlan_add() always calls
mlxsw_sp_bridge_8021q_vxlan_join() because VLANs were only ever added to
a VLAN-filtering bridge, which is only 802.1q bridge.

This set adds support for VxLAN with 802.1ad bridge, so VLAN-filtering
bridge is not only 802.1q.

Call ops->vxlan_join(), so mlxsw_sp_bridge_802{1q, 1ad}_vxlan_join()
will be called according to bridge type.

This is needed to ensure that VxLAN with 802.1ad bridge will be vetoed
in Spectrum-1 with the next patch.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 051a77440afe..73290f71eb9c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -3320,8 +3320,8 @@ mlxsw_sp_switchdev_vxlan_vlan_add(struct mlxsw_sp 
*mlxsw_sp,
if (!fid) {
if (!flag_untagged || !flag_pvid)
return 0;
-   return mlxsw_sp_bridge_8021q_vxlan_join(bridge_device,
-   vxlan_dev, vid, extack);
+   return bridge_device->ops->vxlan_join(bridge_device, vxlan_dev,
+ vid, extack);
}
 
/* Second case: FID is associated with the VNI and the VLAN associated
@@ -3360,16 +3360,14 @@ mlxsw_sp_switchdev_vxlan_vlan_add(struct mlxsw_sp 
*mlxsw_sp,
if (!flag_untagged)
return 0;
 
-   err = mlxsw_sp_bridge_8021q_vxlan_join(bridge_device, vxlan_dev, vid,
-  extack);
+   err = bridge_device->ops->vxlan_join(bridge_device, vxlan_dev, vid, 
extack);
if (err)
goto err_vxlan_join;
 
return 0;
 
 err_vxlan_join:
-   mlxsw_sp_bridge_8021q_vxlan_join(bridge_device, vxlan_dev, old_vid,
-NULL);
+   bridge_device->ops->vxlan_join(bridge_device, vxlan_dev, old_vid, NULL);
return err;
 }
 
-- 
2.28.0

[PATCH net-next 01/13] mlxsw: Use one enum for all registers that contain tunnel_port field

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

Currently SFN, TNUMT and TNPC registers use separate enums for
tunnel_port.

Create one enum with a neutral name and use it.
Remove the enums that are not currently required.

The next patches add two more registers that contain tunnel_port field,
the new enum can be used for them also.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 32 ++-
 .../ethernet/mellanox/mlxsw/spectrum_nve.c|  2 +-
 .../mellanox/mlxsw/spectrum_nve_vxlan.c   |  2 +-
 3 files changed, 11 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 1077ed2046fe..0a3c5f89268c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -581,6 +581,13 @@ mlxsw_reg_sfd_uc_tunnel_pack(char *payload, int rec_index,
mlxsw_reg_sfd_uc_tunnel_protocol_set(payload, rec_index, proto);
 }
 
+enum mlxsw_reg_tunnel_port {
+   MLXSW_REG_TUNNEL_PORT_NVE,
+   MLXSW_REG_TUNNEL_PORT_VPLS,
+   MLXSW_REG_TUNNEL_PORT_FLEX_TUNNEL0,
+   MLXSW_REG_TUNNEL_PORT_FLEX_TUNNEL1,
+};
+
 /* SFN - Switch FDB Notification Register
  * ---
  * The switch provides notifications on newly learned FDB entries and
@@ -738,13 +745,6 @@ MLXSW_ITEM32_INDEXED(reg, sfn, uc_tunnel_protocol, 
MLXSW_REG_SFN_BASE_LEN, 27,
 MLXSW_ITEM32_INDEXED(reg, sfn, uc_tunnel_uip_lsb, MLXSW_REG_SFN_BASE_LEN, 0,
 24, MLXSW_REG_SFN_REC_LEN, 0x0C, false);
 
-enum mlxsw_reg_sfn_tunnel_port {
-   MLXSW_REG_SFN_TUNNEL_PORT_NVE,
-   MLXSW_REG_SFN_TUNNEL_PORT_VPLS,
-   MLXSW_REG_SFN_TUNNEL_FLEX_TUNNEL0,
-   MLXSW_REG_SFN_TUNNEL_FLEX_TUNNEL1,
-};
-
 /* reg_sfn_uc_tunnel_port
  * Tunnel port.
  * Reserved on Spectrum.
@@ -10507,13 +10507,6 @@ enum mlxsw_reg_tnumt_record_type {
  */
 MLXSW_ITEM32(reg, tnumt, record_type, 0x00, 28, 4);
 
-enum mlxsw_reg_tnumt_tunnel_port {
-   MLXSW_REG_TNUMT_TUNNEL_PORT_NVE,
-   MLXSW_REG_TNUMT_TUNNEL_PORT_VPLS,
-   MLXSW_REG_TNUMT_TUNNEL_FLEX_TUNNEL0,
-   MLXSW_REG_TNUMT_TUNNEL_FLEX_TUNNEL1,
-};
-
 /* reg_tnumt_tunnel_port
  * Tunnel port.
  * Access: RW
@@ -10561,7 +10554,7 @@ MLXSW_ITEM32_INDEXED(reg, tnumt, udip_ptr, 0x0C, 0, 24, 
0x04, 0x00, false);
 
 static inline void mlxsw_reg_tnumt_pack(char *payload,
enum mlxsw_reg_tnumt_record_type type,
-   enum mlxsw_reg_tnumt_tunnel_port tport,
+   enum mlxsw_reg_tunnel_port tport,
u32 underlay_mc_ptr, bool vnext,
u32 next_underlay_mc_ptr,
u8 record_size)
@@ -10725,13 +10718,6 @@ static inline void mlxsw_reg_tndem_pack(char *payload, 
u8 underlay_ecn,
 
 MLXSW_REG_DEFINE(tnpc, MLXSW_REG_TNPC_ID, MLXSW_REG_TNPC_LEN);
 
-enum mlxsw_reg_tnpc_tunnel_port {
-   MLXSW_REG_TNPC_TUNNEL_PORT_NVE,
-   MLXSW_REG_TNPC_TUNNEL_PORT_VPLS,
-   MLXSW_REG_TNPC_TUNNEL_FLEX_TUNNEL0,
-   MLXSW_REG_TNPC_TUNNEL_FLEX_TUNNEL1,
-};
-
 /* reg_tnpc_tunnel_port
  * Tunnel port.
  * Access: Index
@@ -10751,7 +10737,7 @@ MLXSW_ITEM32(reg, tnpc, learn_enable_v6, 0x04, 1, 1);
 MLXSW_ITEM32(reg, tnpc, learn_enable_v4, 0x04, 0, 1);
 
 static inline void mlxsw_reg_tnpc_pack(char *payload,
-  enum mlxsw_reg_tnpc_tunnel_port tport,
+  enum mlxsw_reg_tunnel_port tport,
   bool learn_enable)
 {
MLXSW_REG_ZERO(tnpc, payload);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index 54d3e7dcd303..ed0d334b5fd1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -368,7 +368,7 @@ mlxsw_sp_nve_mc_record_refresh(struct 
mlxsw_sp_nve_mc_record *mc_record)
next_valid = true;
}
 
-   mlxsw_reg_tnumt_pack(tnumt_pl, type, MLXSW_REG_TNUMT_TUNNEL_PORT_NVE,
+   mlxsw_reg_tnumt_pack(tnumt_pl, type, MLXSW_REG_TUNNEL_PORT_NVE,
 mc_record->kvdl_index, next_valid,
 next_kvdl_index, mc_record->num_entries);
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
index 05517c7feaa5..e9bff13ec264 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
@@ -299,7 +299,7 @@ static bool mlxsw_sp2_nve_vxlan_learning_set(struct 
mlxsw_sp *mlxsw_sp,
 {
char tnpc_pl[MLXSW_REG_TNPC_LEN];
 
-   mlxsw_reg_tnpc_pack(tnpc_pl, MLXSW_REG_TNPC_TUNNEL_PORT_NVE,
+   mlxsw_reg_tnpc_pack(tnpc_pl, M

[PATCH net-next 11/13] mlxsw: spectrum_switchdev: Allow joining VxLAN to 802.1ad bridge

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

The previous patches added support for VxLAN device enslaved to 802.1ad
bridge in Spectrum-2 ASIC and vetoed it in Spectrum-1.

Do not veto VxLAN with 802.1ad bridge.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 73290f71eb9c..cea42f6ed89b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2347,8 +2347,8 @@ mlxsw_sp_bridge_8021ad_vxlan_join(struct 
mlxsw_sp_bridge_device *bridge_device,
  const struct net_device *vxlan_dev, u16 vid,
  struct netlink_ext_ack *extack)
 {
-   NL_SET_ERR_MSG_MOD(extack, "VXLAN is not supported with 802.1ad");
-   return -EOPNOTSUPP;
+   return mlxsw_sp_bridge_vlan_aware_vxlan_join(bridge_device, vxlan_dev,
+vid, ETH_P_8021AD, extack);
 }
 
 static const struct mlxsw_sp_bridge_ops mlxsw_sp_bridge_8021ad_ops = {
-- 
2.28.0

[PATCH net-next 08/13] mlxsw: spectrum_nve_vxlan: Add support for Q-in-VNI for Spectrum-2 ASIC

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

On Spectrum-2, the default setting is not to push VLAN to the decapsulated
packet. This is controlled by SPVTR.ipvid_mode.
Set SPVTR.ipvid_mode to always push VLAN.
Without this setting, Spectrum-2 overtakes the VLAN tag of decapsulated
packet for bridging.

In addition, set SPVID register to use EtherType saved in
mlxsw_sp_nve_config when VLAN is pushed for the NVE tunnel.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../mellanox/mlxsw/spectrum_nve_vxlan.c   | 42 +++
 1 file changed, 42 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
index f9a48a0109ff..b586c8f34d49 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
@@ -305,11 +305,30 @@ static bool mlxsw_sp2_nve_vxlan_learning_set(struct 
mlxsw_sp *mlxsw_sp,
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tnpc), tnpc_pl);
 }
 
+static int
+mlxsw_sp2_nve_decap_ethertype_set(struct mlxsw_sp *mlxsw_sp, u16 ethertype)
+{
+   char spvid_pl[MLXSW_REG_SPVID_LEN] = {};
+   u8 sver_type;
+   int err;
+
+   mlxsw_reg_spvid_tport_set(spvid_pl, true);
+   mlxsw_reg_spvid_local_port_set(spvid_pl,
+  MLXSW_REG_TUNNEL_PORT_NVE);
+   err = mlxsw_sp_ethtype_to_sver_type(ethertype, &sver_type);
+   if (err)
+   return err;
+
+   mlxsw_reg_spvid_et_vlan_set(spvid_pl, sver_type);
+   return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(spvid), spvid_pl);
+}
+
 static int
 mlxsw_sp2_nve_vxlan_config_set(struct mlxsw_sp *mlxsw_sp,
   const struct mlxsw_sp_nve_config *config)
 {
char tngcr_pl[MLXSW_REG_TNGCR_LEN];
+   char spvtr_pl[MLXSW_REG_SPVTR_LEN];
u16 ul_rif_index;
int err;
 
@@ -330,8 +349,25 @@ mlxsw_sp2_nve_vxlan_config_set(struct mlxsw_sp *mlxsw_sp,
if (err)
goto err_tngcr_write;
 
+   mlxsw_reg_spvtr_pack(spvtr_pl, true, MLXSW_REG_TUNNEL_PORT_NVE,
+MLXSW_REG_SPVTR_IPVID_MODE_ALWAYS_PUSH_VLAN);
+   err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(spvtr), spvtr_pl);
+   if (err)
+   goto err_spvtr_write;
+
+   err = mlxsw_sp2_nve_decap_ethertype_set(mlxsw_sp, config->ethertype);
+   if (err)
+   goto err_decap_ethertype_set;
+
return 0;
 
+err_decap_ethertype_set:
+   mlxsw_reg_spvtr_pack(spvtr_pl, true, MLXSW_REG_TUNNEL_PORT_NVE,
+MLXSW_REG_SPVTR_IPVID_MODE_IEEE_COMPLIANT_PVID);
+   mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(spvtr), spvtr_pl);
+err_spvtr_write:
+   mlxsw_reg_tngcr_pack(tngcr_pl, MLXSW_REG_TNGCR_TYPE_VXLAN, false, 0);
+   mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tngcr), tngcr_pl);
 err_tngcr_write:
mlxsw_sp2_nve_vxlan_learning_set(mlxsw_sp, false);
 err_vxlan_learning_set:
@@ -341,8 +377,14 @@ mlxsw_sp2_nve_vxlan_config_set(struct mlxsw_sp *mlxsw_sp,
 
 static void mlxsw_sp2_nve_vxlan_config_clear(struct mlxsw_sp *mlxsw_sp)
 {
+   char spvtr_pl[MLXSW_REG_SPVTR_LEN];
char tngcr_pl[MLXSW_REG_TNGCR_LEN];
 
+   /* Set default EtherType */
+   mlxsw_sp2_nve_decap_ethertype_set(mlxsw_sp, ETH_P_8021Q);
+   mlxsw_reg_spvtr_pack(spvtr_pl, true, MLXSW_REG_TUNNEL_PORT_NVE,
+MLXSW_REG_SPVTR_IPVID_MODE_IEEE_COMPLIANT_PVID);
+   mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(spvtr), spvtr_pl);
mlxsw_reg_tngcr_pack(tngcr_pl, MLXSW_REG_TNGCR_TYPE_VXLAN, false, 0);
mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tngcr), tngcr_pl);
mlxsw_sp2_nve_vxlan_learning_set(mlxsw_sp, false);
-- 
2.28.0

[PATCH net-next 10/13] mlxsw: Veto Q-in-VNI for Spectrum-1 ASIC

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

Implementation of Q-in-VNI is different between ASIC types, this set adds
support only for Spectrum-2.

Return an error when trying to create VxLAN device and enslave it to
802.1ad bridge in Spectrum-1.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../net/ethernet/mellanox/mlxsw/spectrum_nve.c |  2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_nve.h |  2 +-
 .../mellanox/mlxsw/spectrum_nve_vxlan.c| 18 +++---
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index adf499665f87..e5ec595593f4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -798,7 +798,7 @@ int mlxsw_sp_nve_fid_enable(struct mlxsw_sp *mlxsw_sp, 
struct mlxsw_sp_fid *fid,
 
ops = nve->nve_ops_arr[params->type];
 
-   if (!ops->can_offload(nve, params->dev, extack))
+   if (!ops->can_offload(nve, params, extack))
return -EINVAL;
 
memset(&config, 0, sizeof(config));
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
index 68bd9422be2a..2796d3659979 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
@@ -36,7 +36,7 @@ struct mlxsw_sp_nve {
 struct mlxsw_sp_nve_ops {
enum mlxsw_sp_nve_type type;
bool (*can_offload)(const struct mlxsw_sp_nve *nve,
-   const struct net_device *dev,
+   const struct mlxsw_sp_nve_params *params,
struct netlink_ext_ack *extack);
void (*nve_config)(const struct mlxsw_sp_nve *nve,
   const struct mlxsw_sp_nve_params *params,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
index b586c8f34d49..3e2bb22e9ca6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
@@ -22,10 +22,10 @@
 VXLAN_F_LEARN)
 
 static bool mlxsw_sp_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve,
-  const struct net_device *dev,
+  const struct mlxsw_sp_nve_params 
*params,
   struct netlink_ext_ack *extack)
 {
-   struct vxlan_dev *vxlan = netdev_priv(dev);
+   struct vxlan_dev *vxlan = netdev_priv(params->dev);
struct vxlan_config *cfg = &vxlan->cfg;
 
if (cfg->saddr.sa.sa_family != AF_INET) {
@@ -86,6 +86,18 @@ static bool mlxsw_sp_nve_vxlan_can_offload(const struct 
mlxsw_sp_nve *nve,
return true;
 }
 
+static bool mlxsw_sp1_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve,
+   const struct mlxsw_sp_nve_params 
*params,
+   struct netlink_ext_ack *extack)
+{
+   if (params->ethertype == ETH_P_8021AD) {
+   NL_SET_ERR_MSG_MOD(extack, "VxLAN: 802.1ad bridge is not 
supported with VxLAN");
+   return false;
+   }
+
+   return mlxsw_sp_nve_vxlan_can_offload(nve, params, extack);
+}
+
 static void mlxsw_sp_nve_vxlan_config(const struct mlxsw_sp_nve *nve,
  const struct mlxsw_sp_nve_params *params,
  struct mlxsw_sp_nve_config *config)
@@ -287,7 +299,7 @@ mlxsw_sp_nve_vxlan_clear_offload(const struct net_device 
*nve_dev, __be32 vni)
 
 const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops = {
.type   = MLXSW_SP_NVE_TYPE_VXLAN,
-   .can_offload= mlxsw_sp_nve_vxlan_can_offload,
+   .can_offload= mlxsw_sp1_nve_vxlan_can_offload,
.nve_config = mlxsw_sp_nve_vxlan_config,
.init   = mlxsw_sp1_nve_vxlan_init,
.fini   = mlxsw_sp1_nve_vxlan_fini,
-- 
2.28.0

[PATCH net-next 13/13] selftests: mlxsw: Add Q-in-VNI veto tests

2020-12-08 Thread Ido Schimmel

From: Amit Cohen 

Add tests to ensure that the forbidden and unsupported cases are indeed
vetoed by mlxsw driver.

Signed-off-by: Amit Cohen 
Reviewed-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../net/mlxsw/spectrum-2/q_in_vni_veto.sh | 77 +++
 .../net/mlxsw/spectrum/q_in_vni_veto.sh   | 66 
 2 files changed, 143 insertions(+)
 create mode 100755 
tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh
 create mode 100755 
tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh

diff --git 
a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh 
b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh
new file mode 100755
index ..0231205a7147
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/q_in_vni_veto.sh
@@ -0,0 +1,77 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+lib_dir=$(dirname $0)/../../../../net/forwarding
+
+VXPORT=4789
+
+ALL_TESTS="
+   create_dot1d_and_dot1ad_vxlans
+"
+NUM_NETIFS=2
+source $lib_dir/lib.sh
+
+setup_prepare()
+{
+   swp1=${NETIFS[p1]}
+   swp2=${NETIFS[p2]}
+
+   ip link set dev $swp1 up
+   ip link set dev $swp2 up
+}
+
+cleanup()
+{
+   pre_cleanup
+
+   ip link set dev $swp2 down
+   ip link set dev $swp1 down
+}
+
+create_dot1d_and_dot1ad_vxlans()
+{
+   RET=0
+
+   ip link add dev br0 type bridge vlan_filtering 1 vlan_protocol 802.1ad \
+   vlan_default_pvid 0 mcast_snooping 0
+   ip link set dev br0 up
+
+   ip link add name vx100 type vxlan id 1000 local 192.0.2.17 dstport \
+   "$VXPORT" nolearning noudpcsum tos inherit ttl 100
+   ip link set dev vx100 up
+
+   ip link set dev $swp1 master br0
+   ip link set dev vx100 master br0
+   bridge vlan add vid 100 dev vx100 pvid untagged
+
+   ip link add dev br1 type bridge vlan_filtering 0 mcast_snooping 0
+   ip link set dev br1 up
+
+   ip link add name vx200 type vxlan id 2000 local 192.0.2.17 dstport \
+   "$VXPORT" nolearning noudpcsum tos inherit ttl 100
+   ip link set dev vx200 up
+
+   ip link set dev $swp2 master br1
+   ip link set dev vx200 master br1 2>/dev/null
+   check_fail $? "802.1d and 802.1ad VxLANs at the same time not rejected"
+
+   ip link set dev vx200 master br1 2>&1 >/dev/null \
+   | grep -q mlxsw_spectrum
+   check_err $? "802.1d and 802.1ad VxLANs at the same time rejected 
without extack"
+
+   log_test "create 802.1d and 802.1ad VxLANs"
+
+   ip link del dev vx200
+   ip link del dev br1
+   ip link del dev vx100
+   ip link del dev br0
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
diff --git 
a/tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh 
b/tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh
new file mode 100755
index ..f0443b1b05b9
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum/q_in_vni_veto.sh
@@ -0,0 +1,66 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+lib_dir=$(dirname $0)/../../../../net/forwarding
+
+VXPORT=4789
+
+ALL_TESTS="
+   create_vxlan_on_top_of_8021ad_bridge
+"
+NUM_NETIFS=2
+source $lib_dir/lib.sh
+
+setup_prepare()
+{
+   swp1=${NETIFS[p1]}
+   swp2=${NETIFS[p2]}
+
+   ip link set dev $swp1 up
+   ip link set dev $swp2 up
+}
+
+cleanup()
+{
+   pre_cleanup
+
+   ip link set dev $swp2 down
+   ip link set dev $swp1 down
+}
+
+create_vxlan_on_top_of_8021ad_bridge()
+{
+   RET=0
+
+   ip link add dev br0 type bridge vlan_filtering 1 vlan_protocol 802.1ad \
+   vlan_default_pvid 0 mcast_snooping 0
+   ip link set dev br0 up
+
+   ip link add name vx100 type vxlan id 1000 local 192.0.2.17 dstport \
+   "$VXPORT" nolearning noudpcsum tos inherit ttl 100
+   ip link set dev vx100 up
+
+   ip link set dev $swp1 master br0
+   ip link set dev vx100 master br0
+
+   bridge vlan add vid 100 dev vx100 pvid untagged 2>/dev/null
+   check_fail $? "802.1ad bridge with VxLAN in Spectrum-1 not rejected"
+
+   bridge vlan add vid 100 dev vx100 pvid untagged 2>&1 >/dev/null \
+   | grep -q mlxsw_spectrum
+   check_err $? "802.1ad bridge with VxLAN in Spectrum-1 rejected without 
extack"
+
+   log_test "create VxLAN on top of 802.1ad bridge"
+
+   ip link del dev vx100
+   ip link del dev br0
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.28.0

[PATCH net-next 12/13] selftests: forwarding: Add Q-in-VNI test

2020-12-08 Thread Ido Schimmel

From: Petr Machata 

Add test to check Q-in-VNI traffic.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
 .../selftests/net/forwarding/q_in_vni.sh  | 347 ++
 1 file changed, 347 insertions(+)
 create mode 100755 tools/testing/selftests/net/forwarding/q_in_vni.sh

diff --git a/tools/testing/selftests/net/forwarding/q_in_vni.sh 
b/tools/testing/selftests/net/forwarding/q_in_vni.sh
new file mode 100755
index ..4c50c0234bce
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/q_in_vni.sh
@@ -0,0 +1,347 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# +---+  ++
+# | H1 (vrf)  |  | H2 (vrf)   |
+# |  + $h1.10 |  |  + $h2.10  |
+# |  | 192.0.2.1/28   |  |  | 192.0.2.2/28|
+# |  ||  |  | |
+# |  | + $h1.20   |  |  | + $h2.20|
+# |  \ | 198.51.100.1/24  |  |  \ | 198.51.100.2/24   |
+# |   \|  |  |   \|   |
+# |+ $h1  |  |+ $h2   |
+# +|--+  +|---+
+#  |  |
+# +|--|---+
+# | SW |  |   |
+# | +--|--|-+ |
+# | |  + $swp1   BR1 (802.1ad)+ $swp2   | |
+# | |vid 100 pvid untagged  vid 100 pvid| |
+# | |   untagged| |
+# | |  + vx100 (vxlan)  | |
+# | |local 192.0.2.17   | |
+# | |remote 192.0.2.34 192.0.2.50   | |
+# | |id 1000 dstport $VXPORT| |
+# | |vid 100 pvid untagged  | |
+# | +---+ |
+# |   |
+# |  192.0.2.32/28 via 192.0.2.18 |
+# |  192.0.2.48/28 via 192.0.2.18 |
+# |   |
+# |+ $rp1 |
+# || 192.0.2.17/28|
+# +|--+
+#  |
+# +|+
+# || VRP2 (vrf) |
+# |+ $rp2   |
+# |  192.0.2.18/28  |
+# | |   (maybe) HW
+# =
+# | |  (likely) SW
+# |+ v1 (veth) + v3 (veth)  |
+# || 192.0.2.33/28 | 192.0.2.49/28  |
+# +|---|+
+#  |   |
+# +|--+   +|--+
+# |+ v2 (veth)NS1 (netns) |   |+ v4 (veth)NS2 (netns) |
+# |  192.0.2.34/28|   |  192.0.2.50/28|
+# |   |   |   |
+# |   192.0.2.16/28 via 192.0.2.33|   |   192.0.2.16/28 via 192.0.2.49|
+# |   192.0.2.50/32 via 192.0.2.33|   |   192.0.2.34/32 via 192.0.2.49|
+# |   |   |   |
+# | +---+ |   | +---+ |
+# | | BR2 (802.1ad) | |   | | BR2 (802.1ad) | |
+# | |  + vx100 (vxlan)  | |   | |  + vx100 (vxlan)  | |
+# | |local 192.0.2.34   | |   | |local 192.0.2.50   | |
+# | |remote 192.0.2.17  | |   | |remote 192.0.2.17  | |
+# | |remote 192.0.2.50  | |   | |remote 192.0.2.34  | |
+# | |id 1000 dstport $VXPORT| |   | |id 1000 dstport $VXPORT| |
+# | |vid 100 pvid untagged  | |   | |vid 100 pvid untagged

Re: [PATCH v3 0/7] Improve s0ix flows for systems i219LM

2020-12-08 Thread Hans de Goede

Hi,

On 12/8/20 6:08 AM, Neftin, Sasha wrote:
> On 12/7/2020 17:41, Limonciello, Mario wrote:
>>> First of all thank you for working on this.
>>>
>>> I must say though that I don't like the approach taken here very
>>> much.
>>>
>>> This is not so much a criticism of this series as it is a criticism
>>> of the earlier decision to simply disable s0ix on all devices
>>> with the i219-LM + and active ME.
>>
>> I was not happy with that decision either as it did cause regressions
>> on all of the "named" Comet Lake laptops that were in the market at
>> the time.  The "unnamed" ones are not yet released, and I don't feel
>> it's fair to call it a regression on "unreleased" hardware.
>>
>>>
>>> AFAIK there was a perfectly acceptable patch to workaround those
>>> broken devices, which increased a timeout:
>>> https://patchwork.ozlabs.org/project/intel-wired-
>>> lan/patch/20200323191639.48826-1-aaron...@canonical.com/
>>>
>>> That patch was nacked because it increased the resume time
>>> *on broken devices*.
>>>
> Officially CSME/ME not POR for Linux and we haven't interfrace to the ME. 
> Nobody can tell how long (and why) ME will hold PHY access semaphore ant just 
> increasing the resuming time (ULP configure) won't be solve the problem. This 
> is not reliable approach.
> I would agree users can add ME system on their responsibilities.

It is not clear to me what you are trying to say here.

Are you saying that you insist on keeping the e1000e_check_me check and
thus needlessly penalizing 100s of laptops models with higher
power-consumption unless these 100s of laptops are added manually
to an allow list for this?

I'm sorry but that is simply unacceptable, the maintenance burden
of that is just way too high.

Testing on the models where the timeout issue was first hit has
shown that increasing the timeout does actually fix it on those
models. Sure in theory the ME on some buggy model could hold the
semaphore even longer, but then the right thing would be to
have a deny-list for s0ix where we can add those buggy models
(none of which we have encountered sofar). Just like we have
denylist for buggy hw in other places in the kernel.

Maintaining an ever growing allow list for the *theoretical*
case of encountering a model where things do not work with
the increased timeout is not a workable and this not an
acceptable solution.

The initial addition of the e1000e_check_me check instead
of just going with the confirmed fix of bumping the timeout
was already highly controversial and should IMHO never have
been done.

Combining this with an ever-growing allow-list on which every
new laptop model needs to be added separately + a new
"s0ix-enabled" ethertool flag, which existence is basically
an admission that the allow-list approach is flawed goes
from controversial to just plain not acceptable.

Regards,

Hans



>>> So it seems to me that we have a simple choice here:
>>>
>>> 1. Longer resume time on devices with an improperly configured ME
>>> 2. Higher power-consumption on all non-buggy devices
>>>
>>> Your patches 4-7 try to workaround 2. but IMHO those are just
>>> bandaids for getting the initial priorities *very* wrong.
>>
>> They were done based upon the discussion in that thread you linked and 
>> others.
>> If the owners of this driver feel it's possible/scalable to follow your 
>> proposal
>> I'm happy to resubmit a new v4 series with these sets of patches:
>>
>> 1) Fixup for the exception corner case referenced in this thread
>> 2) Patch 1 from this series that fixes cable connected case
>> 3) Increase the timeout (from your referenced link)
>> 4) Revert the ME disallow list
>>
>>>
>>> Instead of penalizing non-buggy devices with a higher power-consumption,
>>> we should default to penalizing the buggy devices with a higher
>>> resume time. And if it is decided that the higher resume time is
>>> a worse problem then the higher power-consumption, then there
>>> should be a list of broken devices and s0ix can be disabled on those.
>>
>> I'm perfectly happy either way, my primary goal is that Dell's notebooks and
>> desktops that meet the architectural and firmware guidelines for appropriate
>> low power consumption over s0ix are not penalized.
>>
>>>
>>> The current allow-list approach is simply never going to work well
>>> leading to too high power-consumption on countless devices.
>>> This is going to be an endless game of whack-a-mole and as
>>> such really is a bad idea.
>>
>> I envisioned that it would evolve over time.  For example if by the time Dell
>> finished shipping new CML models it was deemed that all the CML hardware was 
>> done
>> properly it could instead by an allow list of Dell + Comet Point.
>> If all of Tiger Lake are done properly 'maybe' by the time the ML ships 
>> maybe it
>> could be an allow list of Dell + CML or newer.
>>
>> But even if the heuristic changed - this particular configuration needs to 
>> be tested
>> on every single new model.  All of the notebooks that have a T

Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set

2020-12-08 Thread Daniel Borkmann


On 12/8/20 10:00 AM, Jesper Dangaard Brouer wrote:

On Mon, 07 Dec 2020 12:52:22 -0800
John Fastabend  wrote:


Use-case(1): Cloud-provider want to give customers (running VMs) ability
to load XDP program for DDoS protection (only), but don't want to allow
customer to use XDP_TX (that can implement LB or cheat their VM
isolation policy).


Not following. What interface do they want to allow loading on? If its
the VM interface then I don't see how it matters. From outside the
VM there should be no way to discover if its done in VM or in tc or
some other stack.

If its doing some onloading/offloading I would assume they need to
ensure the isolation, etc. is still maintained because you can't
let one VMs program work on other VMs packets safely.

So what did I miss, above doesn't make sense to me.


The Cloud-provider want to load customer provided BPF-code on the
physical Host-OS NIC (that support XDP).  The customer can get access
to a web-interface where they can write or upload their BPF-prog.

As multiple customers can upload BPF-progs, the Cloud-provider have to
write a BPF-prog dispatcher that runs these multiple program.  This
could be done via BPF tail-calls, or via Toke's libxdp[1], or via
devmap XDP-progs per egress port.

The Cloud-provider don't fully trust customers BPF-prog.   They already
pre-filtered traffic to the given VM, so they can allow customers
freedom to see traffic and do XDP_PASS and XDP_DROP.  They
administratively (via ethtool) want to disable the XDP_REDIRECT and
XDP_TX driver feature, as it can be used for violation their VM
isolation policy between customers.

Is the use-case more clear now?


I think we're talking about two different things. The use case as I understood
it in (1) mentioned to be able to disable XDP_TX for NICs that are deployed in
the VM. This would be a no-go as-is since that would mean my basic assumption
for attaching XDP progs is gone in that today return codes pass/drop/tx is
pretty much available everywhere on native XDP supported NICs. And if you've
tried it on major cloud providers like AWS or Azure that offer SRIOV-based
networking that works okay and further restricting this would mean breakage of
existing programs.

What you mean here is "offload" from guest to host which is a different use
case than what likely John and I read from your description in (1). Such program
should then be loaded via BPF offload API. Meaning, if offload is used and the
host is then configured to disallow XDP_TX for such requests from guests, then
these get rejected through such facility, but if the /same/ program was loaded 
as
regular native XDP where it's still running in the guest, then it must succeed.
These are two entirely different things.

It's not clear to me whether some ethtool XDP properties flag is the right place
to describe this (plus this needs to differ between offloaded / non-offloaded 
progs)
or whether this should be an implementation detail for things like virtio_net 
e.g.
via virtio_has_feature(). Feels more like the latter to me which already has 
such
a facility in place.

[PATCH 0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)'

2020-12-08 Thread SeongJae Park

From: SeongJae Park 

On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls
make the number of active slab objects including 'sock_inode_cache' type
rapidly and continuously increase.  As a result, memory pressure occurs.

'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the
relevant memory objects.  They are asynchronously invoked by the work
queues and internally use 'rcu_barrier()' to ensure safe destructions.
'cleanup_net()' works in a batched maneer in a single thread worker,
while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the
'system_wq'.

Therefore, 'fqdir_work_fn()' called frequently under the workload and
made the contention for 'rcu_barrier()' high.  In more detail, the
global mutex, 'rcu_state.barrier_mutex' became the bottleneck.

I tried making 'fqdir_work_fn()' batched and confirmed it works.  The
following patch is for the change.  I think this is the right solution
for point fix of this issue, but someone might blame different parts.

1. User: Frequent 'unshare()' calls
>From some point of view, such frequent 'unshare()' calls might seem only
insane.

2. Global mutex in 'rcu_barrier()'
Because of the global mutex, 'rcu_barrier()' callers could wait long
even after the callbacks started before the call finished.  Therefore,
similar issues could happen in another 'rcu_barrier()' usages.  Maybe we
can use some wait queue like mechanism to notify the waiters when the
desired time came.

I personally believe applying the point fix for now and making
'rcu_barrier()' improvement in longterm make sense.  If I'm missing
something or you have different opinions, please feel free to let me
know.

SeongJae Park (1):
  net/ipv4/inet_fragment: Batch fqdir destroy works

 include/net/inet_frag.h  |  2 +-
 net/ipv4/inet_fragment.c | 28 
 2 files changed, 21 insertions(+), 9 deletions(-)

-- 
2.17.1

[PATCH 1/1] net/ipv4/inet_fragment: Batch fqdir destroy works

2020-12-08 Thread SeongJae Park

From: SeongJae Park 

In 'fqdir_exit()', a work for destruction of the 'fqdir' is enqueued.
The work function, 'fqdir_work_fn()', calls 'rcu_barrier()'.  In case of
intensive 'fqdir_exit()' (e.g., frequent 'unshare(CLONE_NEWNET)'
systemcalls), this increased contention could result in unacceptably
high latency of 'rcu_barrier()'.  This commit avoids such contention by
doing the destruction in batched manner, as similar to that of
'cleanup_net()'.

Signed-off-by: SeongJae Park 
---
 include/net/inet_frag.h  |  2 +-
 net/ipv4/inet_fragment.c | 28 
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index bac79e817776..558893d8810c 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -20,7 +20,7 @@ struct fqdir {
 
/* Keep atomic mem on separate cachelines in structs that include it */
atomic_long_t   mem cacheline_aligned_in_smp;
-   struct work_struct  destroy_work;
+   struct llist_node   destroy_list;
 };
 
 /**
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 10d31733297d..796b559137c5 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -145,12 +145,19 @@ static void inet_frags_free_cb(void *ptr, void *arg)
inet_frag_destroy(fq);
 }
 
+static LLIST_HEAD(destroy_list);
+
 static void fqdir_work_fn(struct work_struct *work)
 {
-   struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work);
-   struct inet_frags *f = fqdir->f;
+   struct llist_node *kill_list;
+   struct fqdir *fqdir;
+   struct inet_frags *f;
+
+   /* Atomically snapshot the list of fqdirs to destroy */
+   kill_list = llist_del_all(&destroy_list);
 
-   rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, 
NULL);
+   llist_for_each_entry(fqdir, kill_list, destroy_list)
+   rhashtable_free_and_destroy(&fqdir->rhashtable, 
inet_frags_free_cb, NULL);
 
/* We need to make sure all ongoing call_rcu(..., inet_frag_destroy_rcu)
 * have completed, since they need to dereference fqdir.
@@ -158,10 +165,13 @@ static void fqdir_work_fn(struct work_struct *work)
 */
rcu_barrier();
 
-   if (refcount_dec_and_test(&f->refcnt))
-   complete(&f->completion);
+   llist_for_each_entry(fqdir, kill_list, destroy_list) {
+   f = fqdir->f;
+   if (refcount_dec_and_test(&f->refcnt))
+   complete(&f->completion);
 
-   kfree(fqdir);
+   kfree(fqdir);
+   }
 }
 
 int fqdir_init(struct fqdir **fqdirp, struct inet_frags *f, struct net *net)
@@ -184,10 +194,12 @@ int fqdir_init(struct fqdir **fqdirp, struct inet_frags 
*f, struct net *net)
 }
 EXPORT_SYMBOL(fqdir_init);
 
+static DECLARE_WORK(fqdir_destroy_work, fqdir_work_fn);
+
 void fqdir_exit(struct fqdir *fqdir)
 {
-   INIT_WORK(&fqdir->destroy_work, fqdir_work_fn);
-   queue_work(system_wq, &fqdir->destroy_work);
+   if (llist_add(&fqdir->destroy_list, &destroy_list))
+   queue_work(system_wq, &fqdir_destroy_work);
 }
 EXPORT_SYMBOL(fqdir_exit);
 
-- 
2.17.1

Re: [PATCH v5 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame

2020-12-08 Thread Jesper Dangaard Brouer

On Mon, 07 Dec 2020 22:49:55 -0800
Saeed Mahameed  wrote:

> On Mon, 2020-12-07 at 19:16 -0800, Alexander Duyck wrote:
> > On Mon, Dec 7, 2020 at 3:03 PM Saeed Mahameed 
> > wrote:  
> > > On Mon, 2020-12-07 at 13:16 -0800, Alexander Duyck wrote:  
> > > > On Mon, Dec 7, 2020 at 8:36 AM Lorenzo Bianconi <  
> > > > lore...@kernel.org>  
> > > > wrote:  
> > > > > Introduce multi-buffer bit (mb) in xdp_frame/xdp_buffer data
> > > > > structure
> > > > > in order to specify if this is a linear buffer (mb = 0) or a
> > > > > multi-
> > > > > buffer
> > > > > frame (mb = 1). In the latter case the shared_info area at the
> > > > > end
> > > > > of the
> > > > > first buffer is been properly initialized to link together
> > > > > subsequent
> > > > > buffers.
> > > > > 
> > > > > Signed-off-by: Lorenzo Bianconi 
> > > > > ---
> > > > >  include/net/xdp.h | 8 ++--
> > > > >  net/core/xdp.c| 1 +
> > > > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > > > index 700ad5db7f5d..70559720ff44 100644
> > > > > --- a/include/net/xdp.h
> > > > > +++ b/include/net/xdp.h
> > > > > @@ -73,7 +73,8 @@ struct xdp_buff {
> > > > > void *data_hard_start;
> > > > > struct xdp_rxq_info *rxq;
> > > > > struct xdp_txq_info *txq;
> > > > > -   u32 frame_sz; /* frame size to deduce
> > > > > data_hard_end/reserved tailroom*/
> > > > > +   u32 frame_sz:31; /* frame size to deduce
> > > > > data_hard_end/reserved tailroom*/
> > > > > +   u32 mb:1; /* xdp non-linear buffer */
> > > > >  };
> > > > >   
> > > > 
> > > > If we are really going to do something like this I say we should
> > > > just
> > > > rip a swath of bits out instead of just grabbing one. We are
> > > > already
> > > > cutting the size down then we should just decide on the minimum
> > > > size
> > > > that is acceptable and just jump to that instead of just stealing
> > > > one
> > > > bit at a time. It looks like we already have differences between
> > > > the
> > > > size here and frame_size in xdp_frame.
> > > >   
> > > 
> > > +1
> > >   
> > > > If we have to steal a bit why not look at something like one of
> > > > the
> > > > lower 2/3 bits in rxq? You could then do the same thing using
> > > > dev_rx
> > > > in a similar fashion instead of stealing from a bit that is
> > > > likely to
> > > > be used in multiple spots and modifying like this adds extra
> > > > overhead
> > > > to?
> > > >   
> > > 
> > > What do you mean in rxq ? from the pointer ?  
> > 
> > Yeah, the pointers have a few bits that are guaranteed 0 and in my
> > mind reusing the lower bits from a 4 or 8 byte aligned pointer would
> > make more sense then stealing the upper bits from the size of the
> > frame.  
> 
> Ha, i can't imagine how accessing that pointer would look like ..
> is possible to define the pointer as a bit-field and just access it
> normally ? or do we need to fix it up every time we need to access it ?
> will gcc/static checkers complain about wrong pointer type ?

This is a pattern that is used all over the kernel.  Yes, it needs to
be fixed it up every time we access it.  In this case, we don't want to
to deploy this trick.  For two reason, (1) rxq is accessed by BPF
byte-code rewrite (which would also need to handle masking out the
bit), (2) this optimization is trading CPU cycles for saving space.

IIRC Alexei have already pointed out that the change to struct xdp_buff
looks suboptimal.  Why don't you simply add a u8 with the info.

The general point is that struct xdp_buff layout is for fast access,
and struct xdp_frame is a state compressed version of xdp_buff.  (Still
room in xdp_buff is limited to 64 bytes - one cacheline, which is
rather close according to pahole)

Thus, it is more okay to do these bit tricks in struct xdp_frame.  For
xdp_frame, it might be better to take some room/space from the member
'mem' (struct xdp_mem_info).  (Would it help later that multi-buffer
bit is officially part of struct xdp_mem_info, when later freeing the
memory backing the frame?)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

$ pahole -C xdp_buff
struct xdp_buff {
void * data; /* 0 8 */
void * data_end; /* 8 8 */
void * data_meta;/*16 8 */
void * data_hard_start;  /*24 8 */
struct xdp_rxq_info *  rxq;  /*32 8 */
struct xdp_txq_info *  txq;  /*40 8 */
u32frame_sz; /*48 4 */

/* size: 56, cachelines: 1, members: 7 */
/* padding: 4 */
/* last cacheline: 56 bytes */
};

$ pahole -C xdp_frame
struct xdp_frame {
void *

Re: [PATCH 2/7] net: batman-adv: remove unneeded MODULE_VERSION() usage

2020-12-08 Thread Sven Eckelmann

On Tuesday, 8 December 2020 08:48:56 CET Enrico Weigelt, metux IT consult wrote:
> > Is there some explanation besides an opinion? Some kind goal which you want 
> > to 
> > achieve with it maybe?
> 
> Just a cleanup. I've been under the impression that this version is just
> an relic from oot times.

There are various entities which are loving to use the distro kernel and 
replace the batman-adv module with a backport from a newer kernel version. 
Similar to what is done in OpenWrt for the wifi drivers.

> > At least for us it was an easy way to query the release cycle information 
> > via 
> > batctl. Which made it easier for us to roughly figure out what an reporter/
> > inquirer was using - independent of whether he is using the in-kernel 
> > version 
> > or a backported version.
> 
> Is the OOT scenario still valid ?

Since the backport is OOT - yes, it is still valid.

Kind regards,
Sven

signature.asc
Description: This is a digitally signed message part.

Re: [PATCH RFC] ethernet: stmmac: clean up the code for release/suspend/resume function

2020-12-08 Thread Jisheng Zhang

On Mon,  7 Dec 2020 19:38:49 +0800 Joakim Zhang wrote:


> 
> commit 1c35cc9cf6a0 ("net: stmmac: remove redundant null check before 
> clk_disable_unprepare()"),
> have not clean up check NULL clock parameter completely, this patch did it.
> 
> commit e8377e7a29efb ("net: stmmac: only call pmt() during suspend/resume if 
> HW enables PMT"),
> after this patch, we use
> if (device_may_wakeup(priv->device) && priv->plat->pmt) check MAC wakeup
> if (device_may_wakeup(priv->device)) check PHY wakeup
> Add oneline comment for readability.
> 
> commit 77b2898394e3b ("net: stmmac: Speed down the PHY if WoL to save 
> energy"),
> slow down phy speed when release net device under any condition.
> 
> Slightly adjust the order of the codes so that suspend/resume look more
> symmetrical, generally speaking they should appear symmetrically.
> 
> Signed-off-by: Joakim Zhang 
> ---
>  .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 +--
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index c33db79cdd0a..a46e865c4acc 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2908,8 +2908,7 @@ static int stmmac_release(struct net_device *dev)
> struct stmmac_priv *priv = netdev_priv(dev);
> u32 chan;
> 
> -   if (device_may_wakeup(priv->device))

This check is to prevent link speed down if the stmmac isn't a wakeup device.

> -   phylink_speed_down(priv->phylink, false);
> +   phylink_speed_down(priv->phylink, false);
> /* Stop and disconnect the PHY */
> phylink_stop(priv->phylink);
> phylink_disconnect_phy(priv->phylink);
> @@ -5183,6 +5182,7 @@ int stmmac_suspend(struct device *dev)
> } else {
> mutex_unlock(&priv->lock);
> rtnl_lock();
> +   /* For PHY wakeup case */
> if (device_may_wakeup(priv->device))
> phylink_speed_down(priv->phylink, false);
> phylink_stop(priv->phylink);
> @@ -5260,11 +5260,17 @@ int stmmac_resume(struct device *dev)
> /* enable the clk previously disabled */
> clk_prepare_enable(priv->plat->stmmac_clk);
> clk_prepare_enable(priv->plat->pclk);
> -   if (priv->plat->clk_ptp_ref)
> -   clk_prepare_enable(priv->plat->clk_ptp_ref);
> +   clk_prepare_enable(priv->plat->clk_ptp_ref);

I think this 3 line modifications can be a separated patch.

> /* reset the phy so that it's ready */
> if (priv->mii)
> stmmac_mdio_reset(priv->mii);
> +
> +   rtnl_lock();
> +   phylink_start(priv->phylink);
> +   /* We may have called phylink_speed_down before */
> +   if (device_may_wakeup(priv->device))
> +   phylink_speed_up(priv->phylink);
> +   rtnl_unlock();

This is moving phylink op before mac setup, I'm not sure whether this is safe.

> }
> 
> if (priv->plat->serdes_powerup) {
> @@ -5275,14 +5281,6 @@ int stmmac_resume(struct device *dev)
> return ret;
> }
> 
> -   if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> -   rtnl_lock();
> -   phylink_start(priv->phylink);
> -   /* We may have called phylink_speed_down before */
> -   phylink_speed_up(priv->phylink);
> -   rtnl_unlock();
> -   }
> -
> rtnl_lock();
> mutex_lock(&priv->lock);
> 
> --
> 2.17.1
>

Re: [PATCH v5 bpf-next 02/14] xdp: initialize xdp_buff mb bit to 0 in all XDP drivers

2020-12-08 Thread Lorenzo Bianconi

> On Mon, 2020-12-07 at 22:37 +0100, Maciej Fijalkowski wrote:
> > On Mon, Dec 07, 2020 at 01:15:00PM -0800, Alexander Duyck wrote:
> > > On Mon, Dec 7, 2020 at 8:36 AM Lorenzo Bianconi  > > > wrote:
> > > > Initialize multi-buffer bit (mb) to 0 in all XDP-capable drivers.
> > > > This is a preliminary patch to enable xdp multi-buffer support.
> > > > 
> > > > Signed-off-by: Lorenzo Bianconi 
> > > 
> > > I'm really not a fan of this design. Having to update every driver
> > > in
> > > order to initialize a field that was fragmented is a pain. At a
> > > minimum it seems like it might be time to consider introducing some
> > > sort of initializer function for this so that you can update things
> > > in
> > > one central place the next time you have to add a new field instead
> > > of
> > > having to update every individual driver that supports XDP.
> > > Otherwise
> > > this isn't going to scale going forward.
> > 
> > Also, a good example of why this might be bothering for us is a fact
> > that
> > in the meantime the dpaa driver got XDP support and this patch hasn't
> > been
> > updated to include mb setting in that driver.
> > 
> something like
> init_xdp_buff(hard_start, headroom, len, frame_sz, rxq);
> 
> would work for most of the drivers.
> 

ack, agree. I will add init_xdp_buff() in v6.

Regards,
Lorenzo


signature.asc
Description: PGP signature

Re: [PATCHv3 bpf-next] samples/bpf: add xdp program on egress for xdp_redirect_map

2020-12-08 Thread Jesper Dangaard Brouer

On Tue,  8 Dec 2020 16:18:56 +0800
Hangbin Liu  wrote:

> This patch add a xdp program on egress to show that we can modify
> the packet on egress. In this sample we will set the pkt's src
> mac to egress's mac address. The xdp_prog will be attached when
> -X option supplied.
> 
> Signed-off-by: Hangbin Liu 
> ---
> v3:
> a) modify the src mac address based on egress mac
> 
> v2:
> a) use pkt counter instead of IP ttl modification on egress program
> b) make the egress program selectable by option -X
> ---
>  samples/bpf/xdp_redirect_map_kern.c |  60 ++-
>  samples/bpf/xdp_redirect_map_user.c | 153 
>  2 files changed, 168 insertions(+), 45 deletions(-)
> 

[...]
> diff --git a/samples/bpf/xdp_redirect_map_user.c 
> b/samples/bpf/xdp_redirect_map_user.c
> index 31131b6e7782..19636045c8dc 100644
> --- a/samples/bpf/xdp_redirect_map_user.c
> +++ b/samples/bpf/xdp_redirect_map_user.c
> @@ -14,6 +14,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  
>  #include "bpf_util.h"
>  #include 
> @@ -21,7 +25,8 @@
>  
>  static int ifindex_in;
>  static int ifindex_out;
> -static bool ifindex_out_xdp_dummy_attached = true;
> +static bool ifindex_out_xdp_dummy_attached = false;
> +static bool xdp_devmap_attached = false;
>  static __u32 prog_id;
>  static __u32 dummy_prog_id;
>  
> @@ -83,6 +88,29 @@ static void poll_stats(int interval, int ifindex)
>   }
>  }
>  
> +static int get_mac_addr(unsigned int ifindex_out, void *mac_addr)
> +{
> + struct ifreq ifr;
> + char ifname[IF_NAMESIZE];
> + int fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);

I would have expected (like ethtool):
 fd = socket(AF_INET, SOCK_DGRAM, 0);

> + if (fd < 0)
> + return -1;
> +
> + if (!if_indextoname(ifindex_out, ifname))
> + return -1;
> +
> + strcpy(ifr.ifr_name, ifname);
> +
> + if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0)
> + return -1;
> +
> + memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char));
> + close(fd);
> +
> + return 0;
> +}
[...]

> - /* Loading dummy XDP prog on out-device */
> - if (bpf_set_link_xdp_fd(ifindex_out, dummy_prog_fd,
> - (xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST)) < 0) {
> - printf("WARN: link set xdp fd failed on %d\n", ifindex_out);
> - ifindex_out_xdp_dummy_attached = false;
> - }
> + /* If -X supplied, load 2nd xdp prog on egress.
> +  * If not, just load dummy prog on egress.
> +  */

The dummy prog need to be loaded, regardless of 2nd xdp prog on egress.


> + if (xdp_devmap_attached) {
> + unsigned char mac_addr[6];
>  
> - memset(&info, 0, sizeof(info));
> - ret = bpf_obj_get_info_by_fd(dummy_prog_fd, &info, &info_len);
> - if (ret) {
> - printf("can't get prog info - %s\n", strerror(errno));
> - return ret;
> + devmap_prog = bpf_object__find_program_by_title(obj, 
> "xdp_devmap/map_prog");
> + if (!devmap_prog) {
> + printf("finding devmap_prog in obj file failed\n");
> + goto out;
> + }
> + devmap_prog_fd = bpf_program__fd(devmap_prog);
> + if (devmap_prog_fd < 0) {
> + printf("finding devmap_prog fd failed\n");
> + goto out;
> + }
> +
> + if (get_mac_addr(ifindex_out, mac_addr) < 0) {
> + printf("get interface %d mac failed\n", ifindex_out);
> + goto out;
> + }
> +
> + ret = bpf_map_update_elem(tx_mac_map_fd, &key, mac_addr, 0);
> + if (ret) {
> + perror("bpf_update_elem tx_mac_map_fd");
> + goto out;
> + }
> + } else if (ifindex_in != ifindex_out) {
> + dummy_prog = bpf_object__find_program_by_title(obj, 
> "xdp_redirect_dummy");
> + if (!dummy_prog) {
> + printf("finding dummy_prog in obj file failed\n");
> + goto out;
> + }
> +
> + dummy_prog_fd = bpf_program__fd(dummy_prog);
> + if (dummy_prog_fd < 0) {
> + printf("find dummy_prog fd failed\n");
> + goto out;
> + }
> +
> + if (bpf_set_link_xdp_fd(ifindex_out, dummy_prog_fd,
> + (xdp_flags | 
> XDP_FLAGS_UPDATE_IF_NOEXIST)) == 0) {
> + ifindex_out_xdp_dummy_attached = true;
> + } else {
> + printf("WARN: link set xdp fd failed on %d\n", 
> ifindex_out);
> + }
> +
> + memset(&info, 0, sizeof(info));
> + ret = bpf_obj_get_info_by_fd(dummy_prog_fd, &info, &info_len);
> + if (ret) {
> + printf("can't get prog info - %s\n", strerror(errno));
> + }
> +

RE: [PATCH RFC] ethernet: stmmac: clean up the code for release/suspend/resume function

2020-12-08 Thread Joakim Zhang


> -Original Message-
> From: Jisheng Zhang 
> Sent: 2020年12月8日 18:24
> To: Joakim Zhang 
> Cc: peppe.cavall...@st.com; alexandre.tor...@st.com;
> joab...@synopsys.com; da...@davemloft.net; k...@kernel.org;
> netdev@vger.kernel.org; dl-linux-imx 
> Subject: Re: [PATCH RFC] ethernet: stmmac: clean up the code for
> release/suspend/resume function
> 
> On Mon,  7 Dec 2020 19:38:49 +0800 Joakim Zhang wrote:
> 
> 
> >
> > commit 1c35cc9cf6a0 ("net: stmmac: remove redundant null check before
> > clk_disable_unprepare()"), have not clean up check NULL clock parameter
> completely, this patch did it.
> >
> > commit e8377e7a29efb ("net: stmmac: only call pmt() during
> > suspend/resume if HW enables PMT"), after this patch, we use if
> > (device_may_wakeup(priv->device) && priv->plat->pmt) check MAC wakeup
> > if (device_may_wakeup(priv->device)) check PHY wakeup Add oneline
> > comment for readability.
> >
> > commit 77b2898394e3b ("net: stmmac: Speed down the PHY if WoL to save
> > energy"), slow down phy speed when release net device under any condition.
> >
> > Slightly adjust the order of the codes so that suspend/resume look
> > more symmetrical, generally speaking they should appear symmetrically.
> >
> > Signed-off-by: Joakim Zhang 
> > ---
> >  .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22
> > +--
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index c33db79cdd0a..a46e865c4acc 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -2908,8 +2908,7 @@ static int stmmac_release(struct net_device *dev)
> > struct stmmac_priv *priv = netdev_priv(dev);
> > u32 chan;
> >
> > -   if (device_may_wakeup(priv->device))
> 
> This check is to prevent link speed down if the stmmac isn't a wakeup device.

When we invoke .ndo_stop, we down the net device. Per my understanding, we can 
speed down the phy, no matter it is a wakeup device or not.
Since when invoke .ndo_open to up the net devce, we will re-config mac and phy. 
Please point out to me if I mis-understand something. Thanks.

> > -   phylink_speed_down(priv->phylink, false);
> > +   phylink_speed_down(priv->phylink, false);
> > /* Stop and disconnect the PHY */
> > phylink_stop(priv->phylink);
> > phylink_disconnect_phy(priv->phylink);
> > @@ -5183,6 +5182,7 @@ int stmmac_suspend(struct device *dev)
> > } else {
> > mutex_unlock(&priv->lock);
> > rtnl_lock();
> > +   /* For PHY wakeup case */
> > if (device_may_wakeup(priv->device))
> > phylink_speed_down(priv->phylink, false);
> > phylink_stop(priv->phylink); @@ -5260,11 +5260,17 @@
> > int stmmac_resume(struct device *dev)
> > /* enable the clk previously disabled */
> > clk_prepare_enable(priv->plat->stmmac_clk);
> > clk_prepare_enable(priv->plat->pclk);
> > -   if (priv->plat->clk_ptp_ref)
> > -   clk_prepare_enable(priv->plat->clk_ptp_ref);
> > +   clk_prepare_enable(priv->plat->clk_ptp_ref);
> 
> I think this 3 line modifications can be a separated patch.

Yes, this just a RFC to export issue.

> > /* reset the phy so that it's ready */
> > if (priv->mii)
> > stmmac_mdio_reset(priv->mii);
> > +
> > +   rtnl_lock();
> > +   phylink_start(priv->phylink);
> > +   /* We may have called phylink_speed_down before */
> > +   if (device_may_wakeup(priv->device))
> > +   phylink_speed_up(priv->phylink);
> > +   rtnl_unlock();
> 
> This is moving phylink op before mac setup, I'm not sure whether this is safe.

We encounter an issue, need move phylink before mac setup, please see below 
patch.
https://www.spinics.net/lists/netdev/msg706458.html

Have not found problems after test. Is there ang risk?

Best Regards,
Joakim Zhang
> > }
> >
> > if (priv->plat->serdes_powerup) { @@ -5275,14 +5281,6 @@ int
> > stmmac_resume(struct device *dev)
> > return ret;
> > }
> >
> > -   if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> > -   rtnl_lock();
> > -   phylink_start(priv->phylink);
> > -   /* We may have called phylink_speed_down before */
> > -   phylink_speed_up(priv->phylink);
> > -   rtnl_unlock();
> > -   }
> > -
> > rtnl_lock();
> > mutex_lock(&priv->lock);
> >
> > --
> > 2.17.1
> >

RE: [PATCH v4 2/6] igb: take vlan double header into account

2020-12-08 Thread Penigalapati, Sandeep

On Tue, Dec 01, 2020 at 09:58:52AM +0100, Jesper Dangaard Brouer wrote:
> > On Tue, 1 Dec 2020 08:23:23 +
> > "Penigalapati, Sandeep"  wrote:
> >
> > > Tested-by: Sandeep Penigalapati 
> >
> > Very happy that you are testing this.
> >
> > Have you also tested that samples/bpf/ xdp_redirect_cpu program works?
> 
> Hi Jesper,
> 
> I have tested the xdp routing example but it would be good if someone can
> double check this.
> 
> Best
> Sven
> 
Hi Jesper, Sven

I have tested xdp_redirect_cpu and it is working.

Thanks,
Sandeep
> >
> > --
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   LinkedIn:
> >
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.l
> >
> inkedin.com%2Fin%2Fbrouer&data=04%7C01%7Csven.auhagen%40vol
> eatech.
> >
> de%7C5a78333f75c945b9bcee08d895d75e5b%7Cb82a99f679814a7295344d3
> 5298f84
> >
> 7b%7C0%7C0%7C637424099531073949%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLj
> >
> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&
> sdata=
> > g80690tGbCHAi3lr412ZlKoxwIFSIzn5e8V8nO1aZcw%3D&reserved=0
> >

Re: [PATCH v5 bpf-next 03/14] xdp: add xdp_shared_info data structure

2020-12-08 Thread Lorenzo Bianconi

> On Mon, 2020-12-07 at 17:32 +0100, Lorenzo Bianconi wrote:
> > Introduce xdp_shared_info data structure to contain info about
> > "non-linear" xdp frame. xdp_shared_info will alias skb_shared_info
> > allowing to keep most of the frags in the same cache-line.
> > Introduce some xdp_shared_info helpers aligned to skb_frag* ones
> > 
> 
> is there or will be a more general purpose use to this xdp_shared_info
> ? other than hosting frags ?

I do not have other use-cases at the moment other than multi-buff but in
theory it is possible I guess.
The reason we introduced it is to have most of the frags in the first
shared_info cache-line to avoid cache-misses.

> 
> > Signed-off-by: Lorenzo Bianconi 
> > ---
> >  drivers/net/ethernet/marvell/mvneta.c | 62 +++
> > 
> >  include/net/xdp.h | 52 --
> >  2 files changed, 82 insertions(+), 32 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c
> > b/drivers/net/ethernet/marvell/mvneta.c
> > index 1e5b5c69685a..d635463609ad 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -2033,14 +2033,17 @@ int mvneta_rx_refill_queue(struct mvneta_port
> > *pp, struct mvneta_rx_queue *rxq)
> >  
> 
> [...]
> 
> >  static void
> > @@ -2278,7 +2281,7 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port
> > *pp,
> > struct mvneta_rx_desc *rx_desc,
> > struct mvneta_rx_queue *rxq,
> > struct xdp_buff *xdp, int *size,
> > -   struct skb_shared_info *xdp_sinfo,
> > +   struct xdp_shared_info *xdp_sinfo,
> > struct page *page)
> >  {
> > struct net_device *dev = pp->dev;
> > @@ -2301,13 +2304,13 @@ mvneta_swbm_add_rx_fragment(struct
> > mvneta_port *pp,
> > if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
> > skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo-
> > >nr_frags++];
> >  
> > -   skb_frag_off_set(frag, pp->rx_offset_correction);
> > -   skb_frag_size_set(frag, data_len);
> > -   __skb_frag_set_page(frag, page);
> > +   xdp_set_frag_offset(frag, pp->rx_offset_correction);
> > +   xdp_set_frag_size(frag, data_len);
> > +   xdp_set_frag_page(frag, page);
> >  
> 
> why three separate setters ? why not just one 
> xdp_set_frag(page, offset, size) ?

to be aligned with skb_frags helpers, but I guess we can have a single helper,
I do not have a strong opinion on it

> 
> > /* last fragment */
> > if (len == *size) {
> > -   struct skb_shared_info *sinfo;
> > +   struct xdp_shared_info *sinfo;
> >  
> > sinfo = xdp_get_shared_info_from_buff(xdp);
> > sinfo->nr_frags = xdp_sinfo->nr_frags;
> > @@ -2324,10 +2327,13 @@ static struct sk_buff *
> >  mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue
> > *rxq,
> >   struct xdp_buff *xdp, u32 desc_status)
> >  {

[...]

> >  
> > -static inline struct skb_shared_info *
> > +struct xdp_shared_info {
> 
> xdp_shared_info is a bad name, we need this to have a specific purpose 
> xdp_frags should the proper name, so people will think twice before
> adding weird bits to this so called shared_info.

I named the struct xdp_shared_info to recall skb_shared_info but I guess
xdp_frags is fine too. Agree?

> 
> > +   u16 nr_frags;
> > +   u16 data_length; /* paged area length */
> > +   skb_frag_t frags[MAX_SKB_FRAGS];
> 
> why MAX_SKB_FRAGS ? just use a flexible array member 
> skb_frag_t frags[]; 
> 
> and enforce size via the n_frags and on the construction of the
> tailroom preserved buffer, which is already being done.
> 
> this is waste of unnecessary space, at lease by definition of the
> struct, in your use case you do:
> memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * num_frags);
> And the tailroom space was already preserved for a full skb_shinfo.
> so i don't see why you need this array to be of a fixed MAX_SKB_FRAGS
> size.

In order to avoid cache-misses, xdp_shared info is built as a variable
on mvneta_rx_swbm() stack and it is written to "shared_info" area only on the
last fragment in mvneta_swbm_add_rx_fragment(). I used MAX_SKB_FRAGS to be
aligned with skb_shared_info struct but probably we can use even a smaller 
value.
Another approach would be to define two different struct, e.g.

stuct xdp_frag_metadata {
u16 nr_frags;
u16 data_length; /* paged area length */
};

struct xdp_frags {
skb_frag_t frags[MAX_SKB_FRAGS];
};

and then define xdp_shared_info as

struct xdp_shared_info {
stuct xdp_frag_metadata meta;
skb_frag_t frags[];
};

In this way we can probably optimize the space. What do you think?

> 
> > +};
> > +
> > +static inline struct xdp_shared_info *
> >  xdp_get_shared_info_from_buff(struct xdp_buff *x

Re: [PATCH 01/17] wil6210: wmi: Correct misnamed function parameter 'ptr_'

2020-12-08 Thread Lee Jones

On Wed, 02 Dec 2020, Kalle Valo wrote:

> Lee Jones  wrote:
> 
> > Fixes the following W=1 kernel build warning(s):
> > 
> >  drivers/net/wireless/ath/wil6210/wmi.c:279: warning: Function parameter or 
> > member 'ptr_' not described in 'wmi_buffer_block'
> >  drivers/net/wireless/ath/wil6210/wmi.c:279: warning: Excess function 
> > parameter 'ptr' description in 'wmi_buffer_block'
> > 
> > Cc: Maya Erez 
> > Cc: Kalle Valo 
> > Cc: "David S. Miller" 
> > Cc: Jakub Kicinski 
> > Cc: linux-wirel...@vger.kernel.org
> > Cc: wil6...@qti.qualcomm.com
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Lee Jones 
> 
> Failed to apply:
> 
> error: patch failed: drivers/net/wireless/ath/wil6210/wmi.c:262
> error: drivers/net/wireless/ath/wil6210/wmi.c: patch does not apply
> stg import: Diff does not apply cleanly
> 
> Patch set to Changes Requested.

That's so strange.

I just rebased my branch onto the latest -next with no issue.

I will re-submit after the merge-window closes.

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

Re: [PATCHv3 bpf-next] samples/bpf: add xdp program on egress for xdp_redirect_map

2020-12-08 Thread Hangbin Liu

On Tue, Dec 08, 2020 at 11:39:14AM +0100, Jesper Dangaard Brouer wrote:
> > +   /* If -X supplied, load 2nd xdp prog on egress.
> > +* If not, just load dummy prog on egress.
> > +*/
> 
> The dummy prog need to be loaded, regardless of 2nd xdp prog on egress.

Thanks for this remind, Now I know why the pkts are dropped with I do perf
test on physical NICs.

Regards
Hangbin

Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support

2020-12-08 Thread Vladimir Oltean

Hi Tobias,

On Wed, Dec 02, 2020 at 10:13:54AM +0100, Tobias Waldekranz wrote:
> Monitor the following events and notify the driver when:
>
> - A DSA port joins/leaves a LAG.
> - A LAG, made up of DSA ports, joins/leaves a bridge.
> - A DSA port in a LAG is enabled/disabled (enabled meaning
>   "distributing" in 802.3ad LACP terms).
>
> Each LAG interface to which a DSA port is attached is represented by a
> `struct dsa_lag` which is globally reachable from the switch tree and
> from each associated port.
>
> When a LAG joins a bridge, the DSA subsystem will treat that as each
> individual port joining the bridge. The driver may look at the port's
> LAG pointer to see if it is associated with any LAG, if that is
> required. This is analogue to how switchdev events are replicated out
> to all lower devices when reaching e.g. a LAG.
>
> Signed-off-by: Tobias Waldekranz 
> ---
>
> +struct dsa_lag {
> + struct net_device *dev;
> + int id;
> +
> + struct list_head ports;
> +
> + /* For multichip systems, we must ensure that each hash bucket
> +  * is only enabled on a single egress port throughout the
> +  * whole tree, lest we send duplicates. Therefore we must
> +  * maintain a global list of active tx ports, so that each
> +  * switch can figure out which buckets to enable on which
> +  * ports.
> +  */
> + struct list_head tx_ports;
> + int num_tx;
> +
> + refcount_t refcount;
> +};

Sorry it took so long. I wanted to understand:
(a) where are the challenged for drivers to uniformly support software
bridging when they already have code for bridge offloading. I found
the following issues:
- We have taggers that unconditionally set skb->offload_fwd_mark = 1,
  which kind of prevents software bridging. I'm not sure what the
  fix for these should be.
- Source address is a big problem, but this time not in the sense
  that it traditionally has been. Specifically, due to address
  learning being enabled, the hardware FDB will set destinations to
  take the autonomous fast path. But surprise, the autonomous fast
  path is blocked, because as far as the switch is concerned, the
  ports are standalone and not offloading the bridge. We have drivers
  that don't disable address learning when they operate in standalone
  mode, which is something they definitely should do.
There is nothing actionable for you in this patch set to resolve this.
I just wanted to get an idea.
(b) Whether struct dsa_lag really brings us any significant benefit. I
found that it doesn't. It's a lot of code added to the DSA core, that
should not really belong in the middle layer. I need to go back and
quote your motivation in the RFC:

| All LAG configuration is cached in `struct dsa_lag`s. I realize that
| the standard M.O. of DSA is to read back information from hardware
| when required. With LAGs this becomes very tricky though. For example,
| the change of a link state on one switch will require re-balancing of
| LAG hash buckets on another one, which in turn depends on the total
| number of active links in the LAG. Do you agree that this is
| motivated?

After reimplementing bonding offload in ocelot, I have found
struct dsa_lag to not provide any benefit. All the information a
driver needs is already provided through the
struct net_device *lag_dev argument given to lag_join and lag_leave,
and through the struct netdev_lag_lower_state_info *info given to
lag_change. I will send an RFC to you and the list shortly to prove
that this information is absolutely sufficient for the driver to do
decent internal bookkeeping, and that DSA should not really care
beyond that.

There are two points to be made:
- Recently we have seen people with non-DSA (pure switchdev) hardware
  being compelled to write DSA drivers, because they noticed that a
  large part of the middle layer had already been written, and it
  presents an API with a lot of syntactic sugar. Maybe there is a
  larger issue here in that the switchdev offloading APIs are fairly
  bulky and repetitive, but that does not mean that we should be
  encouraging the attitude "come to DSA, we have cookies".
  https://lwn.net/ml/linux-kernel/20201125232459.378-1-lu...@denx.de/
- Remember that the only reason why the DSA framework and the
  syntactic sugar exists is that we are presenting the hardware a
  unified view for the ports which have a struct net_device registered,
  and the ports which don't (DSA links and CPU ports). The argument
  really needs to be broken down into two:
  - For cross-chip DSA links, I can see why it was convenient for
you to have the dsa_lag_by_dev(ds->dst, lag_dev) helper. But
just as we currently have a struct net_device *bridge_dev in
struct dsa_port, so we could have a struct net_device *bond,
without the extra fat

[PATCH 1/1] mwifiex: Fix possible buffer overflows in mwifiex_uap_bss_param_prepare

2020-12-08 Thread Xiaohui Zhang

From: Zhang Xiaohui 

mwifiex_uap_bss_param_prepare() calls memcpy() without checking
the destination size may trigger a buffer overflower,
which a local user could use to cause denial of service or the
execution of arbitrary code.
Fix it by putting the length check before calling memcpy().

Signed-off-by: Zhang Xiaohui 
---
 drivers/net/wireless/marvell/mwifiex/uap_cmd.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/uap_cmd.c 
b/drivers/net/wireless/marvell/mwifiex/uap_cmd.c
index b48a85d79..fb937c7ee 100644
--- a/drivers/net/wireless/marvell/mwifiex/uap_cmd.c
+++ b/drivers/net/wireless/marvell/mwifiex/uap_cmd.c
@@ -496,13 +496,16 @@ mwifiex_uap_bss_param_prepare(u8 *tlv, void *cmd_buf, u16 
*param_size)
struct mwifiex_ie_types_wmmcap *wmm_cap;
struct mwifiex_uap_bss_param *bss_cfg = cmd_buf;
int i;
+   int ssid_size;
u16 cmd_size = *param_size;
 
if (bss_cfg->ssid.ssid_len) {
ssid = (struct host_cmd_tlv_ssid *)tlv;
ssid->header.type = cpu_to_le16(TLV_TYPE_UAP_SSID);
ssid->header.len = cpu_to_le16((u16)bss_cfg->ssid.ssid_len);
-   memcpy(ssid->ssid, bss_cfg->ssid.ssid, bss_cfg->ssid.ssid_len);
+   ssid_size = bss_cfg->ssid.ssid_len > strlen(ssid->ssid) ?
+   strlen(ssid->ssid) : bss_cfg->ssid.ssid_len;
+   memcpy(ssid->ssid, bss_cfg->ssid.ssid, ssid_size);
cmd_size += sizeof(struct mwifiex_ie_types_header) +
bss_cfg->ssid.ssid_len;
tlv += sizeof(struct mwifiex_ie_types_header) +
-- 
2.17.1

Re: [PATCH v4 2/6] igb: take vlan double header into account

2020-12-08 Thread Jesper Dangaard Brouer

On Tue, 8 Dec 2020 10:52:28 +
"Penigalapati, Sandeep"  wrote:

> On Tue, Dec 01, 2020 at 09:58:52AM +0100, Jesper Dangaard Brouer wrote:
> > > On Tue, 1 Dec 2020 08:23:23 +
> > > "Penigalapati, Sandeep"  wrote:
> > >  
> > > > Tested-by: Sandeep Penigalapati   
> > >
> > > Very happy that you are testing this.
> > >
> > > Have you also tested that samples/bpf/ xdp_redirect_cpu program works?  
> > 
> > Hi Jesper,
> > 
> > I have tested the xdp routing example but it would be good if someone can
> > double check this.
> > 
> Hi Jesper, Sven
> 
> I have tested xdp_redirect_cpu and it is working.

Thanks this is great to hear.

You have tested with large frames right?  As cpumap just creates SKBs
based on xdp_frame, and send them to the normal network stack (on
remote CPU), you can just to a standard TCP-stream throughput test with
iperf or netperf.  That should hopefully blowup if we screwed up the
boundaries of the two packets sharing the same page.  (In principle we
should verify the content of the TCP transfer, so maybe a scp + md5sum
is a better test).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[PATCH] [v11] wireless: Initial driver submission for pureLiFi STA devices

2020-12-08 Thread Srinivasan Raju

This introduces the pureLiFi LiFi driver for LiFi-X, LiFi-XC
and LiFi-XL USB devices.

This driver implementation has been based on the zd1211rw driver.

Driver is based on 802.11 softMAC Architecture and uses
native 802.11 for configuration and management.

The driver is compiled and tested in ARM, x86 architectures and
compiled in powerpc architecture.

Signed-off-by: Srinivasan Raju 

---
v11, v10:
- Addressed review comment on readability
- Changed firmware names to match products and latest firmware
v9:
- Addressed review comments on style and content defects
- Used kmemdup instead of alloc and memcpy
v7 , v8:
- Magic numbers removed and used IEEE80211 macors
- usb.c is split into two files firmware.c and dbgfs.c
- Other code style and timer function fixes (mod_timer)
v6:
- Code style fix patch from Joe Perches
v5:
- Code refactoring for clarity and redundnacy removal
- Fix warnings from kernel test robot
v4:
- Code refactoring based on kernel code guidelines
- Remove multi level macors and use kernel debug macros
v3:
- Code style fixes kconfig fix
v2:
- Driver submitted to wireless-next
- Code style fixes and copyright statement fix
v1:
- Driver submitted to staging
---
 MAINTAINERS  |5 +
 drivers/net/wireless/Kconfig |1 +
 drivers/net/wireless/Makefile|1 +
 drivers/net/wireless/purelifi/Kconfig|   27 +
 drivers/net/wireless/purelifi/Makefile   |3 +
 drivers/net/wireless/purelifi/chip.c |   93 ++
 drivers/net/wireless/purelifi/chip.h |   81 ++
 drivers/net/wireless/purelifi/dbgfs.c|  150 +++
 drivers/net/wireless/purelifi/firmware.c |  384 
 drivers/net/wireless/purelifi/intf.h |   38 +
 drivers/net/wireless/purelifi/mac.c  |  873 ++
 drivers/net/wireless/purelifi/mac.h  |  189 
 drivers/net/wireless/purelifi/usb.c  | 1075 ++
 drivers/net/wireless/purelifi/usb.h  |  199 
 14 files changed, 3119 insertions(+)
 create mode 100644 drivers/net/wireless/purelifi/Kconfig
 create mode 100644 drivers/net/wireless/purelifi/Makefile
 create mode 100644 drivers/net/wireless/purelifi/chip.c
 create mode 100644 drivers/net/wireless/purelifi/chip.h
 create mode 100644 drivers/net/wireless/purelifi/dbgfs.c
 create mode 100644 drivers/net/wireless/purelifi/firmware.c
 create mode 100644 drivers/net/wireless/purelifi/intf.h
 create mode 100644 drivers/net/wireless/purelifi/mac.c
 create mode 100644 drivers/net/wireless/purelifi/mac.h
 create mode 100644 drivers/net/wireless/purelifi/usb.c
 create mode 100644 drivers/net/wireless/purelifi/usb.h

diff --git a/MAINTAINERS b/MAINTAINERS
index c80f87d7258c..17955b8497df 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14108,6 +14108,11 @@ T: git git://linuxtv.org/media_tree.git
 F: Documentation/admin-guide/media/pulse8-cec.rst
 F: drivers/media/cec/usb/pulse8/
 
+PUREILIFI USB DRIVER
+M: Srinivasan Raju 
+S: Supported
+F: drivers/net/wireless/purelifi
+
 PVRUSB2 VIDEO4LINUX DRIVER
 M: Mike Isely 
 L: pvru...@isely.net   (subscribers-only)
diff --git a/drivers/net/wireless/Kconfig b/drivers/net/wireless/Kconfig
index 170a64e67709..b87da3139f94 100644
--- a/drivers/net/wireless/Kconfig
+++ b/drivers/net/wireless/Kconfig
@@ -48,6 +48,7 @@ source "drivers/net/wireless/st/Kconfig"
 source "drivers/net/wireless/ti/Kconfig"
 source "drivers/net/wireless/zydas/Kconfig"
 source "drivers/net/wireless/quantenna/Kconfig"
+source "drivers/net/wireless/purelifi/Kconfig"
 
 config PCMCIA_RAYCS
tristate "Aviator/Raytheon 2.4GHz wireless support"
diff --git a/drivers/net/wireless/Makefile b/drivers/net/wireless/Makefile
index 80b324499786..e9fc770026f0 100644
--- a/drivers/net/wireless/Makefile
+++ b/drivers/net/wireless/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_WLAN_VENDOR_ST) += st/
 obj-$(CONFIG_WLAN_VENDOR_TI) += ti/
 obj-$(CONFIG_WLAN_VENDOR_ZYDAS) += zydas/
 obj-$(CONFIG_WLAN_VENDOR_QUANTENNA) += quantenna/
+obj-$(CONFIG_WLAN_VENDOR_PURELIFI) += purelifi/
 
 # 16-bit wireless PCMCIA client drivers
 obj-$(CONFIG_PCMCIA_RAYCS) += ray_cs.o
diff --git a/drivers/net/wireless/purelifi/Kconfig 
b/drivers/net/wireless/purelifi/Kconfig
new file mode 100644
index ..f6630791df9d
--- /dev/null
+++ b/drivers/net/wireless/purelifi/Kconfig
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: GPL-2.0
+config WLAN_VENDOR_PURELIFI
+   bool "pureLiFi devices"
+   default y
+   help
+ If you have a pureLiFi device, say Y.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all the
+ questions about these cards. If you say Y, you will be asked for
+ your specific card in the following questions.
+
+if WLAN_VENDOR_PURELIFI
+
+config PURELIFI
+
+   tristate "pureLiFi device support"
+   depends on CFG80211 && MAC80211 && USB
+   help
+  This driver makes th

Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set

2020-12-08 Thread Toke Høiland-Jørgensen

Jesper Dangaard Brouer  writes:

> On Mon, 7 Dec 2020 18:01:00 -0700
> David Ahern  wrote:
>
>> On 12/7/20 1:52 PM, John Fastabend wrote:
>> >>
>> >> I think we need to keep XDP_TX action separate, because I think that
>> >> there are use-cases where the we want to disable XDP_TX due to end-user
>> >> policy or hardware limitations.  
>> > 
>> > How about we discover this at load time though. 
>
> Nitpick at XDP "attach" time. The general disconnect between BPF and
> XDP is that BPF can verify at "load" time (as kernel knows what it
> support) while XDP can have different support/features per driver, and
> cannot do this until attachment time. (See later issue with tail calls).
> (All other BPF-hooks don't have this issue)
>
>> > Meaning if the program
>> > doesn't use XDP_TX then the hardware can skip resource allocations for
>> > it. I think we could have verifier or extra pass discover the use of
>> > XDP_TX and then pass a bit down to driver to enable/disable TX caps.
>> >   
>> 
>> This was discussed in the context of virtio_net some months back - it is
>> hard to impossible to know a program will not return XDP_TX (e.g., value
>> comes from a map).
>
> It is hard, and sometimes not possible.  For maps the workaround is
> that BPF-programmer adds a bound check on values from the map. If not
> doing that the verifier have to assume all possible return codes are
> used by BPF-prog.
>
> The real nemesis is program tail calls, that can be added dynamically
> after the XDP program is attached.  It is at attachment time that
> changing the NIC resources is possible.  So, for program tail calls the
> verifier have to assume all possible return codes are used by BPF-prog.

We actually had someone working on a scheme for how to express this for
programs some months ago, but unfortunately that stalled out (Jesper
already knows this, but FYI to the rest of you). In any case, I view
this as a "next step". Just exposing the feature bits to userspace will
help users today, and as a side effect, this also makes drivers declare
what they support, which we can then incorporate into the core code to,
e.g., reject attachment of programs that won't work anyway. But let's
do this in increments and not make the perfect the enemy of the good
here.

> BPF now have function calls and function replace right(?)  How does
> this affect this detection of possible return codes?

It does have the same issue as tail calls, in that the return code of
the function being replaced can obviously change. However, the verifier
knows the target of a replace, so it can propagate any constraints put
upon the caller if we implement it that way.

-Toke

[PATCHv4 bpf-next] samples/bpf: add xdp program on egress for xdp_redirect_map

2020-12-08 Thread Hangbin Liu

This patch add a xdp program on egress to show that we can modify
the packet on egress. In this sample we will set the pkt's src
mac to egress's mac address. The xdp_prog will be attached when
-X option supplied.

Signed-off-by: Hangbin Liu 
---
v4:
a) Update get_mac_addr socket creation
b) Load dummy prog regardless of 2nd xdp prog on egress

v3:
a) modify the src mac address based on egress mac

v2:
a) use pkt counter instead of IP ttl modification on egress program
b) make the egress program selectable by option -X
---
 samples/bpf/xdp_redirect_map_kern.c |  60 ++--
 samples/bpf/xdp_redirect_map_user.c | 104 +++-
 2 files changed, 140 insertions(+), 24 deletions(-)

diff --git a/samples/bpf/xdp_redirect_map_kern.c 
b/samples/bpf/xdp_redirect_map_kern.c
index 6489352ab7a4..6b2164722649 100644
--- a/samples/bpf/xdp_redirect_map_kern.c
+++ b/samples/bpf/xdp_redirect_map_kern.c
@@ -19,12 +19,22 @@
 #include 
 #include 
 
+/* The 2nd xdp prog on egress does not support skb mode, so we define two
+ * maps, tx_port_general and tx_port_native.
+ */
 struct {
__uint(type, BPF_MAP_TYPE_DEVMAP);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
__uint(max_entries, 100);
-} tx_port SEC(".maps");
+} tx_port_general SEC(".maps");
+
+struct {
+   __uint(type, BPF_MAP_TYPE_DEVMAP);
+   __uint(key_size, sizeof(int));
+   __uint(value_size, sizeof(struct bpf_devmap_val));
+   __uint(max_entries, 100);
+} tx_port_native SEC(".maps");
 
 /* Count RX packets, as XDP bpf_prog doesn't get direct TX-success
  * feedback.  Redirect TX errors can be caught via a tracepoint.
@@ -36,6 +46,14 @@ struct {
__uint(max_entries, 1);
 } rxcnt SEC(".maps");
 
+/* map to stroe egress interface mac address */
+struct {
+   __uint(type, BPF_MAP_TYPE_ARRAY);
+   __type(key, u32);
+   __type(value, __be64);
+   __uint(max_entries, 1);
+} tx_mac SEC(".maps");
+
 static void swap_src_dst_mac(void *data)
 {
unsigned short *p = data;
@@ -52,17 +70,16 @@ static void swap_src_dst_mac(void *data)
p[5] = dst[2];
 }
 
-SEC("xdp_redirect_map")
-int xdp_redirect_map_prog(struct xdp_md *ctx)
+static int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map)
 {
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
struct ethhdr *eth = data;
int rc = XDP_DROP;
-   int vport, port = 0, m = 0;
long *value;
u32 key = 0;
u64 nh_off;
+   int vport;
 
nh_off = sizeof(*eth);
if (data + nh_off > data_end)
@@ -79,7 +96,40 @@ int xdp_redirect_map_prog(struct xdp_md *ctx)
swap_src_dst_mac(data);
 
/* send packet out physical port */
-   return bpf_redirect_map(&tx_port, vport, 0);
+   return bpf_redirect_map(redirect_map, vport, 0);
+}
+
+SEC("xdp_redirect_general")
+int xdp_redirect_map_general(struct xdp_md *ctx)
+{
+   return xdp_redirect_map(ctx, &tx_port_general);
+}
+
+SEC("xdp_redirect_native")
+int xdp_redirect_map_native(struct xdp_md *ctx)
+{
+   return xdp_redirect_map(ctx, &tx_port_native);
+}
+
+SEC("xdp_devmap/map_prog")
+int xdp_redirect_map_egress(struct xdp_md *ctx)
+{
+   void *data_end = (void *)(long)ctx->data_end;
+   void *data = (void *)(long)ctx->data;
+   struct ethhdr *eth = data;
+   __be64 *mac;
+   u32 key = 0;
+   u64 nh_off;
+
+   nh_off = sizeof(*eth);
+   if (data + nh_off > data_end)
+   return XDP_DROP;
+
+   mac = bpf_map_lookup_elem(&tx_mac, &key);
+   if (mac)
+   __builtin_memcpy(eth->h_source, mac, ETH_ALEN);
+
+   return XDP_PASS;
 }
 
 /* Redirect require an XDP bpf_prog loaded on the TX device */
diff --git a/samples/bpf/xdp_redirect_map_user.c 
b/samples/bpf/xdp_redirect_map_user.c
index 31131b6e7782..9866d759bd11 100644
--- a/samples/bpf/xdp_redirect_map_user.c
+++ b/samples/bpf/xdp_redirect_map_user.c
@@ -14,6 +14,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include "bpf_util.h"
 #include 
@@ -22,6 +26,7 @@
 static int ifindex_in;
 static int ifindex_out;
 static bool ifindex_out_xdp_dummy_attached = true;
+static bool xdp_devmap_attached = false;
 static __u32 prog_id;
 static __u32 dummy_prog_id;
 
@@ -83,6 +88,29 @@ static void poll_stats(int interval, int ifindex)
}
 }
 
+static int get_mac_addr(unsigned int ifindex_out, void *mac_addr)
+{
+   struct ifreq ifr;
+   char ifname[IF_NAMESIZE];
+   int fd = socket(AF_INET, SOCK_DGRAM, 0);
+
+   if (fd < 0)
+   return -1;
+
+   if (!if_indextoname(ifindex_out, ifname))
+   return -1;
+
+   strcpy(ifr.ifr_name, ifname);
+
+   if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0)
+   return -1;
+
+   memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char));
+   close(fd);
+
+   return 0;
+}
+
 static void usage(const ch

BUG: unable to handle kernel paging request in smc_nl_handle_smcr_dev

2020-12-08 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:b1f7b098 Merge branch 's390-qeth-next'
git tree:   net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=164d246b50
kernel config:  https://syzkaller.appspot.com/x/.config?x=2ac2dabe250b3a58
dashboard link: https://syzkaller.appspot.com/bug?extid=600fef7c414ee7e2d71b
compiler:   gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+600fef7c414ee7e2d...@syzkaller.appspotmail.com

BUG: unable to handle page fault for address: ff84
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD b08f067 P4D b08f067 PUD b091067 PMD 0 
Oops:  [#1] PREEMPT SMP KASAN
CPU: 0 PID: 21334 Comm: syz-executor.1 Not tainted 5.10.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:smc_set_pci_values net/smc/smc_core.h:396 [inline]
RIP: 0010:smc_nl_handle_smcr_dev.isra.0+0x4bd/0x11b0 net/smc/smc_ib.c:422
Code: 00 00 00 fc ff df 48 8d 7b 84 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 
83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 59 0c 00 00 <0f> b7 43 84 48 8d 7b 86 
48 89 fa 48 c1 ea 03 66 89 84 24 ee 00 00
RSP: 0018:c900018b7228 EFLAGS: 00010246
RAX: 0005 RBX:  RCX: 
RDX:  RSI:  RDI: ff84
RBP: 8ccc6120 R08: 0001 R09: c900018b7310
R10: f52000316e65 R11:  R12: 
R13: 88802f52d540 R14: dc00 R15: 888062412014
FS:  7f9ce0405700() GS:8880b9e0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ff84 CR3: 13c46000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 smc_nl_prep_smcr_dev net/smc/smc_ib.c:469 [inline]
 smcr_nl_get_device+0xdf/0x1f0 net/smc/smc_ib.c:481
 genl_lock_dumpit+0x60/0x90 net/netlink/genetlink.c:623
 netlink_dump+0x4b9/0xb70 net/netlink/af_netlink.c:2268
 __netlink_dump_start+0x642/0x900 net/netlink/af_netlink.c:2373
 genl_family_rcv_msg_dumpit+0x2af/0x310 net/netlink/genetlink.c:686
 genl_family_rcv_msg net/netlink/genetlink.c:780 [inline]
 genl_rcv_msg+0x434/0x580 net/netlink/genetlink.c:800
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:671
 sys_sendmsg+0x6e8/0x810 net/socket.c:2331
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2385
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2418
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45e0f9
Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 
db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7f9ce0404c68 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 0003 RCX: 0045e0f9
RDX:  RSI: 2040 RDI: 0003
RBP: 0119bfc0 R08:  R09: 
R10:  R11: 0246 R12: 0119bf8c
R13: 7ffda3a6b65f R14: 7f9ce04059c0 R15: 0119bf8c
Modules linked in:
CR2: ff84
---[ end trace 7323b30ca37a03b9 ]---
RIP: 0010:smc_set_pci_values net/smc/smc_core.h:396 [inline]
RIP: 0010:smc_nl_handle_smcr_dev.isra.0+0x4bd/0x11b0 net/smc/smc_ib.c:422
Code: 00 00 00 fc ff df 48 8d 7b 84 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 
83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 59 0c 00 00 <0f> b7 43 84 48 8d 7b 86 
48 89 fa 48 c1 ea 03 66 89 84 24 ee 00 00
RSP: 0018:c900018b7228 EFLAGS: 00010246
RAX: 0005 RBX:  RCX: 
RDX:  RSI:  RDI: ff84
RBP: 8ccc6120 R08: 0001 R09: c900018b7310
R10: f52000316e65 R11:  R12: 
R13: 88802f52d540 R14: dc00 R15: 888062412014
FS:  7f9ce0405700() GS:8880b9e0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ff84 CR3: 13c46000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be r

Re: [PATCH net-next 2/4] net: mvpp2: add mvpp2_phylink_to_port() helper

2020-12-08 Thread Marcin Wojtas

Hi Greg,

Apologies for delayed response:.


pon., 2 lis 2020 o 19:02 Greg Kroah-Hartman
 napisał(a):
>
> On Mon, Nov 02, 2020 at 06:38:54PM +0100, Marcin Wojtas wrote:
> > Hi Greg and Sasha,
> >
> > pt., 9 paź 2020 o 05:43 Marcin Wojtas  napisał(a):
> > >
> > > Hi,
> > >
> > > sob., 20 cze 2020 o 11:21 Russell King  
> > > napisał(a):
> > > >
> > > > Add a helper to convert the struct phylink_config pointer passed in
> > > > from phylink to the drivers internal struct mvpp2_port.
> > > >
> > > > Signed-off-by: Russell King 
> > > > ---
> > > >  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 29 +--
> > > >  1 file changed, 14 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c 
> > > > b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > > index 7653277d03b7..313f5a60a605 100644
> > > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > > @@ -4767,12 +4767,16 @@ static void mvpp2_port_copy_mac_addr(struct 
> > > > net_device *dev, struct mvpp2 *priv,
> > > > eth_hw_addr_random(dev);
> > > >  }
> > > >
> > > > +static struct mvpp2_port *mvpp2_phylink_to_port(struct phylink_config 
> > > > *config)
> > > > +{
> > > > +   return container_of(config, struct mvpp2_port, phylink_config);
> > > > +}
> > > > +
> > > >  static void mvpp2_phylink_validate(struct phylink_config *config,
> > > >unsigned long *supported,
> > > >struct phylink_link_state *state)
> > > >  {
> > > > -   struct mvpp2_port *port = container_of(config, struct 
> > > > mvpp2_port,
> > > > -  phylink_config);
> > > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > > > __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
> > > >
> > > > /* Invalid combinations */
> > > > @@ -4913,8 +4917,7 @@ static void mvpp2_gmac_pcs_get_state(struct 
> > > > mvpp2_port *port,
> > > >  static void mvpp2_phylink_mac_pcs_get_state(struct phylink_config 
> > > > *config,
> > > > struct phylink_link_state 
> > > > *state)
> > > >  {
> > > > -   struct mvpp2_port *port = container_of(config, struct 
> > > > mvpp2_port,
> > > > -  phylink_config);
> > > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > > >
> > > > if (port->priv->hw_version == MVPP22 && port->gop_id == 0) {
> > > > u32 mode = readl(port->base + MVPP22_XLG_CTRL3_REG);
> > > > @@ -4931,8 +4934,7 @@ static void 
> > > > mvpp2_phylink_mac_pcs_get_state(struct phylink_config *config,
> > > >
> > > >  static void mvpp2_mac_an_restart(struct phylink_config *config)
> > > >  {
> > > > -   struct mvpp2_port *port = container_of(config, struct 
> > > > mvpp2_port,
> > > > -  phylink_config);
> > > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > > > u32 val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> > > >
> > > > writel(val | MVPP2_GMAC_IN_BAND_RESTART_AN,
> > > > @@ -5105,13 +5107,12 @@ static void mvpp2_gmac_config(struct mvpp2_port 
> > > > *port, unsigned int mode,
> > > >  static void mvpp2_mac_config(struct phylink_config *config, unsigned 
> > > > int mode,
> > > >  const struct phylink_link_state *state)
> > > >  {
> > > > -   struct net_device *dev = to_net_dev(config->dev);
> > > > -   struct mvpp2_port *port = netdev_priv(dev);
> > > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > > > bool change_interface = port->phy_interface != state->interface;
> > > >
> > > > /* Check for invalid configuration */
> > > > if (mvpp2_is_xlg(state->interface) && port->gop_id != 0) {
> > > > -   netdev_err(dev, "Invalid mode on %s\n", dev->name);
> > > > +   netdev_err(port->dev, "Invalid mode on %s\n", 
> > > > port->dev->name);
> > > > return;
> > > > }
> > > >
> > > > @@ -5151,8 +5152,7 @@ static void mvpp2_mac_link_up(struct 
> > > > phylink_config *config,
> > > >   int speed, int duplex,
> > > >   bool tx_pause, bool rx_pause)
> > > >  {
> > > > -   struct net_device *dev = to_net_dev(config->dev);
> > > > -   struct mvpp2_port *port = netdev_priv(dev);
> > > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > > > u32 val;
> > > >
> > > > if (mvpp2_is_xlg(interface)) {
> > > > @@ -5199,14 +5199,13 @@ static void mvpp2_mac_link_up(struct 
> > > > phylink_config *config,
> > > >
> > > > mvpp2_egress_enable(port);
> > > > mvpp2_ingress_enable(port);
> > > > -   netif_tx_wake_all_queues(dev);
> > > > +   netif_tx_wake_all_queues(p

[PATCH net-next] net/sched: cls_u32: simplify the return expression of u32_reoffload_knode()

2020-12-08 Thread Zheng Yongjun

Simplify the return expression.

Signed-off-by: Zheng Yongjun 
---
 net/sched/cls_u32.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 54209a18d7fe..6e1abe805448 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -1171,7 +1171,6 @@ static int u32_reoffload_knode(struct tcf_proto *tp, 
struct tc_u_knode *n,
struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
struct tcf_block *block = tp->chain->block;
struct tc_cls_u32_offload cls_u32 = {};
-   int err;
 
tc_cls_common_offload_init(&cls_u32.common, tp, n->flags, extack);
cls_u32.command = add ?
@@ -1194,13 +1193,9 @@ static int u32_reoffload_knode(struct tcf_proto *tp, 
struct tc_u_knode *n,
cls_u32.knode.link_handle = ht->handle;
}
 
-   err = tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSU32,
-   &cls_u32, cb_priv, &n->flags,
-   &n->in_hw_count);
-   if (err)
-   return err;
-
-   return 0;
+   return tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSU32,
+&cls_u32, cb_priv, &n->flags,
+&n->in_hw_count);
 }
 
 static int u32_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
-- 
2.22.0

[RFC PATCH net-next 00/16] LAG offload for Ocelot DSA switches

2020-12-08 Thread Vladimir Oltean

This patch series comes as a continuation of the discussion started with
Tobias Waldekranz in his patch series to offload bonding/team from DSA:
https://patchwork.kernel.org/project/netdevbpf/patch/20201202091356.24075-3-tob...@waldekranz.com/

On one hand, it shows the rework that needs to be done to ocelot such
that a pure switchdev and a DSA driver could share the same implementation.

On the other hand, it tries to identify what data structures does DSA
really need to keep and pass along to drivers, and which structures are
best left for the drivers to deal privately with them.

Testing has been done in the following topology:

 +--+
 | Board 1 br0  |
 | +-+  |
 |/   \ |
 ||   | |
 || bond0   |
 ||+-+  |
 ||   /   \ |
 |  eno0 swp0swp1swp2   |
 +---||---|---|-+
 ||   |   |
 ++   |   |
   Cable  |   |
 Cable|   |Cable
   Cable  |   |
 ++   |   |
 ||   |   |
 +---||---|---|-+
 |  eno0 swp0swp1swp2   |
 ||   \   / |
 ||+-+  |
 || bond0   |
 ||   | |
 |\   / |
 | +-+  |
 | Board 2 br0  |
 +--+

The same script can be run on both Board 1 and Board 2 to set this up:

#!/bin/bash

ip link del bond0
ip link add bond0 type bond mode 802.3ad
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link del br0
ip link add br0 type bridge
ip link set bond0 master br0
ip link set swp0 master br0

Then traffic can be tested between eno0 of Board 1 and eno0 of Board 2.

Vladimir Oltean (16):
  net: mscc: ocelot: offload bridge port flags to device
  net: mscc: ocelot: allow offloading of bridge on top of LAG
  net: mscc: ocelot: rename ocelot_netdevice_port_event to
ocelot_netdevice_changeupper
  net: mscc: ocelot: use a switch-case statement in
ocelot_netdevice_event
  net: mscc: ocelot: don't refuse bonding interfaces we can't offload
  net: mscc: ocelot: use ipv6 in the aggregation code
  net: mscc: ocelot: set up the bonding mask in a way that avoids a
net_device
  net: mscc: ocelot: avoid unneeded "lp" variable in LAG join
  net: mscc: ocelot: use "lag" variable name in
ocelot_bridge_stp_state_set
  net: mscc: ocelot: reapply bridge forwarding mask on bonding
join/leave
  net: mscc: ocelot: set up logical port IDs centrally
  net: mscc: ocelot: drop the use of the "lags" array
  net: mscc: ocelot: rename aggr_count to num_ports_in_lag
  net: mscc: ocelot: rebalance LAGs on link up/down events
  net: dsa: felix: propagate the LAG offload ops towards the ocelot lib
  net: dsa: ocelot: tell DSA that we can offload link aggregation

 drivers/net/dsa/ocelot/felix.c |  28 +++
 drivers/net/ethernet/mscc/ocelot.c | 276 +++--
 drivers/net/ethernet/mscc/ocelot.h |   7 +-
 drivers/net/ethernet/mscc/ocelot_net.c | 139 -
 include/soc/mscc/ocelot.h  |  13 +-
 5 files changed, 298 insertions(+), 165 deletions(-)

-- 
2.25.1

[RFC PATCH net-next 02/16] net: mscc: ocelot: allow offloading of bridge on top of LAG

2020-12-08 Thread Vladimir Oltean

Commit 7afb3e575e5a ("net: mscc: ocelot: don't handle netdev events for
other netdevs") was too aggressive, and it made ocelot_netdevice_event
react only to network interface events emitted for the ocelot switch
ports.

In fact, only the PRECHANGEUPPER should have had that check.

When we ignore all events that are not for us, we miss the fact that the
upper of the LAG changes, and the bonding interface gets enslaved to a
bridge. This is an operation we could offload under certain conditions.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot_net.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
b/drivers/net/ethernet/mscc/ocelot_net.c
index 93ecd5274156..6fb2a813e694 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1047,10 +1047,8 @@ static int ocelot_netdevice_event(struct notifier_block 
*unused,
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
int ret = 0;
 
-   if (!ocelot_netdevice_dev_check(dev))
-   return 0;
-
if (event == NETDEV_PRECHANGEUPPER &&
+   ocelot_netdevice_dev_check(dev) &&
netif_is_lag_master(info->upper_dev)) {
struct netdev_lag_upper_info *lag_upper_info = info->upper_info;
struct netlink_ext_ack *extack;
-- 
2.25.1

[RFC PATCH net-next 01/16] net: mscc: ocelot: offload bridge port flags to device

2020-12-08 Thread Vladimir Oltean

We should not be unconditionally enabling address learning, since doing
that is actively detrimential when a port is standalone and not offloading
a bridge. Namely, if a port in the switch is standalone and others are
offloading the bridge, then we could enter a situation where we learn an
address towards the standalone port, but the bridged ports could not
forward the packet there, because the CPU is the only path between the
standalone and the bridged ports. The solution of course is to not
enable address learning unless the bridge asks for it. Currently this is
the only bridge port flag we are looking at. The others (flooding etc)
are TBD.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 21 -
 drivers/net/ethernet/mscc/ocelot.h |  3 +++
 drivers/net/ethernet/mscc/ocelot_net.c |  4 
 include/soc/mscc/ocelot.h  |  2 ++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index b9626eec8db6..7a5c534099d3 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -883,6 +883,7 @@ EXPORT_SYMBOL(ocelot_get_ts_info);
 
 void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
 {
+   struct ocelot_port *ocelot_port = ocelot->ports[port];
u32 port_cfg;
int p, i;
 
@@ -896,7 +897,8 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int 
port, u8 state)
ocelot->bridge_fwd_mask |= BIT(port);
fallthrough;
case BR_STATE_LEARNING:
-   port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
+   if (ocelot_port->brport_flags & BR_LEARNING)
+   port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
break;
 
default:
@@ -1178,6 +1180,7 @@ EXPORT_SYMBOL(ocelot_port_bridge_join);
 int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
 struct net_device *bridge)
 {
+   struct ocelot_port *ocelot_port = ocelot->ports[port];
struct ocelot_vlan pvid = {0}, native_vlan = {0};
struct switchdev_trans trans;
int ret;
@@ -1200,6 +1203,10 @@ int ocelot_port_bridge_leave(struct ocelot *ocelot, int 
port,
ocelot_port_set_pvid(ocelot, port, pvid);
ocelot_port_set_native_vlan(ocelot, port, native_vlan);
 
+   ocelot_port->brport_flags = 0;
+   ocelot_rmw_gix(ocelot, 0, ANA_PORT_PORT_CFG_LEARN_ENA,
+  ANA_PORT_PORT_CFG, port);
+
return 0;
 }
 EXPORT_SYMBOL(ocelot_port_bridge_leave);
@@ -1391,6 +1398,18 @@ int ocelot_get_max_mtu(struct ocelot *ocelot, int port)
 }
 EXPORT_SYMBOL(ocelot_get_max_mtu);
 
+void ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
+ unsigned long flags,
+ struct switchdev_trans *trans)
+{
+   struct ocelot_port *ocelot_port = ocelot->ports[port];
+
+   if (switchdev_trans_ph_prepare(trans))
+   return;
+
+   ocelot_port->brport_flags = flags;
+}
+
 void ocelot_init_port(struct ocelot *ocelot, int port)
 {
struct ocelot_port *ocelot_port = ocelot->ports[port];
diff --git a/drivers/net/ethernet/mscc/ocelot.h 
b/drivers/net/ethernet/mscc/ocelot.h
index 291d39d49c4e..739bd201d951 100644
--- a/drivers/net/ethernet/mscc/ocelot.h
+++ b/drivers/net/ethernet/mscc/ocelot.h
@@ -102,6 +102,9 @@ struct ocelot_multicast {
struct ocelot_pgid *pgid;
 };
 
+void ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
+ unsigned long flags,
+ struct switchdev_trans *trans);
 int ocelot_port_fdb_do_dump(const unsigned char *addr, u16 vid,
bool is_static, void *data);
 int ocelot_mact_learn(struct ocelot *ocelot, int port,
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
b/drivers/net/ethernet/mscc/ocelot_net.c
index 9ba7e2b166e9..93ecd5274156 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -882,6 +882,10 @@ static int ocelot_port_attr_set(struct net_device *dev,
case SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED:
ocelot_port_attr_mc_set(ocelot, port, !attr->u.mc_disabled);
break;
+   case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
+   ocelot_port_bridge_flags(ocelot, port, attr->u.brport_flags,
+trans);
+   break;
default:
err = -EOPNOTSUPP;
break;
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 2f4cd3288bcc..50514c087231 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -581,6 +581,8 @@ struct ocelot_port {
 
struct regmap   *target;
 
+   unsigned long   brport_flags;
+
boolvlan_aware;
/* VLAN that untag

[RFC PATCH net-next 06/16] net: mscc: ocelot: use ipv6 in the aggregation code

2020-12-08 Thread Vladimir Oltean

IPv6 header information is not currently part of the entropy source for
the 4-bit aggregation code used for LAG offload, even though it could be.
The hardware reference manual says about these fields:

ANA::AGGR_CFG.AC_IP6_TCPUDP_PORT_ENA
Use IPv6 TCP/UDP port when calculating aggregation code. Configure
identically for all ports. Recommended value is 1.

ANA::AGGR_CFG.AC_IP6_FLOW_LBL_ENA
Use IPv6 flow label when calculating AC. Configure identically for all
ports. Recommended value is 1.

Integration with the xmit_hash_policy of the bonding interface is TBD.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index 7a5c534099d3..13e86dd71e5a 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1557,7 +1557,10 @@ int ocelot_init(struct ocelot *ocelot)
ocelot_write(ocelot, ANA_AGGR_CFG_AC_SMAC_ENA |
 ANA_AGGR_CFG_AC_DMAC_ENA |
 ANA_AGGR_CFG_AC_IP4_SIPDIP_ENA |
-ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA, ANA_AGGR_CFG);
+ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA |
+ANA_AGGR_CFG_AC_IP6_FLOW_LBL_ENA |
+ANA_AGGR_CFG_AC_IP6_TCPUDP_ENA,
+ANA_AGGR_CFG);
 
/* Set MAC age time to default value. The entry is aged after
 * 2*AGE_PERIOD
-- 
2.25.1

[RFC PATCH net-next 05/16] net: mscc: ocelot: don't refuse bonding interfaces we can't offload

2020-12-08 Thread Vladimir Oltean

Since switchdev/DSA exposes network interfaces that fulfill many of the
same user space expectations that dedicated NICs do, it makes sense to
not deny bonding interfaces with a bonding policy that we cannot offload,
but instead allow the bonding driver to select the egress interface in
software.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot_net.c | 38 ++
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
b/drivers/net/ethernet/mscc/ocelot_net.c
index 47b620967156..77957328722a 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1022,6 +1022,15 @@ static int ocelot_netdevice_changeupper(struct 
net_device *dev,
}
}
if (netif_is_lag_master(info->upper_dev)) {
+   struct netdev_lag_upper_info *lag_upper_info;
+
+   lag_upper_info = info->upper_info;
+
+   /* Only offload what we can */
+   if (lag_upper_info &&
+   lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH)
+   return NOTIFY_DONE;
+
if (info->linking)
err = ocelot_port_lag_join(ocelot, port,
   info->upper_dev);
@@ -1037,10 +1046,16 @@ static int
 ocelot_netdevice_lag_changeupper(struct net_device *dev,
 struct netdev_notifier_changeupper_info *info)
 {
+   struct netdev_lag_upper_info *lag_upper_info = info->upper_info;
struct net_device *lower;
struct list_head *iter;
int err = NOTIFY_DONE;
 
+   /* Can't offload LAG => also do bridging in software */
+   if (lag_upper_info &&
+   lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH)
+   return NOTIFY_DONE;
+
netdev_for_each_lower_dev(dev, lower, iter) {
err = ocelot_netdevice_changeupper(lower, info);
if (err)
@@ -1056,29 +1071,6 @@ static int ocelot_netdevice_event(struct notifier_block 
*unused,
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 
switch (event) {
-   case NETDEV_PRECHANGEUPPER: {
-   struct netdev_notifier_changeupper_info *info = ptr;
-   struct netdev_lag_upper_info *lag_upper_info;
-   struct netlink_ext_ack *extack;
-
-   if (!ocelot_netdevice_dev_check(dev))
-   break;
-
-   if (!netif_is_lag_master(info->upper_dev))
-   break;
-
-   lag_upper_info = info->upper_info;
-
-   if (lag_upper_info &&
-   lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) {
-   extack = netdev_notifier_info_to_extack(&info->info);
-   NL_SET_ERR_MSG_MOD(extack, "LAG device using 
unsupported Tx type");
-
-   return NOTIFY_BAD;
-   }
-
-   break;
-   }
case NETDEV_CHANGEUPPER: {
struct netdev_notifier_changeupper_info *info = ptr;
 
-- 
2.25.1

[RFC PATCH net-next 16/16] net: dsa: ocelot: tell DSA that we can offload link aggregation

2020-12-08 Thread Vladimir Oltean

For preallocation purposes, we need to specify the maximum number of
individual bonding/team devices that we can offload, which in our case
is equal to the number of physical interfaces.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/ocelot/felix.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 53ed182fac12..ad73aaa4457c 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -653,6 +653,7 @@ static int felix_setup(struct dsa_switch *ds)
 
ds->mtu_enforcement_ingress = true;
ds->configure_vlan_while_not_filtering = true;
+   ds->num_lags = ds->num_ports;
 
return 0;
 }
-- 
2.25.1

[RFC PATCH net-next 03/16] net: mscc: ocelot: rename ocelot_netdevice_port_event to ocelot_netdevice_changeupper

2020-12-08 Thread Vladimir Oltean

ocelot_netdevice_port_event treats a single event, NETDEV_CHANGEUPPER.
So we can remove the check for the type of event, and rename the
function to be more suggestive, since there already is a function with a
very similar name of ocelot_netdevice_event.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot_net.c | 59 --
 1 file changed, 27 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
b/drivers/net/ethernet/mscc/ocelot_net.c
index 6fb2a813e694..50765a3b1c44 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1003,9 +1003,8 @@ static int ocelot_port_obj_del(struct net_device *dev,
return ret;
 }
 
-static int ocelot_netdevice_port_event(struct net_device *dev,
-  unsigned long event,
-  struct netdev_notifier_changeupper_info 
*info)
+static int ocelot_netdevice_changeupper(struct net_device *dev,
+   struct netdev_notifier_changeupper_info 
*info)
 {
struct ocelot_port_private *priv = netdev_priv(dev);
struct ocelot_port *ocelot_port = &priv->port;
@@ -1013,28 +1012,22 @@ static int ocelot_netdevice_port_event(struct 
net_device *dev,
int port = priv->chip_port;
int err = 0;
 
-   switch (event) {
-   case NETDEV_CHANGEUPPER:
-   if (netif_is_bridge_master(info->upper_dev)) {
-   if (info->linking) {
-   err = ocelot_port_bridge_join(ocelot, port,
- info->upper_dev);
-   } else {
-   err = ocelot_port_bridge_leave(ocelot, port,
-  info->upper_dev);
-   }
-   }
-   if (netif_is_lag_master(info->upper_dev)) {
-   if (info->linking)
-   err = ocelot_port_lag_join(ocelot, port,
-  info->upper_dev);
-   else
-   ocelot_port_lag_leave(ocelot, port,
+   if (netif_is_bridge_master(info->upper_dev)) {
+   if (info->linking) {
+   err = ocelot_port_bridge_join(ocelot, port,
  info->upper_dev);
+   } else {
+   err = ocelot_port_bridge_leave(ocelot, port,
+  info->upper_dev);
}
-   break;
-   default:
-   break;
+   }
+   if (netif_is_lag_master(info->upper_dev)) {
+   if (info->linking)
+   err = ocelot_port_lag_join(ocelot, port,
+  info->upper_dev);
+   else
+   ocelot_port_lag_leave(ocelot, port,
+ info->upper_dev);
}
 
return err;
@@ -1063,17 +1056,19 @@ static int ocelot_netdevice_event(struct notifier_block 
*unused,
}
}
 
-   if (netif_is_lag_master(dev)) {
-   struct net_device *slave;
-   struct list_head *iter;
+   if (event == NETDEV_CHANGEUPPER) {
+   if (netif_is_lag_master(dev)) {
+   struct net_device *slave;
+   struct list_head *iter;
 
-   netdev_for_each_lower_dev(dev, slave, iter) {
-   ret = ocelot_netdevice_port_event(slave, event, info);
-   if (ret)
-   goto notify;
+   netdev_for_each_lower_dev(dev, slave, iter) {
+   ret = ocelot_netdevice_changeupper(slave, 
event, info);
+   if (ret)
+   goto notify;
+   }
+   } else {
+   ret = ocelot_netdevice_changeupper(dev, event, info);
}
-   } else {
-   ret = ocelot_netdevice_port_event(dev, event, info);
}
 
 notify:
-- 
2.25.1

[RFC PATCH net-next 07/16] net: mscc: ocelot: set up the bonding mask in a way that avoids a net_device

2020-12-08 Thread Vladimir Oltean

Since this code should be called from pure switchdev as well as from
DSA, we must find a way to determine the bonding mask not by looking
directly at the net_device lowers of the bonding interface, since those
could have different private structures.

We keep a pointer to the bonding upper interface, if present, in struct
ocelot_port. Then the bonding mask becomes the bitwise OR of all ports
that have the same bonding upper interface. This adds a duplication of
functionality with the current "lags" array, but the duplication will be
short-lived, since further patches will remove the latter completely.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 29 ++---
 include/soc/mscc/ocelot.h  |  2 ++
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index 13e86dd71e5a..30dee1f957d1 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -881,6 +881,24 @@ int ocelot_get_ts_info(struct ocelot *ocelot, int port,
 }
 EXPORT_SYMBOL(ocelot_get_ts_info);
 
+static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device *bond)
+{
+   u32 bond_mask = 0;
+   int port;
+
+   for (port = 0; port < ocelot->num_phys_ports; port++) {
+   struct ocelot_port *ocelot_port = ocelot->ports[port];
+
+   if (!ocelot_port)
+   continue;
+
+   if (ocelot_port->bond == bond)
+   bond_mask |= BIT(port);
+   }
+
+   return bond_mask;
+}
+
 void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
 {
struct ocelot_port *ocelot_port = ocelot->ports[port];
@@ -1272,17 +1290,12 @@ static void ocelot_setup_lag(struct ocelot *ocelot, int 
lag)
 int ocelot_port_lag_join(struct ocelot *ocelot, int port,
 struct net_device *bond)
 {
-   struct net_device *ndev;
u32 bond_mask = 0;
int lag, lp;
 
-   rcu_read_lock();
-   for_each_netdev_in_bond_rcu(bond, ndev) {
-   struct ocelot_port_private *priv = netdev_priv(ndev);
+   ocelot->ports[port]->bond = bond;
 
-   bond_mask |= BIT(priv->chip_port);
-   }
-   rcu_read_unlock();
+   bond_mask = ocelot_get_bond_mask(ocelot, bond);
 
lp = __ffs(bond_mask);
 
@@ -1315,6 +1328,8 @@ void ocelot_port_lag_leave(struct ocelot *ocelot, int 
port,
u32 port_cfg;
int i;
 
+   ocelot->ports[port]->bond = NULL;
+
/* Remove port from any lag */
for (i = 0; i < ocelot->num_phys_ports; i++)
ocelot->lags[i] &= ~BIT(port);
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 50514c087231..b812bdff1da1 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -597,6 +597,8 @@ struct ocelot_port {
phy_interface_t phy_mode;
 
u8  *xmit_template;
+
+   struct net_device   *bond;
 };
 
 struct ocelot {
-- 
2.25.1

[RFC PATCH net-next 04/16] net: mscc: ocelot: use a switch-case statement in ocelot_netdevice_event

2020-12-08 Thread Vladimir Oltean

Make ocelot's net device event handler more streamlined by structuring
it in a similar way with others. The inspiration here was
dsa_slave_netdevice_event.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot_net.c | 68 +-
 1 file changed, 45 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
b/drivers/net/ethernet/mscc/ocelot_net.c
index 50765a3b1c44..47b620967156 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1030,49 +1030,71 @@ static int ocelot_netdevice_changeupper(struct 
net_device *dev,
  info->upper_dev);
}
 
-   return err;
+   return notifier_from_errno(err);
+}
+
+static int
+ocelot_netdevice_lag_changeupper(struct net_device *dev,
+struct netdev_notifier_changeupper_info *info)
+{
+   struct net_device *lower;
+   struct list_head *iter;
+   int err = NOTIFY_DONE;
+
+   netdev_for_each_lower_dev(dev, lower, iter) {
+   err = ocelot_netdevice_changeupper(lower, info);
+   if (err)
+   return notifier_from_errno(err);
+   }
+
+   return NOTIFY_DONE;
 }
 
 static int ocelot_netdevice_event(struct notifier_block *unused,
  unsigned long event, void *ptr)
 {
-   struct netdev_notifier_changeupper_info *info = ptr;
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
-   int ret = 0;
 
-   if (event == NETDEV_PRECHANGEUPPER &&
-   ocelot_netdevice_dev_check(dev) &&
-   netif_is_lag_master(info->upper_dev)) {
-   struct netdev_lag_upper_info *lag_upper_info = info->upper_info;
+   switch (event) {
+   case NETDEV_PRECHANGEUPPER: {
+   struct netdev_notifier_changeupper_info *info = ptr;
+   struct netdev_lag_upper_info *lag_upper_info;
struct netlink_ext_ack *extack;
 
+   if (!ocelot_netdevice_dev_check(dev))
+   break;
+
+   if (!netif_is_lag_master(info->upper_dev))
+   break;
+
+   lag_upper_info = info->upper_info;
+
if (lag_upper_info &&
lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) {
extack = netdev_notifier_info_to_extack(&info->info);
NL_SET_ERR_MSG_MOD(extack, "LAG device using 
unsupported Tx type");
 
-   ret = -EINVAL;
-   goto notify;
+   return NOTIFY_BAD;
}
+
+   break;
}
+   case NETDEV_CHANGEUPPER: {
+   struct netdev_notifier_changeupper_info *info = ptr;
 
-   if (event == NETDEV_CHANGEUPPER) {
-   if (netif_is_lag_master(dev)) {
-   struct net_device *slave;
-   struct list_head *iter;
+   if (ocelot_netdevice_dev_check(dev))
+   return ocelot_netdevice_changeupper(dev, info);
 
-   netdev_for_each_lower_dev(dev, slave, iter) {
-   ret = ocelot_netdevice_changeupper(slave, 
event, info);
-   if (ret)
-   goto notify;
-   }
-   } else {
-   ret = ocelot_netdevice_changeupper(dev, event, info);
-   }
+   if (netif_is_lag_master(dev))
+   return ocelot_netdevice_lag_changeupper(dev, info);
+
+   break;
+   }
+   default:
+   break;
}
 
-notify:
-   return notifier_from_errno(ret);
+   return NOTIFY_DONE;
 }
 
 struct notifier_block ocelot_netdevice_nb __read_mostly = {
-- 
2.25.1

[RFC PATCH net-next 15/16] net: dsa: felix: propagate the LAG offload ops towards the ocelot lib

2020-12-08 Thread Vladimir Oltean

The ocelot switch has been supporting LAG offload since its initial
commit, however felix could not make use of that, due to lack of a LAG
abstraction in DSA. Now that we have that, let's forward DSA's calls
towards the ocelot library, who will deal with setting up the bonding.

Note that ocelot_port_lag_leave can return an error due to memory
allocation but we are currently ignoring that, because the DSA method
returns void.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/ocelot/felix.c | 27 +++
 drivers/net/ethernet/mscc/ocelot.c |  1 +
 drivers/net/ethernet/mscc/ocelot.h |  6 --
 include/soc/mscc/ocelot.h  |  6 ++
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 7dc230677b78..53ed182fac12 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -112,6 +112,30 @@ static void felix_bridge_leave(struct dsa_switch *ds, int 
port,
ocelot_port_bridge_leave(ocelot, port, br);
 }
 
+static int felix_lag_join(struct dsa_switch *ds, int port,
+ struct net_device *lag_dev)
+{
+   struct ocelot *ocelot = ds->priv;
+
+   return ocelot_port_lag_join(ocelot, port, lag_dev);
+}
+
+static void felix_lag_leave(struct dsa_switch *ds, int port,
+   struct net_device *lag_dev)
+{
+   struct ocelot *ocelot = ds->priv;
+
+   ocelot_port_lag_leave(ocelot, port, lag_dev);
+}
+
+static int felix_lag_change(struct dsa_switch *ds, int port,
+   struct netdev_lag_lower_state_info *linfo)
+{
+   struct ocelot *ocelot = ds->priv;
+
+   return ocelot_port_lag_change(ocelot, port, linfo);
+}
+
 static int felix_vlan_prepare(struct dsa_switch *ds, int port,
  const struct switchdev_obj_port_vlan *vlan)
 {
@@ -803,6 +827,9 @@ const struct dsa_switch_ops felix_switch_ops = {
.port_mdb_del   = felix_mdb_del,
.port_bridge_join   = felix_bridge_join,
.port_bridge_leave  = felix_bridge_leave,
+   .port_lag_join  = felix_lag_join,
+   .port_lag_leave = felix_lag_leave,
+   .port_lag_change= felix_lag_change,
.port_stp_state_set = felix_bridge_stp_state_set,
.port_vlan_prepare  = felix_vlan_prepare,
.port_vlan_filtering= felix_vlan_filtering,
diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index 5c71d121048d..cd7a2e558301 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1381,6 +1381,7 @@ int ocelot_port_lag_change(struct ocelot *ocelot, int 
port,
/* Rebalance the LAGs */
return ocelot_set_aggr_pgids(ocelot);
 }
+EXPORT_SYMBOL(ocelot_port_lag_change);
 
 /* Configure the maximum SDU (L2 payload) on RX to the value specified in @sdu.
  * The length of VLAN tags is accounted for automatically via DEV_MAC_TAGS_CFG.
diff --git a/drivers/net/ethernet/mscc/ocelot.h 
b/drivers/net/ethernet/mscc/ocelot.h
index 0860125b623c..3141ccde6a66 100644
--- a/drivers/net/ethernet/mscc/ocelot.h
+++ b/drivers/net/ethernet/mscc/ocelot.h
@@ -112,12 +112,6 @@ int ocelot_mact_learn(struct ocelot *ocelot, int port,
  unsigned int vid, enum macaccess_entry_type type);
 int ocelot_mact_forget(struct ocelot *ocelot,
   const unsigned char mac[ETH_ALEN], unsigned int vid);
-int ocelot_port_lag_join(struct ocelot *ocelot, int port,
-struct net_device *bond);
-int ocelot_port_lag_leave(struct ocelot *ocelot, int port,
- struct net_device *bond);
-int ocelot_port_lag_change(struct ocelot *ocelot, int port,
-  struct netdev_lag_lower_state_info *info);
 struct net_device *ocelot_port_to_netdev(struct ocelot *ocelot, int port);
 int ocelot_netdev_to_port(struct net_device *dev);
 
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 8a44b9064789..7c104f08796d 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -780,5 +780,11 @@ int ocelot_port_mdb_add(struct ocelot *ocelot, int port,
const struct switchdev_obj_port_mdb *mdb);
 int ocelot_port_mdb_del(struct ocelot *ocelot, int port,
const struct switchdev_obj_port_mdb *mdb);
+int ocelot_port_lag_join(struct ocelot *ocelot, int port,
+struct net_device *bond);
+int ocelot_port_lag_leave(struct ocelot *ocelot, int port,
+ struct net_device *bond);
+int ocelot_port_lag_change(struct ocelot *ocelot, int port,
+  struct netdev_lag_lower_state_info *info);
 
 #endif
-- 
2.25.1

[RFC PATCH net-next 09/16] net: mscc: ocelot: use "lag" variable name in ocelot_bridge_stp_state_set

2020-12-08 Thread Vladimir Oltean

In anticipation of further simplification, make it more clear what we're
iterating over.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index 080fd4ce37ea..c3c6682e6e79 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -903,7 +903,7 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int 
port, u8 state)
 {
struct ocelot_port *ocelot_port = ocelot->ports[port];
u32 port_cfg;
-   int p, i;
+   int p;
 
if (!(BIT(port) & ocelot->bridge_mask))
return;
@@ -928,14 +928,17 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, 
int port, u8 state)
ocelot_write_gix(ocelot, port_cfg, ANA_PORT_PORT_CFG, port);
 
/* Apply FWD mask. The loop is needed to add/remove the current port as
-* a source for the other ports.
+* a source for the other ports. If the source port is in a bond, then
+* all the other ports from that bond need to be removed from this
+* source port's forwarding mask.
 */
for (p = 0; p < ocelot->num_phys_ports; p++) {
if (ocelot->bridge_fwd_mask & BIT(p)) {
unsigned long mask = ocelot->bridge_fwd_mask & ~BIT(p);
+   int lag;
 
-   for (i = 0; i < ocelot->num_phys_ports; i++) {
-   unsigned long bond_mask = ocelot->lags[i];
+   for (lag = 0; lag < ocelot->num_phys_ports; lag++) {
+   unsigned long bond_mask = ocelot->lags[lag];
 
if (!bond_mask)
continue;
-- 
2.25.1

[PATCH net-next] net: ipv6: rpl_iptunnel: simplify the return expression of rpl_do_srh()

2020-12-08 Thread Zheng Yongjun

Simplify the return expression.

Signed-off-by: Zheng Yongjun 
---
 net/ipv6/rpl_iptunnel.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
index 5fdf3ebb953f..f16cf45a2421 100644
--- a/net/ipv6/rpl_iptunnel.c
+++ b/net/ipv6/rpl_iptunnel.c
@@ -190,18 +190,13 @@ static int rpl_do_srh(struct sk_buff *skb, const struct 
rpl_lwt *rlwt)
 {
struct dst_entry *dst = skb_dst(skb);
struct rpl_iptunnel_encap *tinfo;
-   int err = 0;
 
if (skb->protocol != htons(ETH_P_IPV6))
return -EINVAL;
 
tinfo = rpl_encap_lwtunnel(dst->lwtstate);
 
-   err = rpl_do_srh_inline(skb, rlwt, tinfo->srh);
-   if (err)
-   return err;
-
-   return 0;
+   return rpl_do_srh_inline(skb, rlwt, tinfo->srh);
 }
 
 static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
-- 
2.22.0

[RFC PATCH net-next 10/16] net: mscc: ocelot: reapply bridge forwarding mask on bonding join/leave

2020-12-08 Thread Vladimir Oltean

Applying the bridge forwarding mask currently is done only on the STP
state changes for any port. But it depends on both STP state changes,
and bonding interface state changes. Export the bit that recalculates
the forwarding mask so that it could be reused, and call it when a port
starts and stops offloading a bonding interface.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 68 +-
 1 file changed, 38 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index c3c6682e6e79..ee0fcee8e09a 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -899,11 +899,45 @@ static u32 ocelot_get_bond_mask(struct ocelot *ocelot, 
struct net_device *bond)
return bond_mask;
 }
 
+static void ocelot_apply_bridge_fwd_mask(struct ocelot *ocelot)
+{
+   int port;
+
+   /* Apply FWD mask. The loop is needed to add/remove the current port as
+* a source for the other ports. If the source port is in a bond, then
+* all the other ports from that bond need to be removed from this
+* source port's forwarding mask.
+*/
+   for (port = 0; port < ocelot->num_phys_ports; port++) {
+   if (ocelot->bridge_fwd_mask & BIT(port)) {
+   unsigned long mask = ocelot->bridge_fwd_mask & 
~BIT(port);
+   int lag;
+
+   for (lag = 0; lag < ocelot->num_phys_ports; lag++) {
+   unsigned long bond_mask = ocelot->lags[lag];
+
+   if (!bond_mask)
+   continue;
+
+   if (bond_mask & BIT(port)) {
+   mask &= ~bond_mask;
+   break;
+   }
+   }
+
+   ocelot_write_rix(ocelot, mask,
+ANA_PGID_PGID, PGID_SRC + port);
+   } else {
+   ocelot_write_rix(ocelot, 0,
+ANA_PGID_PGID, PGID_SRC + port);
+   }
+   }
+}
+
 void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
 {
struct ocelot_port *ocelot_port = ocelot->ports[port];
u32 port_cfg;
-   int p;
 
if (!(BIT(port) & ocelot->bridge_mask))
return;
@@ -927,35 +961,7 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, 
int port, u8 state)
 
ocelot_write_gix(ocelot, port_cfg, ANA_PORT_PORT_CFG, port);
 
-   /* Apply FWD mask. The loop is needed to add/remove the current port as
-* a source for the other ports. If the source port is in a bond, then
-* all the other ports from that bond need to be removed from this
-* source port's forwarding mask.
-*/
-   for (p = 0; p < ocelot->num_phys_ports; p++) {
-   if (ocelot->bridge_fwd_mask & BIT(p)) {
-   unsigned long mask = ocelot->bridge_fwd_mask & ~BIT(p);
-   int lag;
-
-   for (lag = 0; lag < ocelot->num_phys_ports; lag++) {
-   unsigned long bond_mask = ocelot->lags[lag];
-
-   if (!bond_mask)
-   continue;
-
-   if (bond_mask & BIT(p)) {
-   mask &= ~bond_mask;
-   break;
-   }
-   }
-
-   ocelot_write_rix(ocelot, mask,
-ANA_PGID_PGID, PGID_SRC + p);
-   } else {
-   ocelot_write_rix(ocelot, 0,
-ANA_PGID_PGID, PGID_SRC + p);
-   }
-   }
+   ocelot_apply_bridge_fwd_mask(ocelot);
 }
 EXPORT_SYMBOL(ocelot_bridge_stp_state_set);
 
@@ -1315,6 +1321,7 @@ int ocelot_port_lag_join(struct ocelot *ocelot, int port,
}
 
ocelot_setup_lag(ocelot, lag);
+   ocelot_apply_bridge_fwd_mask(ocelot);
ocelot_set_aggr_pgids(ocelot);
 
return 0;
@@ -1350,6 +1357,7 @@ void ocelot_port_lag_leave(struct ocelot *ocelot, int 
port,
ocelot_write_gix(ocelot, port_cfg | ANA_PORT_PORT_CFG_PORTID_VAL(port),
 ANA_PORT_PORT_CFG, port);
 
+   ocelot_apply_bridge_fwd_mask(ocelot);
ocelot_set_aggr_pgids(ocelot);
 }
 EXPORT_SYMBOL(ocelot_port_lag_leave);
-- 
2.25.1

[RFC PATCH net-next 08/16] net: mscc: ocelot: avoid unneeded "lp" variable in LAG join

2020-12-08 Thread Vladimir Oltean

The index of the LAG is equal to the logical port ID that all the
physical port members have, which is further equal to the index of the
first physical port that is a member of the LAG.

The code gets a bit carried away with logic like this:

if (a == b)
c = a;
else
c = b;

which can be simplified, of course, into:

c = b;

(with a being port, b being lp, c being lag)

This further makes the "lp" variable redundant, since we can use "lag"
everywhere where "lp" (logical port) was used. So instead of a "c = b"
assignment, we can do a complete deletion of b. Only one comment here:

if (bond_mask) {
lp = __ffs(bond_mask);
ocelot->lags[lp] = 0;
}

lp was clobbered before, because it was used as a temporary variable to
hold the new smallest port ID from the bond. Now that we don't have "lp"
any longer, we'll just avoid the temporary variable and zeroize the
bonding mask directly.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index 30dee1f957d1..080fd4ce37ea 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1291,28 +1291,24 @@ int ocelot_port_lag_join(struct ocelot *ocelot, int 
port,
 struct net_device *bond)
 {
u32 bond_mask = 0;
-   int lag, lp;
+   int lag;
 
ocelot->ports[port]->bond = bond;
 
bond_mask = ocelot_get_bond_mask(ocelot, bond);
 
-   lp = __ffs(bond_mask);
+   lag = __ffs(bond_mask);
 
/* If the new port is the lowest one, use it as the logical port from
 * now on
 */
-   if (port == lp) {
-   lag = port;
+   if (port == lag) {
ocelot->lags[port] = bond_mask;
bond_mask &= ~BIT(port);
-   if (bond_mask) {
-   lp = __ffs(bond_mask);
-   ocelot->lags[lp] = 0;
-   }
+   if (bond_mask)
+   ocelot->lags[__ffs(bond_mask)] = 0;
} else {
-   lag = lp;
-   ocelot->lags[lp] |= BIT(port);
+   ocelot->lags[lag] |= BIT(port);
}
 
ocelot_setup_lag(ocelot, lag);
-- 
2.25.1

[RFC PATCH net-next 12/16] net: mscc: ocelot: drop the use of the "lags" array

2020-12-08 Thread Vladimir Oltean

We can now simplify the implementation by always using ocelot_get_bond_mask
to look up the other ports that are offloading the same bonding interface
as us.

In ocelot_set_aggr_pgids, the code had a way to uniquely iterate through
LAGs. We need to achieve the same behavior by marking each LAG as visited,
which we do now by temporarily allocating an array of pointers to bonding
uppers of each port, and marking each bonding upper as NULL once it has
been treated by the first port that is a member. And because we now do
some dynamic allocation, we need to propagate errors from
ocelot_set_aggr_pgid all the way to ocelot_port_lag_leave.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 104 ++---
 drivers/net/ethernet/mscc/ocelot.h |   4 +-
 drivers/net/ethernet/mscc/ocelot_net.c |   4 +-
 include/soc/mscc/ocelot.h  |   2 -
 4 files changed, 47 insertions(+), 67 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index 1a98c24af056..d4dbba66aa65 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -909,21 +909,17 @@ static void ocelot_apply_bridge_fwd_mask(struct ocelot 
*ocelot)
 * source port's forwarding mask.
 */
for (port = 0; port < ocelot->num_phys_ports; port++) {
-   if (ocelot->bridge_fwd_mask & BIT(port)) {
-   unsigned long mask = ocelot->bridge_fwd_mask & 
~BIT(port);
-   int lag;
+   struct ocelot_port *ocelot_port = ocelot->ports[port];
 
-   for (lag = 0; lag < ocelot->num_phys_ports; lag++) {
-   unsigned long bond_mask = ocelot->lags[lag];
+   if (!ocelot_port)
+   continue;
 
-   if (!bond_mask)
-   continue;
+   if (ocelot->bridge_fwd_mask & BIT(port)) {
+   unsigned long mask = ocelot->bridge_fwd_mask & 
~BIT(port);
+   struct net_device *bond = ocelot_port->bond;
 
-   if (bond_mask & BIT(port)) {
-   mask &= ~bond_mask;
-   break;
-   }
-   }
+   if (bond)
+   mask &= ~ocelot_get_bond_mask(ocelot, bond);
 
ocelot_write_rix(ocelot, mask,
 ANA_PGID_PGID, PGID_SRC + port);
@@ -1238,10 +1234,16 @@ int ocelot_port_bridge_leave(struct ocelot *ocelot, int 
port,
 }
 EXPORT_SYMBOL(ocelot_port_bridge_leave);
 
-static void ocelot_set_aggr_pgids(struct ocelot *ocelot)
+static int ocelot_set_aggr_pgids(struct ocelot *ocelot)
 {
+   struct net_device **bonds;
int i, port, lag;
 
+   bonds = kcalloc(ocelot->num_phys_ports, sizeof(struct net_device *),
+   GFP_KERNEL);
+   if (!bonds)
+   return -ENOMEM;
+
/* Reset destination and aggregation PGIDS */
for_each_unicast_dest_pgid(ocelot, port)
ocelot_write_rix(ocelot, BIT(port), ANA_PGID_PGID, port);
@@ -1250,16 +1252,26 @@ static void ocelot_set_aggr_pgids(struct ocelot *ocelot)
ocelot_write_rix(ocelot, GENMASK(ocelot->num_phys_ports - 1, 0),
 ANA_PGID_PGID, i);
 
+   for (port = 0; port < ocelot->num_phys_ports; port++) {
+   struct ocelot_port *ocelot_port = ocelot->ports[port];
+
+   if (!ocelot_port)
+   continue;
+
+   bonds[port] = ocelot_port->bond;
+   }
+
/* Now, set PGIDs for each LAG */
for (lag = 0; lag < ocelot->num_phys_ports; lag++) {
unsigned long bond_mask;
int aggr_count = 0;
u8 aggr_idx[16];
 
-   bond_mask = ocelot->lags[lag];
-   if (!bond_mask)
+   if (!bonds[lag])
continue;
 
+   bond_mask = ocelot_get_bond_mask(ocelot, bonds[lag]);
+
for_each_set_bit(port, &bond_mask, ocelot->num_phys_ports) {
// Destination mask
ocelot_write_rix(ocelot, bond_mask,
@@ -1276,7 +1288,19 @@ static void ocelot_set_aggr_pgids(struct ocelot *ocelot)
ac |= BIT(aggr_idx[i % aggr_count]);
ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);
}
+
+   /* Mark the bonding interface as visited to avoid applying
+* the same config again
+*/
+   for (i = lag + 1; i < ocelot->num_phys_ports; i++)
+   if (bonds[i] == bonds[lag])
+   bonds[i] = NULL;
+
+   bonds[lag] = NULL;
}
+
+   kfree(bonds);
+   return

[RFC PATCH net-next 14/16] net: mscc: ocelot: rebalance LAGs on link up/down events

2020-12-08 Thread Vladimir Oltean

At present there is an issue when ocelot is offloading a bonding
interface, but one of the links of the physical ports goes down. Traffic
keeps being hashed towards that destination, and of course gets dropped
on egress.

Monitor the netdev notifier events emitted by the bonding driver for
changes in the physical state of lower interfaces, to determine which
ports are active and which ones are no longer.

Then extend ocelot_get_bond_mask to return either the configured bonding
interfaces, or the active ones, depending on a boolean argument. The
code that does rebalancing only needs to do so among the active ports,
whereas the bridge forwarding mask and the logical port IDs still need
to look at the permanently bonded ports.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 43 --
 drivers/net/ethernet/mscc/ocelot.h |  2 ++
 drivers/net/ethernet/mscc/ocelot_net.c | 26 
 include/soc/mscc/ocelot.h  |  1 +
 4 files changed, 63 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index d87e80a15ca5..5c71d121048d 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -881,7 +881,8 @@ int ocelot_get_ts_info(struct ocelot *ocelot, int port,
 }
 EXPORT_SYMBOL(ocelot_get_ts_info);
 
-static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device *bond)
+static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device *bond,
+   bool just_active_ports)
 {
u32 bond_mask = 0;
int port;
@@ -892,8 +893,12 @@ static u32 ocelot_get_bond_mask(struct ocelot *ocelot, 
struct net_device *bond)
if (!ocelot_port)
continue;
 
-   if (ocelot_port->bond == bond)
+   if (ocelot_port->bond == bond) {
+   if (just_active_ports && !ocelot_port->lag_tx_active)
+   continue;
+
bond_mask |= BIT(port);
+   }
}
 
return bond_mask;
@@ -919,7 +924,7 @@ static void ocelot_apply_bridge_fwd_mask(struct ocelot 
*ocelot)
struct net_device *bond = ocelot_port->bond;
 
if (bond)
-   mask &= ~ocelot_get_bond_mask(ocelot, bond);
+   mask &= ~ocelot_get_bond_mask(ocelot, bond, 
false);
 
ocelot_write_rix(ocelot, mask,
 ANA_PGID_PGID, PGID_SRC + port);
@@ -1261,22 +1266,22 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot)
bonds[port] = ocelot_port->bond;
}
 
-   /* Now, set PGIDs for each LAG */
+   /* Now, set PGIDs for each active LAG */
for (lag = 0; lag < ocelot->num_phys_ports; lag++) {
-   int num_ports_in_lag = 0;
+   int num_active_ports = 0;
unsigned long bond_mask;
u8 aggr_idx[16];
 
if (!bonds[lag])
continue;
 
-   bond_mask = ocelot_get_bond_mask(ocelot, bonds[lag]);
+   bond_mask = ocelot_get_bond_mask(ocelot, bonds[lag], true);
 
for_each_set_bit(port, &bond_mask, ocelot->num_phys_ports) {
// Destination mask
ocelot_write_rix(ocelot, bond_mask,
 ANA_PGID_PGID, port);
-   aggr_idx[num_ports_in_lag++] = port;
+   aggr_idx[num_active_ports++] = port;
}
 
for_each_aggr_pgid(ocelot, i) {
@@ -1284,7 +1289,11 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot)
 
ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i);
ac &= ~bond_mask;
-   ac |= BIT(aggr_idx[i % num_ports_in_lag]);
+   /* Don't do division by zero if there was no active
+* port. Just make all aggregation codes zero.
+*/
+   if (num_active_ports)
+   ac |= BIT(aggr_idx[i % num_active_ports]);
ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);
}
 
@@ -1320,7 +1329,8 @@ static void ocelot_setup_logical_port_ids(struct ocelot 
*ocelot)
 
bond = ocelot_port->bond;
if (bond) {
-   int lag = __ffs(ocelot_get_bond_mask(ocelot, bond));
+   int lag = __ffs(ocelot_get_bond_mask(ocelot, bond,
+false));
 
ocelot_rmw_gix(ocelot,
   ANA_PORT_PORT_CFG_PORTID_VAL(lag),
@@ -1357,6 +1367,21 @@ int ocelot_port_lag_leave(struct ocelot *ocelot, int 
port,
 }
 EXPORT_SYMBOL(ocelot_port_lag_lea

[RFC PATCH net-next 13/16] net: mscc: ocelot: rename aggr_count to num_ports_in_lag

2020-12-08 Thread Vladimir Oltean

It makes it a bit easier to read and understand the code that deals with
balancing the 16 aggregation codes among the ports in a certain LAG.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index d4dbba66aa65..d87e80a15ca5 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1263,8 +1263,8 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot)
 
/* Now, set PGIDs for each LAG */
for (lag = 0; lag < ocelot->num_phys_ports; lag++) {
+   int num_ports_in_lag = 0;
unsigned long bond_mask;
-   int aggr_count = 0;
u8 aggr_idx[16];
 
if (!bonds[lag])
@@ -1276,8 +1276,7 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot)
// Destination mask
ocelot_write_rix(ocelot, bond_mask,
 ANA_PGID_PGID, port);
-   aggr_idx[aggr_count] = port;
-   aggr_count++;
+   aggr_idx[num_ports_in_lag++] = port;
}
 
for_each_aggr_pgid(ocelot, i) {
@@ -1285,7 +1284,7 @@ static int ocelot_set_aggr_pgids(struct ocelot *ocelot)
 
ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i);
ac &= ~bond_mask;
-   ac |= BIT(aggr_idx[i % aggr_count]);
+   ac |= BIT(aggr_idx[i % num_ports_in_lag]);
ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);
}
 
-- 
2.25.1

[PATCH net-next] net: core: devlink: simplify the return expression of devlink_nl_cmd_trap_set_doit()

2020-12-08 Thread Zheng Yongjun

Simplify the return expression.

Signed-off-by: Zheng Yongjun 
---
 net/core/devlink.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 8c5ddffd707d..3f0a65ee0474 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -6981,7 +6981,6 @@ static int devlink_nl_cmd_trap_set_doit(struct sk_buff 
*skb,
struct netlink_ext_ack *extack = info->extack;
struct devlink *devlink = info->user_ptr[0];
struct devlink_trap_item *trap_item;
-   int err;
 
if (list_empty(&devlink->trap_list))
return -EOPNOTSUPP;
@@ -6992,11 +6991,7 @@ static int devlink_nl_cmd_trap_set_doit(struct sk_buff 
*skb,
return -ENOENT;
}
 
-   err = devlink_trap_action_set(devlink, trap_item, info);
-   if (err)
-   return err;
-
-   return 0;
+   return devlink_trap_action_set(devlink, trap_item, info);
 }
 
 static struct devlink_trap_group_item *
-- 
2.22.0

[RFC PATCH net-next 11/16] net: mscc: ocelot: set up logical port IDs centrally

2020-12-08 Thread Vladimir Oltean

The setup of logical port IDs is done in two places: from the inconclusively
named ocelot_setup_lag and from ocelot_port_lag_leave, a function that
also calls ocelot_setup_lag (which apparently does an incomplete setup
of the LAG).

To improve this situation, we can rename ocelot_setup_lag into
ocelot_setup_logical_port_ids, and drop the "lag" argument. It will now
set up the logical port IDs of all switch ports, which may be just
slightly more inefficient but more maintainable.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot.c | 47 ++
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index ee0fcee8e09a..1a98c24af056 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1279,20 +1279,36 @@ static void ocelot_set_aggr_pgids(struct ocelot *ocelot)
}
 }
 
-static void ocelot_setup_lag(struct ocelot *ocelot, int lag)
+/* When offloading a bonding interface, the switch ports configured under the
+ * same bond must have the same logical port ID, equal to the physical port ID
+ * of the lowest numbered physical port in that bond. Otherwise, in standalone/
+ * bridged mode, each port has a logical port ID equal to its physical port ID.
+ */
+static void ocelot_setup_logical_port_ids(struct ocelot *ocelot)
 {
-   unsigned long bond_mask = ocelot->lags[lag];
-   unsigned int p;
+   int port;
 
-   for_each_set_bit(p, &bond_mask, ocelot->num_phys_ports) {
-   u32 port_cfg = ocelot_read_gix(ocelot, ANA_PORT_PORT_CFG, p);
+   for (port = 0; port < ocelot->num_phys_ports; port++) {
+   struct ocelot_port *ocelot_port = ocelot->ports[port];
+   struct net_device *bond;
+
+   if (!ocelot_port)
+   continue;
 
-   port_cfg &= ~ANA_PORT_PORT_CFG_PORTID_VAL_M;
+   bond = ocelot_port->bond;
+   if (bond) {
+   int lag = __ffs(ocelot_get_bond_mask(ocelot, bond));
 
-   /* Use lag port as logical port for port i */
-   ocelot_write_gix(ocelot, port_cfg |
-ANA_PORT_PORT_CFG_PORTID_VAL(lag),
-ANA_PORT_PORT_CFG, p);
+   ocelot_rmw_gix(ocelot,
+  ANA_PORT_PORT_CFG_PORTID_VAL(lag),
+  ANA_PORT_PORT_CFG_PORTID_VAL_M,
+  ANA_PORT_PORT_CFG, port);
+   } else {
+   ocelot_rmw_gix(ocelot,
+  ANA_PORT_PORT_CFG_PORTID_VAL(port),
+  ANA_PORT_PORT_CFG_PORTID_VAL_M,
+  ANA_PORT_PORT_CFG, port);
+   }
}
 }
 
@@ -1320,7 +1336,7 @@ int ocelot_port_lag_join(struct ocelot *ocelot, int port,
ocelot->lags[lag] |= BIT(port);
}
 
-   ocelot_setup_lag(ocelot, lag);
+   ocelot_setup_logical_port_ids(ocelot);
ocelot_apply_bridge_fwd_mask(ocelot);
ocelot_set_aggr_pgids(ocelot);
 
@@ -1331,7 +1347,6 @@ EXPORT_SYMBOL(ocelot_port_lag_join);
 void ocelot_port_lag_leave(struct ocelot *ocelot, int port,
   struct net_device *bond)
 {
-   u32 port_cfg;
int i;
 
ocelot->ports[port]->bond = NULL;
@@ -1348,15 +1363,9 @@ void ocelot_port_lag_leave(struct ocelot *ocelot, int 
port,
 
ocelot->lags[n] = ocelot->lags[port];
ocelot->lags[port] = 0;
-
-   ocelot_setup_lag(ocelot, n);
}
 
-   port_cfg = ocelot_read_gix(ocelot, ANA_PORT_PORT_CFG, port);
-   port_cfg &= ~ANA_PORT_PORT_CFG_PORTID_VAL_M;
-   ocelot_write_gix(ocelot, port_cfg | ANA_PORT_PORT_CFG_PORTID_VAL(port),
-ANA_PORT_PORT_CFG, port);
-
+   ocelot_setup_logical_port_ids(ocelot);
ocelot_apply_bridge_fwd_mask(ocelot);
ocelot_set_aggr_pgids(ocelot);
 }
-- 
2.25.1

KMSAN: uninit-value in smsc95xx_wait_eeprom (2)

2020-12-08 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:73d62e81 kmsan: random: prevent boot-time reports in _mix_..
git tree:   https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=178d246b50
kernel config:  https://syzkaller.appspot.com/x/.config?x=eef728deea880383
dashboard link: https://syzkaller.appspot.com/bug?extid=94b1393490c2c70b781b
compiler:   clang version 11.0.0 (https://github.com/llvm/llvm-project.git 
ca2dcbd030eadbf0aa9b660efe864ff08af6e18b)

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+94b1393490c2c70b7...@syzkaller.appspotmail.com

=
BUG: KMSAN: uninit-value in smsc95xx_wait_eeprom+0x223/0x3e0 
drivers/net/usb/smsc95xx.c:303
CPU: 1 PID: 28836 Comm: kworker/1:1 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
 smsc95xx_wait_eeprom+0x223/0x3e0 drivers/net/usb/smsc95xx.c:303
 smsc95xx_read_eeprom+0x46d/0xa10 drivers/net/usb/smsc95xx.c:360
 smsc95xx_init_mac_address drivers/net/usb/smsc95xx.c:769 [inline]
 smsc95xx_bind+0x811/0x1d30 drivers/net/usb/smsc95xx.c:1090
 usbnet_probe+0x1169/0x3e90 drivers/net/usb/usbnet.c:1712
 usb_probe_interface+0xfcc/0x1520 drivers/usb/core/driver.c:396
 really_probe+0xebd/0x2420 drivers/base/dd.c:558
 driver_probe_device+0x293/0x390 drivers/base/dd.c:738
 __device_attach_driver+0x63f/0x830 drivers/base/dd.c:844
 bus_for_each_drv+0x2ca/0x3f0 drivers/base/bus.c:431
 __device_attach+0x538/0x860 drivers/base/dd.c:912
 device_initial_probe+0x4a/0x60 drivers/base/dd.c:959
 bus_probe_device+0x177/0x3d0 drivers/base/bus.c:491
 device_add+0x399e/0x3f20 drivers/base/core.c:2936
 usb_set_configuration+0x39cf/0x4010 drivers/usb/core/message.c:2159
 usb_generic_driver_probe+0x138/0x300 drivers/usb/core/generic.c:238
 usb_probe_device+0x317/0x570 drivers/usb/core/driver.c:293
 really_probe+0xebd/0x2420 drivers/base/dd.c:558
 driver_probe_device+0x293/0x390 drivers/base/dd.c:738
 __device_attach_driver+0x63f/0x830 drivers/base/dd.c:844
 bus_for_each_drv+0x2ca/0x3f0 drivers/base/bus.c:431
 __device_attach+0x538/0x860 drivers/base/dd.c:912
 device_initial_probe+0x4a/0x60 drivers/base/dd.c:959
 bus_probe_device+0x177/0x3d0 drivers/base/bus.c:491
 device_add+0x399e/0x3f20 drivers/base/core.c:2936
 usb_new_device+0x1bd6/0x2a30 drivers/usb/core/hub.c:2554
 hub_port_connect drivers/usb/core/hub.c:5222 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5362 [inline]
 port_event drivers/usb/core/hub.c:5508 [inline]
 hub_event+0x5bc9/0x8890 drivers/usb/core/hub.c:5590
 process_one_work+0x121c/0x1fc0 kernel/workqueue.c:2272
 worker_thread+0x10cc/0x2740 kernel/workqueue.c:2418
 kthread+0x51c/0x560 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296

Local variable buf.i.i@smsc95xx_wait_eeprom created at:
 __smsc95xx_read_reg drivers/net/usb/smsc95xx.c:77 [inline]
 smsc95xx_read_reg drivers/net/usb/smsc95xx.c:141 [inline]
 smsc95xx_wait_eeprom+0x9d/0x3e0 drivers/net/usb/smsc95xx.c:297
 __smsc95xx_read_reg drivers/net/usb/smsc95xx.c:77 [inline]
 smsc95xx_read_reg drivers/net/usb/smsc95xx.c:141 [inline]
 smsc95xx_wait_eeprom+0x9d/0x3e0 drivers/net/usb/smsc95xx.c:297
=


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

[PATCH net-next] net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()

2020-12-08 Thread Zheng Yongjun

Simplify the return expression.

Signed-off-by: Zheng Yongjun 
---
 net/openvswitch/conntrack.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4beb96139d77..96a49aa3a128 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -2025,15 +2025,11 @@ static int ovs_ct_limit_get_default_limit(struct 
ovs_ct_limit_info *info,
  struct sk_buff *reply)
 {
struct ovs_zone_limit zone_limit;
-   int err;
 
zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE;
zone_limit.limit = info->default_limit;
-   err = nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit);
-   if (err)
-   return err;
 
-   return 0;
+   return nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit);
 }
 
 static int __ovs_ct_limit_get_zone_limit(struct net *net,
-- 
2.22.0

RE: [PATCH v4 1/6] igb: XDP xmit back fix error code

2020-12-08 Thread Penigalapati, Sandeep

> From: sven.auha...@voleatech.de 
> Sent: Wednesday, November 11, 2020 10:35 PM
> To: Nguyen, Anthony L ; Fijalkowski, Maciej
> ; k...@kernel.org
> Cc: da...@davemloft.net; intel-wired-...@lists.osuosl.org;
> netdev@vger.kernel.org; nhor...@redhat.com; sassm...@redhat.com;
> Penigalapati, Sandeep ;
> bro...@redhat.com; pmen...@molgen.mpg.de
> Subject: [PATCH v4 1/6] igb: XDP xmit back fix error code
> 
> From: Sven Auhagen 
> 
> The igb XDP xmit back function should only return defined error codes.
> 
> Reported-by: Dan Carpenter 
> Acked-by: Maciej Fijalkowski 
> Signed-off-by: Sven Auhagen 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
Tested-by: Sandeep Penigalapati

RE: [PATCH v4 4/6] igb: skb add metasize for xdp

2020-12-08 Thread Penigalapati, Sandeep

> From: sven.auha...@voleatech.de 
> Sent: Wednesday, November 11, 2020 10:35 PM
> To: Nguyen, Anthony L ; Fijalkowski, Maciej
> ; k...@kernel.org
> Cc: da...@davemloft.net; intel-wired-...@lists.osuosl.org;
> netdev@vger.kernel.org; nhor...@redhat.com; sassm...@redhat.com;
> Penigalapati, Sandeep ;
> bro...@redhat.com; pmen...@molgen.mpg.de
> Subject: [PATCH v4 4/6] igb: skb add metasize for xdp
> 
> From: Sven Auhagen 
> 
> add metasize if it is set in xdp
> 
> Suggested-by: Maciej Fijalkowski 
> Reviewed-by: Maciej Fijalkowski 
> Acked-by: Maciej Fijalkowski 
> Signed-off-by: Sven Auhagen 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 4 
>  1 file changed, 4 insertions(+)
> 
Tested-by: Sandeep Penigalapati

Re: [PATCH] net: 8021q: vlan: reduce noise in driver initialization

2020-12-08 Thread kernel test robot

Hi "Enrico,

I love your patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on net-next/master net/master linus/master v5.10-rc7 
next-20201207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Enrico-Weigelt-metux-IT-consult/net-8021q-vlan-reduce-noise-in-driver-initialization/20201208-165821
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
09162bc32c880a791c6c0668ce0745cf7958f576
config: i386-randconfig-s001-20201208 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-179-ga00755aa-dirty
# 
https://github.com/0day-ci/linux/commit/7c73ca17c3872132d7bd1b9407a26dd5ed916e2c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Enrico-Weigelt-metux-IT-consult/net-8021q-vlan-reduce-noise-in-driver-initialization/20201208-165821
git checkout 7c73ca17c3872132d7bd1b9407a26dd5ed916e2c
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   ld: net/8021q/vlan_dev.o: in function `strlcpy':
>> include/linux/string.h:346: undefined reference to `vlan_fullname'
>> ld: include/linux/string.h:346: undefined reference to `vlan_version'

vim +346 include/linux/string.h

6974f0c4555e28 Daniel Micay  2017-07-12  337  
6974f0c4555e28 Daniel Micay  2017-07-12  338  /* defined after fortified strlen 
to reuse it */
6974f0c4555e28 Daniel Micay  2017-07-12  339  extern size_t __real_strlcpy(char 
*, const char *, size_t) __RENAME(strlcpy);
6974f0c4555e28 Daniel Micay  2017-07-12  340  __FORTIFY_INLINE size_t 
strlcpy(char *p, const char *q, size_t size)
6974f0c4555e28 Daniel Micay  2017-07-12  341  {
6974f0c4555e28 Daniel Micay  2017-07-12  342size_t ret;
6974f0c4555e28 Daniel Micay  2017-07-12  343size_t p_size = 
__builtin_object_size(p, 0);
6974f0c4555e28 Daniel Micay  2017-07-12  344size_t q_size = 
__builtin_object_size(q, 0);
6974f0c4555e28 Daniel Micay  2017-07-12  345if (p_size == (size_t)-1 && 
q_size == (size_t)-1)
6974f0c4555e28 Daniel Micay  2017-07-12 @346return 
__real_strlcpy(p, q, size);
6974f0c4555e28 Daniel Micay  2017-07-12  347ret = strlen(q);
6974f0c4555e28 Daniel Micay  2017-07-12  348if (size) {
6974f0c4555e28 Daniel Micay  2017-07-12  349size_t len = (ret >= 
size) ? size - 1 : ret;
6974f0c4555e28 Daniel Micay  2017-07-12  350if 
(__builtin_constant_p(len) && len >= p_size)
6974f0c4555e28 Daniel Micay  2017-07-12  351
__write_overflow();
6974f0c4555e28 Daniel Micay  2017-07-12  352if (len >= p_size)
6974f0c4555e28 Daniel Micay  2017-07-12  353
fortify_panic(__func__);
47227d27e2fcb0 Daniel Axtens 2020-06-03  354__underlying_memcpy(p, 
q, len);
6974f0c4555e28 Daniel Micay  2017-07-12  355p[len] = '\0';
6974f0c4555e28 Daniel Micay  2017-07-12  356}
6974f0c4555e28 Daniel Micay  2017-07-12  357return ret;
6974f0c4555e28 Daniel Micay  2017-07-12  358  }
6974f0c4555e28 Daniel Micay  2017-07-12  359  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [net 3/3] can: isotp: add SF_BROADCAST support for functional addressing

2020-12-08 Thread Oliver Hartkopp





On 05.12.20 22:09, Jakub Kicinski wrote:

On Sat, 5 Dec 2020 21:56:33 +0100 Marc Kleine-Budde wrote:

On 12/5/20 9:33 PM, Jakub Kicinski wrote:

What about the (incremental?) change that Thomas Wagner posted?

https://lore.kernel.org/r/20201204135557.55599-1-th...@web.de


That settles it :) This change needs to got into -next and 5.11.


Ok. Can you take patch 1, which is a real fix:

https://lore.kernel.org/linux-can/20201204133508.742120-2-...@pengutronix.de/


Sure! Applied that one from the ML (I assumed that's what you meant).



I just double-checked this mail and in fact the second patch from Marc's 
pull request was a real fix too:


https://lore.kernel.org/linux-can/20201204133508.742120-3-...@pengutronix.de/

Btw. the missing feature which was added for completeness of the ISOTP 
implementation has now also integrated the improvement suggested by 
Thomas Wagner:


https://lore.kernel.org/linux-can/20201206144731.4609-1-socket...@hartkopp.net/T/#u

Would be cool if it could go into the initial iso-tp contribution as 
5.10 becomes a long-term kernel.


But I don't want to be pushy - treat it as your like.

Many thanks,
Oliver

[PATCH 1/1] mwifiex: Fix possible buffer overflows in mwifiex_config_scan

2020-12-08 Thread Xiaohui Zhang

From: Zhang Xiaohui 

mwifiex_config_scan() calls memcpy() without checking
the destination size may trigger a buffer overflower,
which a local user could use to cause denial of service
or the execution of arbitrary code.
Fix it by putting the length check before calling memcpy().

Signed-off-by: Zhang Xiaohui 
---
 drivers/net/wireless/marvell/mwifiex/scan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c 
b/drivers/net/wireless/marvell/mwifiex/scan.c
index c2a685f63..b1d90678f 100644
--- a/drivers/net/wireless/marvell/mwifiex/scan.c
+++ b/drivers/net/wireless/marvell/mwifiex/scan.c
@@ -930,6 +930,8 @@ mwifiex_config_scan(struct mwifiex_private *priv,
"DIRECT-", 7))
wildcard_ssid_tlv->max_ssid_length = 0xfe;
 
+   if (ssid_len > 1)
+   ssid_len = 1;
memcpy(wildcard_ssid_tlv->ssid,
   user_scan_in->ssid_list[i].ssid, ssid_len);
 
-- 
2.17.1

Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support

2020-12-08 Thread Richard Cochran

On Mon, Dec 07, 2020 at 12:42:33PM -0800, Jakub Kicinski wrote:

> The behavior is not entirely dissimilar to the time stamps on
> multi-layered devices (e.g. DSA switches). The time stamp can either 
> be generated when the packet enters the device (current mlx5 behavior)
> or when it actually egresses thru the MAC (what this set adds).

To be useful, the time stamps must be taken on the external ports.
Generating the time stamp at the DMA reception in the device doesn't
even make sense, unless the delay through the device is constant.

> My main concern is the user friendliness. I think there is no question
> that user running ptp4l would want this mlx5 knob to be enabled.

Right.

> Would
> we rather see a patch to ptp4l that turns per driver knob or should we
> shoot for some form of an API that tells the kernel that we're
> expecting ns level time accuracy? 

This is a hardware-specific "feature".  One of the guiding principles
of the linuxptp user space stack is not to become a catalog of
workarounds for random hardware.  IMO the kernel's API should not
encourage "special" hardware either.  After all, we have lots and lots
of PTP hardware supported, all of them already working with the
existing API just fine.

My preference is for a global knob for users of this hardware, either

- a compile time Kconfig option on the driver, or
- some kind of sysctl/debugfs knob

Thanks,
Richard

Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread David Howells

I wonder - would it make sense to reserve two arrays of scatterlist structs
and a mutex per CPU sufficient to map up to 1MiB of pages with each array
while the krb5 service is in use?

That way sunrpc could, say, grab the mutex, map the input and output buffers,
do the entire crypto op in one go and then release the mutex - at least for
big ops, small ops needn't use this service.

For rxrpc/afs's use case this would probably be overkill - it's doing crypto
on each packet, not on whole operations - but I could still make use of it
there.

However, that then limits the maximum size of an op to 1MiB, plus dangly bits
on either side (which can be managed with chained scatterlist structs) and
also limits the number of large simultaneous krb5 crypto ops we can do.

David

Re: [PATCH v5 bpf-next 02/14] xdp: initialize xdp_buff mb bit to 0 in all XDP drivers

2020-12-08 Thread Jesper Dangaard Brouer

On Tue, 8 Dec 2020 11:31:03 +0100
Lorenzo Bianconi  wrote:

> > On Mon, 2020-12-07 at 22:37 +0100, Maciej Fijalkowski wrote:  
> > > On Mon, Dec 07, 2020 at 01:15:00PM -0800, Alexander Duyck wrote:  
> > > > On Mon, Dec 7, 2020 at 8:36 AM Lorenzo Bianconi  > > > > wrote:
> > > > > Initialize multi-buffer bit (mb) to 0 in all XDP-capable drivers.
> > > > > This is a preliminary patch to enable xdp multi-buffer support.
> > > > > 
> > > > > Signed-off-by: Lorenzo Bianconi   
> > > > 
> > > > I'm really not a fan of this design. Having to update every driver in
> > > > order to initialize a field that was fragmented is a pain. At a
> > > > minimum it seems like it might be time to consider introducing some
> > > > sort of initializer function for this so that you can update things in
> > > > one central place the next time you have to add a new field instead of
> > > > having to update every individual driver that supports XDP. Otherwise
> > > > this isn't going to scale going forward.  

+1

> > > Also, a good example of why this might be bothering for us is a fact that
> > > in the meantime the dpaa driver got XDP support and this patch hasn't been
> > > updated to include mb setting in that driver.
> > >   
> > something like
> > init_xdp_buff(hard_start, headroom, len, frame_sz, rxq);
> >
> > would work for most of the drivers.
> >   
> 
> ack, agree. I will add init_xdp_buff() in v6.

I do like the idea of an initialize helper function.
Remember this is fast-path code and likely need to be inlined.

Further more, remember that drivers can and do optimize the number of
writes they do to xdp_buff.   There are a number of fields in xdp_buff
that only need to be initialized once per NAPI.  E.g. rxq and frame_sz
(some driver do change frame_sz per packet).  Thus, you likely need two
inlined helpers for init.

Again, remember that C-compiler will generate an expensive operation
(rep stos) for clearing a struct if it is initialized like this, where
all member are not initialized (do NOT do this):

 struct xdp_buff xdp = {
   .rxq = rxq,
   .frame_sz = PAGE_SIZE,
 };

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH net-next 2/4] net: mvpp2: add mvpp2_phylink_to_port() helper

2020-12-08 Thread Sasha Levin


On Tue, Dec 08, 2020 at 01:03:38PM +0100, Marcin Wojtas wrote:

Hi Greg,

Apologies for delayed response:.


pon., 2 lis 2020 o 19:02 Greg Kroah-Hartman
 napisał(a):


On Mon, Nov 02, 2020 at 06:38:54PM +0100, Marcin Wojtas wrote:
> Hi Greg and Sasha,
>
> pt., 9 paź 2020 o 05:43 Marcin Wojtas  napisał(a):
> >
> > Hi,
> >
> > sob., 20 cze 2020 o 11:21 Russell King  
napisał(a):
> > >
> > > Add a helper to convert the struct phylink_config pointer passed in
> > > from phylink to the drivers internal struct mvpp2_port.
> > >
> > > Signed-off-by: Russell King 
> > > ---
> > >  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 29 +--
> > >  1 file changed, 14 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c 
b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > index 7653277d03b7..313f5a60a605 100644
> > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > @@ -4767,12 +4767,16 @@ static void mvpp2_port_copy_mac_addr(struct 
net_device *dev, struct mvpp2 *priv,
> > > eth_hw_addr_random(dev);
> > >  }
> > >
> > > +static struct mvpp2_port *mvpp2_phylink_to_port(struct phylink_config 
*config)
> > > +{
> > > +   return container_of(config, struct mvpp2_port, phylink_config);
> > > +}
> > > +
> > >  static void mvpp2_phylink_validate(struct phylink_config *config,
> > >unsigned long *supported,
> > >struct phylink_link_state *state)
> > >  {
> > > -   struct mvpp2_port *port = container_of(config, struct mvpp2_port,
> > > -  phylink_config);
> > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > > __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
> > >
> > > /* Invalid combinations */
> > > @@ -4913,8 +4917,7 @@ static void mvpp2_gmac_pcs_get_state(struct 
mvpp2_port *port,
> > >  static void mvpp2_phylink_mac_pcs_get_state(struct phylink_config 
*config,
> > > struct phylink_link_state 
*state)
> > >  {
> > > -   struct mvpp2_port *port = container_of(config, struct mvpp2_port,
> > > -  phylink_config);
> > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > >
> > > if (port->priv->hw_version == MVPP22 && port->gop_id == 0) {
> > > u32 mode = readl(port->base + MVPP22_XLG_CTRL3_REG);
> > > @@ -4931,8 +4934,7 @@ static void mvpp2_phylink_mac_pcs_get_state(struct 
phylink_config *config,
> > >
> > >  static void mvpp2_mac_an_restart(struct phylink_config *config)
> > >  {
> > > -   struct mvpp2_port *port = container_of(config, struct mvpp2_port,
> > > -  phylink_config);
> > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > > u32 val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> > >
> > > writel(val | MVPP2_GMAC_IN_BAND_RESTART_AN,
> > > @@ -5105,13 +5107,12 @@ static void mvpp2_gmac_config(struct mvpp2_port 
*port, unsigned int mode,
> > >  static void mvpp2_mac_config(struct phylink_config *config, unsigned int 
mode,
> > >  const struct phylink_link_state *state)
> > >  {
> > > -   struct net_device *dev = to_net_dev(config->dev);
> > > -   struct mvpp2_port *port = netdev_priv(dev);
> > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > > bool change_interface = port->phy_interface != state->interface;
> > >
> > > /* Check for invalid configuration */
> > > if (mvpp2_is_xlg(state->interface) && port->gop_id != 0) {
> > > -   netdev_err(dev, "Invalid mode on %s\n", dev->name);
> > > +   netdev_err(port->dev, "Invalid mode on %s\n", 
port->dev->name);
> > > return;
> > > }
> > >
> > > @@ -5151,8 +5152,7 @@ static void mvpp2_mac_link_up(struct phylink_config 
*config,
> > >   int speed, int duplex,
> > >   bool tx_pause, bool rx_pause)
> > >  {
> > > -   struct net_device *dev = to_net_dev(config->dev);
> > > -   struct mvpp2_port *port = netdev_priv(dev);
> > > +   struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > > u32 val;
> > >
> > > if (mvpp2_is_xlg(interface)) {
> > > @@ -5199,14 +5199,13 @@ static void mvpp2_mac_link_up(struct 
phylink_config *config,
> > >
> > > mvpp2_egress_enable(port);
> > > mvpp2_ingress_enable(port);
> > > -   netif_tx_wake_all_queues(dev);
> > > +   netif_tx_wake_all_queues(port->dev);
> > >  }
> > >
> > >  static void mvpp2_mac_link_down(struct phylink_config *config,
> > > unsigned int mode, phy_interface_t 
interface)
> > >  {
> > > -   struct net_device *dev = to_net_dev(config->

[PATCH net-next] drivers: net: ionic: simplify the return expression of ionic_set_rxfh()

2020-12-08 Thread Zheng Yongjun

Simplify the return expression.

Signed-off-by: Zheng Yongjun 
---
 drivers/net/ethernet/pensando/ionic/ionic_ethtool.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c 
b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
index 35c72d4a78b3..0832bedcb3b4 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
@@ -738,16 +738,11 @@ static int ionic_set_rxfh(struct net_device *netdev, 
const u32 *indir,
  const u8 *key, const u8 hfunc)
 {
struct ionic_lif *lif = netdev_priv(netdev);
-   int err;
 
if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
return -EOPNOTSUPP;
 
-   err = ionic_lif_rss_config(lif, lif->rss_types, key, indir);
-   if (err)
-   return err;
-
-   return 0;
+   return ionic_lif_rss_config(lif, lif->rss_types, key, indir);
 }
 
 static int ionic_set_tunable(struct net_device *dev,
-- 
2.22.0

[PATCH net-next] drivers: net: qlcnic: simplify the return expression of qlcnic_sriov_vf_shutdown()

2020-12-08 Thread Zheng Yongjun

Simplify the return expression.

Signed-off-by: Zheng Yongjun 
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
index 30e52f969759..dd03be3fc82a 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
@@ -2112,7 +2112,6 @@ static int qlcnic_sriov_vf_shutdown(struct pci_dev *pdev)
 {
struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
struct net_device *netdev = adapter->netdev;
-   int retval;
 
netif_device_detach(netdev);
qlcnic_cancel_idc_work(adapter);
@@ -2125,11 +2124,7 @@ static int qlcnic_sriov_vf_shutdown(struct pci_dev *pdev)
qlcnic_83xx_disable_mbx_intr(adapter);
cancel_delayed_work_sync(&adapter->idc_aen_work);
 
-   retval = pci_save_state(pdev);
-   if (retval)
-   return retval;
-
-   return 0;
+   return pci_save_state(pdev);
 }
 
 static int qlcnic_sriov_vf_resume(struct qlcnic_adapter *adapter)
-- 
2.22.0

[PATCH net-next] net/mlx4: simplify the return expression of mlx4_init_cq_table()

2020-12-08 Thread Zheng Yongjun

Simplify the return expression.

Signed-off-by: Zheng Yongjun 
---
 drivers/net/ethernet/mellanox/mlx4/cq.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c 
b/drivers/net/ethernet/mellanox/mlx4/cq.c
index 3b8576b9c2f9..68bd18ee6ee3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -462,19 +462,14 @@ EXPORT_SYMBOL_GPL(mlx4_cq_free);
 int mlx4_init_cq_table(struct mlx4_dev *dev)
 {
struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table;
-   int err;
 
spin_lock_init(&cq_table->lock);
INIT_RADIX_TREE(&cq_table->tree, GFP_ATOMIC);
if (mlx4_is_slave(dev))
return 0;
 
-   err = mlx4_bitmap_init(&cq_table->bitmap, dev->caps.num_cqs,
-  dev->caps.num_cqs - 1, dev->caps.reserved_cqs, 
0);
-   if (err)
-   return err;
-
-   return 0;
+   return mlx4_bitmap_init(&cq_table->bitmap, dev->caps.num_cqs,
+   dev->caps.num_cqs - 1, dev->caps.reserved_cqs, 
0);
 }
 
 void mlx4_cleanup_cq_table(struct mlx4_dev *dev)
-- 
2.22.0

Re: [PATCH net-next] net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()

2020-12-08 Thread Eelco Chaudron




On 8 Dec 2020, at 13:13, Zheng Yongjun wrote:

> Simplify the return expression.
>
> Signed-off-by: Zheng Yongjun 

Change looks good to me.

Reviewed-by: Eelco Chaudron

[PATCH net-next] net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()

2020-12-08 Thread Zheng Yongjun

Simplify the return expression.

Signed-off-by: Zheng Yongjun 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index c9c2962ad49f..786d2fc4b403 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1893,13 +1893,8 @@ void esw_offloads_unload_rep(struct mlx5_eswitch *esw, 
u16 vport_num)
 static int mlx5_esw_offloads_pair(struct mlx5_eswitch *esw,
  struct mlx5_eswitch *peer_esw)
 {
-   int err;
 
-   err = esw_add_fdb_peer_miss_rules(esw, peer_esw->dev);
-   if (err)
-   return err;
-
-   return 0;
+   return esw_add_fdb_peer_miss_rules(esw, peer_esw->dev);
 }
 
 static void mlx5_esw_offloads_unpair(struct mlx5_eswitch *esw)
-- 
2.22.0

[PATCH net-next] net: atheros: simplify the return expression of atl2_phy_setup_autoneg_adv()

2020-12-08 Thread Zheng Yongjun

Simplify the return expression.

Signed-off-by: Zheng Yongjun 
---
 drivers/net/ethernet/atheros/atlx/atl2.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atlx/atl2.c 
b/drivers/net/ethernet/atheros/atlx/atl2.c
index 7b80d924632a..f016f2e12ee7 100644
--- a/drivers/net/ethernet/atheros/atlx/atl2.c
+++ b/drivers/net/ethernet/atheros/atlx/atl2.c
@@ -2549,7 +2549,6 @@ static s32 atl2_write_phy_reg(struct atl2_hw *hw, u32 
reg_addr, u16 phy_data)
  */
 static s32 atl2_phy_setup_autoneg_adv(struct atl2_hw *hw)
 {
-   s32 ret_val;
s16 mii_autoneg_adv_reg;
 
/* Read the MII Auto-Neg Advertisement Register (Address 4). */
@@ -2605,12 +2604,7 @@ static s32 atl2_phy_setup_autoneg_adv(struct atl2_hw *hw)
 
hw->mii_autoneg_adv_reg = mii_autoneg_adv_reg;
 
-   ret_val = atl2_write_phy_reg(hw, MII_ADVERTISE, mii_autoneg_adv_reg);
-
-   if (ret_val)
-   return ret_val;
-
-   return 0;
+   return atl2_write_phy_reg(hw, MII_ADVERTISE, mii_autoneg_adv_reg);
 }
 
 /*
-- 
2.22.0

Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread David Howells

David Howells  wrote:

> I wonder - would it make sense to reserve two arrays of scatterlist structs
> and a mutex per CPU sufficient to map up to 1MiB of pages with each array
> while the krb5 service is in use?

Actually, simply reserving a set per CPU is probably unnecessary.  We could,
say, set a minimum and a maximum on the reservations (say 2 -> 2*nr_cpus) and
then allocate new ones when we run out.  Then let the memory shrinker clean
them up off an lru list.

David

Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread Ard Biesheuvel

On Tue, 8 Dec 2020 at 14:25, David Howells  wrote:
>
> I wonder - would it make sense to reserve two arrays of scatterlist structs
> and a mutex per CPU sufficient to map up to 1MiB of pages with each array
> while the krb5 service is in use?
>
> That way sunrpc could, say, grab the mutex, map the input and output buffers,
> do the entire crypto op in one go and then release the mutex - at least for
> big ops, small ops needn't use this service.
>
> For rxrpc/afs's use case this would probably be overkill - it's doing crypto
> on each packet, not on whole operations - but I could still make use of it
> there.
>
> However, that then limits the maximum size of an op to 1MiB, plus dangly bits
> on either side (which can be managed with chained scatterlist structs) and
> also limits the number of large simultaneous krb5 crypto ops we can do.
>

Apparently, it is permitted for gss_krb5_cts_crypt() to do a
kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is
being invoked, and so I don't see why it wouldn't be possible to
simply kmalloc() a scatterlist[] of the appropriate size, populate it
with all the pages, bufs and whatever else gets passed into the
skcipher, and pass it into the skcipher in one go.

[PATCH v2 net-next 0/2] nfc: s3fwrn5: Change I2C interrupt trigger to EDGE_RISING

2020-12-08 Thread Bongsu Jeon

From: Bongsu Jeon 

For stable Samsung's I2C interrupt handling, I changed the interrupt 
trigger from IRQ_TYPE_LEVEL_HIGH to IRQ_TYPE_EDGE_RISING and removed 
the hard coded interrupt trigger type in the i2c module for the flexible 
control.

1/2 is the changed dt binding for the edge rising trigger.
2/2 is to remove the hard coded interrupt trigger type in the i2c module.

ChangeLog:
 v2:
  2/2
   - remove the hard coded interrupt trigger type.

Bongsu Jeon (2):
  dt-bindings: net: nfc: s3fwrn5: Change I2C interrupt trigger to
EDGE_RISING
  nfc: s3fwrn5: Remove hard coded interrupt trigger type from the i2c
module

 .../devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml  | 2 +-
 drivers/nfc/s3fwrn5/i2c.c | 8 +++-
 2 files changed, 8 insertions(+), 2 deletions(-)

-- 
2.17.1

[PATCH v2 net-next 1/2] dt-bindings: net: nfc: s3fwrn5: Change I2C interrupt trigger type

2020-12-08 Thread Bongsu Jeon

From: Bongsu Jeon 

Change interrupt trigger from IRQ_TYPE_LEVEL_HIGH to IRQ_TYPE_EDGE_RISING
 for stable NFC I2C interrupt handling.
Samsung's NFC Firmware sends an i2c frame as below.
1. NFC Firmware sets the GPIO(interrupt pin) high when there is an i2c
 frame to send.
2. If the CPU's I2C master has received the i2c frame, NFC F/W sets the
GPIO low.
NFC driver's i2c interrupt handler would be called in the abnormal case
as the NFC FW task of number 2 is delayed because of other high priority
tasks.
In that case, NFC driver will try to receive the i2c frame but there isn't
 any i2c frame to send in NFC.
It would cause an I2C communication problem. This case would hardly happen.
But, I changed the interrupt as a defense code.
If Driver uses the TRIGGER_RISING instead of the LEVEL trigger,
there would be no problem even if the NFC FW task is delayed.

Signed-off-by: Bongsu Jeon 
---
 Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml 
b/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml
index ca3904bf90e0..477066e2b821 100644
--- a/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml
+++ b/Documentation/devicetree/bindings/net/nfc/samsung,s3fwrn5.yaml
@@ -76,7 +76,7 @@ examples:
 reg = <0x27>;
 
 interrupt-parent = <&gpa1>;
-interrupts = <3 IRQ_TYPE_LEVEL_HIGH>;
+interrupts = <3 IRQ_TYPE_EDGE_RISING>;
 
 en-gpios = <&gpf1 4 GPIO_ACTIVE_HIGH>;
 wake-gpios = <&gpj0 2 GPIO_ACTIVE_HIGH>;
-- 
2.17.1

[PATCH v2 net-next 2/2] nfc: s3fwrn5: Remove hard coded interrupt trigger type from the i2c module

2020-12-08 Thread Bongsu Jeon

From: Bongsu Jeon 

For the flexible control of interrupt trigger type, remove the hard coded
interrupt trigger type in the i2c module. The trigger type will be loaded
 from a dts.

Signed-off-by: Bongsu Jeon 
---
 drivers/nfc/s3fwrn5/i2c.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nfc/s3fwrn5/i2c.c b/drivers/nfc/s3fwrn5/i2c.c
index e1bdde105f24..42f1f610ac2c 100644
--- a/drivers/nfc/s3fwrn5/i2c.c
+++ b/drivers/nfc/s3fwrn5/i2c.c
@@ -179,6 +179,8 @@ static int s3fwrn5_i2c_probe(struct i2c_client *client,
  const struct i2c_device_id *id)
 {
struct s3fwrn5_i2c_phy *phy;
+   struct irq_data *irq_data;
+   unsigned long irqflags;
int ret;
 
phy = devm_kzalloc(&client->dev, sizeof(*phy), GFP_KERNEL);
@@ -212,8 +214,11 @@ static int s3fwrn5_i2c_probe(struct i2c_client *client,
if (ret < 0)
return ret;
 
+   irq_data = irq_get_irq_data(client->irq);
+   irqflags = irqd_get_trigger_type(irq_data) | IRQF_ONESHOT;
+
ret = devm_request_threaded_irq(&client->dev, phy->i2c_dev->irq, NULL,
-   s3fwrn5_i2c_irq_thread_fn, IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
+   s3fwrn5_i2c_irq_thread_fn, irqflags,
S3FWRN5_I2C_DRIVER_NAME, phy);
if (ret)
s3fwrn5_remove(phy->common.ndev);
-- 
2.17.1

Re: Why the auxiliary cipher in gss_krb5_crypto.c?

2020-12-08 Thread David Howells

Ard Biesheuvel  wrote:

> Apparently, it is permitted for gss_krb5_cts_crypt() to do a
> kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is
> being invoked, and so I don't see why it wouldn't be possible to
> simply kmalloc() a scatterlist[] of the appropriate size, populate it
> with all the pages, bufs and whatever else gets passed into the
> skcipher, and pass it into the skcipher in one go.

I never said it wasn't possible.  But doing a pair of order-1 allocations from
there might have a significant detrimental effect on performance - in which
case Trond and co. will say "no".

Remember: to crypt 1MiB of data on a 64-bit machine requires 2 x minimum 8KiB
scatterlist arrays.  That's assuming the pages in the middle are contiguous,
which might not be the case for a direct I/O read/write.  So for the DIO case,
it could be involve an order-2 allocation (or chaining of single pages).

David

Re: [PATCH v2] net: dsa: ksz8795: adjust CPU link to host interface

2020-12-08 Thread Andrew Lunn

> > Hi Jean
> >
> > I never said i was too specific to your board. There are other boards
> > using different switches like this. This is where the commit message
> > is so important. Without understanding Why? it is hard to point you in
> > the right direction.
> >
> > So you setup is:
> >
> > SoC - MAC - PHY - PHY - MAC - Switch.
> >
> > The SoC MAC driver is looking after the first PHY?
> 
> No, the connection is at the MAC level, via RGMII but it is missing the MDC/
> MDIO signals. That means we have to fix the auto-neg parameters from the DT.

So the PHY is there, but you cannot talk to it? It has strapping
resisters to make it auto-neg to the other PHY?

Some switches default their CPU port to the maximum speed the port can
do. But not all do. It is worth checking that.

> On the 4.14 LTS kernel we are working with, the setup of the parameters is 
> done
> via adjust_link. Since the phylink conversion adjust_link is not required
> anymore, is this correct?

4.14 is dead in terms of development. Anything you contribute needs to
be for net-next, and then you need to figure out how to backport
it. Using v5.4 will help with that, since it is much closer, and v5.10
will be LTS. Given the change to phylink, you probably want as new a
kernel as possible. If you put a fixed-link property in the "CPU"
node, phylink should do the right thing for you.

  Andrew

1 2 3 4 >

1 - 100 of 335 matches

Mail list logo