Re: [PATCH net-next v2 01/12] gtp: set initial MTU

2020-12-12 Thread Harald Welte
On Fri, Dec 11, 2020 at 01:26:01PM +0100, Jonas Bonn wrote:
> The GTP link is brought up with a default MTU of zero.  This can lead to
> some rather unexpected behaviour for users who are more accustomed to
> interfaces coming online with reasonable defaults.
> 
> This patch sets an initial MTU for the GTP link of 1500 less worst-case
> tunnel overhead.

Thanks, LGTM.  I would probably have gone to a #define or a 'const' variable,
but I guess compilers should be smart enough to figure out that this is
static at compile time even the way you wrote it.

Acked-by: Harald Welte 

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH] net: check skb partial checksum offset after trim

2020-12-12 Thread Vasily Averin
On 12/11/20 6:37 PM, Vasily Averin wrote:
> It seems for me the similar problem can happen in __skb_trim_rcsum().
> Also I doubt that that skb_checksum_start_offset(skb) checks in 
> __skb_postpull_rcsum() and skb_csum_unnecessary() are correct,
> becasue they do not guarantee that skb have correct CHECKSUM_PARTIAL.
> Could somebody confirm it?

I've rechecked the code and I think now that other places are not affected,
i.e. skb_push_rcsum() only should be patched.

Thank you,
Vasily Averin


Re: [kbuild-all] Re: [PATCH 1/3] Add TX sending hardware timestamp.

2020-12-12 Thread Philip Li
On Thu, Dec 10, 2020 at 12:41:32PM +, Geva, Erez wrote:
> 
> On 10/12/2020 04:11, kernel test robot wrote:
> > Hi Erez,
> > 
> > Thank you for the patch! Yet something to improve:
> > 
> Thanks for the robot,
> as we rarely use clang for kernel. It is very helpful.
> 
> > [auto build test ERROR on b65054597872ce3aefbc6a666385eabdf9e288da]
> > 
> > url:
> > https://github.com/0day-ci/linux/commits/Erez-Geva/Add-sending-TX-hardware-timestamp-for-TC-ETF-Qdisc/20201210-000521
> I can not find this commit
> 
> > base:b65054597872ce3aefbc6a666385eabdf9e288da
> > config: mips-randconfig-r026-20201209 (attached as .config)
> > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
> > 1968804ac726e7674d5de22bc2204b45857da344)
> However the clang in 
> https://download.01.org/0day-ci/cross-package/clang-latest/clang.tar.xz  is 
> version 11
Sorry that these are issues at our side, including the branch/commit missing.
The push to download.01.org failed and did not really work, we will look for
recovering them.

> 
> > reproduce (this is a W=1 build):
> >  wget 
> > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> > ~/bin/make.cross
> Your make cross script tries to download the clang every time.
> Please separate the download (which is ~400 MB and 2 GB after open) from the 
> compilation.
Hi Erez, thanks for your feedback, we will improve the reproduction
side per these suggestions.

> 
> Please use "wget" follow your own instructions in this email.
> 
> >  chmod +x ~/bin/make.cross
> >  # install mips cross compiling tool for clang build
> >  # apt-get install binutils-mips-linux-gnu
> >  # 
> > https://github.com/0day-ci/linux/commit/8a8f634bc74db16dc551cfcf3b63c1183f98eaac
> >  git remote add linux-review https://github.com/0day-ci/linux
> >  git fetch --no-tags linux-review 
> > Erez-Geva/Add-sending-TX-hardware-timestamp-for-TC-ETF-Qdisc/20201210-000521
> This branch is absent
> 
> >  git checkout 8a8f634bc74db16dc551cfcf3b63c1183f98eaac
> This commit as well
> 
> >  # save the attached .config to linux build tree
> >  COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
> > ARCH=mips
> > 
> I use Debian 10.7.
> I usually compile with GCC. I have not see any errors.
> 
> When I use clang 11 from download.01.org I get a crash right away.
> Please add a proper instructions how to use clang on Debian or provide 
> a Docker container with updated clang for testing.
> 
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot 
> > 
> > All errors (new ones prefixed by >>):
> > 
> >>> net/core/sock.c:2383:7: error: use of undeclared identifier 
> >>> 'SCM_HW_TXTIME'; did you mean 'SOCK_HW_TXTIME'?
> > case SCM_HW_TXTIME:
> >  ^
> >  SOCK_HW_TXTIME
> > include/net/sock.h:862:2: note: 'SOCK_HW_TXTIME' declared here
> > SOCK_HW_TXTIME,
> > ^
> > 1 error generated.
> > 
> > vim +2383 net/core/sock.c
> > 
> >2351 
> >2352 int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, 
> > struct cmsghdr *cmsg,
> >2353  struct sockcm_cookie *sockc)
> >2354 {
> >2355 u32 tsflags;
> >2356 
> >2357 switch (cmsg->cmsg_type) {
> >2358 case SO_MARK:
> >2359 if (!ns_capable(sock_net(sk)->user_ns, 
> > CAP_NET_ADMIN))
> >2360 return -EPERM;
> >2361 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
> >2362 return -EINVAL;
> >2363 sockc->mark = *(u32 *)CMSG_DATA(cmsg);
> >2364 break;
> >2365 case SO_TIMESTAMPING_OLD:
> >2366 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
> >2367 return -EINVAL;
> >2368 
> >2369 tsflags = *(u32 *)CMSG_DATA(cmsg);
> >2370 if (tsflags & ~SOF_TIMESTAMPING_TX_RECORD_MASK)
> >2371 return -EINVAL;
> >2372 
> >2373 sockc->tsflags &= 
> > ~SOF_TIMESTAMPING_TX_RECORD_MASK;
> >2374 sockc->tsflags |= tsflags;
> >2375 break;
> >2376 case SCM_TXTIME:
> >2377 if (!sock_flag(sk, SOCK_TXTIME))
> >2378 return -EINVAL;
> >2379 if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64)))
> >2380 return -EINVAL;
> >2381 sockc->transmit_time = get_unaligned((u64 
> > *)CMSG_DATA(cmsg));
> >2382 break;
> >> 2383   case SCM_HW_TXTIME:
> >2384 if (!sock_flag(sk, SOCK_HW_TXTIM

Re: [PATCH net-next v2 04/12] gtp: drop unnecessary call to skb_dst_drop

2020-12-12 Thread Harald Welte
On Fri, Dec 11, 2020 at 01:26:04PM +0100, Jonas Bonn wrote:
> The call to skb_dst_drop() is already done as part of udp_tunnel_xmit().

I must be blind, can you please point out where exactly this happens?

I don't see any skb_dst_drop in udp_tunnel_xmit_skb, and 
in iptunnel_xmit() there's only a skb_dst_set (which doesn't call skb_dst_drop 
internally)

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH net-next v2 05/12] gtp: set device type

2020-12-12 Thread Harald Welte
On Fri, Dec 11, 2020 at 01:26:05PM +0100, Jonas Bonn wrote:
> Set the devtype to 'gtp' when setting up the link.
> 
> Signed-off-by: Jonas Bonn 

Acked-by: Harald Welte 

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


INFO: task can't die in connmark_exit_net

2020-12-12 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:15ac8fdb Add linux-next specific files for 20201207
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15fbf86b50
kernel config:  https://syzkaller.appspot.com/x/.config?x=3696b8138207d24d
dashboard link: https://syzkaller.appspot.com/bug?extid=b3b63b6bff456bd95294
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1312128750

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b3b63b6bff456bd95...@syzkaller.appspotmail.com

INFO: task syz-executor.4:13889 can't die for more than 143 seconds.
task:syz-executor.4  state:D stack:26200 pid:13889 ppid: 12369 flags:0x4006
Call Trace:
 context_switch kernel/sched/core.c:4325 [inline]
 __schedule+0x8eb/0x21b0 kernel/sched/core.c:5076
 schedule+0xcf/0x270 kernel/sched/core.c:5155
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:5214
 __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
 __mutex_lock+0x81a/0x1110 kernel/locking/mutex.c:1103
 tc_action_net_exit include/net/act_api.h:147 [inline]
 connmark_exit_net+0x20/0x130 net/sched/act_connmark.c:241
 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
 setup_net+0x508/0x850 net/core/net_namespace.c:365
 copy_net_ns+0x376/0x7b0 net/core/net_namespace.c:483
 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
 copy_namespaces+0x3e5/0x4d0 kernel/nsproxy.c:179
 copy_process+0x2aa7/0x6fe0 kernel/fork.c:2103
 kernel_clone+0xe7/0xad0 kernel/fork.c:2465
 __do_sys_clone+0xc8/0x110 kernel/fork.c:2582
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45e0f9
RSP: 002b:7fd04901bc68 EFLAGS: 0246 ORIG_RAX: 0038
RAX: ffda RBX: 0005 RCX: 0045e0f9
RDX:  RSI:  RDI: e900e57c
RBP: 0119c078 R08:  R09: 
R10:  R11: 0246 R12: 0119c034
R13: 7fff629a5d7f R14: 7fd04901c9c0 R15: 0119c034
INFO: task syz-executor.1:13932 can't die for more than 143 seconds.
task:syz-executor.1  state:D stack:26320 pid:13932 ppid: 12371 flags:0x4006
Call Trace:
 context_switch kernel/sched/core.c:4325 [inline]
 __schedule+0x8eb/0x21b0 kernel/sched/core.c:5076
 schedule+0xcf/0x270 kernel/sched/core.c:5155
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:5214
 __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
 __mutex_lock+0x81a/0x1110 kernel/locking/mutex.c:1103
 tc_action_net_exit include/net/act_api.h:147 [inline]
 gate_exit_net+0x20/0x130 net/sched/act_gate.c:624
 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
 setup_net+0x508/0x850 net/core/net_namespace.c:365
 copy_net_ns+0x376/0x7b0 net/core/net_namespace.c:483
 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
 copy_namespaces+0x3e5/0x4d0 kernel/nsproxy.c:179
 copy_process+0x2aa7/0x6fe0 kernel/fork.c:2103
 kernel_clone+0xe7/0xad0 kernel/fork.c:2465
 __do_sys_clone+0xc8/0x110 kernel/fork.c:2582
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45e0f9
RSP: 002b:7fd373ed4c68 EFLAGS: 0246 ORIG_RAX: 0038
RAX: ffda RBX: 0005 RCX: 0045e0f9
RDX:  RSI:  RDI: e900e57c
RBP: 0119c120 R08:  R09: 
R10:  R11: 0246 R12: 0119c0dc
R13: 7ffc4464352f R14: 7fd373ed59c0 R15: 0119c0dc

Showing all locks held in the system:
3 locks held by kworker/0:2/8:
3 locks held by kworker/1:1/35:
 #0: 8881473fb538 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: 8881473fb538 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
 #0: 8881473fb538 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: 8881473fb538 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
set_work_data kernel/workqueue.c:616 [inline]
 #0: 8881473fb538 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
 #0: 8881473fb538 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
process_one_work+0x871/0x1630 kernel/workqueue.c:2246
 #1: c9e6fda8 ((work_completion)(&(&ifa->dad_work)->work)){+.+.}-{0:0}, 
at: process_one_work+0x8a5/0x1630 kernel/workqueue.c:2250
 #2: 8d0d70c8 (rtnl_mutex){+.+.}-{3:3}, at: 
addrconf_dad_work+0xa3/0x1280 net/ipv6/addrconf.c:4028
1 lock held by khungtaskd/1655:
 #0: 8b78db60 (rcu_read_lock){}-{1:2}, at: 
debug_show_all_locks+0x53/0x28c kernel/locking/lockdep.c:6254
1 lock held by in:imklog/8192:
 #0: 888012b58370 (&f->f_po

Re: [PATCH net-next v2 03/12] gtp: really check namespaces before xmit

2020-12-12 Thread Harald Welte
On Fri, Dec 11, 2020 at 01:26:03PM +0100, Jonas Bonn wrote:
> Blindly assuming that packet transmission crosses namespaces results in
> skb marks being lost in the single namespace case.
> 
> Signed-off-by: Jonas Bonn 

Acked-by: Harald Welte 

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH net-next v2 10/12] gtp: add IPv6 support

2020-12-12 Thread Jonas Bonn

Hi Pravin,

On 12/12/2020 06:51, Pravin Shelar wrote:

On Fri, Dec 11, 2020 at 4:29 AM Jonas Bonn  wrote:


This patch adds support for handling IPv6.  Both the GTP tunnel and the
tunneled packets may be IPv6; as they constitute independent streams,
both v4-over-v6 and v6-over-v4 are supported, as well.

This patch includes only the driver functionality for IPv6 support.  A
follow-on patch will add support for configuring the tunnels with IPv6
addresses.

Signed-off-by: Jonas Bonn 
---
  drivers/net/gtp.c | 330 +-
  1 file changed, 269 insertions(+), 61 deletions(-)





+   /* Get IP version of _inner_ packet */
+   ipver = inner_ip_hdr(skb)->version;
+
+   switch (ipver) {
+   case 4:
+   skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IP));
+   r = gtp_check_ms_ipv4(skb, pctx, hdrlen, role);

I don't see a need to set inner header on receive path, we are any
ways removing outer header from this packet in same function.


+   break;
+   case 6:
+   skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
+   r = gtp_check_ms_ipv6(skb, pctx, hdrlen, role);
+   break;
+   }
+
+   if (!r) {
 netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
 return 1;
 }
@@ -193,6 +256,8 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
  !net_eq(sock_net(pctx->sk), 
dev_net(pctx->dev
 return -1;

+   skb->protocol = skb->inner_protocol;
+

iptunnel_pull_header() can set the protocol, so it would be better to
pass the correct inner protocol.



Yes, your comments above are correct.  I'll fix that.




 netdev_dbg(pctx->dev, "forwarding packet from GGSN to uplink\n");

 /* Now that the UDP and the GTP header have been removed, set up the
@@ -201,7 +266,7 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
  */
 skb_reset_network_header(skb);

-   skb->dev = pctx->dev;
+   __skb_tunnel_rx(skb, pctx->dev, sock_net(pctx->sk));


No need to call skb_tunnel_rx() given iptunnel_pull_header() function
is already called and it does take care of clearing the context.


Right.  The only difference seems to be that __skb_tunnel_rx also does:

skb->dev = dev;

iptunnel_pull_header excludes that.  I can't see that setting the 
skb->dev will actually change anything for this driver, but it was there 
previously.  Thoughts?




+static struct dst_entry *gtp_get_v6_dst(struct sk_buff *skb,
+   struct net_device *dev,
+   struct pdp_ctx *pctx,
+   struct in6_addr *saddr)
+{
+   const struct sock *sk = pctx->sk;
+   struct dst_entry *dst = NULL;
+   struct flowi6 fl6;
+
+   memset(&fl6, 0, sizeof(fl6));
+   fl6.flowi6_mark = skb->mark;
+   fl6.flowi6_proto = IPPROTO_UDP;
+   fl6.daddr = pctx->peer_addr;
+
+   dst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(sk), sk, &fl6, NULL);
+   if (IS_ERR(dst)) {
+   netdev_dbg(pctx->dev, "no route to %pI6\n", &fl6.daddr);
+   return ERR_PTR(-ENETUNREACH);
+   }
+   if (dst->dev == pctx->dev) {
+   netdev_dbg(pctx->dev, "circular route to %pI6\n", &fl6.daddr);
+   dst_release(dst);
+   return ERR_PTR(-ELOOP);
+   }
+
+   *saddr = fl6.saddr;
+
+   return dst;
+}
+

IPv6 related functionality needs to be protected by IS_ENABLED(CONFIG_IPV6).


Yes, you're probably right.  Given that IPv6 isn't really optional in 
contexts where this driver is relevant, however, I'm almost inclined to 
switch this around and make the driver depend on the availability of IPv6...


/Jonas


Re: [PATCH net-next v2 02/12] gtp: include role in link info

2020-12-12 Thread Harald Welte
Hi Jonas,

On Fri, Dec 11, 2020 at 01:26:02PM +0100, Jonas Bonn wrote:
> Querying link info for the GTP interface doesn't reveal in which "role" the
> device is set to operate.  Include this information in the info query
> result.
> 
> Signed-off-by: Jonas Bonn 

Acked-by: Harald Welte 

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH net-next v2 08/12] gtp: set dev features to enable GSO

2020-12-12 Thread Jonas Bonn




On 12/12/2020 06:31, Pravin Shelar wrote:

On Fri, Dec 11, 2020 at 4:28 AM Jonas Bonn  wrote:


Signed-off-by: Jonas Bonn 
---
  drivers/net/gtp.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 236ebbcb37bf..7bbeec173113 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -536,7 +536,11 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct 
net_device *dev)
 if (unlikely(r))
 goto err_rt;

-   skb_reset_inner_headers(skb);
+   r = udp_tunnel_handle_offloads(skb, true);
+   if (unlikely(r))
+   goto err_rt;
+
+   skb_set_inner_protocol(skb, skb->protocol);


This should be skb_set_inner_ipproto(), since GTP is L3 protocol.


I think you're right, but I barely see the point of this.  It all ends 
up in the same place, as far as I can tell.  Is this supposed to be some 
sort of safeguard against trying to offload tunnels-in-tunnels?


/Jonas


Re: [PATCH net-next v2 07/12] gtp: use ephemeral source port

2020-12-12 Thread Harald Welte
Hi Jonas,

On Fri, Dec 11, 2020 at 01:26:07PM +0100, Jonas Bonn wrote:
> From 3GPP TS 29.281:
> "...the UDP Source Port or the Flow Label field... should be set dynamically
> by the sending GTP-U entity to help balancing the load in the transport
> network."

You unfortuantely didn't specifiy which 3GPP release you are referring to.

At least in V15.7.0 (2020-01)  Release 15 I can only find:

"For the messages described below, the UDP Source Port (except as
specified for the Echo Response message) may be allocated either
statically or dynamically by the sending GTP-U entity.  NOTE: Dynamic
allocation of the UDP source port can help balancing the load in the
network, depending on network deployments and network node
implementations."

For GTPv0, TS 29.060 states:

"The UDP Source Port is a locally allocated port number at the sending
GSN/RNC."

unfortuantely it doesn't say if it's a locally allocated number globally
for that entire GSN/RNC, or it's dynamic per flow or per packet.

As I'm aware of a lot of very tight packet filtering between GSNs,
I would probably not go for fully dynamic source port allocation
without some kind of way how the user (GTP-control instance) being
able to decide on that policy.

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH 1/1] net: Fix use of proc_fs

2020-12-12 Thread Yonatan Linik
On Fri, Dec 11, 2020 at 11:00 PM Arnd Bergmann  wrote:
>
> Another option would be to just ignore the return code here
> and continue without a procfs file, regardless of whether procfs
> is enabled or not.
>
>Arnd

Yes I thought about that, but I didn't want to make changes to the way
it behaved when procfs was enabled.
If you decide that's a better solution I will happily change it.

-- 
Yonatan Linik


Re: [PATCH net-next v2 07/12] gtp: use ephemeral source port

2020-12-12 Thread Jonas Bonn




On 12/12/2020 06:35, Pravin Shelar wrote:

On Fri, Dec 11, 2020 at 4:29 AM Jonas Bonn  wrote:



 /* Read the IP destination address and resolve the PDP context.
@@ -527,6 +527,10 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct 
net_device *dev)
 return -EMSGSIZE;
 }

+   sport = udp_flow_src_port(sock_net(pctx->sk), skb,
+   0, USHRT_MAX,
+   true);
+

why use_eth is true for this is L3 GTP devices, Am missing something?


No, you are right.  Will fix.

/Jonas


[PATCH] xfrm: redact SA secret with lockdown confidentiality

2020-12-12 Thread Steffen Klassert
From: Antony Antony 

redact XFRM SA secret in the netlink response to xfrm_get_sa()
or dumpall sa.
Enable lockdown, confidentiality mode, at boot or at run time.

e.g. when enabled:
cat /sys/kernel/security/lockdown
none integrity [confidentiality]

ip xfrm state
src 172.16.1.200 dst 172.16.1.100
proto esp spi 0x0002 reqid 2 mode tunnel
replay-window 0
aead rfc4106(gcm(aes)) 0x 96

note: the aead secret is redacted.
Redacting secret is also a FIPS 140-2 requirement.

v1->v2
 - add size checks before memset calls
v2->v3
 - replace spaces with tabs for consistency
v3->v4
 - use kernel lockdown instead of a /proc setting
v4->v5
 - remove kconfig option

Reviewed-by: Stephan Mueller 
Signed-off-by: Antony Antony 
Signed-off-by: Steffen Klassert 
---
 include/linux/security.h |  1 +
 net/xfrm/xfrm_user.c | 74 
 security/security.c  |  1 +
 3 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index bc2725491560..1112a79a7dba 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -127,6 +127,7 @@ enum lockdown_reason {
LOCKDOWN_PERF,
LOCKDOWN_TRACEFS,
LOCKDOWN_XMON_RW,
+   LOCKDOWN_XFRM_SECRET,
LOCKDOWN_CONFIDENTIALITY_MAX,
 };
 
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d0c32a8fcc4a..0727ac853b55 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -848,21 +848,84 @@ static int copy_user_offload(struct xfrm_state_offload 
*xso, struct sk_buff *skb
return 0;
 }
 
+static bool xfrm_redact(void)
+{
+   return IS_ENABLED(CONFIG_SECURITY) &&
+   security_locked_down(LOCKDOWN_XFRM_SECRET);
+}
+
 static int copy_to_user_auth(struct xfrm_algo_auth *auth, struct sk_buff *skb)
 {
struct xfrm_algo *algo;
+   struct xfrm_algo_auth *ap;
struct nlattr *nla;
+   bool redact_secret = xfrm_redact();
 
nla = nla_reserve(skb, XFRMA_ALG_AUTH,
  sizeof(*algo) + (auth->alg_key_len + 7) / 8);
if (!nla)
return -EMSGSIZE;
-
algo = nla_data(nla);
strncpy(algo->alg_name, auth->alg_name, sizeof(algo->alg_name));
-   memcpy(algo->alg_key, auth->alg_key, (auth->alg_key_len + 7) / 8);
+
+   if (redact_secret && auth->alg_key_len)
+   memset(algo->alg_key, 0, (auth->alg_key_len + 7) / 8);
+   else
+   memcpy(algo->alg_key, auth->alg_key,
+  (auth->alg_key_len + 7) / 8);
algo->alg_key_len = auth->alg_key_len;
 
+   nla = nla_reserve(skb, XFRMA_ALG_AUTH_TRUNC, xfrm_alg_auth_len(auth));
+   if (!nla)
+   return -EMSGSIZE;
+   ap = nla_data(nla);
+   memcpy(ap, auth, sizeof(struct xfrm_algo_auth));
+   if (redact_secret && auth->alg_key_len)
+   memset(ap->alg_key, 0, (auth->alg_key_len + 7) / 8);
+   else
+   memcpy(ap->alg_key, auth->alg_key,
+  (auth->alg_key_len + 7) / 8);
+   return 0;
+}
+
+static int copy_to_user_aead(struct xfrm_algo_aead *aead, struct sk_buff *skb)
+{
+   struct nlattr *nla = nla_reserve(skb, XFRMA_ALG_AEAD, aead_len(aead));
+   struct xfrm_algo_aead *ap;
+   bool redact_secret = xfrm_redact();
+
+   if (!nla)
+   return -EMSGSIZE;
+
+   ap = nla_data(nla);
+   memcpy(ap, aead, sizeof(*aead));
+
+   if (redact_secret && aead->alg_key_len)
+   memset(ap->alg_key, 0, (aead->alg_key_len + 7) / 8);
+   else
+   memcpy(ap->alg_key, aead->alg_key,
+  (aead->alg_key_len + 7) / 8);
+   return 0;
+}
+
+static int copy_to_user_ealg(struct xfrm_algo *ealg, struct sk_buff *skb)
+{
+   struct xfrm_algo *ap;
+   bool redact_secret = xfrm_redact();
+   struct nlattr *nla = nla_reserve(skb, XFRMA_ALG_CRYPT,
+xfrm_alg_len(ealg));
+   if (!nla)
+   return -EMSGSIZE;
+
+   ap = nla_data(nla);
+   memcpy(ap, ealg, sizeof(*ealg));
+
+   if (redact_secret && ealg->alg_key_len)
+   memset(ap->alg_key, 0, (ealg->alg_key_len + 7) / 8);
+   else
+   memcpy(ap->alg_key, ealg->alg_key,
+  (ealg->alg_key_len + 7) / 8);
+
return 0;
 }
 
@@ -906,20 +969,17 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
goto out;
}
if (x->aead) {
-   ret = nla_put(skb, XFRMA_ALG_AEAD, aead_len(x->aead), x->aead);
+   ret = copy_to_user_aead(x->aead, skb);
if (ret)
goto out;
}
if (x->aalg) {
ret = copy_to_user_auth(x->aalg, skb);
-   if (!ret)
-   ret = nla_put(skb, XFRMA_ALG_AUTH_TRUNC,
- xfrm_alg_auth_len(x->

pull request (net-next): ipsec-next 2020-12-12

2020-12-12 Thread Steffen Klassert
Just one patch this time:

1) Redact the SA keys with kernel lockdown confidentiality.
   If enabled, no secret keys are sent to uuserspace.
   From Antony Antony.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 8be33ecfc1ffd2da20cc29e957e4cb6eb99310cb:

  net: skb_vlan_untag(): don't reset transport offset if set by GRO layer 
(2020-11-09 20:03:55 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to c7a5899eb26e2a4d516d53f65b6dd67be2228041:

  xfrm: redact SA secret with lockdown confidentiality (2020-11-27 11:03:06 
+0100)


Antony Antony (1):
  xfrm: redact SA secret with lockdown confidentiality

 include/linux/security.h |  1 +
 net/xfrm/xfrm_user.c | 74 +++-
 security/security.c  |  1 +
 3 files changed, 69 insertions(+), 7 deletions(-)


Re: [PATCH net-next v2 10/12] gtp: add IPv6 support

2020-12-12 Thread Harald Welte
Hi Jonas,

On Sat, Dec 12, 2020 at 08:05:40AM +0100, Jonas Bonn wrote:
> Yes, you're probably right.  Given that IPv6 isn't really optional in
> contexts where this driver is relevant, [...]

I strongly contest this statement. GTP is used in a lot of legacy contexts
without any IPv6 requirements whatsoever.  _particularly_ so on the outer/
transport level, where even GSMA IR.34 in still states:

> The IPX Provider's and Service Provider's networks must support IPv4
> addressing and routing.  IPv6 addressing and routing is recommended.

So there is (still) no requirement for IPv6 on the transport level
between cellular operators.
 
The fact that this gtp module has existed until today with pure IPv4
support has something to say about that.

I'm of course all in support of finally getting IPv6 support merged (via
your patches!) - but I see absolutely no reason why a GTP kernel module
would have a mandatory dependency on IPv6.

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH net-next v2 11/12] gtp: netlink update for ipv6

2020-12-12 Thread Harald Welte
On Fri, Dec 11, 2020 at 01:26:11PM +0100, Jonas Bonn wrote:
> This patch adds the netlink changes required to support IPv6.

See my related comment to the other IPv6 patch in this series.

It is not legal to assume that v4/v6 are an either-or decision,
but it can be either v4-only, v6-only or v4 and v6 in the same PDP context.

For the "peer" (outer) address, I think it is correct to assume only either v4 
or v6.

But for the inner "ms" address, it is not.

Regards,
Harald

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH net-next v2 04/12] gtp: drop unnecessary call to skb_dst_drop

2020-12-12 Thread Jonas Bonn




On 12/12/2020 10:50, Harald Welte wrote:

On Fri, Dec 11, 2020 at 01:26:04PM +0100, Jonas Bonn wrote:

The call to skb_dst_drop() is already done as part of udp_tunnel_xmit().


I must be blind, can you please point out where exactly this happens?


It's in skb_scrub_packet() which is called by iptunnel_xmit().

/Jonas



I don't see any skb_dst_drop in udp_tunnel_xmit_skb, and
in iptunnel_xmit() there's only a skb_dst_set (which doesn't call skb_dst_drop 
internally)





Re: [PATCH net-next v2 10/12] gtp: add IPv6 support

2020-12-12 Thread Harald Welte
Hi Jonas,

thanks again for your patches, they are very much appreciated.

However, I don't think that it is "that easy".

PDP contexts (at least) in GPRS/EDGE and UMTS come in three flavors:
* IPv4 only
* IPv6 only
* IPv4v6 (i.e. both an IPv4 and an IPv6 address within the same tunnel)

See for example osmo-ggsn at https://git.osmocom.org/osmo-ggsn
for an userspace implementation that covers all three cases,
as well as a related automatic test suite containing cases
for all three flavors at
https://git.osmocom.org/osmo-ttcn3-hacks/tree/ggsn_tests

If I read your patch correctly

On Fri, Dec 11, 2020 at 01:26:10PM +0100, Jonas Bonn wrote:
> - struct in_addr  ms_addr_ip4;
> - struct in_addr  peer_addr_ip4;
> + struct in6_addr ms_addr;
> + struct in6_addr peer_addr;

this simply replaces the (inner) IPv4 "MS addr" with an IPv6 "MS addr".

Sure, it is an improvement over v4-only.  But IMHO any follow-up
change to introduce v4v6 PDP contexts would require significant changes,
basically re-introducing the ms_add_ip4 member which you are removing here.

Therefore, I argue very much in favor of intrducing proper IPv6 support
(including v4v6) in one go.

Regards,
Harald

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [patch 03/30] genirq: Move irq_set_lockdep_class() to core

2020-12-12 Thread Andy Shevchenko
On Sat, Dec 12, 2020 at 12:07 AM Thomas Gleixner  wrote:
>
> On Fri, Dec 11 2020 at 22:08, Thomas Gleixner wrote:
>
> > On Fri, Dec 11 2020 at 19:53, Andy Shevchenko wrote:
> >
> >> On Thu, Dec 10, 2020 at 10:14 PM Thomas Gleixner  
> >> wrote:
> >>>
> >>> irq_set_lockdep_class() is used from modules and requires irq_to_desc() to
> >>> be exported. Move it into the core code which lifts another requirement 
> >>> for
> >>> the export.
> >>
> >> ...
> >>
> >>> +   if (IS_ENABLED(CONFIG_LOCKDEP))
> >>> +   __irq_set_lockdep_class(irq, lock_class, request_class);
> >
> > You are right. Let me fix that.
>
> No. I have to correct myself. You're wrong.
>
> The inline is evaluated in the compilation units which include that
> header and because the function declaration is unconditional it is
> happy.
>
> Now the optimizer stage makes the whole thing a NOOP if CONFIG_LOCKDEP=n
> and thereby drops the reference to the function which makes it not
> required for linking.
>
> So in the file where the function is implemented:
>
> #ifdef CONFIG_LOCKDEP
> void __irq_set_lockdep_class()
> {
> }
> #endif
>
> The whole block is either discarded because CONFIG_LOCKDEP is not
> defined or compile if it is defined which makes it available for the
> linker.
>
> And in the latter case the optimizer keeps the call in the inline (it
> optimizes the condition away because it's always true).
>
> So in both cases the compiler and the linker are happy and everything
> works as expected.
>
> It would fail if the header file had the following:
>
> #ifdef CONFIG_LOCKDEP
> void __irq_set_lockdep_class();
> #endif
>
> Because then it would complain about the missing function prototype when
> it evaluates the inline.

I understand that (that's why I put "if even no warning") and what I'm
talking about is the purpose of IS_ENABLED(). It's usually good for
compile testing !CONFIG_FOO cases. But here it seems inconsistent.

The pattern I usually see in the cases like this is

 #ifdef CONFIG_LOCKDEP
 void __irq_set_lockdep_class();
 #else
 static inline void ... {}
 #endif

and call it directly in the caller.

It's not a big deal, so up to you.

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v2] net: dsa: qca: ar9331: fix sleeping function called from invalid context bug

2020-12-12 Thread Vladimir Oltean
On Fri, Dec 11, 2020 at 12:03:17PM +0100, Oleksij Rempel wrote:
> With lockdep enabled, we will get following warning:
> 
>  ar9331_switch ethernet.1:10 lan0 (uninitialized): PHY 
> [!ahb!ethernet@1a00!mdio!switch@10:00] driver [Qualcomm Atheros AR9331 
> built-in PHY] (irq=13)
>  BUG: sleeping function called from invalid context at 
> kernel/locking/mutex.c:935
>  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 18, name: kworker/0:1
>  INFO: lockdep is turned off.
>  irq event stamp: 602
>  hardirqs last  enabled at (601): [<8073fde0>] _raw_spin_unlock_irq+0x3c/0x80
>  hardirqs last disabled at (602): [<8073a4f4>] __schedule+0x184/0x800
>  softirqs last  enabled at (0): [<80080f60>] copy_process+0x578/0x14c8
>  softirqs last disabled at (0): [<>] 0x0
>  CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 
> 5.10.0-rc3-ar9331-00734-g7d644991df0c #31
>  Workqueue: events deferred_probe_work_func
>  Stack : 8098 8098 8089ef70 8089 804b5414 8098 0002 
> 80b53728
>   800d1268 804b5414 ffde 0017 800afe08 81943860 
> 0f5bfc32
>    8089ef70 819436c0 ffea   
> 
>  8194390c 808e353c 000f 66657272 8098   
> 8089
>  804b5414 8098 0002 80b53728    
> 80d4
>  ...
>  Call Trace:
>  [<80069ce0>] show_stack+0x9c/0x140
>  [<800afe08>] ___might_sleep+0x220/0x244
>  [<8073bfb0>] __mutex_lock+0x70/0x374
>  [<8073c2e0>] mutex_lock_nested+0x2c/0x38
>  [<804b5414>] regmap_update_bits_base+0x38/0x8c
>  [<804ee584>] regmap_update_bits+0x1c/0x28
>  [<804ee714>] ar9331_sw_unmask_irq+0x34/0x60
>  [<800d91f0>] unmask_irq+0x48/0x70
>  [<800d93d4>] irq_startup+0x114/0x11c
>  [<800d65b4>] __setup_irq+0x4f4/0x6d0
>  [<800d68a0>] request_threaded_irq+0x110/0x190
>  [<804e3ef0>] phy_request_interrupt+0x4c/0xe4
>  [<804df508>] phylink_bringup_phy+0x2c0/0x37c
>  [<804df7bc>] phylink_of_phy_connect+0x118/0x130
>  [<806c1a64>] dsa_slave_create+0x3d0/0x578
>  [<806bc4ec>] dsa_register_switch+0x934/0xa20
>  [<804eef98>] ar9331_sw_probe+0x34c/0x364
>  [<804eb48c>] mdio_probe+0x44/0x70
>  [<8049e3b4>] really_probe+0x30c/0x4f4
>  [<8049ea10>] driver_probe_device+0x264/0x26c
>  [<8049bc10>] bus_for_each_drv+0xb4/0xd8
>  [<8049e684>] __device_attach+0xe8/0x18c
>  [<8049ce58>] bus_probe_device+0x48/0xc4
>  [<8049db70>] deferred_probe_work_func+0xdc/0xf8
>  [<8009ff64>] process_one_work+0x2e4/0x4a0
>  [<800a0770>] worker_thread+0x2a8/0x354
>  [<800a774c>] kthread+0x16c/0x174
>  [<8006306c>] ret_from_kernel_thread+0x14/0x1c
> 
>  ar9331_switch ethernet.1:10 lan1 (uninitialized): PHY 
> [!ahb!ethernet@1a00!mdio!switch@10:02] driver [Qualcomm Atheros AR9331 
> built-in PHY] (irq=13)
>  DSA: tree 0 setup
> 
> To fix it, it is better to move access to MDIO register to the 
> .irq_bus_sync_unlock
> call back.
> 
> Fixes: ec6698c272de ("net: dsa: add support for Atheros AR9331 built-in 
> switch")
> Signed-off-by: Oleksij Rempel 
> ---

Just from looking at other irqchip drivers, it seems probably ok to do
your I/O from .irq_bus_sync_unlock.

Reviewed-by: Vladimir Oltean 

But I'm a bit concerned about your ar9331_sw_remove method. Is it safe
to call these in the following order?

irq_domain_remove(priv->irqdomain);
mdiobus_unregister(priv->mbus);
dsa_unregister_switch(&priv->ds);

What if a PHY interrupt occurs after the irqdomain was removed and/or
the master MDIO bus was removed, but before dsa_unregister_switch
happened, which performed all the phylink teardown?

> changes v2:
> - fix comment on error
> 
>  drivers/net/dsa/qca/ar9331.c | 33 -
>  1 file changed, 24 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c
> index e24a99031b80..4d49c5f2b790 100644
> --- a/drivers/net/dsa/qca/ar9331.c
> +++ b/drivers/net/dsa/qca/ar9331.c
> @@ -159,6 +159,8 @@ struct ar9331_sw_priv {
>   struct dsa_switch ds;
>   struct dsa_switch_ops ops;
>   struct irq_domain *irqdomain;
> + u32 irq_mask;
> + struct mutex lock_irq;
>   struct mii_bus *mbus; /* mdio master */
>   struct mii_bus *sbus; /* mdio slave */
>   struct regmap *regmap;
> @@ -520,32 +522,44 @@ static irqreturn_t ar9331_sw_irq(int irq, void *data)
>  static void ar9331_sw_mask_irq(struct irq_data *d)
>  {
>   struct ar9331_sw_priv *priv = irq_data_get_irq_chip_data(d);
> - struct regmap *regmap = priv->regmap;
> - int ret;
>  
> - ret = regmap_update_bits(regmap, AR9331_SW_REG_GINT_MASK,
> -  AR9331_SW_GINT_PHY_INT, 0);
> - if (ret)
> - dev_err(priv->dev, "could not mask IRQ\n");
> + priv->irq_mask = 0;
>  }
>  
>  static void ar9331_sw_unmask_irq(struct irq_data *d)
> +{
> + struct ar9331_sw_priv *priv = irq_data_get_irq_chip_data(d);
> +
> + priv->irq_mask = AR9331_SW_GINT_PHY_INT;
> +}
> +
> +sta

Re: [PATCH net-next v2 07/12] gtp: use ephemeral source port

2020-12-12 Thread Jonas Bonn

Hi Harald,

On 12/12/2020 11:07, Harald Welte wrote:

Hi Jonas,

On Fri, Dec 11, 2020 at 01:26:07PM +0100, Jonas Bonn wrote:

 From 3GPP TS 29.281:
"...the UDP Source Port or the Flow Label field... should be set dynamically
by the sending GTP-U entity to help balancing the load in the transport
network."


You unfortuantely didn't specifiy which 3GPP release you are referring to.



Sorry about that.  I'm looking at v16.1.0.


At least in V15.7.0 (2020-01)  Release 15 I can only find:

"For the messages described below, the UDP Source Port (except as
specified for the Echo Response message) may be allocated either
statically or dynamically by the sending GTP-U entity.  NOTE: Dynamic
allocation of the UDP source port can help balancing the load in the
network, depending on network deployments and network node
implementations."

For GTPv0, TS 29.060 states:

"The UDP Source Port is a locally allocated port number at the sending
GSN/RNC."

unfortuantely it doesn't say if it's a locally allocated number globally
for that entire GSN/RNC, or it's dynamic per flow or per packet.


I could interpret that to mean that the receiving end shouldn't be too 
bothered about what port a packet happens to come from...  That said, 
dynamic per flow is almost certainly what we want here as this is mostly 
about being able to trigger receive side scaling for network cards that 
can't look at inner headers.




As I'm aware of a lot of very tight packet filtering between GSNs,
I would probably not go for fully dynamic source port allocation
without some kind of way how the user (GTP-control instance) being
able to decide on that policy.



Sure... sounds reasonable.  A flag on the link setup indicating dynamic 
source port yes/no should be sufficient, right?


/Jonas



Re: [PATCH v5 2/2] net: dsa: qca: ar9331: export stats64

2020-12-12 Thread Vladimir Oltean
On Fri, Dec 11, 2020 at 11:53:22AM +0100, Oleksij Rempel wrote:
> Add stats support for the ar9331 switch.
> 
> Signed-off-by: Oleksij Rempel 
> ---
>  drivers/net/dsa/qca/ar9331.c | 256 ++-
>  1 file changed, 255 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c
> index 4d49c5f2b790..5baef0ec6410 100644
> --- a/drivers/net/dsa/qca/ar9331.c
> +++ b/drivers/net/dsa/qca/ar9331.c
> @@ -101,6 +101,9 @@
>AR9331_SW_PORT_STATUS_RX_FLOW_EN | AR9331_SW_PORT_STATUS_TX_FLOW_EN | \
>AR9331_SW_PORT_STATUS_SPEED_M)
>  
> +/* MIB registers */
> +#define AR9331_MIB_COUNTER(x)(0x2 + ((x) * 
> 0x100))
> +
>  /* Phy bypass mode
>   * 
>   * Bit:   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
> @@ -154,6 +157,111 @@
>  #define AR9331_SW_MDIO_POLL_SLEEP_US 1
>  #define AR9331_SW_MDIO_POLL_TIMEOUT_US   20
>  
> +/* The interval should be small enough to avoid overflow of 32bit MIBs */
> +/*
> + * FIXME: as long as we can't read MIBs from stats64 call directly, we should
> + * poll stats more frequently then it is actually needed. In normal case
> + * 100 sec interval should be OK.
> + */
> +#define STATS_INTERVAL_JIFFIES   (3 * HZ)
> +
> +struct ar9331_sw_stats_raw {
> + u32 rxbroad;/* 0x00 */

You should probably use all tabs, or all spaces.

> + u32 rxpause;/* 0x04 */
> + u32 rxmulti;/* 0x08 */
> + u32 rxfcserr;   /* 0x0c */
> + u32 rxalignerr; /* 0x10 */
> + u32 rxrunt; /* 0x14 */
> + u32 rxfragment; /* 0x18 */
> + u32 rx64byte;   /* 0x1c */
> + u32 rx128byte;  /* 0x20 */
> + u32 rx256byte;  /* 0x24 */
> + u32 rx512byte;  /* 0x28 */
> + u32 rx1024byte; /* 0x2c */
> + u32 rx1518byte; /* 0x30 */
> + u32 rxmaxbyte;  /* 0x34 */
> + u32 rxtoolong;  /* 0x38 */
> + u32 rxgoodbyte; /* 0x3c */
> + u32 rxgoodbyte_hi;
> + u32 rxbadbyte;  /* 0x44 */
> + u32 rxbadbyte_hi;
> + u32 rxoverflow; /* 0x4c */
> + u32 filtered;   /* 0x50 */
> + u32 txbroad;/* 0x54 */
> + u32 txpause;/* 0x58 */
> + u32 txmulti;/* 0x5c */
> + u32 txunderrun; /* 0x60 */
> + u32 tx64byte;   /* 0x64 */
> + u32 tx128byte;  /* 0x68 */
> + u32 tx256byte;  /* 0x6c */
> + u32 tx512byte;  /* 0x70 */
> + u32 tx1024byte; /* 0x74 */
> + u32 tx1518byte; /* 0x78 */
> + u32 txmaxbyte;  /* 0x7c */
> + u32 txoversize; /* 0x80 */
> + u32 txbyte; /* 0x84 */
> + u32 txbyte_hi;
> + u32 txcollision;/* 0x8c */
> + u32 txabortcol; /* 0x90 */
> + u32 txmulticol; /* 0x94 */
> + u32 txsinglecol;/* 0x98 */
> + u32 txexcdefer; /* 0x9c */
> + u32 txdefer;/* 0xa0 */
> + u32 txlatecol;  /* 0xa4 */
> +};
> +
> +struct ar9331_sw_stats {
> + u64_stats_t rxbroad;
> + u64_stats_t rxpause;
> + u64_stats_t rxmulti;
> + u64_stats_t rxfcserr;
> + u64_stats_t rxalignerr;
> + u64_stats_t rxrunt;
> + u64_stats_t rxfragment;
> + u64_stats_t rx64byte;
> + u64_stats_t rx128byte;
> + u64_stats_t rx256byte;
> + u64_stats_t rx512byte;
> + u64_stats_t rx1024byte;
> + u64_stats_t rx1518byte;
> + u64_stats_t rxmaxbyte;
> + u64_stats_t rxtoolong;
> + u64_stats_t rxgoodbyte;
> + u64_stats_t rxbadbyte;
> + u64_stats_t rxoverflow;
> + u64_stats_t filtered;
> + u64_stats_t txbroad;
> + u64_stats_t txpause;
> + u64_stats_t txmulti;
> + u64_stats_t txunderrun;
> + u64_stats_t tx64byte;
> + u64_stats_t tx128byte;
> + u64_stats_t tx256byte;
> + u64_stats_t tx512byte;
> + u64_stats_t tx1024byte;
> + u64_stats_t tx1518byte;
> + u64_stats_t txmaxbyte;
> + u64_stats_t txoversize;
> + u64_stats_t txbyte;
> + u64_stats_t txcollision;
> + u64_stats_t txabortcol;
> + u64_stats_t txmulticol;
> + u64_stats_t txsinglecol;
> + u64_stats_t txexcdefer;
> + u64_stats_t txdefer;
> + u64_stats_t txlatecol;
> +
> + struct u64_stats_sync syncp;
> +};
> +
> +struct ar9331_sw_priv;
> +struct ar9331_sw_port {
> + int idx;
> + struct ar9331_sw_priv *priv;
> + struct delayed_work mib_read;
> + struct ar

Re: [PATCH v5 1/2] net: dsa: add optional stats64 support

2020-12-12 Thread Vladimir Oltean
On Fri, Dec 11, 2020 at 11:53:21AM +0100, Oleksij Rempel wrote:
> Allow DSA drivers to export stats64
> 
> Signed-off-by: Oleksij Rempel 
> Reviewed-by: Vladimir Oltean 
> ---
>  include/net/dsa.h |  3 +++
>  net/dsa/slave.c   | 14 +-
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 4e60d2610f20..457b89143875 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -655,6 +655,9 @@ struct dsa_switch_ops {
>   int (*port_change_mtu)(struct dsa_switch *ds, int port,
>  int new_mtu);
>   int (*port_max_mtu)(struct dsa_switch *ds, int port);
> +
> + void(*get_stats64)(struct dsa_switch *ds, int port,
> +struct rtnl_link_stats64 *s);

Nitpick: I would have probably put it under the "ethtool hardware
statistics." category, at the same time renaming it into "Port
statistics counters."

>  };
>  
>  #define DSA_DEVLINK_PARAM_DRIVER(_id, _name, _type, _cmodes) \
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index ff2266d2b998..6e1a4dc18a97 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -1602,6 +1602,18 @@ static struct devlink_port 
> *dsa_slave_get_devlink_port(struct net_device *dev)
>   return dp->ds->devlink ? &dp->devlink_port : NULL;
>  }
>  
> +static void dsa_slave_get_stats64(struct net_device *dev,
> +   struct rtnl_link_stats64 *s)
> +{
> + struct dsa_port *dp = dsa_slave_to_port(dev);
> + struct dsa_switch *ds = dp->ds;
> +
> + if (!ds->ops->get_stats64)
> + return dev_get_tstats64(dev, s);
> +
> + return ds->ops->get_stats64(ds, dp->index, s);
> +}
> +
>  static const struct net_device_ops dsa_slave_netdev_ops = {
>   .ndo_open   = dsa_slave_open,
>   .ndo_stop   = dsa_slave_close,
> @@ -1621,7 +1633,7 @@ static const struct net_device_ops dsa_slave_netdev_ops 
> = {
>  #endif
>   .ndo_get_phys_port_name = dsa_slave_get_phys_port_name,
>   .ndo_setup_tc   = dsa_slave_setup_tc,
> - .ndo_get_stats64= dev_get_tstats64,
> + .ndo_get_stats64= dsa_slave_get_stats64,
>   .ndo_get_port_parent_id = dsa_slave_get_port_parent_id,
>   .ndo_vlan_rx_add_vid= dsa_slave_vlan_rx_add_vid,
>   .ndo_vlan_rx_kill_vid   = dsa_slave_vlan_rx_kill_vid,
> -- 
> 2.29.2
> 


Re: [PATCH net-next v2 10/12] gtp: add IPv6 support

2020-12-12 Thread Jonas Bonn

Hi Harald,

On 12/12/2020 12:22, Harald Welte wrote:

Hi Jonas,

thanks again for your patches, they are very much appreciated.

However, I don't think that it is "that easy".

PDP contexts (at least) in GPRS/EDGE and UMTS come in three flavors:
* IPv4 only
* IPv6 only
* IPv4v6 (i.e. both an IPv4 and an IPv6 address within the same tunnel)

See for example osmo-ggsn at https://git.osmocom.org/osmo-ggsn
for an userspace implementation that covers all three cases,
as well as a related automatic test suite containing cases
for all three flavors at
https://git.osmocom.org/osmo-ttcn3-hacks/tree/ggsn_tests

If I read your patch correctly

On Fri, Dec 11, 2020 at 01:26:10PM +0100, Jonas Bonn wrote:

-   struct in_addr  ms_addr_ip4;
-   struct in_addr  peer_addr_ip4;
+   struct in6_addr ms_addr;
+   struct in6_addr peer_addr;


this simply replaces the (inner) IPv4 "MS addr" with an IPv6 "MS addr".


Yes, that's correct.  It's currently an either-or proposition for the UE 
address.


I actually wanted to proceed a bit carefully here because this patch 
series is already quite large.  I do plan on adding support for multiple 
MS addresses, but already just juggling mixed inner and outer tunnels 
seemed challenging enough to review.




Sure, it is an improvement over v4-only.  But IMHO any follow-up
change to introduce v4v6 PDP contexts would require significant changes,
basically re-introducing the ms_add_ip4 member which you are removing here.


I think we ultimately need to take the MS (UE address) member out of the 
PDP context struct altogether.  A single PDP context can apply to even 
more than two addresses as the UE can/should be provisioned with 
multiple IPv6 addresses (as I understand it, superficially).




Therefore, I argue very much in favor of intrducing proper IPv6 support
(including v4v6) in one go.


For provisioning contexts via Netlink, I agree that we should try to 
avoid pathological intermediate steps.  For the actual packet handling 
of outer and inner IPv6 tunnels, I think it's moot whether or not the 
PDP contexts support multiple addresses or not: the subsequent step to 
extend to multiple IP's is a bit of churn, but doesn't immediately seem 
overly intrusive.


Anyway, I'll look into making this change...

Thanks for the review!

/Jonas





Regards,
Harald



Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support

2020-12-12 Thread Vladimir Oltean
On Fri, Dec 11, 2020 at 09:50:24PM +0100, Tobias Waldekranz wrote:
> On Tue, Dec 08, 2020 at 13:23, Vladimir Oltean  wrote:
> > Sorry it took so long. I wanted to understand:
> > (a) where are the challenged for drivers to uniformly support software
> > bridging when they already have code for bridge offloading. I found
> > the following issues:
> > - We have taggers that unconditionally set skb->offload_fwd_mark = 1,
> >   which kind of prevents software bridging. I'm not sure what the
> >   fix for these should be.
>
> I took a closer look at the software fallback mode for LAGs and I've
> found three issues that prevent this from working in a bridged setup,
> two of which are easy to fix. This is the setup (team0 is _not_
> offloaded):
>
> (A)  br0
>  /
>   team0
>/ \
> swp0 swp1
>
>
> 1. DSA tries to offload port attributes for standalone ports. So in this
>setup, if vlan filtering is enabled on br0, we will enable it in
>hardware which on mv88e6xxx causes swp0/1 to drop all packets on
>ingress due to a VTU violation. This is a very easy fix, I will
>include it in v4.

Makes sense, I did not enable VLAN filtering when I tried it.

> 2. The issue Vladimir mentioned above. This is also a straight forward
>fix, I have patch for tag_dsa, making sure that offload_fwd_mark is
>never set for ports in standalone mode.
>
>I am not sure if I should solve it like that or if we should just
>clear the mark in dsa_switch_rcv if the dp does not have a
>bridge_dev. I know both Vladimir and I were leaning towards each
>tagger solving it internally. But looking at the code, I get the
>feeling that all taggers will end up copying the same block of code
>anyway. What do you think?

I am not sure what constitutes a good separation between DSA and taggers
here. We have many taggers that just set skb->offload_fwd_mark = 1. We
could have this as an opportunity to even let DSA take the decision
altogether. What do you say if we stop setting skb->offload_fwd_mark
from taggers, just add this:

+#define DSA_SKB_TRAPPEDBIT(0)
+
 struct dsa_skb_cb {
struct sk_buff *clone;
+   unsigned long flags;
 };

and basically just reverse the logic. Make taggers just assign this flag
for packets which are known to have reached software via data or control
traps. Don't make the taggers set skb->offload_fwd_mark = 1 if they
don't need to. Let DSA take that decision upon a more complex thought
process, which looks at DSA_SKB_CB(skb)->flags & DSA_SKB_TRAPPED too,
among other things.

> With these two patches in place, setup (A) works as expected. But if you
> extend it to (team0 still not offloaded):
>
> (B)   br0
>  /   \
>   team0   \
>/ \ \
> swp0 swp1  swp2
>
> You instantly run into:
>
> 3. Only traffic which does _not_ have offload_fwd_mark set is allowed to
>pass from swp2 to team0. This is because the bridge uses
>dev_get_port_parent_id to figure out which ports belong to the same
>switch. This will recurse down through all lowers and find swp0/1
>which will answer with the same ID as swp2.
>
>In the case where team0 is offloaded, this is exactly what we want,
>but in a setup like (B) they do not have the same "logical" parent in
>the sense that br0 is led to believe. I.e. the hardware will never
>forward packets between swp0/1 and swp2.
>
>I do not see an obvious solution to this. Refusing to divulge the
>parent just because you are a part of a software LAG seems fraught
>with danger as there are other users of those APIs. Adding yet
>another ndo would theoretically be possible, but not
>desirable. Ideas?
>
> As for this series, my intention is to make sure that (A) works as
> intended, leaving (B) for another day. Does that seem reasonable?
>
> NOTE: In the offloaded case, (B) will of course also be supported.

Yeah, ok, one can already tell that the way I've tested this setup was
by commenting out skb->offload_fwd_mark = 1 altogether. It seems ok to
postpone this a bit.

For what it's worth, in the giant "RX filtering for DSA switches" fiasco
https://patchwork.ozlabs.org/project/netdev/patch/20200521211036.668624-11-olte...@gmail.com/
we seemed to reach the conclusion that it would be ok to add a new NDO
answering the question "can this interface do forwarding in hardware
towards this other interface". We can probably start with the question
being asked for L2 forwarding only.

Maybe you are just the kick I need to come back to that series and
simplify it.


Re: [PATCH] net/netconsole: Support VLAN for netconsole

2020-12-12 Thread Vladimir Oltean
On Thu, Dec 10, 2020 at 09:55:16AM -0800, Florian Fainelli wrote:
> 
> 
> On 12/10/2020 2:07 AM, Libing Zhou wrote:
> > During kernel startup phase, current netconsole doesn’t support VLAN
> > since there is no VLAN interface setup already.
> > 
> > This patch provides VLAN ID and PCP as optional boot/module parameters
> > to support VLAN environment, thus kernel startup log can be retrieved
> > via VLAN.
> > 
> > Signed-off-by: Libing Zhou 
> > ---
> >  Documentation/networking/netconsole.rst | 10 -
> >  drivers/net/netconsole.c|  3 +-
> >  include/linux/netpoll.h |  3 ++
> >  net/core/netpoll.c  | 58 -
> >  4 files changed, 70 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/networking/netconsole.rst 
> > b/Documentation/networking/netconsole.rst
> > index 1f5c4a04027c..a08387fcc3f0 100644
> > --- a/Documentation/networking/netconsole.rst
> > +++ b/Documentation/networking/netconsole.rst
> > @@ -13,6 +13,8 @@ IPv6 support by Cong Wang , Jan 
> > 1 2013
> >  
> >  Extended console support by Tejun Heo , May 1 2015
> >  
> > +VLAN support by Libing Zhou , Dec 8 2020
> > +
> >  Please send bug reports to Matt Mackall 
> >  Satyam Sharma , and Cong Wang 
> > 
> >  
> > @@ -34,7 +36,7 @@ Sender and receiver configuration:
> >  It takes a string configuration parameter "netconsole" in the
> >  following format::
> >  
> > - 
> > netconsole=[+][src-port]@[src-ip]/[],[tgt-port]@/[tgt-macaddr]
> > + 
> > netconsole=[+][src-port]@[src-ip]/[],[tgt-port]@/[tgt-macaddr][-V]
> >  
> > where
> > + if present, enable extended console support
> > @@ -44,11 +46,17 @@ following format::
> > tgt-port  port for logging agent ()
> > tgt-ipIP address for logging agent
> > tgt-macaddr   ethernet MAC address for logging agent (broadcast)
> > +   -Vif present, enable VLAN support
> > +   vid:pcp   VLAN identifier and priority code point
> >  
> >  Examples::
> >  
> >   linux netconsole=@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc
> >  
> > +or using VLAN::
> > +
> > + linux netconsole=@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc-V100:1
> > +
> >  or::
> >  
> >   insmod netconsole netconsole=@/,@10.0.0.2/
> > diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> > index 92001f7af380..f0690cd6a744 100644
> > --- a/drivers/net/netconsole.c
> > +++ b/drivers/net/netconsole.c
> > @@ -36,7 +36,6 @@
> >  #include 
> >  #include 
> >  #include 
> > -
> >  MODULE_AUTHOR("Maintainer: Matt Mackall ");
> >  MODULE_DESCRIPTION("Console driver for network interfaces");
> >  MODULE_LICENSE("GPL");
> > @@ -46,7 +45,7 @@ MODULE_LICENSE("GPL");
> >  
> >  static char config[MAX_PARAM_LENGTH];
> >  module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
> > -MODULE_PARM_DESC(netconsole, " 
> > netconsole=[src-port]@[src-ip]/[dev],[tgt-port]@/[tgt-macaddr]");
> > +MODULE_PARM_DESC(netconsole, " 
> > netconsole=[src-port]@[src-ip]/[dev],[tgt-port]@/[tgt-macaddr][-V]");
> >  
> >  static bool oops_only = false;
> >  module_param(oops_only, bool, 0600);
> > diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
> > index e6a2d72e0dc7..8ab3f25cadae 100644
> > --- a/include/linux/netpoll.h
> > +++ b/include/linux/netpoll.h
> > @@ -31,6 +31,9 @@ struct netpoll {
> > bool ipv6;
> > u16 local_port, remote_port;
> > u8 remote_mac[ETH_ALEN];
> > +   bool vlan_present;
> > +   u16 vlan_id;
> > +   u8 pcp;
> >  };
> >  
> >  struct netpoll_info {
> > diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> > index 2338753e936b..077a7aec51ae 100644
> > --- a/net/core/netpoll.c
> > +++ b/net/core/netpoll.c
> > @@ -478,6 +478,14 @@ void netpoll_send_udp(struct netpoll *np, const char 
> > *msg, int len)
> >  
> > skb->dev = np->dev;
> >  
> > +   if (np->vlan_present) {
> > +   skb->vlan_proto = htons(ETH_P_8021Q);
> > +
> > +   /* htons for tci is done in __vlan_insert_inner_tag, not here */
> > +   skb->vlan_tci = (np->pcp << VLAN_PRIO_SHIFT) + (np->vlan_id & 
> > VLAN_VID_MASK);
> > +   skb->vlan_present = 1;
> > +   }
> 
> This does not seem to be the way to go around this, I would rather
> specifying eth0. on the netconsole parameters and automatically
> create a VLAN interface from that which would ensure that everything
> works properly and that the VLAN interface is linked to its lower device
> properly.
> 
> If you prefer your current syntax, that is probably fine, too but you
> should consider registering a VLAN device when you parse appropriate
> options.
> -- 
> Florian

I don't have a strong opinion, considering that I have not actually
tried this. I do tend to agree with Florian about registering an 8021q
interface, as long as there aren't any other complications associated
with it. As for the syntax, I am not sure if "eth0." is universal
enough to make an ABI out of it.


Re: [PATCH v2 bpf-next 2/2] net: xdp: introduce xdp_prepare_buff utility routine

2020-12-12 Thread Lorenzo Bianconi
On Dec 11, Saeed Mahameed wrote:
> On Fri, 2020-12-11 at 20:28 +0100, Lorenzo Bianconi wrote:
> > Introduce xdp_prepare_buff utility routine to initialize per-
> > descriptor
> > xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff()
> > in
> > all XDP capable drivers.
> > 
> > Signed-off-by: Lorenzo Bianconi 
> > 
> ...
> > +static inline void
> > +xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
> > +int headroom, int data_len)
> > +{
> > +   xdp->data_hard_start = hard_start;
> > +   xdp->data = hard_start + headroom;
> > +   xdp->data_end = xdp->data + data_len;
> > +   xdp->data_meta = xdp->data;
> 
> You might want to compute data = hard_start + headroom; on a local var,
> and hopefully gcc will put it into a register, then reuse it three
> times instead of the 2 xdp->data de-references you got at the end of
> the function.
> 
> unsigned char *data = hard_start + headroom;
> 
> xdp->data_hard_start = hard_start;
> xdp->data = data;
> xdp->data_end = data + data_len;
> xdp->data_meta = data;

ack, I will fix it in v3.

Regards,
Lorenzo

> 
> 


signature.asc
Description: PGP signature


[PATCH 0/3] PROTO_CMSG_DATA_ONLY for Datagram (UDP)

2020-12-12 Thread Victor Stewart
RE our conversation on the "[RFC 0/1] whitelisting UDP GSO and GRO
cmsgs" thread...

https://lore.kernel.org/io-uring/CAM1kxwi5m6i8hrtkw7nZYoziPTD-Wp03+fcsUwh3CuSc=81...@mail.gmail.com/

here are the patches we discussed.

Victor Stewart (3):
   net/socket.c: add PROTO_CMSG_DATA_ONLY to __sys_sendmsg_sock
   net/ipv4/af_inet.c: add PROTO_CMSG_DATA_ONLY to inet_dgram_ops
   net/ipv6/af_inet6.c: add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops

   net/ipv4/af_inet.c
 |   1 +
   net/ipv6/af_inet6.c
|   1 +
   net/socket.c
   |   8 +-
   3 files changed, 7 insertions(+), 3 deletions(-)


[PATCH 2/3] add PROTO_CMSG_DATA_ONLY to inet_dgram_ops

2020-12-12 Thread Victor Stewart
add PROTO_CMSG_DATA_ONLY to inet_dgram_ops

Signed-off by: Victor Stewart 
---
net/ipv4/af_inet.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index b7260c8cef2e..c9fd5e7cfd6e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1052,6 +1052,7 @@ EXPORT_SYMBOL(inet_stream_ops);

 const struct proto_ops inet_dgram_ops = {
.family= PF_INET,
+   .flags = PROTO_CMSG_DATA_ONLY,
.owner = THIS_MODULE,
.release   = inet_release,
.bind  = inet_bind,


[PATCH 3/3] add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops

2020-12-12 Thread Victor Stewart
add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops

Signed-off by: Victor Stewart 
---
net/ipv6/af_inet6.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index e648fbebb167..560f45009d06 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -695,6 +695,7 @@ const struct proto_ops inet6_stream_ops = {

 const struct proto_ops inet6_dgram_ops = {
.family= PF_INET6,
+   .flags = PROTO_CMSG_DATA_ONLY,
.owner = THIS_MODULE,
.release   = inet6_release,
.bind  = inet6_bind,


[PATCH 1/3] add PROTO_CMSG_DATA_ONLY to __sys_sendmsg_sock

2020-12-12 Thread Victor Stewart
add PROTO_CMSG_DATA_ONLY whitelisting to __sys_sendmsg_sock

Signed-off by: Victor Stewart 
---
 net/socket.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 6e62104f..6995835d6355 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2416,9 +2416,11 @@ static int ___sys_sendmsg(struct socket *sock,
struct user_msghdr __user *msg,
 long __sys_sendmsg_sock(struct socket *sock, struct msghdr *msg,
unsigned int flags)
 {
-   /* disallow ancillary data requests from this path */
-   if (msg->msg_control || msg->msg_controllen)
-   return -EINVAL;
+   if (msg->msg_control || msg->msg_controllen) {
+   /* disallow ancillary data reqs unless cmsg is plain data */
+   if (!(sock->ops->flags & PROTO_CMSG_DATA_ONLY))
+   return -EINVAL;
+   }

return sys_sendmsg(sock, msg, flags, NULL, 0);
 }


Re: [PATCH RESEND net-next 1/2] dpaa2-eth: send a scatter-gather FD instead of realloc-ing

2020-12-12 Thread Jon Nettleton
On Fri, Dec 11, 2020 at 5:56 PM Ioana Ciornei  wrote:
>
> On Fri, Dec 11, 2020 at 04:29:14PM +, Daniel Thompson wrote:
> > On Fri, Dec 11, 2020 at 02:01:28PM +, Ioana Ciornei wrote:
> > > On Thu, Dec 10, 2020 at 08:06:36PM +0200, Ioana Ciornei wrote:
> > > > [Added also the netdev mailing list, I haven't heard of linux-netdev
> > > > before but kept it]
> > > >
> > > > On Thu, Dec 10, 2020 at 05:31:56PM +, Daniel Thompson wrote:
> > > > > Hi Ioana
> > > >
> > > > Hi Daniel,
> > > >
> > > > >
> > > > > On Mon, Jun 29, 2020 at 06:47:11PM +, Ioana Ciornei wrote:
> > > > > > Instead of realloc-ing the skb on the Tx path when the provided 
> > > > > > headroom
> > > > > > is smaller than the HW requirements, create a Scatter/Gather frame
> > > > > > descriptor with only one entry.
> > > > > >
> > > > > > Remove the '[drv] tx realloc frames' counter exposed previously 
> > > > > > through
> > > > > > ethtool since it is no longer used.
> > > > > >
> > > > > > Signed-off-by: Ioana Ciornei 
> > > > > > ---
> > > > >
> > > > > I've been chasing down a networking regression on my LX2160A board
> > > > > (Honeycomb LX2K based on CEx7 LX2160A COM) that first appeared in 
> > > > > v5.9.
> > > > >
> > > > > It makes the board unreliable opening outbound connections meaning
> > > > > things like `apt update` or `git fetch` often can't open the 
> > > > > connection.
> > > > > It does not happen all the time but is sufficient to make the boards
> > > > > built-in networking useless for workstation use.
> > > > >
> > > > > The problem is strongly linked to warnings in the logs so I used the
> > > > > warnings to bisect down to locate the cause of the regression and it
> > > > > pinpointed this patch. I have confirmed that in both v5.9 and 
> > > > > v5.10-rc7
> > > > > that reverting this patch (and fixing up the merge issues) fixes the
> > > > > regression and the warnings stop appearing.
> > > > >
> > > > > A typical example of the warning is below (io-pgtable-arm.c:281 is an
> > > > > error path that I guess would cause dma_map_page_attrs() to return
> > > > > an error):
> > > > >
> > > > > [  714.464927] WARNING: CPU: 13 PID: 0 at
> > > > > drivers/iommu/io-pgtable-arm.c:281 __arm_lpae_map+0x2d4/0x30c
> > > > > [  714.464930] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E)
> > > > > snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) 
> > > > > bridge(E)
> > > > > stp(E) llc(E) rfkill(E) caam_jr(E) crypto_engine(E) rng_core(E)
> > > > > joydev(E) evdev(E) dpaa2_caam(E) caamhash_desc(E) caamalg_desc(E)
> > > > > authenc(E) libdes(E) dpaa2_console(E) ofpart(E) caam(E) sg(E) error(E)
> > > > > lm90(E) at24(E) spi_nor(E) mtd(E) sbsa_gwdt(E) qoriq_thermal(E)
> > > > > layerscape_edac_mod(E) qoriq_cpufreq(E) drm(E) fuse(E) configfs(E)
> > > > > ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E)
> > > > > mbcache(E) jbd2(E) hid_generic(E) usbhid(E) hid(E) dm_crypt(E) 
> > > > > dm_mod(E)
> > > > > sd_mod(E) fsl_dpaa2_ptp(E) ptp_qoriq(E) fsl_dpaa2_eth(E)
> > > > > xhci_plat_hcd(E) xhci_hcd(E) usbcore(E) aes_ce_blk(E) crypto_simd(E)
> > > > > cryptd(E) aes_ce_cipher(E) ghash_ce(E) gf128mul(E) at803x(E) libaes(E)
> > > > > fsl_mc_dpio(E) pcs_lynx(E) rtc_pcf2127(E) sha2_ce(E) phylink(E)
> > > > > xgmac_mdio(E) regmap_spi(E) of_mdio(E) sha256_arm64(E)
> > > > > i2c_mux_pca954x(E) fixed_phy(E) i2c_mux(E) sha1_ce(E) ptp(E) libphy(E)
> > > > > [  714.465131]  pps_core(E) ahci_qoriq(E) libahci_platform(E) nvme(E)
> > > > > libahci(E) nvme_core(E) t10_pi(E) libata(E) crc_t10dif(E)
> > > > > crct10dif_generic(E) crct10dif_common(E) dwc3(E) scsi_mod(E) 
> > > > > udc_core(E)
> > > > > roles(E) ulpi(E) sdhci_of_esdhc(E) sdhci_pltfm(E) sdhci(E)
> > > > > spi_nxp_fspi(E) i2c_imx(E) fixed(E)
> > > > > [  714.465192] CPU: 13 PID: 0 Comm: swapper/13 Tainted: GW   E
> > > > > 5.10.0-rc7-1-gba98d13279ca #52
> > > > > [  714.465196] Hardware name: SolidRun LX2160A Honeycomb (DT)
> > > > > [  714.465202] pstate: 6005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > > > > [  714.465207] pc : __arm_lpae_map+0x2d4/0x30c
> > > > > [  714.465211] lr : __arm_lpae_map+0x114/0x30c
> > > > > [  714.465215] sp : 80001006b340
> > > > > [  714.465219] x29: 80001006b340 x28: 002086538003
> > > > > [  714.465227] x27: 0a20 x26: 1000
> > > > > [  714.465236] x25: 0f44 x24: 0020adf8d000
> > > > > [  714.465245] x23: 0001 x22: faeca000
> > > > > [  714.465253] x21: 0003 x20: 19b60d64d200
> > > > > [  714.465261] x19: 00ca x18: 
> > > > > [  714.465270] x17:  x16: cccb7cf3ca20
> > > > > [  714.465278] x15:  x14: 
> > > > > [  714.465286] x13: 0003 x12: 0010
> > > > > [  714.465294] x11:  x10: 0002
> > > > > [  714.465302] x9 : cccb7d5b6e78 x8 : 01ff
> > > > > [  714.465311] x7 

[PATCH AUTOSEL 4.14 4/8] vxlan: Add needed_headroom for lower device

2020-12-12 Thread Sasha Levin
From: Sven Eckelmann 

[ Upstream commit 0a35dc41fea67ac4495ce7584406bf9557a6e7d0 ]

It was observed that sending data via batadv over vxlan (on top of
wireguard) reduced the performance massively compared to raw ethernet or
batadv on raw ethernet. A check of perf data showed that the
vxlan_build_skb was calling all the time pskb_expand_head to allocate
enough headroom for:

  min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
+ VXLAN_HLEN + iphdr_len;

But the vxlan_config_apply only requested needed headroom for:

  lowerdev->hard_header_len + VXLAN6_HEADROOM or VXLAN_HEADROOM

So it completely ignored the needed_headroom of the lower device. The first
caller of net_dev_xmit could therefore never make sure that enough headroom
was allocated for the rest of the transmit path.

Cc: Annika Wickert 
Signed-off-by: Sven Eckelmann 
Tested-by: Annika Wickert 
Link: https://lore.kernel.org/r/20201126125247.1047977-1-s...@narfation.org
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin 
---
 drivers/net/vxlan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 82efa5bbf568b..c21f28840f05b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3184,6 +3184,7 @@ static void vxlan_config_apply(struct net_device *dev,
dev->gso_max_segs = lowerdev->gso_max_segs;
 
needed_headroom = lowerdev->hard_header_len;
+   needed_headroom += lowerdev->needed_headroom;
 
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
   VXLAN_HEADROOM);
-- 
2.27.0



[PATCH AUTOSEL 4.14 5/8] vxlan: Copy needed_tailroom from lowerdev

2020-12-12 Thread Sasha Levin
From: Sven Eckelmann 

[ Upstream commit a5e74021e84bb5eadf760aaf2c583304f02269be ]

While vxlan doesn't need any extra tailroom, the lowerdev might need it. In
that case, copy it over to reduce the chance for additional (re)allocations
in the transmit path.

Signed-off-by: Sven Eckelmann 
Link: https://lore.kernel.org/r/20201126125247.1047977-2-s...@narfation.org
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin 
---
 drivers/net/vxlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index c21f28840f05b..94a9add2fc878 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3186,6 +3186,8 @@ static void vxlan_config_apply(struct net_device *dev,
needed_headroom = lowerdev->hard_header_len;
needed_headroom += lowerdev->needed_headroom;
 
+   dev->needed_tailroom = lowerdev->needed_tailroom;
+
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
   VXLAN_HEADROOM);
if (max_mtu < ETH_MIN_MTU)
-- 
2.27.0



Re: [PATCH v3] Compiler Attributes: remove CONFIG_ENABLE_MUST_CHECK

2020-12-12 Thread Guenter Roeck
On Sun, Nov 29, 2020 at 04:33:35AM +0900, Masahiro Yamada wrote:
> Revert commit cebc04ba9aeb ("add CONFIG_ENABLE_MUST_CHECK").
> 
> A lot of warn_unused_result warnings existed in 2006, but until now
> they have been fixed thanks to people doing allmodconfig tests.
> 
> Our goal is to always enable __must_check where appropriate, so this
> CONFIG option is no longer needed.
> 
> I see a lot of defconfig (arch/*/configs/*_defconfig) files having:
> 
> # CONFIG_ENABLE_MUST_CHECK is not set
> 
> I did not touch them for now since it would be a big churn. If arch
> maintainers want to clean them up, please go ahead.
> 
> While I was here, I also moved __must_check to compiler_attributes.h
> from compiler_types.h
> 
> Signed-off-by: Masahiro Yamada 
> Acked-by: Jason A. Donenfeld 
> Acked-by: Nathan Chancellor 
> Reviewed-by: Nick Desaulniers 

This patch results in:

arch/sh/kernel/cpu/sh4a/smp-shx3.c: In function 'shx3_prepare_cpus':
arch/sh/kernel/cpu/sh4a/smp-shx3.c:76:3: error: ignoring return value of 
'request_irq' declared with attribute 'warn_unused_result'

when building sh:defconfig. Checking for calls to request_irq()
suggests that there will be other similar errors in various builds.
Reverting the patch fixes the problem.

Guenter


[PATCH AUTOSEL 4.19 6/9] vxlan: Copy needed_tailroom from lowerdev

2020-12-12 Thread Sasha Levin
From: Sven Eckelmann 

[ Upstream commit a5e74021e84bb5eadf760aaf2c583304f02269be ]

While vxlan doesn't need any extra tailroom, the lowerdev might need it. In
that case, copy it over to reduce the chance for additional (re)allocations
in the transmit path.

Signed-off-by: Sven Eckelmann 
Link: https://lore.kernel.org/r/20201126125247.1047977-2-s...@narfation.org
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin 
---
 drivers/net/vxlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 8481a21fe7afb..66fffbd64a33f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3182,6 +3182,8 @@ static void vxlan_config_apply(struct net_device *dev,
needed_headroom = lowerdev->hard_header_len;
needed_headroom += lowerdev->needed_headroom;
 
+   dev->needed_tailroom = lowerdev->needed_tailroom;
+
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
   VXLAN_HEADROOM);
if (max_mtu < ETH_MIN_MTU)
-- 
2.27.0



[PATCH AUTOSEL 4.19 5/9] vxlan: Add needed_headroom for lower device

2020-12-12 Thread Sasha Levin
From: Sven Eckelmann 

[ Upstream commit 0a35dc41fea67ac4495ce7584406bf9557a6e7d0 ]

It was observed that sending data via batadv over vxlan (on top of
wireguard) reduced the performance massively compared to raw ethernet or
batadv on raw ethernet. A check of perf data showed that the
vxlan_build_skb was calling all the time pskb_expand_head to allocate
enough headroom for:

  min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
+ VXLAN_HLEN + iphdr_len;

But the vxlan_config_apply only requested needed headroom for:

  lowerdev->hard_header_len + VXLAN6_HEADROOM or VXLAN_HEADROOM

So it completely ignored the needed_headroom of the lower device. The first
caller of net_dev_xmit could therefore never make sure that enough headroom
was allocated for the rest of the transmit path.

Cc: Annika Wickert 
Signed-off-by: Sven Eckelmann 
Tested-by: Annika Wickert 
Link: https://lore.kernel.org/r/20201126125247.1047977-1-s...@narfation.org
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin 
---
 drivers/net/vxlan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index abf85f0ab72fc..8481a21fe7afb 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3180,6 +3180,7 @@ static void vxlan_config_apply(struct net_device *dev,
dev->gso_max_segs = lowerdev->gso_max_segs;
 
needed_headroom = lowerdev->hard_header_len;
+   needed_headroom += lowerdev->needed_headroom;
 
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
   VXLAN_HEADROOM);
-- 
2.27.0



[PATCH AUTOSEL 5.9 18/23] iwlwifi: pcie: add one missing entry for AX210

2020-12-12 Thread Sasha Levin
From: Luca Coelho 

[ Upstream commit 5febcdef30902fa870128b9789b873199f13aff1 ]

The 0x0024 subsytem device ID was missing from the list, so some AX210
devices were not recognized.  Add it.

Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201202143859.308eab4db42c.I3763196cd3f7bb36f3dcabf02ec4e7c4fe859c0f@changeid
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
index e02bafb8921f6..0a4c7d1b37f0e 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
@@ -536,6 +536,7 @@ static const struct pci_device_id iwl_hw_card_ids[] = {
 
{IWL_PCI_DEVICE(0x2725, 0x0090, iwlax211_2ax_cfg_so_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0020, iwlax210_2ax_cfg_ty_gf_a0)},
+   {IWL_PCI_DEVICE(0x2725, 0x0024, iwlax210_2ax_cfg_ty_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0310, iwlax210_2ax_cfg_ty_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0510, iwlax210_2ax_cfg_ty_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0A10, iwlax210_2ax_cfg_ty_gf_a0)},
-- 
2.27.0



[PATCH AUTOSEL 5.4 11/14] iwlwifi: pcie: add one missing entry for AX210

2020-12-12 Thread Sasha Levin
From: Luca Coelho 

[ Upstream commit 5febcdef30902fa870128b9789b873199f13aff1 ]

The 0x0024 subsytem device ID was missing from the list, so some AX210
devices were not recognized.  Add it.

Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201202143859.308eab4db42c.I3763196cd3f7bb36f3dcabf02ec4e7c4fe859c0f@changeid
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
index b0b7eca1754ed..f34297fd453c0 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
@@ -968,6 +968,7 @@ static const struct pci_device_id iwl_hw_card_ids[] = {
 
{IWL_PCI_DEVICE(0x2725, 0x0090, iwlax211_2ax_cfg_so_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0020, iwlax210_2ax_cfg_ty_gf_a0)},
+   {IWL_PCI_DEVICE(0x2725, 0x0024, iwlax210_2ax_cfg_ty_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0310, iwlax210_2ax_cfg_ty_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0510, iwlax210_2ax_cfg_ty_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0A10, iwlax210_2ax_cfg_ty_gf_a0)},
-- 
2.27.0



[PATCH AUTOSEL 5.4 07/14] vxlan: Copy needed_tailroom from lowerdev

2020-12-12 Thread Sasha Levin
From: Sven Eckelmann 

[ Upstream commit a5e74021e84bb5eadf760aaf2c583304f02269be ]

While vxlan doesn't need any extra tailroom, the lowerdev might need it. In
that case, copy it over to reduce the chance for additional (re)allocations
in the transmit path.

Signed-off-by: Sven Eckelmann 
Link: https://lore.kernel.org/r/20201126125247.1047977-2-s...@narfation.org
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin 
---
 drivers/net/vxlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3753cf0942865..5502e145aa17b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3540,6 +3540,8 @@ static void vxlan_config_apply(struct net_device *dev,
needed_headroom = lowerdev->hard_header_len;
needed_headroom += lowerdev->needed_headroom;
 
+   dev->needed_tailroom = lowerdev->needed_tailroom;
+
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
   VXLAN_HEADROOM);
if (max_mtu < ETH_MIN_MTU)
-- 
2.27.0



[PATCH AUTOSEL 5.4 06/14] vxlan: Add needed_headroom for lower device

2020-12-12 Thread Sasha Levin
From: Sven Eckelmann 

[ Upstream commit 0a35dc41fea67ac4495ce7584406bf9557a6e7d0 ]

It was observed that sending data via batadv over vxlan (on top of
wireguard) reduced the performance massively compared to raw ethernet or
batadv on raw ethernet. A check of perf data showed that the
vxlan_build_skb was calling all the time pskb_expand_head to allocate
enough headroom for:

  min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
+ VXLAN_HLEN + iphdr_len;

But the vxlan_config_apply only requested needed headroom for:

  lowerdev->hard_header_len + VXLAN6_HEADROOM or VXLAN_HEADROOM

So it completely ignored the needed_headroom of the lower device. The first
caller of net_dev_xmit could therefore never make sure that enough headroom
was allocated for the rest of the transmit path.

Cc: Annika Wickert 
Signed-off-by: Sven Eckelmann 
Tested-by: Annika Wickert 
Link: https://lore.kernel.org/r/20201126125247.1047977-1-s...@narfation.org
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin 
---
 drivers/net/vxlan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 630ac00a34ede..3753cf0942865 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3538,6 +3538,7 @@ static void vxlan_config_apply(struct net_device *dev,
dev->gso_max_segs = lowerdev->gso_max_segs;
 
needed_headroom = lowerdev->hard_header_len;
+   needed_headroom += lowerdev->needed_headroom;
 
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
   VXLAN_HEADROOM);
-- 
2.27.0



[PATCH AUTOSEL 5.9 19/23] iwlwifi: pcie: add some missing entries for AX210

2020-12-12 Thread Sasha Levin
From: Golan Ben Ami 

[ Upstream commit 9b15596c5006d82b2f82810e8cbf80d8c6e7e7b4 ]

Some subsytem device IDs were missing from the list, so some AX210
devices were not recognized.  Add them.

Signed-off-by: Golan Ben Ami 
Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201202143859.a06ba7540449.I7390305d088a49c1043c9b489154fe057989c18f@changeid
Link: https://lore.kernel.org/r/20201121003411.9450-1-ikegam...@gmail.com
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
index 0a4c7d1b37f0e..2eeb644981417 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
@@ -540,6 +540,11 @@ static const struct pci_device_id iwl_hw_card_ids[] = {
{IWL_PCI_DEVICE(0x2725, 0x0310, iwlax210_2ax_cfg_ty_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0510, iwlax210_2ax_cfg_ty_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x0A10, iwlax210_2ax_cfg_ty_gf_a0)},
+   {IWL_PCI_DEVICE(0x2725, 0xE020, iwlax210_2ax_cfg_ty_gf_a0)},
+   {IWL_PCI_DEVICE(0x2725, 0xE024, iwlax210_2ax_cfg_ty_gf_a0)},
+   {IWL_PCI_DEVICE(0x2725, 0x4020, iwlax210_2ax_cfg_ty_gf_a0)},
+   {IWL_PCI_DEVICE(0x2725, 0x6020, iwlax210_2ax_cfg_ty_gf_a0)},
+   {IWL_PCI_DEVICE(0x2725, 0x6024, iwlax210_2ax_cfg_ty_gf_a0)},
{IWL_PCI_DEVICE(0x2725, 0x00B0, iwlax411_2ax_cfg_sosnj_gf4_a0)},
{IWL_PCI_DEVICE(0x2726, 0x0090, iwlax211_cfg_snj_gf_a0)},
{IWL_PCI_DEVICE(0x2726, 0x00B0, iwlax411_2ax_cfg_sosnj_gf4_a0)},
-- 
2.27.0



[PATCH AUTOSEL 5.9 13/23] vxlan: Add needed_headroom for lower device

2020-12-12 Thread Sasha Levin
From: Sven Eckelmann 

[ Upstream commit 0a35dc41fea67ac4495ce7584406bf9557a6e7d0 ]

It was observed that sending data via batadv over vxlan (on top of
wireguard) reduced the performance massively compared to raw ethernet or
batadv on raw ethernet. A check of perf data showed that the
vxlan_build_skb was calling all the time pskb_expand_head to allocate
enough headroom for:

  min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
+ VXLAN_HLEN + iphdr_len;

But the vxlan_config_apply only requested needed headroom for:

  lowerdev->hard_header_len + VXLAN6_HEADROOM or VXLAN_HEADROOM

So it completely ignored the needed_headroom of the lower device. The first
caller of net_dev_xmit could therefore never make sure that enough headroom
was allocated for the rest of the transmit path.

Cc: Annika Wickert 
Signed-off-by: Sven Eckelmann 
Tested-by: Annika Wickert 
Link: https://lore.kernel.org/r/20201126125247.1047977-1-s...@narfation.org
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin 
---
 drivers/net/vxlan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index b248d9e694254..85c4a6bfc7c06 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3802,6 +3802,7 @@ static void vxlan_config_apply(struct net_device *dev,
dev->gso_max_segs = lowerdev->gso_max_segs;
 
needed_headroom = lowerdev->hard_header_len;
+   needed_headroom += lowerdev->needed_headroom;
 
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
   VXLAN_HEADROOM);
-- 
2.27.0



[PATCH AUTOSEL 5.9 14/23] vxlan: Copy needed_tailroom from lowerdev

2020-12-12 Thread Sasha Levin
From: Sven Eckelmann 

[ Upstream commit a5e74021e84bb5eadf760aaf2c583304f02269be ]

While vxlan doesn't need any extra tailroom, the lowerdev might need it. In
that case, copy it over to reduce the chance for additional (re)allocations
in the transmit path.

Signed-off-by: Sven Eckelmann 
Link: https://lore.kernel.org/r/20201126125247.1047977-2-s...@narfation.org
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin 
---
 drivers/net/vxlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 85c4a6bfc7c06..94e14238fb8a1 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3804,6 +3804,8 @@ static void vxlan_config_apply(struct net_device *dev,
needed_headroom = lowerdev->hard_header_len;
needed_headroom += lowerdev->needed_headroom;
 
+   dev->needed_tailroom = lowerdev->needed_tailroom;
+
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
   VXLAN_HEADROOM);
if (max_mtu < ETH_MIN_MTU)
-- 
2.27.0



[RFC] ravb: Add support for optional txc_refclk

2020-12-12 Thread Adam Ford
The SoC expects the txv_refclk is provided, but if it is provided
by a programmable clock, there needs to be a way to get and enable
this clock to operate.  It needs to be optional since it's only
necessary for those with programmable clocks.

Signed-off-by: Adam Ford 

diff --git a/drivers/net/ethernet/renesas/ravb.h 
b/drivers/net/ethernet/renesas/ravb.h
index 7453b17a37a2..ddf3bc5164d2 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -994,6 +994,7 @@ struct ravb_private {
struct platform_device *pdev;
void __iomem *addr;
struct clk *clk;
+   struct clk *ref_clk;
struct mdiobb_ctrl mdiobb;
u32 num_rx_ring[NUM_RX_QUEUE];
u32 num_tx_ring[NUM_TX_QUEUE];
diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index bd30505fbc57..4c3f95923ef2 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2148,6 +2148,18 @@ static int ravb_probe(struct platform_device *pdev)
goto out_release;
}
 
+   priv->ref_clk = devm_clk_get(&pdev->dev, "txc_refclk");
+   if (IS_ERR(priv->ref_clk)) {
+   if (PTR_ERR(priv->ref_clk) == -EPROBE_DEFER) {
+   /* for Probe defer return error */
+   error = PTR_ERR(priv->ref_clk);
+   goto out_release;
+   }
+   /* Ignore other errors since it's optional */
+   } else {
+   (void)clk_prepare_enable(priv->ref_clk);
+   }
+
ndev->max_mtu = 2048 - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN);
ndev->min_mtu = ETH_MIN_MTU;
 
-- 
2.25.1



Re: [PATCH] net/mlx4: Use true,false for bool variable

2020-12-12 Thread Jakub Kicinski
On Fri, 11 Dec 2020 11:05:18 +0100 Vasyl Gomonovych wrote:
> Fix en_rx.c:687:1-17: WARNING: Assignment of 0/1 to bool variable
> Fix main.c:4465:5-13: WARNING: Comparison of 0/1 to bool variable

Apart from addressing Joe's comment please name the tool which produced
those.


Re: [PATCH 0/3] PROTO_CMSG_DATA_ONLY for Datagram (UDP)

2020-12-12 Thread Jens Axboe
On 12/12/20 8:31 AM, Victor Stewart wrote:
> RE our conversation on the "[RFC 0/1] whitelisting UDP GSO and GRO
> cmsgs" thread...
> 
> https://lore.kernel.org/io-uring/CAM1kxwi5m6i8hrtkw7nZYoziPTD-Wp03+fcsUwh3CuSc=81...@mail.gmail.com/
> 
> here are the patches we discussed.
> 
> Victor Stewart (3):
>net/socket.c: add PROTO_CMSG_DATA_ONLY to __sys_sendmsg_sock
>net/ipv4/af_inet.c: add PROTO_CMSG_DATA_ONLY to inet_dgram_ops
>net/ipv6/af_inet6.c: add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops
> 
>net/ipv4/af_inet.c
>  |   1 +
>net/ipv6/af_inet6.c
> |   1 +
>net/socket.c
>|   8 +-
>3 files changed, 7 insertions(+), 3 deletions(-)

Changes look fine to me, but a few comments:

- I'd order 1/3 as 3/3, that ordering makes more sense as at that point it
  could actually be used.

- For adding it to af_inet/af_inet6, you should write a better commit message
  on the reasoning for the change. Right now it just describes what the
  patch does (which is obvious from the change), not WHY it's done. Really
  goes for current 1/3 as well, commit messages need to be better in
  general.

I'd also CC Jann Horn on the series, he's the one that found an issue there
in the past and also acked the previous change on doing PROTO_CMSG_DATA_ONLY.

-- 
Jens Axboe



Re: [PATCH net-next v3 0/4] vsock: Add flags field in the vsock address

2020-12-12 Thread Jakub Kicinski
On Fri, 11 Dec 2020 16:24:13 +0100 Stefano Garzarella wrote:
> On Fri, Dec 11, 2020 at 12:32:37PM +0200, Andra Paraschiv wrote:
> >vsock enables communication between virtual machines and the host they are
> >running on. Nested VMs can be setup to use vsock channels, as the multi
> >transport support has been available in the mainline since the v5.5 Linux 
> >kernel
> >has been released.
> >
> >Implicitly, if no host->guest vsock transport is loaded, all the vsock 
> >packets
> >are forwarded to the host. This behavior can be used to setup communication
> >channels between sibling VMs that are running on the same host. One example 
> >can
> >be the vsock channels that can be established within AWS Nitro Enclaves
> >(see Documentation/virt/ne_overview.rst).
> >
> >To be able to explicitly mark a connection as being used for a certain use 
> >case,
> >add a flags field in the vsock address data structure. The value of the flags
> >field is taken into consideration when the vsock transport is assigned. This 
> >way
> >can distinguish between different use cases, such as nested VMs / local
> >communication and sibling VMs.
> >
> >The flags field can be set in the user space application connect logic. On 
> >the
> >listen path, the field can be set in the kernel space logic.
> >  
> 
> I reviewed all the patches and they are in a good shape!
> 
> Maybe the last thing to add is a flags check in the 
> vsock_addr_validate(), to avoid that flags that we don't know how to 
> handle are specified.
> For example if in the future we add new flags that this version of the 
> kernel is not able to satisfy, we should return an error to the 
> application.
> 
> I mean something like this:
> 
>  diff --git a/net/vmw_vsock/vsock_addr.c b/net/vmw_vsock/vsock_addr.c
>  index 909de26cb0e7..73bb1d2fa526 100644
>  --- a/net/vmw_vsock/vsock_addr.c
>  +++ b/net/vmw_vsock/vsock_addr.c
>  @@ -22,6 +22,8 @@ EXPORT_SYMBOL_GPL(vsock_addr_init);
>   
>   int vsock_addr_validate(const struct sockaddr_vm *addr)
>   {
>  +   unsigned short svm_valid_flags = VMADDR_FLAG_TO_HOST;
>  +
>  if (!addr)
>  return -EFAULT;
>   
>  @@ -31,6 +33,9 @@ int vsock_addr_validate(const struct sockaddr_vm *addr)
>  if (addr->svm_zero[0] != 0)
>  return -EINVAL;

Strictly speaking this check should be superseded by the check below
(AKA removed). We used to check svm_zero[0], with the new field added
this now checks svm_zero[2]. Old applications may have not initialized
svm_zero[2] (we're talking about binary compatibility here, apps built
with old headers).

>  +   if (addr->svm_flags & ~svm_valid_flags)
>  +   return -EINVAL;

The flags should also probably be one byte (we can define a "more
flags" flag to unlock further bytes) - otherwise on big endian the 
new flag will fall into svm_zero[1] so the v3 improvements are moot 
for big endian, right?

>  return 0;
>   }
>   EXPORT_SYMBOL_GPL(vsock_addr_validate);


Re: [PATCH v5 1/2] net: dsa: add optional stats64 support

2020-12-12 Thread Jakub Kicinski
On Fri, 11 Dec 2020 11:53:21 +0100 Oleksij Rempel wrote:
> +static void dsa_slave_get_stats64(struct net_device *dev,
> +   struct rtnl_link_stats64 *s)
> +{
> + struct dsa_port *dp = dsa_slave_to_port(dev);
> + struct dsa_switch *ds = dp->ds;
> +
> + if (!ds->ops->get_stats64)
> + return dev_get_tstats64(dev, s);
> +
> + return ds->ops->get_stats64(ds, dp->index, s);

nit: please don't return void, "else" will do just fine here


Re: [PATCH net-next 13/15] mlxsw: spectrum_router_xm: Introduce basic XM cache flushing

2020-12-12 Thread Ido Schimmel
On Fri, Dec 11, 2020 at 08:24:27PM -0800, Jakub Kicinski wrote:
> On Fri, 11 Dec 2020 19:04:11 +0200 Ido Schimmel wrote:
> > From: Jiri Pirko 
> > 
> > Upon route insertion and removal, it is needed to flush possibly cached
> > entries from the XM cache. Extend XM op context to carry information
> > needed for the flush. Implement the flush in delayed work since for HW
> > design reasons there is a need to wait 50usec before the flush can be
> > done. If during this time comes the same flush request, consolidate it
> > to the first one. Implement this queued flushes by a hashtable.
> > 
> > Signed-off-by: Jiri Pirko 
> > Signed-off-by: Ido Schimmel 
> 
> 32 bit does not like this patch:

Thanks

Jiri, looks like this fix is needed:

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c
index b680c22eff7d..d213af723a2a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c
@@ -358,7 +358,7 @@ mlxsw_sp_router_xm_cache_flush_node_destroy(struct mlxsw_sp 
*mlxsw_sp,
 
 static u32 mlxsw_sp_router_xm_flush_mask4(u8 prefix_len)
 {
-   return GENMASK(32, 32 - prefix_len);
+   return GENMASK(31, 32 - prefix_len);
 }
 
 static unsigned char *mlxsw_sp_router_xm_flush_mask6(u8 prefix_len)

> 
> In file included from ../include/linux/bitops.h:5,
>  from ../include/linux/kernel.h:12,
>  from 
> ../drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c:4:
> ../drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c: In function 
> ‘mlxsw_sp_router_xm_flush_mask4’:
> ../include/linux/bits.h:36:11: warning: right shift count is negative 
> [-Wshift-count-negative]
>36 |   (~UL(0) >> (BITS_PER_LONG - 1 - (h
>   |   ^~
> ../include/linux/bits.h:38:31: note: in expansion of macro ‘__GENMASK’
>38 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>   |   ^
> ../drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c:361:9: note: in 
> expansion of macro ‘GENMASK’
>   361 |  return GENMASK(32, 32 - prefix_len);
>   | ^~~
> ../drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c:361:16: warning: 
> shift count is negative (-1)
> In file included from ../include/linux/bitops.h:5,
>  from ../include/linux/kernel.h:12,
>  from 
> ../drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c:4:
> ../drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c: In function 
> ‘mlxsw_sp_router_xm_flush_mask4’:
> ../include/linux/bits.h:36:11: warning: right shift count is negative 
> [-Wshift-count-negative]
>36 |   (~UL(0) >> (BITS_PER_LONG - 1 - (h
>   |   ^~
> ../include/linux/bits.h:38:31: note: in expansion of macro ‘__GENMASK’
>38 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>   |   ^
> ../drivers/net/ethernet/mellanox/mlxsw/spectrum_router_xm.c:361:9: note: in 
> expansion of macro ‘GENMASK’
>   361 |  return GENMASK(32, 32 - prefix_len);
>   | ^~~


Re: [PATCH 0/3] PROTO_CMSG_DATA_ONLY for Datagram (UDP)

2020-12-12 Thread Victor Stewart
On Sat, Dec 12, 2020 at 5:07 PM Jens Axboe  wrote:
>
> On 12/12/20 8:31 AM, Victor Stewart wrote:
> > RE our conversation on the "[RFC 0/1] whitelisting UDP GSO and GRO
> > cmsgs" thread...
> >
> > https://lore.kernel.org/io-uring/CAM1kxwi5m6i8hrtkw7nZYoziPTD-Wp03+fcsUwh3CuSc=81...@mail.gmail.com/
> >
> > here are the patches we discussed.
> >
> > Victor Stewart (3):
> >net/socket.c: add PROTO_CMSG_DATA_ONLY to __sys_sendmsg_sock
> >net/ipv4/af_inet.c: add PROTO_CMSG_DATA_ONLY to inet_dgram_ops
> >net/ipv6/af_inet6.c: add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops
> >
> >net/ipv4/af_inet.c
> >  |   1 +
> >net/ipv6/af_inet6.c
> > |   1 +
> >net/socket.c
> >|   8 +-
> >3 files changed, 7 insertions(+), 3 deletions(-)
>
> Changes look fine to me, but a few comments:
>
> - I'd order 1/3 as 3/3, that ordering makes more sense as at that point it
>   could actually be used.

right that makes sense.

>
> - For adding it to af_inet/af_inet6, you should write a better commit message
>   on the reasoning for the change. Right now it just describes what the
>   patch does (which is obvious from the change), not WHY it's done. Really
>   goes for current 1/3 as well, commit messages need to be better in
>   general.
>

okay thanks Jens. i would have reiterated the intention but assumed it
were implicit given I linked the initial conversation about enabling
UDP_SEGMENT (GSO) and UDP_GRO through io_uring.

> I'd also CC Jann Horn on the series, he's the one that found an issue there
> in the past and also acked the previous change on doing PROTO_CMSG_DATA_ONLY.

I CCed him on this reply. Soheil at the end of the first exchange
thread said he audited the UDP paths and believed this to be safe.

how/should I resubmit the patch with a proper intention explanation in
the meta and reorder the patches? my first patch and all lol.

>
> --
> Jens Axboe
>


Re: [PATCH v5 2/2] net: dsa: qca: ar9331: export stats64

2020-12-12 Thread Jakub Kicinski
On Fri, 11 Dec 2020 11:53:22 +0100 Oleksij Rempel wrote:
> Add stats support for the ar9331 switch.
> 
> Signed-off-by: Oleksij Rempel 
> ---
>  drivers/net/dsa/qca/ar9331.c | 256 ++-
>  1 file changed, 255 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c
> index 4d49c5f2b790..5baef0ec6410 100644
> --- a/drivers/net/dsa/qca/ar9331.c
> +++ b/drivers/net/dsa/qca/ar9331.c
> @@ -101,6 +101,9 @@
>AR9331_SW_PORT_STATUS_RX_FLOW_EN | AR9331_SW_PORT_STATUS_TX_FLOW_EN | \
>AR9331_SW_PORT_STATUS_SPEED_M)
>  
> +/* MIB registers */
> +#define AR9331_MIB_COUNTER(x)(0x2 + ((x) * 
> 0x100))
> +
>  /* Phy bypass mode
>   * 
>   * Bit:   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
> @@ -154,6 +157,111 @@
>  #define AR9331_SW_MDIO_POLL_SLEEP_US 1
>  #define AR9331_SW_MDIO_POLL_TIMEOUT_US   20
>  
> +/* The interval should be small enough to avoid overflow of 32bit MIBs */
> +/*
> + * FIXME: as long as we can't read MIBs from stats64 call directly, we should
> + * poll stats more frequently then it is actually needed. In normal case
> + * 100 sec interval should be OK.

This comment is a little confusing, if you don't mind.

Should it says something like:

 FIXME: until we can read MIBs from stats64 call directly (i.e. sleep
 there), we have to poll stats more frequently then it is actually needed.
 For overflow protection, normally, 100 sec interval should have been OK.



Re: [PATCH 0/3] PROTO_CMSG_DATA_ONLY for Datagram (UDP)

2020-12-12 Thread Jens Axboe
On 12/12/20 10:25 AM, Victor Stewart wrote:
> On Sat, Dec 12, 2020 at 5:07 PM Jens Axboe  wrote:
>>
>> On 12/12/20 8:31 AM, Victor Stewart wrote:
>>> RE our conversation on the "[RFC 0/1] whitelisting UDP GSO and GRO
>>> cmsgs" thread...
>>>
>>> https://lore.kernel.org/io-uring/CAM1kxwi5m6i8hrtkw7nZYoziPTD-Wp03+fcsUwh3CuSc=81...@mail.gmail.com/
>>>
>>> here are the patches we discussed.
>>>
>>> Victor Stewart (3):
>>>net/socket.c: add PROTO_CMSG_DATA_ONLY to __sys_sendmsg_sock
>>>net/ipv4/af_inet.c: add PROTO_CMSG_DATA_ONLY to inet_dgram_ops
>>>net/ipv6/af_inet6.c: add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops
>>>
>>>net/ipv4/af_inet.c
>>>  |   1 +
>>>net/ipv6/af_inet6.c
>>> |   1 +
>>>net/socket.c
>>>|   8 +-
>>>3 files changed, 7 insertions(+), 3 deletions(-)
>>
>> Changes look fine to me, but a few comments:
>>
>> - I'd order 1/3 as 3/3, that ordering makes more sense as at that point it
>>   could actually be used.
> 
> right that makes sense.
> 
>>
>> - For adding it to af_inet/af_inet6, you should write a better commit message
>>   on the reasoning for the change. Right now it just describes what the
>>   patch does (which is obvious from the change), not WHY it's done. Really
>>   goes for current 1/3 as well, commit messages need to be better in
>>   general.
>>
> 
> okay thanks Jens. i would have reiterated the intention but assumed it
> were implicit given I linked the initial conversation about enabling
> UDP_SEGMENT (GSO) and UDP_GRO through io_uring.
> 
>> I'd also CC Jann Horn on the series, he's the one that found an issue there
>> in the past and also acked the previous change on doing PROTO_CMSG_DATA_ONLY.
> 
> I CCed him on this reply. Soheil at the end of the first exchange
> thread said he audited the UDP paths and believed this to be safe.
> 
> how/should I resubmit the patch with a proper intention explanation in
> the meta and reorder the patches? my first patch and all lol.

Just post is as a v2 with the change noted in the cover letter. I'd also
ensure that it threads properly, right now it's just coming through as 4
separate emails at my end. If you're using git send-email, make sure you
add --thread to the arguments.

-- 
Jens Axboe



[PATCH v3 bpf-next 0/2] introduce xdp_init_buff/xdp_prepare_buff

2020-12-12 Thread Lorenzo Bianconi
Introduce xdp_init_buff and xdp_prepare_buff utility routines to initialize
xdp_buff data structure and remove duplicated code in all XDP capable
drivers.

Changes since v2:
- precompute xdp->data as hard_start + headroom and save it in a local
  variable to reuse it for xdp->data_end and xdp->data_meta in
  xdp_prepare_buff()

Changes since v1:
- introduce xdp_prepare_buff utility routine

Lorenzo Bianconi (2):
  net: xdp: introduce xdp_init_buff utility routine
  net: xdp: introduce xdp_prepare_buff utility routine

 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  8 +++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  7 ++-
 .../net/ethernet/cavium/thunder/nicvf_main.c  | 11 ++-
 .../net/ethernet/freescale/dpaa/dpaa_eth.c| 10 --
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  | 13 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 18 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c | 17 +
 drivers/net/ethernet/intel/igb/igb_main.c | 18 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 19 +--
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 19 +--
 drivers/net/ethernet/marvell/mvneta.c |  9 +++--
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 13 +++--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c|  8 +++-
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  7 ++-
 .../ethernet/netronome/nfp/nfp_net_common.c   | 12 ++--
 drivers/net/ethernet/qlogic/qede/qede_fp.c|  7 ++-
 drivers/net/ethernet/sfc/rx.c |  9 +++--
 drivers/net/ethernet/socionext/netsec.c   |  8 +++-
 drivers/net/ethernet/ti/cpsw.c| 17 ++---
 drivers/net/ethernet/ti/cpsw_new.c| 17 ++---
 drivers/net/hyperv/netvsc_bpf.c   |  7 ++-
 drivers/net/tun.c | 11 ---
 drivers/net/veth.c| 14 +-
 drivers/net/virtio_net.c  | 18 ++
 drivers/net/xen-netfront.c|  8 +++-
 include/net/xdp.h | 19 +++
 net/bpf/test_run.c|  9 +++--
 net/core/dev.c| 18 --
 28 files changed, 156 insertions(+), 195 deletions(-)

-- 
2.29.2



[PATCH v3 bpf-next 1/2] net: xdp: introduce xdp_init_buff utility routine

2020-12-12 Thread Lorenzo Bianconi
Introduce xdp_init_buff utility routine to initialize xdp_buff fields
const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on
xdp_init_buff in all XDP capable drivers.

Signed-off-by: Lorenzo Bianconi 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c| 3 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c   | 3 +--
 drivers/net/ethernet/cavium/thunder/nicvf_main.c| 4 ++--
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c  | 4 ++--
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c| 8 
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 6 +++---
 drivers/net/ethernet/intel/ice/ice_txrx.c   | 6 +++---
 drivers/net/ethernet/intel/igb/igb_main.c   | 6 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 7 +++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c   | 7 +++
 drivers/net/ethernet/marvell/mvneta.c   | 3 +--
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 8 +---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c  | 3 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 +--
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 4 ++--
 drivers/net/ethernet/qlogic/qede/qede_fp.c  | 3 +--
 drivers/net/ethernet/sfc/rx.c   | 3 +--
 drivers/net/ethernet/socionext/netsec.c | 3 +--
 drivers/net/ethernet/ti/cpsw.c  | 4 ++--
 drivers/net/ethernet/ti/cpsw_new.c  | 4 ++--
 drivers/net/hyperv/netvsc_bpf.c | 3 +--
 drivers/net/tun.c   | 7 +++
 drivers/net/veth.c  | 8 
 drivers/net/virtio_net.c| 6 ++
 drivers/net/xen-netfront.c  | 4 ++--
 include/net/xdp.h   | 7 +++
 net/bpf/test_run.c  | 4 ++--
 net/core/dev.c  | 8 
 28 files changed, 67 insertions(+), 72 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 0e98f45c2b22..338dce73927e 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1567,8 +1567,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, 
struct napi_struct *napi,
netif_dbg(rx_ring->adapter, rx_status, rx_ring->netdev,
  "%s qid %d\n", __func__, rx_ring->qid);
res_budget = budget;
-   xdp.rxq = &rx_ring->xdp_rxq;
-   xdp.frame_sz = ENA_PAGE_SIZE;
+   xdp_init_buff(&xdp, ENA_PAGE_SIZE, &rx_ring->xdp_rxq);
 
do {
xdp_verdict = XDP_PASS;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index fcc262064766..b7942c3440c0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -133,12 +133,11 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct 
bnxt_rx_ring_info *rxr, u16 cons,
dma_sync_single_for_cpu(&pdev->dev, mapping + offset, *len, bp->rx_dir);
 
txr = rxr->bnapi->tx_ring;
+   xdp_init_buff(&xdp, PAGE_SIZE, &rxr->xdp_rxq);
xdp.data_hard_start = *data_ptr - offset;
xdp.data = *data_ptr;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = *data_ptr + *len;
-   xdp.rxq = &rxr->xdp_rxq;
-   xdp.frame_sz = PAGE_SIZE; /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */
orig_data = xdp.data;
 
rcu_read_lock();
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index f3b7b443f964..9fc672f075f2 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -547,12 +547,12 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct 
bpf_prog *prog,
cpu_addr = (u64)phys_to_virt(cpu_addr);
page = virt_to_page((void *)cpu_addr);
 
+   xdp_init_buff(&xdp, RCV_FRAG_LEN + XDP_PACKET_HEADROOM,
+ &rq->xdp_rxq);
xdp.data_hard_start = page_address(page);
xdp.data = (void *)cpu_addr;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + len;
-   xdp.rxq = &rq->xdp_rxq;
-   xdp.frame_sz = RCV_FRAG_LEN + XDP_PACKET_HEADROOM;
orig_data = xdp.data;
 
rcu_read_lock();
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index e28510c282e5..9303e0aa 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2536,12 +2536,12 @@ static u32 dpaa_run_xdp(struct dpaa_priv *priv, struct 
qm_fd *fd, void *vaddr,
return XDP_PASS;
}
 
+   xdp_init_buff(&xdp, DPAA_BP_RAW_SIZE - DPAA_TX_PRIV_DATA_SIZE,
+ &dpaa_fq->xdp_rxq);
xdp.data = vaddr

[PATCH v3 bpf-next 2/2] net: xdp: introduce xdp_prepare_buff utility routine

2020-12-12 Thread Lorenzo Bianconi
Introduce xdp_prepare_buff utility routine to initialize per-descriptor
xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
all XDP capable drivers.

Signed-off-by: Lorenzo Bianconi 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  5 ++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  4 +---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c  |  7 ---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c|  6 ++
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c  | 13 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 12 ++--
 drivers/net/ethernet/intel/ice/ice_txrx.c | 11 ++-
 drivers/net/ethernet/intel/igb/igb_main.c | 12 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 ++--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 12 ++--
 drivers/net/ethernet/marvell/mvneta.c |  6 ++
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c   |  7 +++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c|  5 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |  4 +---
 .../net/ethernet/netronome/nfp/nfp_net_common.c   |  8 
 drivers/net/ethernet/qlogic/qede/qede_fp.c|  4 +---
 drivers/net/ethernet/sfc/rx.c |  6 ++
 drivers/net/ethernet/socionext/netsec.c   |  5 ++---
 drivers/net/ethernet/ti/cpsw.c| 15 +--
 drivers/net/ethernet/ti/cpsw_new.c| 15 +--
 drivers/net/hyperv/netvsc_bpf.c   |  4 +---
 drivers/net/tun.c |  4 +---
 drivers/net/veth.c|  6 +-
 drivers/net/virtio_net.c  | 12 
 drivers/net/xen-netfront.c|  4 +---
 include/net/xdp.h | 12 
 net/bpf/test_run.c|  5 +
 net/core/dev.c| 10 --
 28 files changed, 96 insertions(+), 130 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 338dce73927e..1cfd0c98677e 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1519,10 +1519,9 @@ static int ena_xdp_handle_buff(struct ena_ring *rx_ring, 
struct xdp_buff *xdp)
int ret;
 
rx_info = &rx_ring->rx_buffer_info[rx_ring->ena_bufs[0].req_id];
-   xdp->data = page_address(rx_info->page) + rx_info->page_offset;
+   xdp_prepare_buff(xdp, page_address(rx_info->page),
+rx_info->page_offset, rx_ring->ena_bufs[0].len);
xdp_set_data_meta_invalid(xdp);
-   xdp->data_hard_start = page_address(rx_info->page);
-   xdp->data_end = xdp->data + rx_ring->ena_bufs[0].len;
/* If for some reason we received a bigger packet than
 * we expect, then we simply drop it
 */
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index b7942c3440c0..e1664b86a7b8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -134,10 +134,8 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info 
*rxr, u16 cons,
 
txr = rxr->bnapi->tx_ring;
xdp_init_buff(&xdp, PAGE_SIZE, &rxr->xdp_rxq);
-   xdp.data_hard_start = *data_ptr - offset;
-   xdp.data = *data_ptr;
+   xdp_prepare_buff(&xdp, *data_ptr - offset, offset, *len);
xdp_set_data_meta_invalid(&xdp);
-   xdp.data_end = *data_ptr + *len;
orig_data = xdp.data;
 
rcu_read_lock();
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 9fc672f075f2..9bdac04359c6 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -530,6 +530,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct 
bpf_prog *prog,
struct cqe_rx_t *cqe_rx, struct snd_queue *sq,
struct rcv_queue *rq, struct sk_buff **skb)
 {
+   unsigned char *hard_start, *data;
struct xdp_buff xdp;
struct page *page;
u32 action;
@@ -549,10 +550,10 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct 
bpf_prog *prog,
 
xdp_init_buff(&xdp, RCV_FRAG_LEN + XDP_PACKET_HEADROOM,
  &rq->xdp_rxq);
-   xdp.data_hard_start = page_address(page);
-   xdp.data = (void *)cpu_addr;
+   hard_start = page_address(page);
+   data = (unsigned char *)cpu_addr;
+   xdp_prepare_buff(&xdp, hard_start, data - hard_start, len);
xdp_set_data_meta_invalid(&xdp);
-   xdp.data_end = xdp.data + len;
orig_data = xdp.data;
 
rcu_read_lock();
diff --git a/drivers/net/ethernet/freescale/dpaa/

Re: [RFC] ravb: Add support for optional txc_refclk

2020-12-12 Thread Sergei Shtylyov

Hello!

On 12.12.2020 19:56, Adam Ford wrote:


The SoC expects the txv_refclk is provided, but if it is provided
by a programmable clock, there needs to be a way to get and enable
this clock to operate.  It needs to be optional since it's only
necessary for those with programmable clocks.

Signed-off-by: Adam Ford 

[...]

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index bd30505fbc57..4c3f95923ef2 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2148,6 +2148,18 @@ static int ravb_probe(struct platform_device *pdev)
goto out_release;
}
  
+	priv->ref_clk = devm_clk_get(&pdev->dev, "txc_refclk");


   Why not devm_clk_get_optional()?


+   if (IS_ERR(priv->ref_clk)) {
+   if (PTR_ERR(priv->ref_clk) == -EPROBE_DEFER) {
+   /* for Probe defer return error */
+   error = PTR_ERR(priv->ref_clk);
+   goto out_release;
+   }
+   /* Ignore other errors since it's optional */
+   } else {
+   (void)clk_prepare_enable(priv->ref_clk);
+   }
+
ndev->max_mtu = 2048 - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN);
ndev->min_mtu = ETH_MIN_MTU;
  


MBR, Sergei


Re: [PATCH 0/3] PROTO_CMSG_DATA_ONLY for Datagram (UDP)

2020-12-12 Thread Victor Stewart
On Sat, Dec 12, 2020 at 5:40 PM Jens Axboe  wrote:
>
> On 12/12/20 10:25 AM, Victor Stewart wrote:
> > On Sat, Dec 12, 2020 at 5:07 PM Jens Axboe  wrote:
> >>
> >> On 12/12/20 8:31 AM, Victor Stewart wrote:
> >>> RE our conversation on the "[RFC 0/1] whitelisting UDP GSO and GRO
> >>> cmsgs" thread...
> >>>
> >>> https://lore.kernel.org/io-uring/CAM1kxwi5m6i8hrtkw7nZYoziPTD-Wp03+fcsUwh3CuSc=81...@mail.gmail.com/
> >>>
> >>> here are the patches we discussed.
> >>>
> >>> Victor Stewart (3):
> >>>net/socket.c: add PROTO_CMSG_DATA_ONLY to __sys_sendmsg_sock
> >>>net/ipv4/af_inet.c: add PROTO_CMSG_DATA_ONLY to inet_dgram_ops
> >>>net/ipv6/af_inet6.c: add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops
> >>>
> >>>net/ipv4/af_inet.c
> >>>  |   1 +
> >>>net/ipv6/af_inet6.c
> >>> |   1 +
> >>>net/socket.c
> >>>|   8 +-
> >>>3 files changed, 7 insertions(+), 3 deletions(-)
> >>
> >> Changes look fine to me, but a few comments:
> >>
> >> - I'd order 1/3 as 3/3, that ordering makes more sense as at that point it
> >>   could actually be used.
> >
> > right that makes sense.
> >
> >>
> >> - For adding it to af_inet/af_inet6, you should write a better commit 
> >> message
> >>   on the reasoning for the change. Right now it just describes what the
> >>   patch does (which is obvious from the change), not WHY it's done. Really
> >>   goes for current 1/3 as well, commit messages need to be better in
> >>   general.
> >>
> >
> > okay thanks Jens. i would have reiterated the intention but assumed it
> > were implicit given I linked the initial conversation about enabling
> > UDP_SEGMENT (GSO) and UDP_GRO through io_uring.
> >
> >> I'd also CC Jann Horn on the series, he's the one that found an issue there
> >> in the past and also acked the previous change on doing 
> >> PROTO_CMSG_DATA_ONLY.
> >
> > I CCed him on this reply. Soheil at the end of the first exchange
> > thread said he audited the UDP paths and believed this to be safe.
> >
> > how/should I resubmit the patch with a proper intention explanation in
> > the meta and reorder the patches? my first patch and all lol.
>
> Just post is as a v2 with the change noted in the cover letter. I'd also
> ensure that it threads properly, right now it's just coming through as 4
> separate emails at my end. If you're using git send-email, make sure you
> add --thread to the arguments.

oh i didn't know about git send-email. i was manually constructing /
sending them lol. thanks!

>
> --
> Jens Axboe
>


Re: [PATCH v5 2/2] net: dsa: qca: ar9331: export stats64

2020-12-12 Thread Jakub Kicinski
On Sat, 12 Dec 2020 15:48:52 +0200 Vladimir Oltean wrote:
> > +   stats->rx_packets = u64_stats_read(&s->rx64byte) +
> > +   u64_stats_read(&s->rx128byte) + u64_stats_read(&s->rx256byte) +
> > +   u64_stats_read(&s->rx512byte) + u64_stats_read(&s->rx1024byte) +
> > +   u64_stats_read(&s->rx1518byte) + u64_stats_read(&s->rxmaxbyte);
> > +   stats->tx_packets = u64_stats_read(&s->tx64byte) +
> > +   u64_stats_read(&s->tx128byte) + u64_stats_read(&s->tx256byte) +
> > +   u64_stats_read(&s->tx512byte) + u64_stats_read(&s->tx1024byte) +
> > +   u64_stats_read(&s->tx1518byte) + u64_stats_read(&s->txmaxbyte);
> > +   stats->rx_bytes = u64_stats_read(&s->rxgoodbyte);
> > +   stats->tx_bytes = u64_stats_read(&s->txbyte);
> > +   stats->rx_errors = u64_stats_read(&s->rxfcserr) +
> > +   u64_stats_read(&s->rxalignerr) + u64_stats_read(&s->rxrunt) +
> > +   u64_stats_read(&s->rxfragment) + u64_stats_read(&s->rxoverflow);
> > +   stats->tx_errors = u64_stats_read(&s->txoversize);  
> 
> Should tx_errors not also include tx_aborted_errors, tx_fifo_errors,
> tx_window_errors?

Yes.

> > +   stats->multicast = u64_stats_read(&s->rxmulti);
> > +   stats->collisions = u64_stats_read(&s->txcollision);
> > +   stats->rx_length_errors = u64_stats_read(&s->rxrunt) +
> > +   u64_stats_read(&s->rxfragment) + u64_stats_read(&s->rxtoolong);
> > +   stats->rx_crc_errors = u64_stats_read(&s->rxfcserr) +
> > +   u64_stats_read(&s->rxalignerr) + u64_stats_read(&s->rxfragment);

Why would CRC errors include alignment errors and rxfragments?

Besides looks like rxfragment is already counted to length errors.

> > +   stats->rx_frame_errors = u64_stats_read(&s->rxalignerr);
> > +   stats->rx_missed_errors = u64_stats_read(&s->rxoverflow);
> > +   stats->tx_aborted_errors = u64_stats_read(&s->txabortcol);
> > +   stats->tx_fifo_errors = u64_stats_read(&s->txunderrun);
> > +   stats->tx_window_errors = u64_stats_read(&s->txlatecol);
> > +   stats->rx_nohandler = u64_stats_read(&s->filtered);  
> 
> Should rx_nohandler not be also included in rx_errors?

I don't think drivers should ever touch rx_nohandler, it's a pretty
specific SW stat. But you made me realize that we never specified where
to count frames discarded due to L2 address filtering. It appears that
high speed adapters I was looking at don't have such statistic?

I would go with rx_dropped, if that's what ->filtered is.

We should probably update the doc like this:

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 874cc12a34d9..82708c6db432 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -75,8 +75,9 @@ struct rtnl_link_stats {
  *
  * @rx_dropped: Number of packets received but not processed,
  *   e.g. due to lack of resources or unsupported protocol.
- *   For hardware interfaces this counter should not include packets
- *   dropped by the device which are counted separately in
+ *   For hardware interfaces this counter may include packets discarded
+ *   due to L2 address filtering but should not include packets dropped
+ *   by the device due to buffer exhaustion which are counted separately in
  *   @rx_missed_errors (since procfs folds those two counters together).
  *
  * @tx_dropped: Number of packets dropped on their way to transmission,


> You can probably avoid reading a few of these twice by assigning the
> more specific ones first, then the rx_errors and tx_errors at the end.


Re: [PATCH 0/3] PROTO_CMSG_DATA_ONLY for Datagram (UDP)

2020-12-12 Thread Jens Axboe
On 12/12/20 10:58 AM, Victor Stewart wrote:
> On Sat, Dec 12, 2020 at 5:40 PM Jens Axboe  wrote:
>>
>> On 12/12/20 10:25 AM, Victor Stewart wrote:
>>> On Sat, Dec 12, 2020 at 5:07 PM Jens Axboe  wrote:

 On 12/12/20 8:31 AM, Victor Stewart wrote:
> RE our conversation on the "[RFC 0/1] whitelisting UDP GSO and GRO
> cmsgs" thread...
>
> https://lore.kernel.org/io-uring/CAM1kxwi5m6i8hrtkw7nZYoziPTD-Wp03+fcsUwh3CuSc=81...@mail.gmail.com/
>
> here are the patches we discussed.
>
> Victor Stewart (3):
>net/socket.c: add PROTO_CMSG_DATA_ONLY to __sys_sendmsg_sock
>net/ipv4/af_inet.c: add PROTO_CMSG_DATA_ONLY to inet_dgram_ops
>net/ipv6/af_inet6.c: add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops
>
>net/ipv4/af_inet.c
>  |   1 +
>net/ipv6/af_inet6.c
> |   1 +
>net/socket.c
>|   8 +-
>3 files changed, 7 insertions(+), 3 deletions(-)

 Changes look fine to me, but a few comments:

 - I'd order 1/3 as 3/3, that ordering makes more sense as at that point it
   could actually be used.
>>>
>>> right that makes sense.
>>>

 - For adding it to af_inet/af_inet6, you should write a better commit 
 message
   on the reasoning for the change. Right now it just describes what the
   patch does (which is obvious from the change), not WHY it's done. Really
   goes for current 1/3 as well, commit messages need to be better in
   general.

>>>
>>> okay thanks Jens. i would have reiterated the intention but assumed it
>>> were implicit given I linked the initial conversation about enabling
>>> UDP_SEGMENT (GSO) and UDP_GRO through io_uring.
>>>
 I'd also CC Jann Horn on the series, he's the one that found an issue there
 in the past and also acked the previous change on doing 
 PROTO_CMSG_DATA_ONLY.
>>>
>>> I CCed him on this reply. Soheil at the end of the first exchange
>>> thread said he audited the UDP paths and believed this to be safe.
>>>
>>> how/should I resubmit the patch with a proper intention explanation in
>>> the meta and reorder the patches? my first patch and all lol.
>>
>> Just post is as a v2 with the change noted in the cover letter. I'd also
>> ensure that it threads properly, right now it's just coming through as 4
>> separate emails at my end. If you're using git send-email, make sure you
>> add --thread to the arguments.
> 
> oh i didn't know about git send-email. i was manually constructing /
> sending them lol. thanks!

I'd recommend it, makes sure your mailer doesn't mangle anything either. FWIW,
this is what I do:

git format-patch sha1..sha2
mv 00*.patch /tmp/x

git send-email --no-signed-off-by-cc --thread --compose  --to 
linux-fsde...@vger.kernel.org --cc torva...@linux-foundation.org --cc 
v...@zeniv.linux.org.uk /tmp/x

(from a series I just sent out). And then I have the following section in
~/.gitconfig:

[sendemail]
from = Jens Axboe 
smtpserver = smtp.gmail.com
smtpuser = ax...@kernel.dk
smtpencryption = tls
smtppass = hunter2
smtpserverport = 587

for using gmail to send them out.

--compose will fire up your editor to construct the cover letter, and
when you're happy with it, save+exit and git send-email will ask whether
to proceed or abort.

That's about all there is to it, and provides a consistent way to send out
patch series.

-- 
Jens Axboe



[PATCH] net: bcmgenet: Fix a resource leak in an error handling path in the probe functin

2020-12-12 Thread Christophe JAILLET
If the 'register_netdev()' call fails, we must undo a previous
'bcmgenet_mii_init()' call.

Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Christophe JAILLET 
---
The missing 'bcmgenet_mii_exit()' call is added here, instead of in the
error handling path in order to avoid some goto spaghetti code.
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index be85dad2e3bc..fcca023f22e5 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -4069,8 +4069,10 @@ static int bcmgenet_probe(struct platform_device *pdev)
clk_disable_unprepare(priv->clk);
 
err = register_netdev(dev);
-   if (err)
+   if (err) {
+   bcmgenet_mii_exit(dev);
goto err;
+   }
 
return err;
 
-- 
2.27.0



Re: [net-next PATCH] tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit

2020-12-12 Thread Alexander Duyck
On Sat, Dec 12, 2020 at 10:34 AM Yuchung Cheng  wrote:
>
> On Fri, Dec 11, 2020 at 5:28 PM Alexander Duyck
>  wrote:
> >
> > From: Alexander Duyck 
> >
> > There are cases where a fastopen SYN may trigger either a ICMP_TOOBIG
> > message in the case of IPv6 or a fragmentation request in the case of
> > IPv4. This results in the socket stalling for a second or more as it does
> > not respond to the message by retransmitting the SYN frame.
> >
> > Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
> > ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that
> > makes use of the entire MSS. In the case of fastopen it does, and an
> > additional complication is that the retransmit queue doesn't contain the
> > original frames. As a result when tcp_simple_retransmit is called and
> > walks the list of frames in the queue it may not mark the frames as lost
> > because both the SYN and the data packet each individually are smaller than
> > the MSS size after the adjustment. This results in the socket being stalled
> > until the retransmit timer kicks in and forces the SYN frame out again
> > without the data attached.
> >
> > In order to resolve this we can generate our best estimate for the original
> > packet size by detecting the fastopen SYN frame and then adding the
> > overhead for MAX_TCP_OPTION_SPACE and verifying if the SYN w/ data would
> > have exceeded the MSS. If so we can mark the frame as lost and retransmit
> > it.
> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  net/ipv4/tcp_input.c |   30 +++---
> >  1 file changed, 27 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 9e8a6c1aa019..79375b58de84 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -2686,11 +2686,35 @@ static void tcp_mtup_probe_success(struct sock *sk)
> >  void tcp_simple_retransmit(struct sock *sk)
> >  {
> > const struct inet_connection_sock *icsk = inet_csk(sk);
> > +   struct sk_buff *skb = tcp_rtx_queue_head(sk);
> > struct tcp_sock *tp = tcp_sk(sk);
> > -   struct sk_buff *skb;
> > -   unsigned int mss = tcp_current_mss(sk);
> > +   unsigned int mss;
> > +
> > +   /* A fastopen SYN request is stored as two separate packets within
> > +* the retransmit queue, this is done by tcp_send_syn_data().
> > +* As a result simply checking the MSS of the frames in the queue
> > +* will not work for the SYN packet. So instead we must make a best
> > +* effort attempt by validating the data frame with the mss size
> > +* that would be computed now by tcp_send_syn_data and comparing
> > +* that against the data frame that would have been included with
> > +* the SYN.
> > +*/
> > +   if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN && tp->syn_data) {
> > +   struct sk_buff *syn_data = skb_rb_next(skb);
> > +
> > +   mss = tcp_mtu_to_mss(sk, icsk->icsk_pmtu_cookie) +
> > + tp->tcp_header_len - sizeof(struct tcphdr) -
> > + MAX_TCP_OPTION_SPACE;
> nice comment! The original syn_data mss needs to be inferred which is
> a hassle to get right. my sense is path-mtu issue is enough to warrant
> they are lost.
> I suggest simply mark syn & its data lost if tcp_simple_retransmit is
> called during TFO handshake, i.e.
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 62f7aabc7920..7f0c4f2947eb 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2864,7 +2864,8 @@ void tcp_simple_retransmit(struct sock *sk)
> unsigned int mss = tcp_current_mss(sk);
>
> skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
> -   if (tcp_skb_seglen(skb) > mss)
> +   if (tcp_skb_seglen(skb) > mss ||
> +   (tp->syn_data && sk->sk_state == TCP_SYN_SENT))
> tcp_mark_skb_lost(sk, skb);
> }
>
> We have a TFO packetdrill test that verifies my suggested fix should
> trigger an immediate retransmit vs 1s wait.

Okay, I will go that route, although I will still probably make one
minor cleanup. Instead of testing for syn_data and state per packet I
will probably keep the bit where I overwrite mss since it is only used
in the loop. What I can do is switch it from unsigned int to int since
technically tcp_current_mss and tcp_skb_seglen are both a signed int
anyway. Then I can just set mss to -1 in the syn_data && TCP_SYN_SENT
case. That way all of the frames in the ring should fail the check
while only having to add one initial check outside the loop.


[PATCH v2 net-next] net: dsa: reference count the host mdb addresses

2020-12-12 Thread Vladimir Oltean
Currently any DSA switch that is strict when implementing the mdb
operations prints these benign errors after the addresses expire, with
at least 2 ports bridged:

[  286.013814] mscc_felix :00:00.5 swp3: failed (err=-2) to del object 
(id=3)

The reason has to do with this piece of code:

netdev_for_each_lower_dev(dev, lower_dev, iter)
br_mdb_switchdev_host_port(dev, lower_dev, mp, type);

called from:

br_multicast_group_expired
-> br_multicast_host_leave
   -> br_mdb_notify
  -> br_mdb_switchdev_host

Basically, that code is correct. It tells each switchdev port that the
host can leave that multicast group. But in the case of DSA, all user
ports are connected to the host through the same pipe. So, because DSA
translates a host MDB to a normal MDB on the CPU port, this means that
when all user ports leave a multicast group, DSA tries to remove it N
times from the CPU port.

We should be reference-counting these addresses. Otherwise, the first
port on which the MDB expires will cause an entry removal from the CPU
port, which will break the host MDB for the remaining ports.

Signed-off-by: Vladimir Oltean 
---
Changes in v2:
- Re-targeted against net-next, since this is not breaking any use case
  that I know of.
- Re-did the refcounting logic. The problem is that the MDB addition is
  two-phase, but the deletion is one-phase. So refcounting on addition
  needs to be done only on one phase - the commit one. Before, we had a
  problem there, and for host MDB additions where an entry already existed
  on the CPU port, we would call the prepare phase but never commit.
  This would break drivers that allocate memory on prepare, and then
  expect the commit phase to actually apply. So we're not doing this any
  longer. Both prepare and commit phases are now stubbed out for additions
  of host MDB entries that are already present on the CPU port.
- Renamed dsa_host_mdb_find into dsa_host_addr_find, and we're now
  passing it the host_mdb list rather than struct dsa_switch *ds. This
  is a generic function and we might be able to reuse it in the future
  for host FDB entries (such as slave net_device MAC addresses).
- Left the allocation as GFP_KERNEL, since that is fine - the switchdev
  notifier runs as deferred, therefore in process context.

 include/net/dsa.h |  9 +
 net/dsa/dsa2.c|  2 ++
 net/dsa/slave.c   | 91 ++-
 3 files changed, 93 insertions(+), 9 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 4e60d2610f20..e639db28e238 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -253,6 +253,13 @@ struct dsa_link {
struct list_head list;
 };
 
+struct dsa_host_addr {
+   unsigned char addr[ETH_ALEN];
+   u16 vid;
+   refcount_t refcount;
+   struct list_head list;
+};
+
 struct dsa_switch {
bool setup;
 
@@ -335,6 +342,8 @@ struct dsa_switch {
 */
boolmtu_enforcement_ingress;
 
+   struct list_headhost_mdb;
+
size_t num_ports;
 };
 
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 183003e45762..52b3ef34a2cb 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -413,6 +413,8 @@ static int dsa_switch_setup(struct dsa_switch *ds)
if (ds->setup)
return 0;
 
+   INIT_LIST_HEAD(&ds->host_mdb);
+
/* Initialize ds->phys_mii_mask before registering the slave MDIO bus
 * driver and before ops->setup() has run, since the switch drivers and
 * the slave MDIO bus driver rely on these values for probing PHY
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 4a0498bf6c65..ccf397f02129 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -376,6 +376,86 @@ static int dsa_slave_vlan_add(struct net_device *dev,
return 0;
 }
 
+static struct dsa_host_addr *
+dsa_host_addr_find(struct list_head *addr_list,
+  const struct switchdev_obj_port_mdb *mdb)
+{
+   struct dsa_host_addr *a;
+
+   list_for_each_entry(a, addr_list, list)
+   if (ether_addr_equal(a->addr, mdb->addr) && a->vid == mdb->vid)
+   return a;
+
+   return NULL;
+}
+
+/* DSA can directly translate this to a normal MDB add, but on the CPU port.
+ * But because multiple user ports can join the same multicast group and the
+ * bridge will emit a notification for each port, we need to add/delete the
+ * entry towards the host only once, so we reference count it.
+ */
+static int dsa_host_mdb_add(struct dsa_port *dp,
+   const struct switchdev_obj_port_mdb *mdb,
+   struct switchdev_trans *trans)
+{
+   struct dsa_port *cpu_dp = dp->cpu_dp;
+   struct dsa_switch *ds = dp->ds;
+   struct dsa_host_addr *a;
+   int err;
+
+   a = dsa_host_addr_find(&ds->host_mdb, mdb);
+   if (a) {
+   /* Only the commit phase is refcounted */
+   if (switchdev_trans_ph_commit(tr

[PATCH v4 net-next] net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context

2020-12-12 Thread Vladimir Oltean
Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has
a very nice ocelot_mact_wait_for_completion at the end. Introduced in
commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be
wall time not attempts"), this function uses readx_poll_timeout which
triggers a lot of lockdep warnings and is also dangerous to use from
atomic context, potentially leading to lockups and panics.

Steen Hegelund added a poll timeout of 100 ms for checking the MAC
table, a duration which is clearly absurd to poll in atomic context.
So we need to defer the MAC table access to process context, which we do
via a dynamically allocated workqueue which contains all there is to
know about the MAC table operation it has to do.

Signed-off-by: Vladimir Oltean 
Reviewed-by: Florian Fainelli 
---
Changes in v4:
- Dropped WQ_MEM_RECLAIM
- Properly cleaning up on memory allocation error
- Using kmemdup
- Reordered members of struct ocelot_mact_work_ctx::learn

Changes in v3:
- Dropped Fixes tag and retargeted to net-next
- Dropped get_device/put_device since they don't offer real protection
- Now allocating a private ordered workqueue which is drained on unbind
  to avoid accessing freed memory

Changes in v2:
- Added Fixes tag (it won't backport that far, but anyway)
- Using get_device and put_device to avoid racing with unbind

 drivers/net/ethernet/mscc/ocelot.c |  7 +++
 drivers/net/ethernet/mscc/ocelot_net.c | 80 +-
 include/soc/mscc/ocelot.h  |  2 +
 3 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index abea8dd2b0cb..0b9992bd6626 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1513,6 +1513,12 @@ int ocelot_init(struct ocelot *ocelot)
if (!ocelot->stats_queue)
return -ENOMEM;
 
+   ocelot->owq = alloc_ordered_workqueue("ocelot-owq", 0);
+   if (!ocelot->owq) {
+   destroy_workqueue(ocelot->stats_queue);
+   return -ENOMEM;
+   }
+
INIT_LIST_HEAD(&ocelot->multicast);
INIT_LIST_HEAD(&ocelot->pgids);
ocelot_mact_init(ocelot);
@@ -1619,6 +1625,7 @@ void ocelot_deinit(struct ocelot *ocelot)
 {
cancel_delayed_work(&ocelot->stats_work);
destroy_workqueue(ocelot->stats_queue);
+   destroy_workqueue(ocelot->owq);
mutex_destroy(&ocelot->stats_lock);
 }
 EXPORT_SYMBOL(ocelot_deinit);
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
b/drivers/net/ethernet/mscc/ocelot_net.c
index c65ae6f75a16..2bd2840d88bd 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -414,13 +414,81 @@ static int ocelot_port_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
+enum ocelot_action_type {
+   OCELOT_MACT_LEARN,
+   OCELOT_MACT_FORGET,
+};
+
+struct ocelot_mact_work_ctx {
+   struct work_struct work;
+   struct ocelot *ocelot;
+   enum ocelot_action_type type;
+   union {
+   /* OCELOT_MACT_LEARN */
+   struct {
+   unsigned char addr[ETH_ALEN];
+   u16 vid;
+   enum macaccess_entry_type entry_type;
+   int pgid;
+   } learn;
+   /* OCELOT_MACT_FORGET */
+   struct {
+   unsigned char addr[ETH_ALEN];
+   u16 vid;
+   } forget;
+   };
+};
+
+#define ocelot_work_to_ctx(x) \
+   container_of((x), struct ocelot_mact_work_ctx, work)
+
+static void ocelot_mact_work(struct work_struct *work)
+{
+   struct ocelot_mact_work_ctx *w = ocelot_work_to_ctx(work);
+   struct ocelot *ocelot = w->ocelot;
+
+   switch (w->type) {
+   case OCELOT_MACT_LEARN:
+   ocelot_mact_learn(ocelot, w->learn.pgid, w->learn.addr,
+ w->learn.vid, w->learn.entry_type);
+   break;
+   case OCELOT_MACT_FORGET:
+   ocelot_mact_forget(ocelot, w->forget.addr, w->forget.vid);
+   break;
+   default:
+   break;
+   };
+
+   kfree(w);
+}
+
+static int ocelot_enqueue_mact_action(struct ocelot *ocelot,
+ const struct ocelot_mact_work_ctx *ctx)
+{
+   struct ocelot_mact_work_ctx *w = kmemdup(ctx, sizeof(*w), GFP_ATOMIC);
+
+   if (!w)
+   return -ENOMEM;
+
+   w->ocelot = ocelot;
+   INIT_WORK(&w->work, ocelot_mact_work);
+   queue_work(ocelot->owq, &w->work);
+
+   return 0;
+}
+
 static int ocelot_mc_unsync(struct net_device *dev, const unsigned char *addr)
 {
struct ocelot_port_private *priv = netdev_priv(dev);
struct ocelot_port *ocelot_port = &priv->port;
struct ocelot *ocelot = ocelot_port->ocelot;
+   struct ocelot_mact_work_ctx w;
+
+   ether_addr_copy(w.fo

Re: [PATCH v2 net-next] net: dsa: reference count the host mdb addresses

2020-12-12 Thread Andrew Lunn
> +/* DSA can directly translate this to a normal MDB add, but on the CPU port.
> + * But because multiple user ports can join the same multicast group and the
> + * bridge will emit a notification for each port, we need to add/delete the
> + * entry towards the host only once, so we reference count it.
> + */
> +static int dsa_host_mdb_add(struct dsa_port *dp,
> + const struct switchdev_obj_port_mdb *mdb,
> + struct switchdev_trans *trans)
> +{
> + struct dsa_port *cpu_dp = dp->cpu_dp;
> + struct dsa_switch *ds = dp->ds;
> + struct dsa_host_addr *a;
> + int err;
> +
> + a = dsa_host_addr_find(&ds->host_mdb, mdb);
> + if (a) {
> + /* Only the commit phase is refcounted */
> + if (switchdev_trans_ph_commit(trans))
> + refcount_inc(&a->refcount);
> + return 0;
> + }
> +
> + err = dsa_port_mdb_add(cpu_dp, mdb, trans);
> + if (err)
> + return err;
> +
> + /* Only the commit phase is refcounted, so don't save this just yet */
> + if (switchdev_trans_ph_prepare(trans))
> + return 0;
> +
> + a = kzalloc(sizeof(*a), GFP_KERNEL);
> + if (!a)
> + return -ENOMEM;

Hi Vladimir

This allocation should be done in the prepare phase, since it can
fail. You should only return catastrophic errors during the commit
phase.

So i think you can allocate it in the prepare phase, but leave the
reference count as 0. Then increment it in the commit phase.

And then make the dsa_host_mdb_del() decrement the specific refcount,
and then do a generic garbage collect for all entries with refcount of
0, to cleanup any left over failures.

   Andrew



Re: [PATCH v10 1/4] dt-bindings: phy: Add sparx5-serdes bindings

2020-12-12 Thread Andrew Lunn
On Fri, Dec 11, 2020 at 10:05:38AM +0100, Steen Hegelund wrote:
> Document the Sparx5 ethernet serdes phy driver bindings.
> 
> Signed-off-by: Lars Povlsen 
> Signed-off-by: Steen Hegelund 
> Reviewed-by: Rob Herring 

Reviewed-by: Andrew Lunn 

Andrew


Re: pull-request: mac80211-next 2020-12-11

2020-12-12 Thread patchwork-bot+netdevbpf
Hello:

This pull request was applied to netdev/net-next.git (refs/heads/master):

On Fri, 11 Dec 2020 15:25:51 +0100 you wrote:
> Hi Dave,
> 
> Welcome back!
> 
> I'm a bit late with this, I guess, but I hope you can still
> pull it into net-next for 5.11. Nothing really stands out,
> we have some 6 GHz fixes, and various small things all over.
> 
> [...]

Here is the summary with links:
  - pull-request: mac80211-next 2020-12-11
https://git.kernel.org/netdev/net-next/c/00f7763a26cb

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH v10 2/4] phy: Add ethernet serdes configuration option

2020-12-12 Thread Andrew Lunn
On Fri, Dec 11, 2020 at 10:05:39AM +0100, Steen Hegelund wrote:
> Provide a new ethernet phy configuration structure, that
> allow PHYs used for ethernet to be configured with
> speed, media type and clock information.
> 
> Signed-off-by: Lars Povlsen 
> Signed-off-by: Steen Hegelund 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH RESEND net-next 1/2] dpaa2-eth: send a scatter-gather FD instead of realloc-ing

2020-12-12 Thread Ioana Ciornei
On Sat, Dec 12, 2020 at 04:58:56PM +0100, Jon Nettleton wrote:
> On Fri, Dec 11, 2020 at 5:56 PM Ioana Ciornei  wrote:
> >
> > On Fri, Dec 11, 2020 at 04:29:14PM +, Daniel Thompson wrote:
> > > On Fri, Dec 11, 2020 at 02:01:28PM +, Ioana Ciornei wrote:
> > > > On Thu, Dec 10, 2020 at 08:06:36PM +0200, Ioana Ciornei wrote:
> > > > > [Added also the netdev mailing list, I haven't heard of linux-netdev
> > > > > before but kept it]
> > > > >
> > > > > On Thu, Dec 10, 2020 at 05:31:56PM +, Daniel Thompson wrote:
> > > > > > Hi Ioana
> > > > >
> > > > > Hi Daniel,
> > > > >
> > > > > >
> > > > > > On Mon, Jun 29, 2020 at 06:47:11PM +, Ioana Ciornei wrote:
> > > > > > > Instead of realloc-ing the skb on the Tx path when the provided 
> > > > > > > headroom
> > > > > > > is smaller than the HW requirements, create a Scatter/Gather frame
> > > > > > > descriptor with only one entry.
> > > > > > >
> > > > > > > Remove the '[drv] tx realloc frames' counter exposed previously 
> > > > > > > through
> > > > > > > ethtool since it is no longer used.
> > > > > > >
> > > > > > > Signed-off-by: Ioana Ciornei 
> > > > > > > ---
> > > > > >
> > > > > > I've been chasing down a networking regression on my LX2160A board
> > > > > > (Honeycomb LX2K based on CEx7 LX2160A COM) that first appeared in 
> > > > > > v5.9.
> > > > > >
> > > > > > It makes the board unreliable opening outbound connections meaning
> > > > > > things like `apt update` or `git fetch` often can't open the 
> > > > > > connection.
> > > > > > It does not happen all the time but is sufficient to make the boards
> > > > > > built-in networking useless for workstation use.
> > > > > >
> > > > > > The problem is strongly linked to warnings in the logs so I used the
> > > > > > warnings to bisect down to locate the cause of the regression and it
> > > > > > pinpointed this patch. I have confirmed that in both v5.9 and 
> > > > > > v5.10-rc7
> > > > > > that reverting this patch (and fixing up the merge issues) fixes the
> > > > > > regression and the warnings stop appearing.
> > > > > >
> > > > > > A typical example of the warning is below (io-pgtable-arm.c:281 is 
> > > > > > an
> > > > > > error path that I guess would cause dma_map_page_attrs() to return
> > > > > > an error):
> > > > > >
> > > > > > [  714.464927] WARNING: CPU: 13 PID: 0 at
> > > > > > drivers/iommu/io-pgtable-arm.c:281 __arm_lpae_map+0x2d4/0x30c
> > > > > > [  714.464930] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E)
> > > > > > snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) 
> > > > > > bridge(E)
> > > > > > stp(E) llc(E) rfkill(E) caam_jr(E) crypto_engine(E) rng_core(E)
> > > > > > joydev(E) evdev(E) dpaa2_caam(E) caamhash_desc(E) caamalg_desc(E)
> > > > > > authenc(E) libdes(E) dpaa2_console(E) ofpart(E) caam(E) sg(E) 
> > > > > > error(E)
> > > > > > lm90(E) at24(E) spi_nor(E) mtd(E) sbsa_gwdt(E) qoriq_thermal(E)
> > > > > > layerscape_edac_mod(E) qoriq_cpufreq(E) drm(E) fuse(E) configfs(E)
> > > > > > ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) 
> > > > > > crc16(E)
> > > > > > mbcache(E) jbd2(E) hid_generic(E) usbhid(E) hid(E) dm_crypt(E) 
> > > > > > dm_mod(E)
> > > > > > sd_mod(E) fsl_dpaa2_ptp(E) ptp_qoriq(E) fsl_dpaa2_eth(E)
> > > > > > xhci_plat_hcd(E) xhci_hcd(E) usbcore(E) aes_ce_blk(E) crypto_simd(E)
> > > > > > cryptd(E) aes_ce_cipher(E) ghash_ce(E) gf128mul(E) at803x(E) 
> > > > > > libaes(E)
> > > > > > fsl_mc_dpio(E) pcs_lynx(E) rtc_pcf2127(E) sha2_ce(E) phylink(E)
> > > > > > xgmac_mdio(E) regmap_spi(E) of_mdio(E) sha256_arm64(E)
> > > > > > i2c_mux_pca954x(E) fixed_phy(E) i2c_mux(E) sha1_ce(E) ptp(E) 
> > > > > > libphy(E)
> > > > > > [  714.465131]  pps_core(E) ahci_qoriq(E) libahci_platform(E) 
> > > > > > nvme(E)
> > > > > > libahci(E) nvme_core(E) t10_pi(E) libata(E) crc_t10dif(E)
> > > > > > crct10dif_generic(E) crct10dif_common(E) dwc3(E) scsi_mod(E) 
> > > > > > udc_core(E)
> > > > > > roles(E) ulpi(E) sdhci_of_esdhc(E) sdhci_pltfm(E) sdhci(E)
> > > > > > spi_nxp_fspi(E) i2c_imx(E) fixed(E)
> > > > > > [  714.465192] CPU: 13 PID: 0 Comm: swapper/13 Tainted: GW  
> > > > > >  E
> > > > > > 5.10.0-rc7-1-gba98d13279ca #52
> > > > > > [  714.465196] Hardware name: SolidRun LX2160A Honeycomb (DT)
> > > > > > [  714.465202] pstate: 6005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > > > > > [  714.465207] pc : __arm_lpae_map+0x2d4/0x30c
> > > > > > [  714.465211] lr : __arm_lpae_map+0x114/0x30c
> > > > > > [  714.465215] sp : 80001006b340
> > > > > > [  714.465219] x29: 80001006b340 x28: 002086538003
> > > > > > [  714.465227] x27: 0a20 x26: 1000
> > > > > > [  714.465236] x25: 0f44 x24: 0020adf8d000
> > > > > > [  714.465245] x23: 0001 x22: faeca000
> > > > > > [  714.465253] x21: 0003 x20: 19b60d64d200
> > > > > > [  714.465261] x19: 00ca x18: 
> > > > > > [  714.465270] x17: 000

Re: [PATCH v10 3/4] phy: Add Sparx5 ethernet serdes PHY driver

2020-12-12 Thread Andrew Lunn
On Fri, Dec 11, 2020 at 10:05:40AM +0100, Steen Hegelund wrote:
> Add the Microchip Sparx5 ethernet serdes PHY driver for the 6G, 10G and 25G
> interfaces available in the Sparx5 SoC.
> 
> Signed-off-by: Bjarni Jonasson 
> Signed-off-by: Steen Hegelund 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH v10 4/4] arm64: dts: sparx5: Add Sparx5 serdes driver node

2020-12-12 Thread Andrew Lunn
On Fri, Dec 11, 2020 at 10:05:41AM +0100, Steen Hegelund wrote:
> Add Sparx5 serdes driver node, and enable it generally for all
> reference boards.
> 
> Signed-off-by: Lars Povlsen 
> Signed-off-by: Steen Hegelund 

Reviewed-by: Andrew Lunn 

Andrew


Re: [RFC] ravb: Add support for optional txc_refclk

2020-12-12 Thread Adam Ford
On Sat, Dec 12, 2020 at 11:55 AM Sergei Shtylyov
 wrote:
>
> Hello!
>
> On 12.12.2020 19:56, Adam Ford wrote:
>
> > The SoC expects the txv_refclk is provided, but if it is provided
> > by a programmable clock, there needs to be a way to get and enable
> > this clock to operate.  It needs to be optional since it's only
> > necessary for those with programmable clocks.
> >
> > Signed-off-by: Adam Ford 
> [...]
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
> > b/drivers/net/ethernet/renesas/ravb_main.c
> > index bd30505fbc57..4c3f95923ef2 100644
> > --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> > @@ -2148,6 +2148,18 @@ static int ravb_probe(struct platform_device *pdev)
> >   goto out_release;
> >   }
> >
> > + priv->ref_clk = devm_clk_get(&pdev->dev, "txc_refclk");
>
> Why not devm_clk_get_optional()?

I am not that familiar with the clock API.  I'll look into that
function. It looks like it makes more sense.  I'll send a V2.

adam
>
> > + if (IS_ERR(priv->ref_clk)) {
> > + if (PTR_ERR(priv->ref_clk) == -EPROBE_DEFER) {
> > + /* for Probe defer return error */
> > + error = PTR_ERR(priv->ref_clk);
> > + goto out_release;
> > + }
> > + /* Ignore other errors since it's optional */
> > + } else {
> > + (void)clk_prepare_enable(priv->ref_clk);
> > + }
> > +
> >   ndev->max_mtu = 2048 - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN);
> >   ndev->min_mtu = ETH_MIN_MTU;
> >
>
> MBR, Sergei


Re: [PATCH 1/1] net: Fix use of proc_fs

2020-12-12 Thread Jakub Kicinski
On Fri, 11 Dec 2020 18:37:49 +0200 Yonatan Linik wrote:
> proc_fs was used, in af_packet, without a surrounding #ifdef,
> although there is no hard dependency on proc_fs.
> That caused the initialization of the af_packet module to fail
> when CONFIG_PROC_FS=n.
> 
> Specifically, proc_create_net() was used in af_packet.c,
> and when it fails, packet_net_init() returns -ENOMEM.
> It will always fail when the kernel is compiled without proc_fs,
> because, proc_create_net() for example always returns NULL.
> 
> The calling order that starts in af_packet.c is as follows:
> packet_init()
> register_pernet_subsys()
> register_pernet_operations()
> __register_pernet_operations()
> ops_init()
> ops->init() (packet_net_ops.init=packet_net_init())
> proc_create_net()
> 
> It worked in the past because register_pernet_subsys()'s return value
> wasn't checked before this Commit 36096f2f4fa0 ("packet: Fix error path in
> packet_init.").
> It always returned an error, but was not checked before, so everything
> was working even when CONFIG_PROC_FS=n.
> 
> The fix here is simply to add the necessary #ifdef.
> 
> Signed-off-by: Yonatan Linik 

Hm, I'm guessing you hit this on a kernel upgrade of a real system?
It seems like all callers to proc_create_net (and friends) interpret
NULL as an error, but only handful is protected by an ifdef.

I checked a few and none of them cares about the proc_dir_entry pointer
that gets returned. Should we perhaps rework the return values of the
function so that we can return success if !CONFIG_PROC_FS without
having to yield a pointer?

Obviously we can apply this fix so we can backport to 5.4 if you need
it. I think the ifdef is fine, since it's what other callers have.

> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 2b33e977a905..031f2b593720 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -4612,9 +4612,11 @@ static int __net_init packet_net_init(struct net *net)
>   mutex_init(&net->packet.sklist_lock);
>   INIT_HLIST_HEAD(&net->packet.sklist);
>  
> +#ifdef CONFIG_PROC_FS
>   if (!proc_create_net("packet", 0, net->proc_net, &packet_seq_ops,
>   sizeof(struct seq_net_private)))
>   return -ENOMEM;
> +#endif /* CONFIG_PROC_FS */
>  
>   return 0;
>  }



Re: [net-next PATCH] tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit

2020-12-12 Thread Alexander Duyck
On Sat, Dec 12, 2020 at 11:07 AM Yuchung Cheng  wrote:
>
> On Sat, Dec 12, 2020 at 11:01 AM Alexander Duyck
>  wrote:
> >
> > On Sat, Dec 12, 2020 at 10:34 AM Yuchung Cheng  wrote:
> > >
> > > On Fri, Dec 11, 2020 at 5:28 PM Alexander Duyck
> > >  wrote:
> > > >
> > > > From: Alexander Duyck 
> > > >
> > > > There are cases where a fastopen SYN may trigger either a ICMP_TOOBIG
> > > > message in the case of IPv6 or a fragmentation request in the case of
> > > > IPv4. This results in the socket stalling for a second or more as it 
> > > > does
> > > > not respond to the message by retransmitting the SYN frame.
> > > >
> > > > Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
> > > > ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame 
> > > > that
> > > > makes use of the entire MSS. In the case of fastopen it does, and an
> > > > additional complication is that the retransmit queue doesn't contain the
> > > > original frames. As a result when tcp_simple_retransmit is called and
> > > > walks the list of frames in the queue it may not mark the frames as lost
> > > > because both the SYN and the data packet each individually are smaller 
> > > > than
> > > > the MSS size after the adjustment. This results in the socket being 
> > > > stalled
> > > > until the retransmit timer kicks in and forces the SYN frame out again
> > > > without the data attached.
> > > >
> > > > In order to resolve this we can generate our best estimate for the 
> > > > original
> > > > packet size by detecting the fastopen SYN frame and then adding the
> > > > overhead for MAX_TCP_OPTION_SPACE and verifying if the SYN w/ data would
> > > > have exceeded the MSS. If so we can mark the frame as lost and 
> > > > retransmit
> > > > it.
> > > >
> > > > Signed-off-by: Alexander Duyck 
> > > > ---
> > > >  net/ipv4/tcp_input.c |   30 +++---
> > > >  1 file changed, 27 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > index 9e8a6c1aa019..79375b58de84 100644
> > > > --- a/net/ipv4/tcp_input.c
> > > > +++ b/net/ipv4/tcp_input.c
> > > > @@ -2686,11 +2686,35 @@ static void tcp_mtup_probe_success(struct sock 
> > > > *sk)
> > > >  void tcp_simple_retransmit(struct sock *sk)
> > > >  {
> > > > const struct inet_connection_sock *icsk = inet_csk(sk);
> > > > +   struct sk_buff *skb = tcp_rtx_queue_head(sk);
> > > > struct tcp_sock *tp = tcp_sk(sk);
> > > > -   struct sk_buff *skb;
> > > > -   unsigned int mss = tcp_current_mss(sk);
> > > > +   unsigned int mss;
> > > > +
> > > > +   /* A fastopen SYN request is stored as two separate packets 
> > > > within
> > > > +* the retransmit queue, this is done by tcp_send_syn_data().
> > > > +* As a result simply checking the MSS of the frames in the 
> > > > queue
> > > > +* will not work for the SYN packet. So instead we must make a 
> > > > best
> > > > +* effort attempt by validating the data frame with the mss size
> > > > +* that would be computed now by tcp_send_syn_data and comparing
> > > > +* that against the data frame that would have been included 
> > > > with
> > > > +* the SYN.
> > > > +*/
> > > > +   if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN && tp->syn_data) {
> > > > +   struct sk_buff *syn_data = skb_rb_next(skb);
> > > > +
> > > > +   mss = tcp_mtu_to_mss(sk, icsk->icsk_pmtu_cookie) +
> > > > + tp->tcp_header_len - sizeof(struct tcphdr) -
> > > > + MAX_TCP_OPTION_SPACE;
> > > nice comment! The original syn_data mss needs to be inferred which is
> > > a hassle to get right. my sense is path-mtu issue is enough to warrant
> > > they are lost.
> > > I suggest simply mark syn & its data lost if tcp_simple_retransmit is
> > > called during TFO handshake, i.e.
> > >
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 62f7aabc7920..7f0c4f2947eb 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -2864,7 +2864,8 @@ void tcp_simple_retransmit(struct sock *sk)
> > > unsigned int mss = tcp_current_mss(sk);
> > >
> > > skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
> > > -   if (tcp_skb_seglen(skb) > mss)
> > > +   if (tcp_skb_seglen(skb) > mss ||
> > > +   (tp->syn_data && sk->sk_state == TCP_SYN_SENT))
> > > tcp_mark_skb_lost(sk, skb);
> > > }
> > >
> > > We have a TFO packetdrill test that verifies my suggested fix should
> > > trigger an immediate retransmit vs 1s wait.
> >
> > Okay, I will go that route, although I will still probably make one
> > minor cleanup. Instead of testing for syn_data and state per packet I
> > will probably keep the bit where I overwrite mss since it is only used
> > in the loop. What I can do is switch it from unsigned int to int since
> > technica

Incorrect --help / manpage for -color for ip, tc, bridge

2020-12-12 Thread Witold Baryluk
iproute 5.9.0

Apparently ip -c is a shortcut to ip -color

but in tc, tc -c doesn't work, one needs to say tc -col or tc -color

I understand there is tc -conf, which has tc -c.

But:

Help says:

root@debian:~# tc
Usage:tc [ OPTIONS ] OBJECT { COMMAND | help }
tc [-force] -batch filename
where  OBJECT := { qdisc | class | filter | chain |
action | monitor | exec }
   OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[aw] |
-o[neline] | -j[son] | -p[retty] | -c[olor]
-b[atch] [filename] | -n[etns] name | -N[umeric] |
 -nm | -nam[es] | { -cf | -conf } path }

this should be:

root@debian:~# tc
Usage:tc [ OPTIONS ] OBJECT { COMMAND | help }
tc [-force] -batch filename
where  OBJECT := { qdisc | class | filter | chain |
action | monitor | exec }
   OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[aw] |
-o[neline] | -j[son] | -p[retty] | -col[or]
-b[atch] [filename] | -n[etns] name | -N[umeric] |
 -nm | -nam[es] | { -cf | -c[onf] } path }


( -c[olor] -> -col[or] )  # also in --help for ip and bridge

If only -c meaning -conf could be removed, it would be even nicer. -cf
is already short.

Additionally in manpage for tc, ip and bridge:
   -c[color][={always|auto|never}
  Configure color output. If parameter is omitted or
always, color output is enabled regardless of stdout state. If
parameter is auto, stdout is checked to be a terminal be‐
  fore  enabling  color output. If parameter is never,
color output is disabled. If specified multiple times, the last one
takes precedence. This flag is ignored if -json is
  also given.



I don't think this is correct either.

Should be -col[or], not -c[color] (sic!).

Similar mistakes are in man pages and --help messages also for ip,
bridge, not just tc.


Regards,
Witold


[net-next PATCH v2] tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit

2020-12-12 Thread Alexander Duyck
From: Alexander Duyck 

There are cases where a fastopen SYN may trigger either a ICMP_TOOBIG
message in the case of IPv6 or a fragmentation request in the case of
IPv4. This results in the socket stalling for a second or more as it does
not respond to the message by retransmitting the SYN frame.

Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that
makes use of the entire MSS. In the case of fastopen it does, and an
additional complication is that the retransmit queue doesn't contain the
original frames. As a result when tcp_simple_retransmit is called and
walks the list of frames in the queue it may not mark the frames as lost
because both the SYN and the data packet each individually are smaller than
the MSS size after the adjustment. This results in the socket being stalled
until the retransmit timer kicks in and forces the SYN frame out again
without the data attached.

In order to resolve this we can generate our best estimate for the original
packet size by detecting the fastopen SYN frame and then adding the
overhead for MAX_TCP_OPTION_SPACE and verifying if the SYN w/ data would
have exceeded the MSS. If so we can mark the frame as lost and retransmit
it.

Signed-off-by: Alexander Duyck 
---
 net/ipv4/tcp_input.c |   17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9e8a6c1aa019..e44327a39a1f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2688,7 +2688,22 @@ void tcp_simple_retransmit(struct sock *sk)
const struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
-   unsigned int mss = tcp_current_mss(sk);
+   int mss;
+
+   /* A fastopen SYN request is stored as two separate packets within
+* the retransmit queue, this is done by tcp_send_syn_data().
+* As a result simply checking the MSS of the frames in the queue
+* will not work for the SYN packet.
+*
+* Us being here is an indication of a path MTU issue so we can
+* assume that the fastopen SYN was lost and just mark all the
+* frames in the retransmit queue as lost. We will use an MSS of
+* -1 to mark all frames as lost, otherwise compute the current MSS.
+*/
+   if (tp->syn_data && sk->sk_state == TCP_SYN_SENT)
+   mss = -1;
+   else
+   mss = tcp_current_mss(sk);
 
skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
if (tcp_skb_seglen(skb) > mss)




Re: [net-next v3 00/14] Add mlx5 subfunction support

2020-12-12 Thread Jakub Kicinski
On Fri, 11 Dec 2020 22:12:11 -0800 Saeed Mahameed wrote:
> Hi Dave, Jakub, Jason,
> 
> This series form Parav was the theme of this mlx5 release cycle,
> we've been waiting anxiously for the auxbus infrastructure to make it into
> the kernel, and now as the auxbus is in and all the stars are aligned, I
> can finally submit this V2 of the devlink and mlx5 subfunction support.
> 
> Subfunctions came to solve the scaling issue of virtualization
> and switchdev environments, where SRIOV failed to deliver and users ran
> out of VFs very quickly as SRIOV demands huge amount of physical resources
> in both of the servers and the NIC.
> 
> Subfunction provide the same functionality as SRIOV but in a very
> lightweight manner, please see the thorough and detailed
> documentation from Parav below, in the commit messages and the
> Networking documentation patches at the end of this series.
> 
> Sending V2/V3 as a continuation to V1 that was sent Last month [0],
> [0] https://lore.kernel.org/linux-rdma/20201112192424.2742-1-pa...@nvidia.com/

This adds more and more instances of the 32 bit build warning.

The warning was also reported separately on netdev after the recent
mlx5-next pull.

Please address that first (or did you already do and I missed it
somehow?)


Re: pull-request: wireless-drivers-next-2020-12-12

2020-12-12 Thread patchwork-bot+netdevbpf
Hello:

This pull request was applied to netdev/net-next.git (refs/heads/master):

On Sat, 12 Dec 2020 05:08:39 + (UTC) you wrote:
> Hi,
> 
> here's a pull request to net-next tree, more info below. Please let me know if
> there are any problems.
> 
> Kalle
> 
> [...]

Here is the summary with links:
  - pull-request: wireless-drivers-next-2020-12-12
https://git.kernel.org/netdev/net-next/c/e5795aacd71b

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




[net-next PATCH v3] tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit

2020-12-12 Thread Alexander Duyck
From: Alexander Duyck 

There are cases where a fastopen SYN may trigger either a ICMP_TOOBIG
message in the case of IPv6 or a fragmentation request in the case of
IPv4. This results in the socket stalling for a second or more as it does
not respond to the message by retransmitting the SYN frame.

Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that
makes use of the entire MSS. In the case of fastopen it does, and an
additional complication is that the retransmit queue doesn't contain the
original frames. As a result when tcp_simple_retransmit is called and
walks the list of frames in the queue it may not mark the frames as lost
because both the SYN and the data packet each individually are smaller than
the MSS size after the adjustment. This results in the socket being stalled
until the retransmit timer kicks in and forces the SYN frame out again
without the data attached.

In order to resolve this we can reduce the MSS the packets are compared
to in tcp_simple_retransmit to -1 for cases where we are still in the
TCP_SYN_SENT state for a fastopen socket. Doing this we will mark all of
the packets related to the fastopen SYN as lost.

Signed-off-by: Alexander Duyck 
---

v2: Changed logic to invalidate all retransmit queue frames if fastopen SYN
v3: Updated commit message to reflect actual solution in 3rd paragraph

 net/ipv4/tcp_input.c |   17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9e8a6c1aa019..e44327a39a1f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2688,7 +2688,22 @@ void tcp_simple_retransmit(struct sock *sk)
const struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
-   unsigned int mss = tcp_current_mss(sk);
+   int mss;
+
+   /* A fastopen SYN request is stored as two separate packets within
+* the retransmit queue, this is done by tcp_send_syn_data().
+* As a result simply checking the MSS of the frames in the queue
+* will not work for the SYN packet.
+*
+* Us being here is an indication of a path MTU issue so we can
+* assume that the fastopen SYN was lost and just mark all the
+* frames in the retransmit queue as lost. We will use an MSS of
+* -1 to mark all frames as lost, otherwise compute the current MSS.
+*/
+   if (tp->syn_data && sk->sk_state == TCP_SYN_SENT)
+   mss = -1;
+   else
+   mss = tcp_current_mss(sk);
 
skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
if (tcp_skb_seglen(skb) > mss)




[PATCH v3 net-next] net: dsa: reference count the host mdb addresses

2020-12-12 Thread Vladimir Oltean
Currently any DSA switch that is strict when implementing the mdb
operations prints these benign errors after the addresses expire, with
at least 2 ports bridged:

[  286.013814] mscc_felix :00:00.5 swp3: failed (err=-2) to del object 
(id=3)

The reason has to do with this piece of code:

netdev_for_each_lower_dev(dev, lower_dev, iter)
br_mdb_switchdev_host_port(dev, lower_dev, mp, type);

called from:

br_multicast_group_expired
-> br_multicast_host_leave
   -> br_mdb_notify
  -> br_mdb_switchdev_host

Basically, that code is correct. It tells each switchdev port that the
host can leave that multicast group. But in the case of DSA, all user
ports are connected to the host through the same pipe. So, because DSA
translates a host MDB to a normal MDB on the CPU port, this means that
when all user ports leave a multicast group, DSA tries to remove it N
times from the CPU port.

We should be reference-counting these addresses. Otherwise, the first
port on which the MDB expires will cause an entry removal from the CPU
port, which will break the host MDB for the remaining ports.

Signed-off-by: Vladimir Oltean 
---
Changes in v3:
- Allocating memory for host mdb in prepare phase, but setting refcount
  to 1 in commit phase. This complicates the implementation of the state
  machine a little bit.

Changes in v2:
- Re-targeted against net-next, since this is not breaking any use case
  that I know of.
- Re-did the refcounting logic. The problem is that the MDB addition is
  two-phase, but the deletion is one-phase. So refcounting on addition
  needs to be done only on one phase - the commit one. Before, we had a
  problem there, and for host MDB additions where an entry already existed
  on the CPU port, we would call the prepare phase but never commit.
  This would break drivers that allocate memory on prepare, and then
  expect the commit phase to actually apply. So we're not doing this any
  longer. Both prepare and commit phases are now stubbed out for additions
  of host MDB entries that are already present on the CPU port.
- Renamed dsa_host_mdb_find into dsa_host_addr_find, and we're now
  passing it the host_mdb list rather than struct dsa_switch *ds. This
  is a generic function and we might be able to reuse it in the future
  for host FDB entries (such as slave net_device MAC addresses).
- Left the allocation as GFP_KERNEL, since that is fine - the switchdev
  notifier runs as deferred, therefore in process context.

 include/net/dsa.h |   9 
 net/dsa/dsa2.c|   2 +
 net/dsa/slave.c   | 128 ++
 3 files changed, 130 insertions(+), 9 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 4e60d2610f20..e639db28e238 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -253,6 +253,13 @@ struct dsa_link {
struct list_head list;
 };
 
+struct dsa_host_addr {
+   unsigned char addr[ETH_ALEN];
+   u16 vid;
+   refcount_t refcount;
+   struct list_head list;
+};
+
 struct dsa_switch {
bool setup;
 
@@ -335,6 +342,8 @@ struct dsa_switch {
 */
boolmtu_enforcement_ingress;
 
+   struct list_headhost_mdb;
+
size_t num_ports;
 };
 
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 183003e45762..52b3ef34a2cb 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -413,6 +413,8 @@ static int dsa_switch_setup(struct dsa_switch *ds)
if (ds->setup)
return 0;
 
+   INIT_LIST_HEAD(&ds->host_mdb);
+
/* Initialize ds->phys_mii_mask before registering the slave MDIO bus
 * driver and before ops->setup() has run, since the switch drivers and
 * the slave MDIO bus driver rely on these values for probing PHY
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 4a0498bf6c65..e0667be7d5ed 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -376,6 +376,123 @@ static int dsa_slave_vlan_add(struct net_device *dev,
return 0;
 }
 
+static struct dsa_host_addr *
+dsa_host_addr_find(struct list_head *addr_list,
+  const struct switchdev_obj_port_mdb *mdb)
+{
+   struct dsa_host_addr *a;
+
+   list_for_each_entry(a, addr_list, list)
+   if (ether_addr_equal(a->addr, mdb->addr) && a->vid == mdb->vid)
+   return a;
+
+   return NULL;
+}
+
+/* DSA can directly translate this to a normal MDB add, but on the CPU port.
+ * But because multiple user ports can join the same multicast group and the
+ * bridge will emit a notification for each port, we need to add/delete the
+ * entry towards the host only once, so we reference count it.
+ */
+static int dsa_host_mdb_add(struct dsa_port *dp,
+   const struct switchdev_obj_port_mdb *mdb,
+   struct switchdev_trans *trans)
+{
+   struct dsa_port *cpu_dp = dp->cpu_dp;
+   struct dsa_switch *ds = dp->ds;
+   struct dsa_host_addr *a;
+  

Re: [PATCH v2 3/3] net: mhi: Add dedicated alloc thread

2020-12-12 Thread Jakub Kicinski
On Thu, 10 Dec 2020 12:15:51 +0100 Loic Poulain wrote:
> The buffer allocation for RX path is currently done by a work executed
> in the system workqueue. The work to do is quite simple and consists
> mostly in allocating and queueing as much as possible buffers to the MHI
> RX channel.
> 
> It appears that using a dedicated kthread would be more appropriate to
> prevent
> 1. RX allocation latency introduced by the system queue

System work queue should not add much latency, you can also create your
own workqueue. Did you intend to modify the priority of the thread you
create?

> 2. Unbounded work execution, the work only returning when queue is
> full, it can possibly monopolise the workqueue thread on slower systems.

Is this something you observed in practice?

> This patch replaces the system work with a simple kthread that loops on
> buffer allocation and sleeps when queue is full. Moreover it gets rid
> of the local rx_queued variable (to track buffer count), and instead,
> relies on the new mhi_get_free_desc_count helper.

Seems unrelated, should probably be a separate patch.

> After pratical testing on a x86_64 machine, this change improves
> - Peek throughput (slightly, by few mbps)
> - Throughput stability when concurrent loads are running (stress)
> - CPU usage, less CPU cycles dedicated to the task

Do you have an explanation why the CPU cycles are lower?

> Below is the powertop output for RX allocation task before and after
> this change, when performing UDP download at 6Gbps. Mostly to highlight
> the improvement in term of CPU usage.
> 
> older (system workqueue):
> Usage   Events/sCategory   Description
> 63,2 ms/s 134,0kWork  mhi_net_rx_refill_work
> 62,8 ms/s 134,3kWork  mhi_net_rx_refill_work
> 60,8 ms/s 141,4kWork  mhi_net_rx_refill_work
> 
> newer (dedicated kthread):
> Usage   Events/sCategory   Description
> 20,7 ms/s 155,6Process[PID 3360] [mhi-net-rx]
> 22,2 ms/s 169,6Process[PID 3360] [mhi-net-rx]
> 22,3 ms/s 150,2Process[PID 3360] [mhi-net-rx]
> 
> Signed-off-by: Loic Poulain 
> ---
>  v2: add module parameter for changing RX refill level

> @@ -16,6 +17,11 @@
>  #define MHI_NET_MAX_MTU  0x
>  #define MHI_NET_DEFAULT_MTU  0x4000
>  
> +static unsigned int rx_refill_level = 70;
> +module_param(rx_refill_level, uint, 0600);
> +MODULE_PARM_DESC(rx_refill_level,
> +  "The minimal RX queue level percentage (0 to 100) under which 
> the RX queue must be refilled");

Sorry you got bad advice in v1 and I didn't catch it. Please avoid
adding module parameters. Many drivers do bulk refill, and don't need
and extra parametrization, I don't see why this one would be special -
if it is please explain.

>  struct mhi_net_stats {
>   u64_stats_t rx_packets;
>   u64_stats_t rx_bytes;
> @@ -25,7 +31,6 @@ struct mhi_net_stats {
>   u64_stats_t tx_bytes;
>   u64_stats_t tx_errors;
>   u64_stats_t tx_dropped;
> - atomic_t rx_queued;
>   struct u64_stats_sync tx_syncp;
>   struct u64_stats_sync rx_syncp;
>  };
> @@ -33,17 +38,66 @@ struct mhi_net_stats {
>  struct mhi_net_dev {
>   struct mhi_device *mdev;
>   struct net_device *ndev;
> - struct delayed_work rx_refill;
> + struct task_struct *refill_task;
> + wait_queue_head_t refill_wq;
>   struct mhi_net_stats stats;
>   u32 rx_queue_sz;
> + u32 rx_refill_level;
>  };
>  
> +static int mhi_net_refill_thread(void *data)
> +{
> + struct mhi_net_dev *mhi_netdev = data;
> + struct net_device *ndev = mhi_netdev->ndev;
> + struct mhi_device *mdev = mhi_netdev->mdev;
> + int size = READ_ONCE(ndev->mtu);
> + struct sk_buff *skb;
> + int err;
> +
> + while (1) {
> + err = wait_event_interruptible(mhi_netdev->refill_wq,
> +!mhi_queue_is_full(mdev, 
> DMA_FROM_DEVICE)
> +|| kthread_should_stop());
> + if (err || kthread_should_stop())
> + break;
> +
> + skb = netdev_alloc_skb(ndev, size);
> + if (unlikely(!skb)) {
> + /* No memory, retry later */
> + schedule_timeout_interruptible(msecs_to_jiffies(250));

You should have a counter for this, at least for your testing. If this
condition is hit it'll probably have a large impact on the performance.

> + continue;
> + }
> +
> + err = mhi_queue_skb(mdev, DMA_FROM_DEVICE, skb, size, MHI_EOT);
> + if (unlikely(err)) {
> + net_err_ratelimited("%s: Failed to queue RX buf (%d)\n",
> + ndev->name, err);
> + kfree_skb(skb);
> + break;
> + }
> +
> + /* Do not hog the CPU */
> + co

Re: [PATCH 1/1] net: Fix use of proc_fs

2020-12-12 Thread Yonatan Linik
On Sat, Dec 12, 2020 at 9:48 PM Jakub Kicinski  wrote:
>
> On Fri, 11 Dec 2020 18:37:49 +0200 Yonatan Linik wrote:
> > proc_fs was used, in af_packet, without a surrounding #ifdef,
> > although there is no hard dependency on proc_fs.
> > That caused the initialization of the af_packet module to fail
> > when CONFIG_PROC_FS=n.
> >
> > Specifically, proc_create_net() was used in af_packet.c,
> > and when it fails, packet_net_init() returns -ENOMEM.
> > It will always fail when the kernel is compiled without proc_fs,
> > because, proc_create_net() for example always returns NULL.
> >
> > The calling order that starts in af_packet.c is as follows:
> > packet_init()
> > register_pernet_subsys()
> > register_pernet_operations()
> > __register_pernet_operations()
> > ops_init()
> > ops->init() (packet_net_ops.init=packet_net_init())
> > proc_create_net()
> >
> > It worked in the past because register_pernet_subsys()'s return value
> > wasn't checked before this Commit 36096f2f4fa0 ("packet: Fix error path in
> > packet_init.").
> > It always returned an error, but was not checked before, so everything
> > was working even when CONFIG_PROC_FS=n.
> >
> > The fix here is simply to add the necessary #ifdef.
> >
> > Signed-off-by: Yonatan Linik 
>
> Hm, I'm guessing you hit this on a kernel upgrade of a real system?

Yeah, suddenly using socket with AF_PACKET didn't work,
so I checked what happened.

> It seems like all callers to proc_create_net (and friends) interpret
> NULL as an error, but only handful is protected by an ifdef.

I guess where there is no ifdef,
there should be a hard dependency on procfs,
using depends on in the Kconfig.
Maybe that's not the case everywhere it should be.

>
> I checked a few and none of them cares about the proc_dir_entry pointer
> that gets returned. Should we perhaps rework the return values of the
> function so that we can return success if !CONFIG_PROC_FS without
> having to yield a pointer?

Sometimes the pointer returned is used,
for example in drivers/acpi/button.c.
Are you suggesting returning a bool while
having the pointer as an out parameter?
Because that would still be problematic where the pointer is used.

>
> Obviously we can apply this fix so we can backport to 5.4 if you need
> it. I think the ifdef is fine, since it's what other callers have.
>

It would be great to apply this where the problem exists,
I believe this applies to other versions as well.

-- 
Yonatan Linik


Re: [PATCH] net: qlcnic: remove casting dma_alloc_coherent

2020-12-12 Thread Jakub Kicinski
On Fri, 11 Dec 2020 08:59:20 + Xu Wang wrote:
> Remove casting the values returned by dma_alloc_coherent.
> 
> Signed-off-by: Xu Wang 

This patch does not apply to net-next, please rebase against:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/

if you want it to be applied to the networking tree.


Re: [PATCH] xfrm: redact SA secret with lockdown confidentiality

2020-12-12 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Sat, 12 Dec 2020 09:57:37 +0100 you wrote:
> From: Antony Antony 
> 
> redact XFRM SA secret in the netlink response to xfrm_get_sa()
> or dumpall sa.
> Enable lockdown, confidentiality mode, at boot or at run time.
> 
> e.g. when enabled:
> cat /sys/kernel/security/lockdown
> none integrity [confidentiality]
> 
> [...]

Here is the summary with links:
  - xfrm: redact SA secret with lockdown confidentiality
https://git.kernel.org/netdev/net-next/c/c7a5899eb26e

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: pull request (net-next): ipsec-next 2020-12-12

2020-12-12 Thread patchwork-bot+netdevbpf
Hello:

This pull request was applied to netdev/net-next.git (refs/heads/master):

On Sat, 12 Dec 2020 09:57:36 +0100 you wrote:
> Just one patch this time:
> 
> 1) Redact the SA keys with kernel lockdown confidentiality.
>If enabled, no secret keys are sent to uuserspace.
>From Antony Antony.
> 
> Please pull or let me know if there are problems.
> 
> [...]

Here is the summary with links:
  - pull request (net-next): ipsec-next 2020-12-12
https://git.kernel.org/netdev/net-next/c/e2437ac2f59d

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH 0/3] PROTO_CMSG_DATA_ONLY for Datagram (UDP)

2020-12-12 Thread Victor Stewart
On Sat, Dec 12, 2020 at 6:02 PM Jens Axboe  wrote:
>
> On 12/12/20 10:58 AM, Victor Stewart wrote:
> > On Sat, Dec 12, 2020 at 5:40 PM Jens Axboe  wrote:
> >>
> >> On 12/12/20 10:25 AM, Victor Stewart wrote:
> >>> On Sat, Dec 12, 2020 at 5:07 PM Jens Axboe  wrote:
> 
>  On 12/12/20 8:31 AM, Victor Stewart wrote:
> > RE our conversation on the "[RFC 0/1] whitelisting UDP GSO and GRO
> > cmsgs" thread...
> >
> > https://lore.kernel.org/io-uring/CAM1kxwi5m6i8hrtkw7nZYoziPTD-Wp03+fcsUwh3CuSc=81...@mail.gmail.com/
> >
> > here are the patches we discussed.
> >
> > Victor Stewart (3):
> >net/socket.c: add PROTO_CMSG_DATA_ONLY to __sys_sendmsg_sock
> >net/ipv4/af_inet.c: add PROTO_CMSG_DATA_ONLY to inet_dgram_ops
> >net/ipv6/af_inet6.c: add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops
> >
> >net/ipv4/af_inet.c
> >  |   1 +
> >net/ipv6/af_inet6.c
> > |   1 +
> >net/socket.c
> >|   8 +-
> >3 files changed, 7 insertions(+), 3 deletions(-)
> 
>  Changes look fine to me, but a few comments:
> 
>  - I'd order 1/3 as 3/3, that ordering makes more sense as at that point 
>  it
>    could actually be used.
> >>>
> >>> right that makes sense.
> >>>
> 
>  - For adding it to af_inet/af_inet6, you should write a better commit 
>  message
>    on the reasoning for the change. Right now it just describes what the
>    patch does (which is obvious from the change), not WHY it's done. 
>  Really
>    goes for current 1/3 as well, commit messages need to be better in
>    general.
> 
> >>>
> >>> okay thanks Jens. i would have reiterated the intention but assumed it
> >>> were implicit given I linked the initial conversation about enabling
> >>> UDP_SEGMENT (GSO) and UDP_GRO through io_uring.
> >>>
>  I'd also CC Jann Horn on the series, he's the one that found an issue 
>  there
>  in the past and also acked the previous change on doing 
>  PROTO_CMSG_DATA_ONLY.
> >>>
> >>> I CCed him on this reply. Soheil at the end of the first exchange
> >>> thread said he audited the UDP paths and believed this to be safe.
> >>>
> >>> how/should I resubmit the patch with a proper intention explanation in
> >>> the meta and reorder the patches? my first patch and all lol.
> >>
> >> Just post is as a v2 with the change noted in the cover letter. I'd also
> >> ensure that it threads properly, right now it's just coming through as 4
> >> separate emails at my end. If you're using git send-email, make sure you
> >> add --thread to the arguments.
> >
> > oh i didn't know about git send-email. i was manually constructing /
> > sending them lol. thanks!
>
> I'd recommend it, makes sure your mailer doesn't mangle anything either. FWIW,
> this is what I do:
>
> git format-patch sha1..sha2
> mv 00*.patch /tmp/x
>
> git send-email --no-signed-off-by-cc --thread --compose  --to 
> linux-fsde...@vger.kernel.org --cc torva...@linux-foundation.org --cc 
> v...@zeniv.linux.org.uk /tmp/x
>
> (from a series I just sent out). And then I have the following section in
> ~/.gitconfig:
>
> [sendemail]
> from = Jens Axboe 
> smtpserver = smtp.gmail.com
> smtpuser = ax...@kernel.dk
> smtpencryption = tls
> smtppass = hunter2
> smtpserverport = 587
>
> for using gmail to send them out.
>
> --compose will fire up your editor to construct the cover letter, and
> when you're happy with it, save+exit and git send-email will ask whether
> to proceed or abort.
>
> That's about all there is to it, and provides a consistent way to send out
> patch series.

awesome thanks! i'll be using this workflow from now on.

P.S. hope thats not your real password LOL

>
> --
> Jens Axboe
>


Re: [PATCH 0/3] PROTO_CMSG_DATA_ONLY for Datagram (UDP)

2020-12-12 Thread Jens Axboe
On 12/12/20 2:42 PM, Victor Stewart wrote:
> On Sat, Dec 12, 2020 at 6:02 PM Jens Axboe  wrote:
>>
>> On 12/12/20 10:58 AM, Victor Stewart wrote:
>>> On Sat, Dec 12, 2020 at 5:40 PM Jens Axboe  wrote:

 On 12/12/20 10:25 AM, Victor Stewart wrote:
> On Sat, Dec 12, 2020 at 5:07 PM Jens Axboe  wrote:
>>
>> On 12/12/20 8:31 AM, Victor Stewart wrote:
>>> RE our conversation on the "[RFC 0/1] whitelisting UDP GSO and GRO
>>> cmsgs" thread...
>>>
>>> https://lore.kernel.org/io-uring/CAM1kxwi5m6i8hrtkw7nZYoziPTD-Wp03+fcsUwh3CuSc=81...@mail.gmail.com/
>>>
>>> here are the patches we discussed.
>>>
>>> Victor Stewart (3):
>>>net/socket.c: add PROTO_CMSG_DATA_ONLY to __sys_sendmsg_sock
>>>net/ipv4/af_inet.c: add PROTO_CMSG_DATA_ONLY to inet_dgram_ops
>>>net/ipv6/af_inet6.c: add PROTO_CMSG_DATA_ONLY to inet6_dgram_ops
>>>
>>>net/ipv4/af_inet.c
>>>  |   1 +
>>>net/ipv6/af_inet6.c
>>> |   1 +
>>>net/socket.c
>>>|   8 +-
>>>3 files changed, 7 insertions(+), 3 deletions(-)
>>
>> Changes look fine to me, but a few comments:
>>
>> - I'd order 1/3 as 3/3, that ordering makes more sense as at that point 
>> it
>>   could actually be used.
>
> right that makes sense.
>
>>
>> - For adding it to af_inet/af_inet6, you should write a better commit 
>> message
>>   on the reasoning for the change. Right now it just describes what the
>>   patch does (which is obvious from the change), not WHY it's done. 
>> Really
>>   goes for current 1/3 as well, commit messages need to be better in
>>   general.
>>
>
> okay thanks Jens. i would have reiterated the intention but assumed it
> were implicit given I linked the initial conversation about enabling
> UDP_SEGMENT (GSO) and UDP_GRO through io_uring.
>
>> I'd also CC Jann Horn on the series, he's the one that found an issue 
>> there
>> in the past and also acked the previous change on doing 
>> PROTO_CMSG_DATA_ONLY.
>
> I CCed him on this reply. Soheil at the end of the first exchange
> thread said he audited the UDP paths and believed this to be safe.
>
> how/should I resubmit the patch with a proper intention explanation in
> the meta and reorder the patches? my first patch and all lol.

 Just post is as a v2 with the change noted in the cover letter. I'd also
 ensure that it threads properly, right now it's just coming through as 4
 separate emails at my end. If you're using git send-email, make sure you
 add --thread to the arguments.
>>>
>>> oh i didn't know about git send-email. i was manually constructing /
>>> sending them lol. thanks!
>>
>> I'd recommend it, makes sure your mailer doesn't mangle anything either. 
>> FWIW,
>> this is what I do:
>>
>> git format-patch sha1..sha2
>> mv 00*.patch /tmp/x
>>
>> git send-email --no-signed-off-by-cc --thread --compose  --to 
>> linux-fsde...@vger.kernel.org --cc torva...@linux-foundation.org --cc 
>> v...@zeniv.linux.org.uk /tmp/x
>>
>> (from a series I just sent out). And then I have the following section in
>> ~/.gitconfig:
>>
>> [sendemail]
>> from = Jens Axboe 
>> smtpserver = smtp.gmail.com
>> smtpuser = ax...@kernel.dk
>> smtpencryption = tls
>> smtppass = hunter2
>> smtpserverport = 587
>>
>> for using gmail to send them out.
>>
>> --compose will fire up your editor to construct the cover letter, and
>> when you're happy with it, save+exit and git send-email will ask whether
>> to proceed or abort.
>>
>> That's about all there is to it, and provides a consistent way to send out
>> patch series.
> 
> awesome thanks! i'll be using this workflow from now on.
> 
> P.S. hope thats not your real password LOL

Haha it's not, google hunter2 and password and you'll see :-)

-- 
Jens Axboe



Re: [PATCH 1/1] net: Fix use of proc_fs

2020-12-12 Thread Jakub Kicinski
On Sat, 12 Dec 2020 23:39:20 +0200 Yonatan Linik wrote:
> On Sat, Dec 12, 2020 at 9:48 PM Jakub Kicinski  wrote:
> >
> > On Fri, 11 Dec 2020 18:37:49 +0200 Yonatan Linik wrote:  
> > > proc_fs was used, in af_packet, without a surrounding #ifdef,
> > > although there is no hard dependency on proc_fs.
> > > That caused the initialization of the af_packet module to fail
> > > when CONFIG_PROC_FS=n.
> > >
> > > Specifically, proc_create_net() was used in af_packet.c,
> > > and when it fails, packet_net_init() returns -ENOMEM.
> > > It will always fail when the kernel is compiled without proc_fs,
> > > because, proc_create_net() for example always returns NULL.
> > >
> > > The calling order that starts in af_packet.c is as follows:
> > > packet_init()
> > > register_pernet_subsys()
> > > register_pernet_operations()
> > > __register_pernet_operations()
> > > ops_init()
> > > ops->init() (packet_net_ops.init=packet_net_init())
> > > proc_create_net()
> > >
> > > It worked in the past because register_pernet_subsys()'s return value
> > > wasn't checked before this Commit 36096f2f4fa0 ("packet: Fix error path in
> > > packet_init.").
> > > It always returned an error, but was not checked before, so everything
> > > was working even when CONFIG_PROC_FS=n.
> > >
> > > The fix here is simply to add the necessary #ifdef.
> > >
> > > Signed-off-by: Yonatan Linik   
> >
> > Hm, I'm guessing you hit this on a kernel upgrade of a real system?  
> 
> Yeah, suddenly using socket with AF_PACKET didn't work,
> so I checked what happened.
> 
> > It seems like all callers to proc_create_net (and friends) interpret
> > NULL as an error, but only handful is protected by an ifdef.  
> 
> I guess where there is no ifdef,
> there should be a hard dependency on procfs,
> using depends on in the Kconfig.
> Maybe that's not the case everywhere it should be.

You're right, on a closer look most of the places have a larger #ifdef
block (which my grep didn't catch) or are under Kconfig. Of those I
checked only TLS looks wrong (good job me) - would you care to fix that
one as well, or should I?

> > I checked a few and none of them cares about the proc_dir_entry pointer
> > that gets returned. Should we perhaps rework the return values of the
> > function so that we can return success if !CONFIG_PROC_FS without
> > having to yield a pointer?  
> 
> Sometimes the pointer returned is used,
> for example in drivers/acpi/button.c.
> Are you suggesting returning a bool while
> having the pointer as an out parameter?
> Because that would still be problematic where the pointer is used.

Ack, I was only thinking of changing proc_create_net* but as you
rightly pointed out most callers already deal with the problem, 
so maybe it's not worth refactoring.

> > Obviously we can apply this fix so we can backport to 5.4 if you need
> > it. I think the ifdef is fine, since it's what other callers have.
> 
> It would be great to apply this where the problem exists,
> I believe this applies to other versions as well.

Will do. Linus is likely to cut the final 5.11 release on Sunday, so
it needs to wait until next week for process reasons but it won't get
lost. For the record:

Fixes: 36096f2f4fa0 ("packet: Fix error path in packet_init")


Re: [PATCH] igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr

2020-12-12 Thread Jakub Kicinski
On Fri, 11 Dec 2020 22:34:56 +0800 Kevin Lo wrote:
> This patch sets the default return value to -IGC_ERR_NVM in 
> igc_write_nvm_srwr.
> Without this change it wouldn't lead to a shadow RAM write EEWR timeout.
> 
> Signed-off-by: Kevin Lo 

This is a fix, please add a Fixes tag.

Please CC the maintainers:

M:  Jesse Brandeburg 
M:  Tony Nguyen 


Re: [PATCH v3 net-next] net: dsa: reference count the host mdb addresses

2020-12-12 Thread Andrew Lunn
> + /* Complication created by the fact that addition has two phases, but
> +  * deletion only has one phase, and we need reference counting.
> +  * The strategy is to do the memory allocation in the prepare phase,
> +  * but initialize the refcount in the commit phase.
> +  *
> +  * Have mdb | mdb has refcount > 0  | Commit phase  | Resolution
> +  * -+---+---+---
> +  * no   | - | no| Alloc & 
> proceed

This does not look right.

The point of the prepare phase is to allow all the different layers
involved to allocate whatever they need and to validate they can do
the requested action. Any layer can say, No, stop, i cannot do
this. The commit phase will then not happen. But that also means the
prepare phase should not do any state changes. So you should not be
proceeding here, just allocating.

And you need some way to cleanup the allocated memory when the commit
never happens because some other layer has said No!

 Andrew


Re: [PATCH net-next v2] GTP: add support for flow based tunneling API

2020-12-12 Thread Jakub Kicinski
On Fri, 11 Dec 2020 20:40:17 -0800 Pravin B Shelar wrote:
> Following patch add support for flow based tunneling API
> to send and recv GTP tunnel packet over tunnel metadata API.
> This would allow this device integration with OVS or eBPF using
> flow based tunneling APIs.
> 
> Signed-off-by: Pravin B Shelar 
> ---
> Fixed according to comments from Jonas Bonn

This adds a sparse warning:

drivers/net/gtp.c:218:39: warning: incorrect type in assignment (different base 
types)
drivers/net/gtp.c:218:39:expected restricted __be16 [usertype] protocol
drivers/net/gtp.c:218:39:got int

Coding nits below.

> @@ -179,33 +183,107 @@ static bool gtp_check_ms(struct sk_buff *skb, struct 
> pdp_ctx *pctx,
>   return false;
>  }
>  
> -static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
> - unsigned int hdrlen, unsigned int role)
> +static int gtp_rx(struct gtp_dev *gtp, struct sk_buff *skb,
> +   unsigned int hdrlen, u8 gtp_version, unsigned int role,
> +   __be64 tid, u8 flags, u8 type)
>  {
> - if (!gtp_check_ms(skb, pctx, hdrlen, role)) {
> - netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
> - return 1;
> - }
> + if (ip_tunnel_collect_metadata() || gtp->collect_md) {

nit: this is a static function which is almost entirely indented now.
Please refactor.

> + struct metadata_dst *tun_dst;
> + int opts_len = 0;
> +
> + if (unlikely(flags & GTP1_F_MASK))
> + opts_len = sizeof(struct gtpu_metadata);
> +
> + tun_dst =
> + udp_tun_rx_dst(skb, gtp->sk1u->sk_family, TUNNEL_KEY, 
> tid, opts_len);

Strange why to break a like after =, rather than just wrap function
args.

> + if (!tun_dst) {
> + netdev_dbg(gtp->dev, "Failed to allocate tun_dst");
> + goto err;
> + }
>  
> - /* Get rid of the GTP + UDP headers. */
> - if (iptunnel_pull_header(skb, hdrlen, skb->protocol,
> -  !net_eq(sock_net(pctx->sk), 
> dev_net(pctx->dev
> - return -1;
> + netdev_dbg(gtp->dev, "attaching metadata_dst to skb, gtp ver %d 
> hdrlen %d\n",
> +gtp_version, hdrlen);
> + if (unlikely(opts_len)) {
> + struct gtpu_metadata *opts = 
> ip_tunnel_info_opts(&tun_dst->u.tun_info);
> + struct gtp1_header *gtp1 = (struct gtp1_header 
> *)(skb->data +
> +   
> sizeof(struct udphdr));

Why bother initializing inline if it creates very long lines?
Please move both of those to separate statements.

> + opts->ver = GTP_METADATA_V1;
> + opts->flags = gtp1->flags;
> + opts->type = gtp1->type;
> + netdev_dbg(gtp->dev, "recved control pkt: flag %x type: 
> %d\n",
> +opts->flags, opts->type);
> + tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
> + tun_dst->u.tun_info.options_len = opts_len;
> + skb->protocol = 0x; /* Unknown */
> + }
> + /* Get rid of the GTP + UDP headers. */
> + if (iptunnel_pull_header(skb, hdrlen, skb->protocol,
> +  !net_eq(sock_net(gtp->sk1u), 
> dev_net(gtp->dev {
> + gtp->dev->stats.rx_length_errors++;
> + goto err;
> + }
> +
> + skb_dst_set(skb, &tun_dst->dst);
> + } else {
> + struct pdp_ctx *pctx;
> +
> + if (flags & GTP1_F_MASK)
> + hdrlen += 4;
> +
> + if (type != GTP_TPDU)
> + return 1;
>  
> - netdev_dbg(pctx->dev, "forwarding packet from GGSN to uplink\n");
> + if (gtp_version == GTP_V0)
> + pctx = gtp0_pdp_find(gtp, be64_to_cpu(tid));
> + else
> + pctx = gtp1_pdp_find(gtp, be64_to_cpu(tid));
> + if (!pctx) {
> + netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", 
> skb);
> + return 1;
> + }
> +
> + if (!gtp_check_ms(skb, pctx, hdrlen, role)) {
> + netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
> + return 1;
> + }
> + /* Get rid of the GTP + UDP headers. */
> + if (iptunnel_pull_header(skb, hdrlen, skb->protocol,
> +  !net_eq(sock_net(pctx->sk), 
> dev_net(gtp->dev {
> + gtp->dev->stats.rx_length_errors++;
> + goto err;
> + }
> + }
> + netdev_dbg(gtp->dev, "forwarding packet from GGSN to uplink\n");
>  
>   /* Now that the UDP and the GTP header have 

Re: [PATCH v3 net-next] net: dsa: reference count the host mdb addresses

2020-12-12 Thread Vladimir Oltean
On Sat, Dec 12, 2020 at 11:06:41PM +0100, Andrew Lunn wrote:
> > +   /* Complication created by the fact that addition has two phases, but
> > +* deletion only has one phase, and we need reference counting.
> > +* The strategy is to do the memory allocation in the prepare phase,
> > +* but initialize the refcount in the commit phase.
> > +*
> > +* Have mdb | mdb has refcount > 0  | Commit phase  | Resolution
> > +* -+---+---+---
> > +* no   | - | no| Alloc & 
> > proceed
>
> This does not look right.
>
> The point of the prepare phase is to allow all the different layers
> involved to allocate whatever they need and to validate they can do
> the requested action. Any layer can say, No, stop, i cannot do
> this. The commit phase will then not happen. But that also means the
> prepare phase should not do any state changes. So you should not be
> proceeding here, just allocating.

Are you commenting based on the code, or just based on the comment?
If just based on the comment, then yeah, sorry. I was limited to 80
characters, and I couldn't specify "proceed to what". It's just "proceed
to call the prepare phase of the driver". Which is... normal and
expected, and does not contradict what you said above.

> And you need some way to cleanup the allocated memory when the commit
> never happens because some other layer has said No!

So this would be a fatal problem with the switchdev transactional model
if I am not misunderstanding it. On one hand there's this nice, bubbly
idea that you should preallocate memory in the prepare phase, so that
there's one reason less to fail at commit time. But on the other hand,
if "the commit phase might never happen" is even a remove possibility,
all of that goes to trash - how are you even supposed to free the
preallocated memory.

Sorry, I don't think that there's any possibility for the commit phase
to not happen as long as everybody is in agreement that the preparation
phase went ok. If you look at the code, I even allocated the memory at
preparation time _before_ calling into the driver, to ensure that we're
not giving the driver the false impression that it gave switchdev the
green light but the commit never came. If our allocation in the DSA core
fails during the prepare phase, the prepare phase of the driver is not
even called.

That being said, please let me know if you spot bugs in the actual code.
I tested it and it appeared to work ok (I also put debugging prints to
make sure that the refcounting works ok and the entries are removed
after all of them expire).

Re: [Patch bpf-next 0/3] bpf: introduce timeout map

2020-12-12 Thread Cong Wang
On Fri, Dec 11, 2020 at 11:55 AM Andrii Nakryiko
 wrote:
>
> On Fri, Dec 11, 2020 at 2:28 AM Cong Wang  wrote:
> >
> > From: Cong Wang 
> >
> > This patchset introduces a new bpf hash map which has timeout.
> > Patch 1 is a preparation, patch 2 is the implementation of timeout
> > map, patch 3 contains a test case for timeout map. Please check each
> > patch description for more details.
> >
> > ---
>
> This patch set seems to be breaking existing selftests. Please take a
> look ([0]).

Interesting, looks unrelated to my patches but let me double check.

> Also patch #3 should have a commit message, even if pretty trivial one.

Yeah, I thought its subject is sufficient for a trivial patch.

Thanks.


Re: [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout

2020-12-12 Thread Jakub Kicinski
On Tue,  8 Dec 2020 17:19:10 +0800 Cambda Zhu wrote:
> For each TCP zero window probe, the icsk_backoff is increased by one and
> its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
> probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
> shift count would be masked to range 0 to 63. And on ARMv7 the result is
> zero. If the shift count is masked, only several probes will be sent
> with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
> needs tcp_retries2 times probes to end this false timeout. Besides,
> bitwise shift greater than or equal to the width is an undefined
> behavior.

If icsk_backoff can reach 64, can it not also reach 256 and wrap?

Adding Eric's address from MAINTAINERS to CC.

> This patch adds a limit to the backoff. The max value of max_when is
> TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit
> is the backoff from TCP_RTO_MIN to TCP_RTO_MAX.

> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index d4ef5bf94168..82044179c345 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1321,7 +1321,9 @@ static inline unsigned long tcp_probe0_base(const 
> struct sock *sk)
>  static inline unsigned long tcp_probe0_when(const struct sock *sk,
>   unsigned long max_when)
>  {
> - u64 when = (u64)tcp_probe0_base(sk) << inet_csk(sk)->icsk_backoff;
> + u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1,
> +inet_csk(sk)->icsk_backoff);
> + u64 when = (u64)tcp_probe0_base(sk) << backoff;
>  
>   return (unsigned long)min_t(u64, when, max_when);
>  }



Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support

2020-12-12 Thread Jakub Kicinski
On Tue,  8 Dec 2020 16:54:43 -0800 Wei Wang wrote:
> This patch allows running each napi poll loop inside its own
> kernel thread.
> The threaded mode could be enabled through napi_set_threaded()
> api, and does not require a device up/down. The kthread gets
> created on demand when napi_set_threaded() is called, and gets
> shut down eventually in napi_disable().
> 
> Once that threaded mode is enabled and the kthread is
> started, napi_schedule() will wake-up such thread instead
> of scheduling the softirq.
> 
> The threaded poll loop behaves quite likely the net_rx_action,
> but it does not have to manipulate local irqs and uses
> an explicit scheduling point based on netdev_budget.
> 
> Co-developed-by: Paolo Abeni 
> Signed-off-by: Paolo Abeni 
> Co-developed-by: Hannes Frederic Sowa 
> Signed-off-by: Hannes Frederic Sowa 
> Co-developed-by: Jakub Kicinski 
> Signed-off-by: Jakub Kicinski 
> Signed-off-by: Wei Wang 
> Reviewed-by: Eric Dumazet 

> @@ -4234,6 +4265,11 @@ int gro_normal_batch __read_mostly = 8;
>  static inline void napi_schedule(struct softnet_data *sd,
>struct napi_struct *napi)
>  {
> + if (test_bit(NAPI_STATE_THREADED, &napi->state)) {
> + wake_up_process(napi->thread);

FTR your implementation depends on the fact that this is the only
place that can wake the worker and not set kthread_should_stop().
Which I trust you is the case :) maybe I already mentioned this..

> + return;
> + }
> +
>   list_add_tail(&napi->poll_list, &sd->poll_list);
>   __raise_softirq_irqoff(NET_RX_SOFTIRQ);
>  }

>  void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>   int (*poll)(struct napi_struct *, int), int weight)
>  {
> @@ -6731,6 +6790,7 @@ void napi_disable(struct napi_struct *n)
>   msleep(1);
>  
>   hrtimer_cancel(&n->timer);
> + napi_kthread_stop(n);

I'm surprised that we stop the thread on napi_disable() but there is no
start/create in napi_enable(). NAPIs can (and do get) disabled and
enabled again. But that'd make your code crash with many popular
drivers if you tried to change rings with threaded napi enabled so I
feel like I must be missing something..

>   clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
>   clear_bit(NAPI_STATE_DISABLE, &n->state);



Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support

2020-12-12 Thread Jakub Kicinski
On Sat, 12 Dec 2020 14:50:22 -0800 Jakub Kicinski wrote:
> > @@ -6731,6 +6790,7 @@ void napi_disable(struct napi_struct *n)
> > msleep(1);
> >  
> > hrtimer_cancel(&n->timer);
> > +   napi_kthread_stop(n);  
> 
> I'm surprised that we stop the thread on napi_disable() but there is no
> start/create in napi_enable(). NAPIs can (and do get) disabled and
> enabled again. But that'd make your code crash with many popular
> drivers if you tried to change rings with threaded napi enabled so I
> feel like I must be missing something..

Ah, not crash, 'cause the flag gets cleared. Is it intentional that any
changes that disable NAPIs cause us to go back to non-threaded NAPI?
I think I had the "threaded" setting stored in struct netdevice in my
patches, is there a reason not to do that?

In fact your patches may _require_ the device to be up to enable
threaded NAPI if NAPIs are allocated in open.


Re: [PATCH net-next v4 3/3] net: add sysfs attribute to control napi threaded mode

2020-12-12 Thread Jakub Kicinski
On Tue,  8 Dec 2020 16:54:44 -0800 Wei Wang wrote:
> +static void dev_disable_threaded_all(struct net_device *dev)
> +{
> + struct napi_struct *napi;
> +
> + list_for_each_entry(napi, &dev->napi_list, dev_list)
> + napi_set_threaded(napi, false);
> +}

This is an implementation detail which should be hidden in dev.c IMHO.
Since the sysfs knob is just a device global on/off the iteration over
napis and all is best hidden away.

(sorry about the delayed review BTW, hope we can do a minor revision
and still hit 5.12)

> +static int modify_napi_threaded(struct net_device *dev, unsigned long val)
> +{
> + struct napi_struct *napi;
> + int ret;
> +
> + if (list_empty(&dev->napi_list))
> + return -EOPNOTSUPP;
> +
> + list_for_each_entry(napi, &dev->napi_list, dev_list) {
> + ret = napi_set_threaded(napi, !!val);
> + if (ret) {
> + /* Error occurred on one of the napi,
> +  * reset threaded mode on all napi.
> +  */
> + dev_disable_threaded_all(dev);
> + break;
> + }
> + }
> +
> + return ret;
> +}


[PATCH net-next 06/10] netfilter: ctnetlink: add timeout and protoinfo to destroy events

2020-12-12 Thread Pablo Neira Ayuso
From: Florian Westphal 

DESTROY events do not include the remaining timeout.

Add the timeout if the entry was removed explicitly. This can happen
when a conntrack gets deleted prematurely, e.g. due to a tcp reset,
module removal, netdev notifier (nat/masquerade device went down),
ctnetlink and so on.

Add the protocol state too for the destroy message to check for abnormal
state on connection termination.

Joint work with Pablo.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack_l4proto.h |  2 +-
 net/netfilter/nf_conntrack_netlink.c | 31 +---
 net/netfilter/nf_conntrack_proto_dccp.c  | 13 ++--
 net/netfilter/nf_conntrack_proto_sctp.c  | 13 +---
 net/netfilter/nf_conntrack_proto_tcp.c   | 13 +---
 5 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_l4proto.h 
b/include/net/netfilter/nf_conntrack_l4proto.h
index 9be7320b994f..96f9cf81f46b 100644
--- a/include/net/netfilter/nf_conntrack_l4proto.h
+++ b/include/net/netfilter/nf_conntrack_l4proto.h
@@ -32,7 +32,7 @@ struct nf_conntrack_l4proto {
 
/* convert protoinfo to nfnetink attributes */
int (*to_nlattr)(struct sk_buff *skb, struct nlattr *nla,
-struct nf_conn *ct);
+struct nf_conn *ct, bool destroy);
 
/* convert nfnetlink attributes to protoinfo */
int (*from_nlattr)(struct nlattr *tb[], struct nf_conn *ct);
diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index 3d0fd33be018..84caf3316946 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -167,10 +167,14 @@ static int ctnetlink_dump_status(struct sk_buff *skb, 
const struct nf_conn *ct)
return -1;
 }
 
-static int ctnetlink_dump_timeout(struct sk_buff *skb, const struct nf_conn 
*ct)
+static int ctnetlink_dump_timeout(struct sk_buff *skb, const struct nf_conn 
*ct,
+ bool skip_zero)
 {
long timeout = nf_ct_expires(ct) / HZ;
 
+   if (skip_zero && timeout == 0)
+   return 0;
+
if (nla_put_be32(skb, CTA_TIMEOUT, htonl(timeout)))
goto nla_put_failure;
return 0;
@@ -179,7 +183,8 @@ static int ctnetlink_dump_timeout(struct sk_buff *skb, 
const struct nf_conn *ct)
return -1;
 }
 
-static int ctnetlink_dump_protoinfo(struct sk_buff *skb, struct nf_conn *ct)
+static int ctnetlink_dump_protoinfo(struct sk_buff *skb, struct nf_conn *ct,
+   bool destroy)
 {
const struct nf_conntrack_l4proto *l4proto;
struct nlattr *nest_proto;
@@ -193,7 +198,7 @@ static int ctnetlink_dump_protoinfo(struct sk_buff *skb, 
struct nf_conn *ct)
if (!nest_proto)
goto nla_put_failure;
 
-   ret = l4proto->to_nlattr(skb, nest_proto, ct);
+   ret = l4proto->to_nlattr(skb, nest_proto, ct, destroy);
 
nla_nest_end(skb, nest_proto);
 
@@ -537,8 +542,8 @@ static int ctnetlink_dump_info(struct sk_buff *skb, struct 
nf_conn *ct)
return -1;
 
if (!test_bit(IPS_OFFLOAD_BIT, &ct->status) &&
-   (ctnetlink_dump_timeout(skb, ct) < 0 ||
-ctnetlink_dump_protoinfo(skb, ct) < 0))
+   (ctnetlink_dump_timeout(skb, ct, false) < 0 ||
+ctnetlink_dump_protoinfo(skb, ct, false) < 0))
return -1;
 
return 0;
@@ -780,15 +785,19 @@ ctnetlink_conntrack_event(unsigned int events, struct 
nf_ct_event *item)
goto nla_put_failure;
 
if (events & (1 << IPCT_DESTROY)) {
+   if (ctnetlink_dump_timeout(skb, ct, true) < 0)
+   goto nla_put_failure;
+
if (ctnetlink_dump_acct(skb, ct, type) < 0 ||
-   ctnetlink_dump_timestamp(skb, ct) < 0)
+   ctnetlink_dump_timestamp(skb, ct) < 0 ||
+   ctnetlink_dump_protoinfo(skb, ct, true) < 0)
goto nla_put_failure;
} else {
-   if (ctnetlink_dump_timeout(skb, ct) < 0)
+   if (ctnetlink_dump_timeout(skb, ct, false) < 0)
goto nla_put_failure;
 
-   if (events & (1 << IPCT_PROTOINFO)
-   && ctnetlink_dump_protoinfo(skb, ct) < 0)
+   if (events & (1 << IPCT_PROTOINFO) &&
+   ctnetlink_dump_protoinfo(skb, ct, false) < 0)
goto nla_put_failure;
 
if ((events & (1 << IPCT_HELPER) || nfct_help(ct))
@@ -2720,10 +2729,10 @@ static int __ctnetlink_glue_build(struct sk_buff *skb, 
struct nf_conn *ct)
if (ctnetlink_dump_status(skb, ct) < 0)
goto nla_put_failure;
 
-   if (ctnetlink_dump_timeout(skb, ct) < 0)
+   if (ctnetlink_dump_timeout(skb, ct, false) < 0)
goto nla_put_failure;
 
-   if (ctnetlink_dump_protoinfo(skb,

Re: [PATCH] nfc: s3fwrn5: let core configure the interrupt trigger

2020-12-12 Thread Jakub Kicinski
On Thu, 10 Dec 2020 22:18:24 +0100 Krzysztof Kozlowski wrote:
> If interrupt trigger is not set when requesting the interrupt, the core
> will take care of reading trigger type from Devicetree.  There is no
> point to do it in the driver.
> 
> Signed-off-by: Krzysztof Kozlowski 

Applied, thank you!


  1   2   >