date:20200729

Re: [V9fs-developer] [PATCH kernel] 9p/trans_fd: Check file mode at opening

2020-07-29 Thread Greg Kurz

On Wed, 29 Jul 2020 09:50:21 +1000
Alexey Kardashevskiy  wrote:

> 
> 
> On 29/07/2020 03:42, Greg Kurz wrote:
> > Hi Alexey,
> > 
> > Working on 9p now ?!? ;-)
> 
> No, I am running syzkaller and seeing things :)
> 

:)

> 
> > Cc'ing Dominique Martinet who appears to be the person who takes care of 9p
> > these days.
> > 
> > On Tue, 28 Jul 2020 22:41:29 +1000
> > Alexey Kardashevskiy  wrote:
> > 
> >> The "fd" transport layer uses 2 file descriptors passed externally
> >> and calls kernel_write()/kernel_read() on these. If files were opened
> >> without FMODE_WRITE/FMODE_READ, WARN_ON_ONCE() will fire.
> >>
> >> This adds file mode checking in p9_fd_open; this returns -EBADF to
> >> preserve the original behavior.
> >>
> > 
> > So this would cause open() to fail with EBADF, which might look a bit
> > weird to userspace since it didn't pass an fd... Is this to have a
> > different error than -EIO that is returned when either rfd or wfd
> > doesn't point to an open file descriptor ?
> 
> This is only to preserve the existing behavior.
> 
> > If yes, why do we care ?
> 
> 
> Without the patch, p9_fd_open() produces a kernel warning which is not
> great by itself and becomes crash with panic_on_warn.
> 

I don't question the patch, just the errno. Why not returning -EIO ?

> 
> 
> > 
> >> Found by syzkaller.
> >>
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>  net/9p/trans_fd.c | 7 ++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> >> index 13cd683a658a..62cdfbd01f0a 100644
> >> --- a/net/9p/trans_fd.c
> >> +++ b/net/9p/trans_fd.c
> >> @@ -797,6 +797,7 @@ static int parse_opts(char *params, struct p9_fd_opts 
> >> *opts)
> >>  
> >>  static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
> >>  {
> >> +  bool perm;
> >>struct p9_trans_fd *ts = kzalloc(sizeof(struct p9_trans_fd),
> >>   GFP_KERNEL);
> >>if (!ts)
> >> @@ -804,12 +805,16 @@ static int p9_fd_open(struct p9_client *client, int 
> >> rfd, int wfd)
> >>  
> >>ts->rd = fget(rfd);
> >>ts->wr = fget(wfd);
> >> -  if (!ts->rd || !ts->wr) {
> >> +  perm = ts->rd && (ts->rd->f_mode & FMODE_READ) &&
> >> + ts->wr && (ts->wr->f_mode & FMODE_WRITE);
> >> +  if (!ts->rd || !ts->wr || !perm) {
> >>if (ts->rd)
> >>fput(ts->rd);
> >>if (ts->wr)
> >>fput(ts->wr);
> >>kfree(ts);
> >> +  if (!perm)
> >> +  return -EBADF;
> >>return -EIO;
> >>}
> >>  
> > 
>

回复: INFO: rcu detected stall in tc_modify_qdisc

2020-07-29 Thread Zhang, Qiang




发件人: linux-kernel-ow...@vger.kernel.org  代表 
syzbot 
发送时间: 2020年7月29日 13:53
收件人: da...@davemloft.net; fweis...@gmail.com; j...@mojatatu.com; 
j...@resnulli.us; linux-ker...@vger.kernel.org; mi...@kernel.org; 
netdev@vger.kernel.org; syzkaller-b...@googlegroups.com; t...@linutronix.de; 
vinicius.go...@intel.com; xiyou.wangc...@gmail.com
主题: INFO: rcu detected stall in tc_modify_qdisc

Hello,

syzbot found the following issue on:

HEAD commit:181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
git tree:   net
console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890
kernel config:  https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15b2d79090

The issue was bisected to:

commit 5a781ccbd19e4664babcbe4b4ead7aa2b9283d22
Author: Vinicius Costa Gomes 
Date:   Sat Sep 29 00:59:43 2018 +

tc: Add support for configuring the taprio scheduler

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=160e1bac90
console output: https://syzkaller.appspot.com/x/log.txt?x=110e1bac90

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9f78d5c664a8c33f4...@syzkaller.appspotmail.com
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:1-...!: (1 GPs behind) idle=6f6/1/0x4000 
softirq=10195/10196 fqs=1
(t=27930 jiffies g=9233 q=413)
rcu: rcu_preempt kthread starved for 27901 jiffies! g9233 f0x0 
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu:Unless rcu_preempt kthread gets sufficient CPU time, OOM is now 
expected behavior.
rcu: RCU grace-period kthread stack dump:
rcu_preempt R  running task2911210  2 0x4000
Call Trace:
 context_switch kernel/sched/core.c:3458 [inline]
 __schedule+0x8ea/0x2210 kernel/sched/core.c:4219
 schedule+0xd0/0x2a0 kernel/sched/core.c:4294
 schedule_timeout+0x148/0x250 kernel/time/timer.c:1908
 rcu_gp_fqs_loop kernel/rcu/tree.c:1874 [inline]
 rcu_gp_kthread+0xae5/0x1b50 kernel/rcu/tree.c:2044
 kthread+0x3b5/0x4a0 kernel/kthread.c:291
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
NMI backtrace for cpu 1
CPU: 1 PID: 6799 Comm: syz-executor494 Not tainted 5.8.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1b3/0x223 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x194/0x1cf kernel/rcu/tree_stall.h:320
 print_cpu_stall kernel/rcu/tree_stall.h:553 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
 rcu_pending kernel/rcu/tree.c:3489 [inline]
 rcu_sched_clock_irq.cold+0x5b3/0xccc kernel/rcu/tree.c:2504
 update_process_times+0x25/0x60 kernel/time/timer.c:1737
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
 tick_sched_timer+0x108/0x290 kernel/time/tick-sched.c:1320
 __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
 __hrtimer_run_queues+0x1d5/0xfc0 kernel/time/hrtimer.c:1584
 hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline]
 __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
 sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 
[inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
Code: 48 c7 c0 88 e0 b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 
00 75 37 48 83 3d e3 52 cc 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 
00 00 e8 35 e5 66 f9 65 8b 05 fe 70 19 78
RSP: 0018:c900016672c0 EFLAGS: 0282
RAX: 11369c11 RBX: 0282 RCX: 0002
RDX: dc00 RSI:  RDI: 0282
RBP: 888093a052e8 R08:  R09: 
R10: 0001 R11:  R12: 0282
R13: 0078100c35c3 R14: 888093a05000 R15: 
 spin_unlock_irqrestore include/linux/spinlock.h:408 [inline]
 taprio_change+0x1fdc/0x2960 net/sched/sch_taprio.c:1557


 It looks like

Aw: Re: [PATCH] net: ethernet: mtk_eth_soc: Always call mtk_gmac0_rgmii_adjust() for mt7623

2020-07-29 Thread Frank Wunderlich

Hi,

Thank you David's to get this finally applied. Add recipient for stable tree as 
TRGMII on 5.4+ is also broken without this Patch.

regards Frank


> Gesendet: Mittwoch, 29. Juli 2020 um 02:05 Uhr
> Von: "David Miller" 
Applied.

Re: [PATCH v3] net: ethernet: mtk_eth_soc: fix mtu warning

2020-07-29 Thread Landen Chao

Hi Frank,

If you send next version of patch, you can help to add the Signed-off
line. Thanks.
Signed-off-by: Landen Chao 

On Tue, 2020-07-28 at 23:53 +0800, Jakub Kicinski wrote:
> On Tue, 28 Jul 2020 14:27:43 +0200 Frank Wunderlich wrote:
> > From: Landen Chao 
> 
> Hi gents,
> 
> if the patch is from Landen we need his sign-off on it.
> 
> > in recent Kernel-Versions there are warnings about incorrect MTU-Size
> > like these:
> > 
> > eth0: mtu greater than device maximum
> > mtk_soc_eth 1b10.ethernet eth0: error -22 setting MTU to include DSA 
> > overhead
> > 
> > Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports")
> > Fixes: 72579e14a1d3 ("net: dsa: don't fail to probe if we couldn't set the 
> > MTU")
> > Fixes: 7a4c53bee332 ("net: report invalid mtu value via netlink extack")
> > Signed-off-by: René van Dorst 
> > Signed-off-by: Frank Wunderlich 
>

Re: [PATCH] net: nixge: fix potential memory leak in nixge_probe()

2020-07-29 Thread Markus Elfring

> If some processes in nixge_probe() fail, free_netdev(dev)
> needs to be called to aviod a memory leak.

* Would you like to avoid a typo in this change description?

* An imperative wording can be preferred here, can't it?
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=6ba1b005ffc388c2aeaddae20da29e4810dea298#n151

Regards,
Markus

RE: [PATCH 2/4] net: make sockptr_is_null strict aliasing safe

2020-07-29 Thread David Laight

From: Christoph Hellwig 
> Sent: 28 July 2020 17:39
> 
> While the kernel in general is not strict aliasing safe we can trivially
> do that in sockptr_is_null without affecting code generation, so always
> check the actually assigned union member.

Even with 'strict aliasing' gcc (at least) guarantees that
the members of a union alias each other.
It is about the only way so safely interpret a float as an int.

So when sockptr_t is a union testing either member is enough.

When it is a structure the changed form almost certainly adds code.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

[PATCH v4] net: ethernet: mtk_eth_soc: fix MTU warnings

2020-07-29 Thread Frank Wunderlich

From: Landen Chao 

in recent kernel versions there are warnings about incorrect MTU size
like these:

eth0: mtu greater than device maximum
mtk_soc_eth 1b10.ethernet eth0: error -22 setting MTU to include DSA 
overhead

Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports")
Fixes: 72579e14a1d3 ("net: dsa: don't fail to probe if we couldn't set the MTU")
Fixes: 7a4c53bee332 ("net: report invalid mtu value via netlink extack")
Signed-off-by: Landen Chao 
Signed-off-by: Frank Wunderlich 
Reviewed-by: Andrew Lunn 
---
v3->v4
  - fix commit-message (hyphernations,capitalisation) as suggested by Russell
  - add Signed-off-by Landen
  - dropped wrong signed-off from rene (because previous v1/2 was from him)
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 85735d32ecb0..a1c45b39a230 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2891,6 +2891,8 @@ static int mtk_add_mac(struct mtk_eth *eth, struct 
device_node *np)
eth->netdev[id]->irq = eth->irq[0];
eth->netdev[id]->dev.of_node = np;

+   eth->netdev[id]->max_mtu = MTK_MAX_RX_LENGTH - MTK_RX_ETH_HLEN;
+
return 0;

 free_netdev:
--
2.25.1

Re: fentry/fexit attach to EXT type XDP program does not work

2020-07-29 Thread Jiri Olsa

On Wed, Jul 29, 2020 at 08:23:56AM +0200, Eelco Chaudron wrote:

SNIP

> > > > > a patch
> > > > > that would nice.
> > > > > You can also send it to me before bpf-next opens and I can verify
> > > > > it, and
> > > > > clean up the self-test so it can be included as well.
> > > > > 
> > > > 
> > > > hi,
> > > > it seems that you cannot exten fentry/fexit programs,
> > > > but it's possible to attach fentry/fexit to ext program.
> > > > 
> > > >/* Program extensions can extend all program types
> > > > * except fentry/fexit. The reason is the following.
> > > > * The fentry/fexit programs are used for performance
> > > > * analysis, stats and can be attached to any program
> > > > * type except themselves. When extension program is
> > > > * replacing XDP function it is necessary to allow
> > > > * performance analysis of all functions. Both original
> > > > * XDP program and its program extension. Hence
> > > > * attaching fentry/fexit to BPF_PROG_TYPE_EXT is
> > > > * allowed. If extending of fentry/fexit was allowed it
> > > > * would be possible to create long call chain
> > > > * fentry->extension->fentry->extension beyond
> > > > * reasonable stack size. Hence extending fentry is not
> > > > * allowed.
> > > > */
> > > > 
> > > > I changed fexit_bpf2bpf.c test just to do a quick check
> > > > and it seems to work:
> > > 
> > > Hi Jiri this is exactly what I’m trying, however when you do this
> > > where the
> > > first argument is a pointer to some context data which you are
> > > accessing
> > > it’s failing in the verifier.
> > > This is a link to the original email, which has a test patch
> > > attached that
> > > will show the failure when trying to load/attach the fentry function
> > > and
> > > access the context:
> > > 
> > > https://lore.kernel.org/bpf/159162546868.10791.12432342618156330247.stgit@ebuild/
> > 
> > ok, I tried to trace ext program with __sk_buff argument and I can see
> > the issue as well.. can't acess the skb argument
> > 
> > patch below fixes it for me, I can access the skb pointer and its data
> > via probe read, like:
> > 
> > SEC("fexit/new_get_skb_ifindex")
> > int BPF_PROG(fexit_new_get_skb_ifindex, int val, struct __sk_buff *skb,
> > int var, int ret)
> > {
> > __u32 data;
> > int err;
> > 
> > bpf_printk("EXIT skb %p", skb);
> > bpf_probe_read_kernel(&data, sizeof(data), &skb->data);
> > bpf_printk("EXIT ret %d, data %p", err, data);
> > return 0;
> > }
> > 
> > I think it should fix the xdp_md acess as well
> 
> Excellent patch ;) It works with xdp_md as well, and even better it does not
> require the bpf_probe_read_kernel(), so the test_xdp_bpf2bpf.c code just
> works.

great ;-) will check on xdp_md

> 
> Are you planning to send the patch upstream?

yep, I'll add some test for that and send it

thanks,
jirka

KASAN: vmalloc-out-of-bounds Read in get_counters

2020-07-29 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:68845a55 Merge branch 'akpm' into master (patches from And..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1366896490
kernel config:  https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
dashboard link: https://syzkaller.appspot.com/bug?extid=a450cb4aa95912e62487
compiler:   gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a450cb4aa95912e62...@syzkaller.appspotmail.com

==
BUG: KASAN: vmalloc-out-of-bounds in get_counters+0x593/0x610 
net/ipv6/netfilter/ip6_tables.c:780
Read of size 8 at addr c9000528b048 by task syz-executor.1/6968

CPU: 1 PID: 6968 Comm: syz-executor.1 Not tainted 5.8.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 get_counters+0x593/0x610 net/ipv6/netfilter/ip6_tables.c:780
 do_ip6t_get_ctl+0x516/0x910 net/ipv6/netfilter/ip6_tables.c:821
 nf_sockopt net/netfilter/nf_sockopt.c:104 [inline]
 nf_getsockopt+0x72/0xd0 net/netfilter/nf_sockopt.c:122
 ipv6_getsockopt+0x1bf/0x270 net/ipv6/ipv6_sockglue.c:1468
 tcp_getsockopt+0x86/0xd0 net/ipv4/tcp.c:3893
 __sys_getsockopt+0x14b/0x2e0 net/socket.c:2172
 __do_sys_getsockopt net/socket.c:2187 [inline]
 __se_sys_getsockopt net/socket.c:2184 [inline]
 __x64_sys_getsockopt+0xba/0x150 net/socket.c:2184
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45ee7a
Code: Bad RIP value.
RSP: 002b:00c9f618 EFLAGS: 0212 ORIG_RAX: 0037
RAX: ffda RBX: 00c9f640 RCX: 0045ee7a
RDX: 0041 RSI: 0029 RDI: 0003
RBP: 00744ca0 R08: 00c9f63c R09: 4000
R10: 00c9f740 R11: 0212 R12: 0003
R13:  R14: 0029 R15: 007445e0


Memory state around the buggy address:
 c9000528af00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
 c9000528af80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
>c9000528b000: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 f9
  ^
 c9000528b080: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
 c9000528b100: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
==


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Re: [PATCH net] devlink: ignore -EOPNOTSUPP errors on dumpit

2020-07-29 Thread Jiri Pirko

Wed, Jul 29, 2020 at 01:15:07AM CEST, k...@kernel.org wrote:
>Number of .dumpit functions try to ignore -EOPNOTSUPP errors.
>Recent change missed that, and started reporting all errors
>but -EMSGSIZE back from dumps. This leads to situation like
>this:
>
>$ devlink dev info
>devlink answers: Operation not supported
>
>Dump should not report an error just because the last device
>to be queried could not provide an answer.
>
>To fix this and avoid similar confusion make sure we clear
>err properly, and not leave it set to an error if we don't
>terminate the iteration.
>
>Fixes: c62c2cfb801b ("net: devlink: don't ignore errors during dumpit")
>Signed-off-by: Jakub Kicinski 

Yeah, that makes perfect sense. Thanks for the fix Kuba!

Reviewed-by: Jiri Pirko

RE: [PATCH 4/4] net: improve the user pointer check in init_user_sockptr

2020-07-29 Thread David Laight

From: Christoph Hellwig
> Sent: 28 July 2020 17:39
> 
> Make sure not just the pointer itself but the whole range lies in
> the user address space.  For that pass the length and then use
> the access_ok helper to do the check.

Now that the address is never changed it is enough to check the
base address (although this would be slightly safer if sockaddr_t
were 'const').
This is especially true given some code paths ignore the length
(so the length must be checked later - which it will be).

Isn't TASK_SIZE the wrong 'constant'.
Looks pretty 'heavy' on x86-64.
You just need something between valid user and valid kernel addresses.
So (1ull << 63) is fine on x86-64 (and probably all 64bit with
non-overlapping user/kernel addresses).
For i386 you need (3ul << 30) - probably also common.

David

> 
> Fixes: 6d04fe15f78a ("net: optimize the sockptr_t for unified kernel/user 
> address spaces")
> Reported-by: David Laight 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/sockptr.h | 18 ++
>  net/ipv4/bpfilter/sockopt.c |  2 +-
>  net/socket.c|  2 +-
>  3 files changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h
> index 9e6c81d474cba8..96840def9d69cc 100644
> --- a/include/linux/sockptr.h
> +++ b/include/linux/sockptr.h
> @@ -27,14 +27,6 @@ static inline sockptr_t KERNEL_SOCKPTR(void *p)
>  {
>   return (sockptr_t) { .kernel = p };
>  }
> -
> -static inline int __must_check init_user_sockptr(sockptr_t *sp, void __user 
> *p)
> -{
> - if ((unsigned long)p >= TASK_SIZE)
> - return -EFAULT;
> - sp->user = p;
> - return 0;
> -}
>  #else /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */
>  typedef struct {
>   union {
> @@ -53,14 +45,16 @@ static inline sockptr_t KERNEL_SOCKPTR(void *p)
>  {
>   return (sockptr_t) { .kernel = p, .is_kernel = true };
>  }
> +#endif /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */
> 
> -static inline int __must_check init_user_sockptr(sockptr_t *sp, void __user 
> *p)
> +static inline int __must_check init_user_sockptr(sockptr_t *sp, void __user 
> *p,
> + size_t size)
>  {
> - sp->user = p;
> - sp->is_kernel = false;
> + if (!access_ok(p, size))
> + return -EFAULT;
> + *sp = (sockptr_t) { .user = p };
>   return 0;
>  }
> -#endif /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */
> 
>  static inline bool sockptr_is_null(sockptr_t sockptr)
>  {
> diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c
> index 94f18d2352d007..545b2640f0194d 100644
> --- a/net/ipv4/bpfilter/sockopt.c
> +++ b/net/ipv4/bpfilter/sockopt.c
> @@ -65,7 +65,7 @@ int bpfilter_ip_get_sockopt(struct sock *sk, int optname,
> 
>   if (get_user(len, optlen))
>   return -EFAULT;
> - err = init_user_sockptr(&optval, user_optval);
> + err = init_user_sockptr(&optval, user_optval, len);
>   if (err)
>   return err;
>   return bpfilter_mbox_request(sk, optname, optval, len, false);
> diff --git a/net/socket.c b/net/socket.c
> index 94ca4547cd7c53..aff52e81653ce3 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2105,7 +2105,7 @@ int __sys_setsockopt(int fd, int level, int optname, 
> char __user *user_optval,
>   if (optlen < 0)
>   return -EINVAL;
> 
> - err = init_user_sockptr(&optval, user_optval);
> + err = init_user_sockptr(&optval, user_optval, optlen);
>   if (err)
>   return err;
> 
> --
> 2.27.0

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

[PATCH net] vxlan: Ensure FDB dump is performed under RCU

2020-07-29 Thread Ido Schimmel

From: Ido Schimmel 

The commit cited below removed the RCU read-side critical section from
rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked
without RCU protection.

This results in the following warning [1] in the VXLAN driver, which
relied on the callback being invoked from an RCU read-side critical
section.

Fix this by calling rcu_read_lock() in the VXLAN driver, as already done
in the bridge driver.

[1]
WARNING: suspicious RCU usage
5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted
-
drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by bridge/166:
 #0: 85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090

stack backtrace:
CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 
#29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 
04/01/2014
Call Trace:
 dump_stack+0x100/0x184
 lockdep_rcu_suspicious+0x153/0x15d
 vxlan_fdb_dump+0x51e/0x6d0
 rtnl_fdb_dump+0x4dc/0xad0
 netlink_dump+0x540/0x1090
 __netlink_dump_start+0x695/0x950
 rtnetlink_rcv_msg+0x802/0xbd0
 netlink_rcv_skb+0x17a/0x480
 rtnetlink_rcv+0x22/0x30
 netlink_unicast+0x5ae/0x890
 netlink_sendmsg+0x98a/0xf40
 __sys_sendto+0x279/0x3b0
 __x64_sys_sendto+0xe6/0x1a0
 do_syscall_64+0x54/0xa0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe14fa2ade0
Code: Bad RIP value.
RSP: 002b:7fff75bb5b88 EFLAGS: 0246 ORIG_RAX: 002c
RAX: ffda RBX: 5614b1ba0020 RCX: 7fe14fa2ade0
RDX: 011c RSI: 7fff75bb5b90 RDI: 0003
RBP: 7fff75bb5b90 R08:  R09: 
R10:  R11: 0246 R12: 5614b1b89160
R13:  R14:  R15: 

Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl")
Signed-off-by: Ido Schimmel 
Reviewed-by: Jiri Pirko 
---
 drivers/net/vxlan.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 89d85dcb200e..5efe1e28f270 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1376,6 +1376,7 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
for (h = 0; h < FDB_HASH_SIZE; ++h) {
struct vxlan_fdb *f;
 
+   rcu_read_lock();
hlist_for_each_entry_rcu(f, &vxlan->fdb_head[h], hlist) {
struct vxlan_rdst *rd;
 
@@ -1387,8 +1388,10 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
 cb->nlh->nlmsg_seq,
 RTM_NEWNEIGH,
 NLM_F_MULTI, NULL);
-   if (err < 0)
+   if (err < 0) {
+   rcu_read_unlock();
goto out;
+   }
 skip_nh:
*idx += 1;
continue;
@@ -1403,12 +1406,15 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
 cb->nlh->nlmsg_seq,
 RTM_NEWNEIGH,
 NLM_F_MULTI, rd);
-   if (err < 0)
+   if (err < 0) {
+   rcu_read_unlock();
goto out;
+   }
 skip:
*idx += 1;
}
}
+   rcu_read_unlock();
}
 out:
return err;
-- 
2.26.2

[PATCH net] ipv4: Silence suspicious RCU usage warning

2020-07-29 Thread Ido Schimmel

From: Ido Schimmel 

fib_trie_unmerge() is called with RTNL held, but not from an RCU
read-side critical section. This leads to the following warning [1] when
the FIB alias list in a leaf is traversed with
hlist_for_each_entry_rcu().

Since the function is always called with RTNL held and since
modification of the list is protected by RTNL, simply use
hlist_for_each_entry() and silence the warning.

[1]
WARNING: suspicious RCU usage
5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Not tainted
-
net/ipv4/fib_trie.c:1867 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/164:
 #0: 85a27850 (rtnl_mutex){+.+.}-{3:3}, at: 
rtnetlink_rcv_msg+0x49a/0xbd0

stack backtrace:
CPU: 0 PID: 164 Comm: ip Not tainted 5.8.0-rc4-custom-01520-gc1f937f3f83b #30
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 
04/01/2014
Call Trace:
 dump_stack+0x100/0x184
 lockdep_rcu_suspicious+0x153/0x15d
 fib_trie_unmerge+0x608/0xdb0
 fib_unmerge+0x44/0x360
 fib4_rule_configure+0xc8/0xad0
 fib_nl_newrule+0x37a/0x1dd0
 rtnetlink_rcv_msg+0x4f7/0xbd0
 netlink_rcv_skb+0x17a/0x480
 rtnetlink_rcv+0x22/0x30
 netlink_unicast+0x5ae/0x890
 netlink_sendmsg+0x98a/0xf40
 sys_sendmsg+0x879/0xa00
 ___sys_sendmsg+0x122/0x190
 __sys_sendmsg+0x103/0x1d0
 __x64_sys_sendmsg+0x7d/0xb0
 do_syscall_64+0x54/0xa0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc80a234e97
Code: Bad RIP value.
RSP: 002b:7ffef8b66798 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX:  RCX: 7fc80a234e97
RDX:  RSI: 7ffef8b66800 RDI: 0003
RBP: 5f141b1c R08: 0001 R09: 
R10: 7fc80a2a8ac0 R11: 0246 R12: 0001
R13:  R14: 7ffef8b67008 R15: 556fccb10020

Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Ido Schimmel 
Reviewed-by: Jiri Pirko 
---
 net/ipv4/fib_trie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 248f1c1959a6..3c65f71d0e82 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1864,7 +1864,7 @@ struct fib_table *fib_trie_unmerge(struct fib_table 
*oldtb)
while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
struct key_vector *local_l = NULL, *local_tp;
 
-   hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
+   hlist_for_each_entry(fa, &l->leaf, fa_list) {
struct fib_alias *new_fa;
 
if (local_tb->tb_id != fa->tb_id)
-- 
2.26.2

Re: [PATCH] ethernet: fix potential memory leak in gemini_ethernet_port_probe()

2020-07-29 Thread Markus Elfring

> If some processes in gemini_ethernet_port_probe() fail,
> free_netdev(dev) needs to be called to avoid a memory leak.

Would you like to use an imperative wording for this change description?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=6ba1b005ffc388c2aeaddae20da29e4810dea298#n151


…
> +++ b/drivers/net/ethernet/cortina/gemini.c
> @@ -2501,8 +2513,10 @@  static int gemini_ethernet_port_probe(struct 
> platform_device *pdev)
>   IRQF_SHARED,
>   port_names[port->id],
>   port);
> - if (ret)
> + if (ret) {
> + free_netdev(netdev);
>   return ret;
> + }
>
>   ret = register_netdev(netdev);
…

I suggest to add a jump target for the desired exception handling
in this function implementation.

if (ret)
-   return ret;
+   goto free_netdev;

Regards,
Markus

Re: [net-next PATCH v7 1/6] Documentation: ACPI: DSD: Document MDIO PHY

2020-07-29 Thread Jon Nettleton

On Wed, Jul 29, 2020 at 4:53 AM Jeremy Linton  wrote:
>
> Hi,
>
> On 7/28/20 7:39 PM, Florian Fainelli wrote:
> > On 7/28/2020 3:30 PM, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 7/28/20 3:06 AM, Dan Callaghan wrote:
> >>> Excerpts from Andrew Lunn's message of 2020-07-24 21:14:36 +02:00:
>  Now i could be wrong, but are Ethernet switches something you expect
>  to see on ACPI/SBSA platforms? Or is this a legitimate use of the
>  escape hatch?
> >>>
> >>> As an extra data point: right now I am working on an x86 embedded
> >>> appliance (ACPI not Device Tree) with 3x integrated Marvell switches.
> >>> I have been watching this patch series with great interest, because
> >>> right now there is no way for me to configure a complex switch topology
> >>> in DSA without Device Tree.
> >>
> >> DSA though, the switch is hung off a normal MAC/MDIO, right? (ignoring
> >> whether that NIC/MAC is actually hug off PCIe for the moment).
> >
> > There is no specific bus, we have memory mapped, MDIO, SPI, I2C swiches
> > all supported within the driver framework right now.
> >
> >>
> >> It just has a bunch of phy devices strung out on that single MAC/MDIO.
> >
> > It has a number of built-in PHYs that typically appear on a MDIO bus,
> > whether that bus is the switch's internal MDIO bus, or another MDIO bus
> > (which could be provided with just two GPIOs) depends on how the switch
> > is connected to its management host.
> >
> > When the switch is interfaced via MDIO the switch also typically has a
> > MDIO interface called the pseudo-PHY which is how you can actually tap
> > into the control interface of the switch, as opposed to reading its
> > internal PHYs from the MDIO bus.
> >
> >> So in ACPI land it would still have a relationship similar to the one
> >> Andrew pointed out with his DT example where the eth0->mdio->phy are all
> >> contained in their physical parent. The phy in that case associated with
> >> the parent adapter would be the first direct decedent of the mdio, the
> >> switch itself could then be represented with another device, with a
> >> further string of device/phys representing the devices. (I dislike
> >> drawing acsii art, but if this isn't clear I will, its also worthwhile
> >> to look at the dpaa2 docs for how the mac/phys work on this device for
> >> contrast.).
> >
> > The eth0->mdio->phy relationship you describe is the simple case that
> > you are well aware of which is say what we have on the Raspberry Pi 4
> > with GENET and the external Broadcom PHY.
> >
> > For an Ethernet switch connected to an Ethernet MAC, we have 4 different
> > types of objects:
> >
> > - the Ethernet MAC which sits on its specific bus
> >
> > - the Ethernet switch which also sits on its specific bus
> >
> > - the built-in PHYs of the Ethernet switch which sit on whatever
> > bus/interface the switch provides to make them accessible
> >
> > - the specific bus controller that provides access to the Ethernet switch
> >
> > and this is a simplification that does not take into account Physical
> > Coding Sublayer devices, pure MDIO devices (with no foot in the Ethernet
> > land such as PCIe, USB3 or SATA PHYs), SFP, SFF and other pluggable modules.
>
> Which is why I've stayed away from much of the switch discussion. There
> are a lot of edge cases to fall into, because for whatever reason
> networking seems to be unique in all this non-enumerable customization
> vs many other areas of the system. Storage, being an example i'm more
> familiar with which has very similar problems yet, somehow has managed
> to avoid much of this, despite having run on fabrics significantly more
> complex than basic ethernet (including running on a wide range of hot
> pluggable GBIC/SFP/QSFP/etc media layers).
>
> ACPI's "problem" here is that its strongly influenced by the "Plug and
> Play" revolution of the 1990's where the industry went from having
> humans describing hardware using machine readable languages, to hardware
> which was enumerable using standard protocols. ACPI's device
> descriptions are there as a crutch for the remaining non plug an play
> hardware in the system.
>
> So at a basic level, if your describing hardware in ACPI rather than
> abstracting it, that is a problem.
>
This is also my first run with ACPI and this discussion needs to be
brought back to the powers that be regarding sorting this out.  This
is where I see it from an Armada and Layerscape perspective.  This
isn't a silver bullet fix but the small things I think that need to be
done to move this forward.

>From Microsoft's documentation.

Device dependencies

Typically, there are hardware dependencies between devices on a
particular platform. Windows requires that all such dependencies be
described so that it can ensure that all devices function correctly as
things change dynamically in the system (device power is removed,
drivers are stopped and started, and so on). In ACPI, dependencies
between devices are described in the following ways:

Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point

2020-07-29 Thread Jakub Sitnicki

Hi Daniel,

On Tue, Jul 28, 2020 at 10:47 PM CEST, Daniel Borkmann wrote:

[...]

> Jakub, I'm actually seeing a slightly different one on my test machine with 
> sk_lookup:
>
> # ./test_progs -t sk_lookup
> #14 cgroup_skb_sk_lookup:OK
> #73/1 query lookup prog:OK
> #73/2 TCP IPv4 redir port:OK
> #73/3 TCP IPv4 redir addr:OK
> #73/4 TCP IPv4 redir with reuseport:OK
> #73/5 TCP IPv4 redir skip reuseport:OK
> #73/6 TCP IPv6 redir port:OK
> #73/7 TCP IPv6 redir addr:OK
> #73/8 TCP IPv4->IPv6 redir port:OK
> #73/9 TCP IPv6 redir with reuseport:OK
> #73/10 TCP IPv6 redir skip reuseport:OK
> #73/11 UDP IPv4 redir port:OK
> #73/12 UDP IPv4 redir addr:OK
> #73/13 UDP IPv4 redir with reuseport:OK
> attach_lookup_prog:PASS:open 0 nsec
> attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> update_lookup_map:PASS:bpf_map__fd 0 nsec
> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> update_lookup_map:PASS:bpf_map__fd 0 nsec
> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> run_lookup_prog:PASS:getsockname 0 nsec
> run_lookup_prog:PASS:connect 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_client:PASS:make_client 0 nsec
> send_byte:PASS:send_byte 0 nsec
> udp_recv_send:FAIL:recvmsg failed
> (/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: 
> errno: Resource temporarily unavailable) failed to receive
> #73/14 UDP IPv4 redir and reuseport with conns:FAIL
> #73/15 UDP IPv4 redir skip reuseport:OK
> #73/16 UDP IPv6 redir port:OK
> #73/17 UDP IPv6 redir addr:OK
> #73/18 UDP IPv4->IPv6 redir port:OK
> #73/19 UDP IPv6 redir and reuseport:OK
> attach_lookup_prog:PASS:open 0 nsec
> attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> update_lookup_map:PASS:bpf_map__fd 0 nsec
> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> update_lookup_map:PASS:bpf_map__fd 0 nsec
> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> run_lookup_prog:PASS:getsockname 0 nsec
> run_lookup_prog:PASS:connect 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_client:PASS:make_client 0 nsec
> send_byte:PASS:send_byte 0 nsec
> udp_recv_send:FAIL:recvmsg failed
> (/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: 
> errno: Resource temporarily unavailable) failed to receive
> #73/20 UDP IPv6 redir and reuseport with conns:FAIL
> #73/21 UDP IPv6 redir skip reuseport:OK
> #73/22 TCP IPv4 drop on lookup:OK
> #73/23 TCP IPv6 drop on lookup:OK
> #73/24 UDP IPv4 drop o

[PATCH net] selftests/bpf: add xdpdrv mode for test_xdp_redirect

2020-07-29 Thread Hangbin Liu

This patch add xdpdrv mode for test_xdp_redirect.sh since veth has
support native mode. After update here is the test result:

]# ./test_xdp_redirect.sh
selftests: test_xdp_redirect xdpgeneric [PASS]
selftests: test_xdp_redirect xdpdrv [PASS]

Signed-off-by: Hangbin Liu 
---
 .../selftests/bpf/test_xdp_redirect.sh| 84 ---
 1 file changed, 52 insertions(+), 32 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_xdp_redirect.sh 
b/tools/testing/selftests/bpf/test_xdp_redirect.sh
index c4b17e08d431..dd80f0c84afb 100755
--- a/tools/testing/selftests/bpf/test_xdp_redirect.sh
+++ b/tools/testing/selftests/bpf/test_xdp_redirect.sh
@@ -10,52 +10,72 @@
 # | xdp forwarding |
 # --
 
-cleanup()
+ret=0
+
+setup()
 {
-   if [ "$?" = "0" ]; then
-   echo "selftests: test_xdp_redirect [PASS]";
-   else
-   echo "selftests: test_xdp_redirect [FAILED]";
-   fi
 
-   set +e
+   local xdpmode=$1
+
+   ip netns add ns1
+   ip netns add ns2
+
+   ip link add veth1 index 111 type veth peer name veth11 netns ns1
+   ip link add veth2 index 222 type veth peer name veth22 netns ns2
+
+   ip link set veth1 up
+   ip link set veth2 up
+   ip -n ns1 link set dev veth11 up
+   ip -n ns2 link set dev veth22 up
+
+   ip -n ns1 addr add 10.1.1.11/24 dev veth11
+   ip -n ns2 addr add 10.1.1.22/24 dev veth22
+}
+
+cleanup()
+{
ip link del veth1 2> /dev/null
ip link del veth2 2> /dev/null
ip netns del ns1 2> /dev/null
ip netns del ns2 2> /dev/null
 }
 
-ip link set dev lo xdpgeneric off 2>/dev/null > /dev/null
-if [ $? -ne 0 ];then
-   echo "selftests: [SKIP] Could not run test without the ip xdpgeneric 
support"
-   exit 0
-fi
-set -e
-
-ip netns add ns1
-ip netns add ns2
+test_xdp_redirect()
+{
+   local xdpmode=$1
 
-trap cleanup 0 2 3 6 9
+   setup
 
-ip link add veth1 index 111 type veth peer name veth11
-ip link add veth2 index 222 type veth peer name veth22
+   ip link set dev veth1 $xdpmode off &> /dev/null
+   if [ $? -ne 0 ];then
+   echo "selftests: test_xdp_redirect $xdpmode [SKIP]"
+   return 0
+   fi
 
-ip link set veth11 netns ns1
-ip link set veth22 netns ns2
+   ip -n ns1 link set veth11 $xdpmode obj xdp_dummy.o sec xdp_dummy &> 
/dev/null
+   ip -n ns2 link set veth22 $xdpmode obj xdp_dummy.o sec xdp_dummy &> 
/dev/null
+   ip link set dev veth1 $xdpmode obj test_xdp_redirect.o sec 
redirect_to_222 &> /dev/null
+   ip link set dev veth2 $xdpmode obj test_xdp_redirect.o sec 
redirect_to_111 &> /dev/null
 
-ip link set veth1 up
-ip link set veth2 up
+   ip netns exec ns1 ping -c 1 10.1.1.22 &> /dev/null
+   local ret1=$?
+   ip netns exec ns2 ping -c 1 10.1.1.11 &> /dev/null
+   local ret2=$?
 
-ip netns exec ns1 ip addr add 10.1.1.11/24 dev veth11
-ip netns exec ns2 ip addr add 10.1.1.22/24 dev veth22
+   if [ $ret1 -eq 0 -a $ret2 -eq 0 ]; then
+   echo "selftests: test_xdp_redirect $xdpmode [PASS]";
+   else
+   ret=1
+   echo "selftests: test_xdp_redirect $xdpmode [FAILED]";
+   fi
 
-ip netns exec ns1 ip link set dev veth11 up
-ip netns exec ns2 ip link set dev veth22 up
+   cleanup
+}
 
-ip link set dev veth1 xdpgeneric obj test_xdp_redirect.o sec redirect_to_222
-ip link set dev veth2 xdpgeneric obj test_xdp_redirect.o sec redirect_to_111
+set -e
+trap cleanup 2 3 6 9
 
-ip netns exec ns1 ping -c 1 10.1.1.22
-ip netns exec ns2 ping -c 1 10.1.1.11
+test_xdp_redirect xdpgeneric
+test_xdp_redirect xdpdrv
 
-exit 0
+exit $ret
-- 
2.25.4

Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point

2020-07-29 Thread Jakub Sitnicki

Hi Andrii,

On Tue, Jul 28, 2020 at 10:13 PM CEST, Andrii Nakryiko wrote:

[...]

> We are getting this failure in Travis CI when syncing libbpf [0]:
>
> ```
> ip: either "local" is duplicate, or "nodad" is garbage
>
> switch_netns:PASS:unshare 0 nsec
>
> switch_netns:FAIL:system failed
>
> (/home/travis/build/libbpf/libbpf/travis-ci/vmtest/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:1310:
> errno: No such file or directory) system(ip -6 addr add dev lo
> fd00::1/128 nodad)
>
> #73 sk_lookup:FAIL
> ```
>
>
> Can you please help fix it so that it works in a Travis CI environment
> as well? For now I disabled sk_lookup selftests altogether. You can
> try to repro it locally by forking https://github.com/libbpf/libbpf
> and enabling Travis CI for your account. See [1] for the PR that
> disabled sk_lookup.
>
>
>   [0] https://travis-ci.com/github/libbpf/libbpf/jobs/365878309#L5408
>   [1] 
> https://github.com/libbpf/libbpf/pull/182/commits/78368c2eaed8b0681381fc34d6016c9b5a443be8
>
>
> Thanks for your help!

"nodad is garbage" message smells like old iproute2. I will take a look.

Thanks for letting me know.

Re: general protection fault in vsock_poll

2020-07-29 Thread syzbot

syzbot has bisected this issue to:

commit 408624af4c89989117bb2c6517bd50b7708a2fcd
Author: Stefano Garzarella 
Date:   Tue Dec 10 10:43:06 2019 +

vsock: use local transport when it is loaded

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17e6489b10
start commit:   92ed3019 Linux 5.8-rc7
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=1416489b10
console output: https://syzkaller.appspot.com/x/log.txt?x=1016489b10
kernel config:  https://syzkaller.appspot.com/x/.config?x=84f076779e989e69
dashboard link: https://syzkaller.appspot.com/bug?extid=a61bac2fcc1a7c6623fe
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15930b6490

Reported-by: syzbot+a61bac2fcc1a7c662...@syzkaller.appspotmail.com
Fixes: 408624af4c89 ("vsock: use local transport when it is loaded")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

RE: [PATCH 2/4] net: make sockptr_is_null strict aliasing safe

2020-07-29 Thread Jan Engelhardt



On Wednesday 2020-07-29 10:04, David Laight wrote:
>From: Christoph Hellwig 
>> Sent: 28 July 2020 17:39
>> 
>> While the kernel in general is not strict aliasing safe we can trivially
>> do that in sockptr_is_null without affecting code generation, so always
>> check the actually assigned union member.
>
>Even with 'strict aliasing' gcc (at least) guarantees that
>the members of a union alias each other.
>It is about the only way so safely interpret a float as an int.

The only?

  float given;
  int i;
  memcpy(&i, &given, sizeof(i));
  BUILD_BUG_ON(sizeof(i) > sizeof(given));

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-29 Thread Jason Wang




On 2020/7/28 下午5:04, Eli Cohen wrote:

On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
  
+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)

+{
+   struct vhost_virtqueue *vq = &v->vqs[qid];
+   const struct vdpa_config_ops *ops = v->vdpa->config;
+   struct vdpa_device *vdpa = v->vdpa;
+   int ret, irq;
+
+   spin_lock(&vq->call_ctx.ctx_lock);
+   irq = ops->get_vq_irq(vdpa, qid);
+   if (!vq->call_ctx.ctx || irq == -EINVAL) {
+   spin_unlock(&vq->call_ctx.ctx_lock);
+   return;
+   }
+

If I understand correctly, this will cause these IRQs to be forwarded
directly to the VCPU, e.g. will be handled by the guest/qemu.



Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.



Does this mean that the host will not handle this interrupt? How does it
work in case on level triggered interrupts?



There's no guarantee that the KVM arch code can make sure the irq bypass 
work for any type of irq. So if they the irq will still need to be 
handled by host first. This means we should keep the host interrupt 
handler as a slowpath (fallback).





In the case of ConnectX, I need to execute some code to acknowledge the
interrupt.



This turns out to be hard for irq bypassing to work. Is it because the 
irq is shared or what kind of ack you need to do?


Thanks




Can you explain how this should be done?

[PATCH net 0/6] mlxsw fixes

2020-07-29 Thread Ido Schimmel

From: Ido Schimmel 

This patch set contains various fixes for mlxsw.

Patches #1-#2 fix two trap related issues introduced in previous cycle.

Patches #3-#5 fix rare use-after-frees discovered by syzkaller. After
over a week of fuzzing with the fixes, the bugs did not reproduce.

Patch #6 from Amit fixes an issue in the ethtool selftest that was
recently discovered after running the test on a new platform that
supports only 1Gbps and 10Gbps speeds.

Amit Cohen (1):
  selftests: ethtool: Fix test when only two speeds are supported

Ido Schimmel (5):
  mlxsw: spectrum_router: Allow programming link-local host routes
  mlxsw: spectrum: Use different trap group for externally routed
packets
  mlxsw: core: Increase scope of RCU read-side critical section
  mlxsw: core: Free EMAD transactions using kfree_rcu()
  mlxsw: spectrum_router: Fix use-after-free in router init / de-init

 .../networking/devlink/devlink-trap.rst   |  4 ++
 drivers/net/ethernet/mellanox/mlxsw/core.c|  8 ++-
 drivers/net/ethernet/mellanox/mlxsw/reg.h |  1 +
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 59 ---
 .../ethernet/mellanox/mlxsw/spectrum_trap.c   | 14 -
 include/net/devlink.h |  3 +
 net/core/devlink.c|  1 +
 .../selftests/net/forwarding/ethtool.sh   |  2 -
 8 files changed, 51 insertions(+), 41 deletions(-)

-- 
2.26.2

[PATCH net 1/6] mlxsw: spectrum_router: Allow programming link-local host routes

2020-07-29 Thread Ido Schimmel

From: Ido Schimmel 

Cited commit added the ability to program link-local prefix routes to
the ASIC so that relevant packets are routed and trapped correctly.

However, host routes were not included in the change and thus not
programmed to the ASIC. This can result in packets being trapped via an
external route trap instead of a local route trap as in IPv4.

Fix this by programming all the link-local routes to the ASIC.

Fixes: 10d3757fcb07 ("mlxsw: spectrum_router: Allow programming link-local 
prefix routes")
Reported-by: Alex Veber 
Tested-by: Alex Veber 
Signed-off-by: Ido Schimmel 
Reviewed-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 019ed503aadf..bd4803074776 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -5001,15 +5001,6 @@ static void mlxsw_sp_router_fib4_del(struct mlxsw_sp 
*mlxsw_sp,
 
 static bool mlxsw_sp_fib6_rt_should_ignore(const struct fib6_info *rt)
 {
-   /* Packets with link-local destination IP arriving to the router
-* are trapped to the CPU, so no need to program specific routes
-* for them. Only allow prefix routes (usually one fe80::/64) so
-* that packets are trapped for the right reason.
-*/
-   if ((ipv6_addr_type(&rt->fib6_dst.addr) & IPV6_ADDR_LINKLOCAL) &&
-   (rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST)))
-   return true;
-
/* Multicast routes aren't supported, so ignore them. Neighbour
 * Discovery packets are specifically trapped.
 */
-- 
2.26.2

[PATCH net 5/6] mlxsw: spectrum_router: Fix use-after-free in router init / de-init

2020-07-29 Thread Ido Schimmel

From: Ido Schimmel 

Several notifiers are registered as part of router initialization.
Since some of these notifiers are registered before the end of the
initialization, it is possible for them to access uninitialized or freed
memory when processing notifications [1].

Additionally, some of these notifiers queue work items on a workqueue.
If these work items are executed after the router was de-initialized,
they will access freed memory.

Fix both problems by moving the registration of the notifiers to the end
of the router initialization and flush the work queue after they are
unregistered.

[1]
BUG: KASAN: use-after-free in __mutex_lock_common kernel/locking/mutex.c:938 
[inline]
BUG: KASAN: use-after-free in __mutex_lock+0xeea/0x1340 
kernel/locking/mutex.c:1103
Read of size 8 at addr 888038c3a6e0 by task kworker/u4:1/61

CPU: 1 PID: 61 Comm: kworker/u4:1 Not tainted 5.8.0-rc2+ #36
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Workqueue: mlxsw_core_ordered mlxsw_sp_inet6addr_event_work
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xf6/0x16e lib/dump_stack.c:118
 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 __mutex_lock_common kernel/locking/mutex.c:938 [inline]
 __mutex_lock+0xeea/0x1340 kernel/locking/mutex.c:1103
 mlxsw_sp_inet6addr_event_work+0xb3/0x1b0 
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7123
 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
 kthread+0x355/0x470 kernel/kthread.c:291
 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293

Allocated by task 1298:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc mm/kasan/common.c:494 [inline]
 __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
 kmalloc include/linux/slab.h:555 [inline]
 kzalloc include/linux/slab.h:669 [inline]
 mlxsw_sp_router_init+0xb2/0x1d20 
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:8074
 mlxsw_sp_init+0xbd8/0x3ac0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2932
 __mlxsw_core_bus_device_register+0x657/0x10d0 
drivers/net/ethernet/mellanox/mlxsw/core.c:1375
 mlxsw_core_bus_device_register drivers/net/ethernet/mellanox/mlxsw/core.c:1436 
[inline]
 mlxsw_devlink_core_bus_device_reload_up+0xcd/0x150 
drivers/net/ethernet/mellanox/mlxsw/core.c:1133
 devlink_reload net/core/devlink.c:2959 [inline]
 devlink_reload+0x281/0x3b0 net/core/devlink.c:2944
 devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987
 genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:736 [inline]
 genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753
 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
 netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0x150/0x190 net/socket.c:672
 sys_sendmsg+0x6d8/0x840 net/socket.c:2363
 ___sys_sendmsg+0xff/0x170 net/socket.c:2417
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 1348:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 kasan_set_free_info mm/kasan/common.c:316 [inline]
 __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
 slab_free_hook mm/slub.c:1474 [inline]
 slab_free_freelist_hook mm/slub.c:1507 [inline]
 slab_free mm/slub.c:3072 [inline]
 kfree+0xe6/0x320 mm/slub.c:4063
 mlxsw_sp_fini+0x340/0x4e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3132
 mlxsw_core_bus_device_unregister+0x16c/0x6d0 
drivers/net/ethernet/mellanox/mlxsw/core.c:1474
 mlxsw_devlink_core_bus_device_reload_down+0x8e/0xc0 
drivers/net/ethernet/mellanox/mlxsw/core.c:1123
 devlink_reload+0xc6/0x3b0 net/core/devlink.c:2952
 devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987
 genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:736 [inline]
 genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753
 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
 netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0x150/0x190 net/socket.c:672
 sys_sendmsg+0x6d8/0x840 net/socket.c:2363
 ___sys_sendmsg+0xff/0x170 net/socket.c:2417
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe

[PATCH net 4/6] mlxsw: core: Free EMAD transactions using kfree_rcu()

2020-07-29 Thread Ido Schimmel

From: Ido Schimmel 

The lifetime of EMAD transactions (i.e., 'struct mlxsw_reg_trans') is
managed using RCU. They are freed using kfree_rcu() once the transaction
ends.

However, in case the transaction failed it is freed immediately after being
removed from the active transactions list. This is problematic because it is
still possible for a different CPU to dereference the transaction from an RCU
read-side critical section while traversing the active transaction list in
mlxsw_emad_rx_listener_func(). In which case, a use-after-free is triggered
[1].

Fix this by freeing the transaction after a grace period by calling
kfree_rcu().

[1]
BUG: KASAN: use-after-free in mlxsw_emad_rx_listener_func+0x969/0xac0 
drivers/net/ethernet/mellanox/mlxsw/core.c:671
Read of size 8 at addr 88800b7964e8 by task syz-executor.2/2881

CPU: 0 PID: 2881 Comm: syz-executor.2 Not tainted 5.8.0-rc4+ #44
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xf6/0x16e lib/dump_stack.c:118
 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 mlxsw_emad_rx_listener_func+0x969/0xac0 
drivers/net/ethernet/mellanox/mlxsw/core.c:671
 mlxsw_core_skb_receive+0x571/0x700 
drivers/net/ethernet/mellanox/mlxsw/core.c:2061
 mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
 mlxsw_pci_cq_tasklet+0x12a6/0x2520 
drivers/net/ethernet/mellanox/mlxsw/pci.c:651
 tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
 __do_softirq+0x223/0x964 kernel/softirq.c:292
 asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
 do_softirq_own_stack+0x109/0x140 arch/x86/kernel/irq_64.c:77
 invoke_softirq kernel/softirq.c:387 [inline]
 __irq_exit_rcu kernel/softirq.c:417 [inline]
 irq_exit_rcu+0x16f/0x1a0 kernel/softirq.c:429
 sysvec_apic_timer_interrupt+0x4e/0xd0 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/irqflags.h:85 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 
[inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x3b/0x40 kernel/locking/spinlock.c:191
Code: e8 2a c3 f4 fc 48 89 ef e8 12 96 f5 fc f6 c7 02 75 11 53 9d e8 d6 db 11 
fd 65 ff 0d 1f 21 b3 56 5b 5d c3 e8 a7 d7 11 fd 53 9d  ed 0f 1f 00 55 48 89 
fd 65 ff 05 05 21 b3 56 ff 74 24 08 48 8d
RSP: 0018:8880446ffd80 EFLAGS: 0286
RAX: 0006 RBX: 0286 RCX: 0006
RDX:  RSI:  RDI: a94ecea9
RBP: 888012934408 R08:  R09: 
R10: 0001 R11: fbfff57be301 R12: 1110088dffc1
R13: 888037b817c0 R14: 88802442415a R15: 888024424000
 __do_sys_perf_event_open+0x1b5d/0x2bd0 kernel/events/core.c:11874
 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x473dbd
Code: Bad RIP value.
RSP: 002b:7f21e5e9cc28 EFLAGS: 0246 ORIG_RAX: 012a
RAX: ffda RBX: 0057bf00 RCX: 00473dbd
RDX:  RSI:  RDI: 2040
RBP: 0057bf00 R08:  R09: 
R10: 0003 R11: 0246 R12: 0057bf0c
R13: 7ffd0493503f R14: 004d0f46 R15: 7f21e5e9cd80

Allocated by task 871:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc mm/kasan/common.c:494 [inline]
 __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
 kmalloc include/linux/slab.h:555 [inline]
 kzalloc include/linux/slab.h:669 [inline]
 mlxsw_core_reg_access_emad+0x70/0x1410 
drivers/net/ethernet/mellanox/mlxsw/core.c:1812
 mlxsw_core_reg_access+0xeb/0x540 
drivers/net/ethernet/mellanox/mlxsw/core.c:1991
 mlxsw_sp_port_get_hw_xstats+0x335/0x7e0 
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1130
 update_stats_cache+0xf4/0x140 
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1173
 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
 kthread+0x355/0x470 kernel/kthread.c:291
 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293

Freed by task 871:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 kasan_set_free_info mm/kasan/common.c:316 [inline]
 __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
 slab_free_hook mm/slub.c:1474 [inline]
 slab_free_freelist_hook mm/slub.c:1507 [inline]
 slab_free mm/slub.c:3072 [inline]
 kfree+0xe6/0x320 mm/slub.c:4052
 mlxsw_core_reg_access_emad+0xd45/0x1410 
drivers/net/ethernet/mellanox/mlxsw/core.c

[PATCH net 6/6] selftests: ethtool: Fix test when only two speeds are supported

2020-07-29 Thread Ido Schimmel

From: Amit Cohen 

The test case check_highest_speed_is_chosen() configures $h1 to
advertise a subset of its supported speeds and checks that $h2 chooses
the highest speed from the subset.

To find the common advertised speeds between $h1 and $h2,
common_speeds_get() is called.

Currently, the first speed returned from common_speeds_get() is removed
claiming "h1 does not advertise this speed". The claim is wrong because
the function is called after $h1 already advertised a subset of speeds.

In case $h1 supports only two speeds, it will advertise a single speed
which will be later removed because of previously mentioned bug. This
results in the test needlessly failing. When more than two speeds are
supported this is not an issue because the first advertised speed
is the lowest one.

Fix this by not removing any speed from the list of commonly advertised
speeds.

Fixes: 64916b57c0b1 ("selftests: forwarding: Add speed and auto-negotiation 
test")
Reported-by: Danielle Ratson 
Signed-off-by: Amit Cohen 
Signed-off-by: Ido Schimmel 
---
 tools/testing/selftests/net/forwarding/ethtool.sh | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/ethtool.sh 
b/tools/testing/selftests/net/forwarding/ethtool.sh
index eb8e2a23bbb4..43a948feed26 100755
--- a/tools/testing/selftests/net/forwarding/ethtool.sh
+++ b/tools/testing/selftests/net/forwarding/ethtool.sh
@@ -252,8 +252,6 @@ check_highest_speed_is_chosen()
fi
 
local -a speeds_arr=($(common_speeds_get $h1 $h2 0 1))
-   # Remove the first speed, h1 does not advertise this speed.
-   unset speeds_arr[0]
 
max_speed=${speeds_arr[0]}
for current in ${speeds_arr[@]}; do
-- 
2.26.2

[PATCH net 2/6] mlxsw: spectrum: Use different trap group for externally routed packets

2020-07-29 Thread Ido Schimmel

From: Ido Schimmel 

Cited commit mistakenly removed the trap group for externally routed
packets (e.g., via the management interface) and grouped locally routed
and externally routed packet traps under the same group, thereby
subjecting them to the same policer.

This can result in problems, for example, when FRR is restarted and
suddenly all transient traffic is trapped to the CPU because of a
default route through the management interface. Locally routed packets
required to re-establish a BGP connection will never reach the CPU and
the routing tables will not be re-populated.

Fix this by using a different trap group for externally routed packets.

Fixes: 8110668ecd9a ("mlxsw: spectrum_trap: Register layer 3 control traps")
Reported-by: Alex Veber 
Tested-by: Alex Veber 
Signed-off-by: Ido Schimmel 
Reviewed-by: Jiri Pirko 
---
 Documentation/networking/devlink/devlink-trap.rst  |  4 
 drivers/net/ethernet/mellanox/mlxsw/reg.h  |  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_trap.c| 14 +++---
 include/net/devlink.h  |  3 +++
 net/core/devlink.c |  1 +
 5 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-trap.rst 
b/Documentation/networking/devlink/devlink-trap.rst
index 1e3f3ffee248..2014307fbe63 100644
--- a/Documentation/networking/devlink/devlink-trap.rst
+++ b/Documentation/networking/devlink/devlink-trap.rst
@@ -486,6 +486,10 @@ narrow. The description of these groups must be added to 
the following table:
  - Contains packet traps for packets that should be locally delivered after
routing, but do not match more specific packet traps (e.g.,
``ipv4_bgp``)
+   * - ``external_delivery``
+ - Contains packet traps for packets that should be routed through an
+   external interface (e.g., management interface) that does not belong to
+   the same device (e.g., switch ASIC) as the ingress interface
* - ``ipv6``
  - Contains packet traps for various IPv6 control packets (e.g., Router
Advertisements)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index fcb88d4271bf..8ac987c8c8bc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -5536,6 +5536,7 @@ enum mlxsw_reg_htgt_trap_group {
MLXSW_REG_HTGT_TRAP_GROUP_SP_MULTICAST,
MLXSW_REG_HTGT_TRAP_GROUP_SP_NEIGH_DISCOVERY,
MLXSW_REG_HTGT_TRAP_GROUP_SP_ROUTER_EXP,
+   MLXSW_REG_HTGT_TRAP_GROUP_SP_EXTERNAL_ROUTE,
MLXSW_REG_HTGT_TRAP_GROUP_SP_IP2ME,
MLXSW_REG_HTGT_TRAP_GROUP_SP_DHCP,
MLXSW_REG_HTGT_TRAP_GROUP_SP_EVENT,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
index 157a42c63066..1e38dfe7cf64 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
@@ -328,6 +328,9 @@ mlxsw_sp_trap_policer_items_arr[] = {
{
.policer = MLXSW_SP_TRAP_POLICER(18, 1024, 128),
},
+   {
+   .policer = MLXSW_SP_TRAP_POLICER(19, 1024, 512),
+   },
 };
 
 static const struct mlxsw_sp_trap_group_item mlxsw_sp_trap_group_items_arr[] = 
{
@@ -421,6 +424,11 @@ static const struct mlxsw_sp_trap_group_item 
mlxsw_sp_trap_group_items_arr[] = {
.hw_group_id = MLXSW_REG_HTGT_TRAP_GROUP_SP_IP2ME,
.priority = 2,
},
+   {
+   .group = DEVLINK_TRAP_GROUP_GENERIC(EXTERNAL_DELIVERY, 19),
+   .hw_group_id = MLXSW_REG_HTGT_TRAP_GROUP_SP_EXTERNAL_ROUTE,
+   .priority = 1,
+   },
{
.group = DEVLINK_TRAP_GROUP_GENERIC(IPV6, 15),
.hw_group_id = MLXSW_REG_HTGT_TRAP_GROUP_SP_IPV6,
@@ -882,11 +890,11 @@ static const struct mlxsw_sp_trap_item 
mlxsw_sp_trap_items_arr[] = {
},
},
{
-   .trap = MLXSW_SP_TRAP_CONTROL(EXTERNAL_ROUTE, LOCAL_DELIVERY,
+   .trap = MLXSW_SP_TRAP_CONTROL(EXTERNAL_ROUTE, EXTERNAL_DELIVERY,
  TRAP),
.listeners_arr = {
-   MLXSW_SP_RXL_MARK(RTR_INGRESS0, IP2ME, TRAP_TO_CPU,
- false),
+   MLXSW_SP_RXL_MARK(RTR_INGRESS0, EXTERNAL_ROUTE,
+ TRAP_TO_CPU, false),
},
},
{
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 1df6dfec26c2..95b0322a2a82 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -718,6 +718,7 @@ enum devlink_trap_group_generic_id {
DEVLINK_TRAP_GROUP_GENERIC_ID_PIM,
DEVLINK_TRAP_GROUP_GENERIC_ID_UC_LB,
DEVLINK_TRAP_GROUP_GENERIC_ID_LOCAL_DELIVERY,
+   DEVLINK_TRAP_GROUP_GENERIC_ID_EXTERNAL_DELIVERY,
DEVLI

[PATCH net 3/6] mlxsw: core: Increase scope of RCU read-side critical section

2020-07-29 Thread Ido Schimmel

From: Ido Schimmel 

The lifetime of the Rx listener item ('rxl_item') is managed using RCU,
but is dereferenced outside of RCU read-side critical section, which can
lead to a use-after-free.

Fix this by increasing the scope of the RCU read-side critical section.

Fixes: 93c1edb27f9e ("mlxsw: Introduce Mellanox switch driver core")
Signed-off-by: Ido Schimmel 
Reviewed-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index d6d6fe64887b..5e76a96a118e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -2051,11 +2051,13 @@ void mlxsw_core_skb_receive(struct mlxsw_core 
*mlxsw_core, struct sk_buff *skb,
break;
}
}
-   rcu_read_unlock();
-   if (!found)
+   if (!found) {
+   rcu_read_unlock();
goto drop;
+   }
 
rxl->func(skb, local_port, rxl_item->priv);
+   rcu_read_unlock();
return;
 
 drop:
-- 
2.26.2

Re: [PATCH v2 0/9] ptp: Add generic header parsing function

2020-07-29 Thread Kurt Kanzenbach

On Tue Jul 28 2020, David Miller wrote:
> It looks like some mlxsw et al. issues wrt. which header is expected at
> skb->data when certain helper functions are invoked need to be resolved
> still.

Yes, the length check needs to be sorted out first. So, please don't
merge this version.

Thanks,
Kurt


signature.asc
Description: PGP signature

Re: [net-next PATCH v7 1/6] Documentation: ACPI: DSD: Document MDIO PHY

2020-07-29 Thread Calvin Johnson

On Wed, Jul 29, 2020 at 10:43:34AM +0200, Jon Nettleton wrote:
> On Wed, Jul 29, 2020 at 4:53 AM Jeremy Linton  wrote:
> >
> > Hi,
> >
> > On 7/28/20 7:39 PM, Florian Fainelli wrote:
> > > On 7/28/2020 3:30 PM, Jeremy Linton wrote:
> > >> Hi,
> > >>
> > >> On 7/28/20 3:06 AM, Dan Callaghan wrote:
> > >>> Excerpts from Andrew Lunn's message of 2020-07-24 21:14:36 +02:00:
> >  Now i could be wrong, but are Ethernet switches something you expect
> >  to see on ACPI/SBSA platforms? Or is this a legitimate use of the
> >  escape hatch?
> > >>>
> > >>> As an extra data point: right now I am working on an x86 embedded
> > >>> appliance (ACPI not Device Tree) with 3x integrated Marvell switches.
> > >>> I have been watching this patch series with great interest, because
> > >>> right now there is no way for me to configure a complex switch topology
> > >>> in DSA without Device Tree.
> > >>
> > >> DSA though, the switch is hung off a normal MAC/MDIO, right? (ignoring
> > >> whether that NIC/MAC is actually hug off PCIe for the moment).
> > >
> > > There is no specific bus, we have memory mapped, MDIO, SPI, I2C swiches
> > > all supported within the driver framework right now.
> > >
> > >>
> > >> It just has a bunch of phy devices strung out on that single MAC/MDIO.
> > >
> > > It has a number of built-in PHYs that typically appear on a MDIO bus,
> > > whether that bus is the switch's internal MDIO bus, or another MDIO bus
> > > (which could be provided with just two GPIOs) depends on how the switch
> > > is connected to its management host.
> > >
> > > When the switch is interfaced via MDIO the switch also typically has a
> > > MDIO interface called the pseudo-PHY which is how you can actually tap
> > > into the control interface of the switch, as opposed to reading its
> > > internal PHYs from the MDIO bus.
> > >
> > >> So in ACPI land it would still have a relationship similar to the one
> > >> Andrew pointed out with his DT example where the eth0->mdio->phy are all
> > >> contained in their physical parent. The phy in that case associated with
> > >> the parent adapter would be the first direct decedent of the mdio, the
> > >> switch itself could then be represented with another device, with a
> > >> further string of device/phys representing the devices. (I dislike
> > >> drawing acsii art, but if this isn't clear I will, its also worthwhile
> > >> to look at the dpaa2 docs for how the mac/phys work on this device for
> > >> contrast.).
> > >
> > > The eth0->mdio->phy relationship you describe is the simple case that
> > > you are well aware of which is say what we have on the Raspberry Pi 4
> > > with GENET and the external Broadcom PHY.
> > >
> > > For an Ethernet switch connected to an Ethernet MAC, we have 4 different
> > > types of objects:
> > >
> > > - the Ethernet MAC which sits on its specific bus
> > >
> > > - the Ethernet switch which also sits on its specific bus
> > >
> > > - the built-in PHYs of the Ethernet switch which sit on whatever
> > > bus/interface the switch provides to make them accessible
> > >
> > > - the specific bus controller that provides access to the Ethernet switch
> > >
> > > and this is a simplification that does not take into account Physical
> > > Coding Sublayer devices, pure MDIO devices (with no foot in the Ethernet
> > > land such as PCIe, USB3 or SATA PHYs), SFP, SFF and other pluggable 
> > > modules.
> >
> > Which is why I've stayed away from much of the switch discussion. There
> > are a lot of edge cases to fall into, because for whatever reason
> > networking seems to be unique in all this non-enumerable customization
> > vs many other areas of the system. Storage, being an example i'm more
> > familiar with which has very similar problems yet, somehow has managed
> > to avoid much of this, despite having run on fabrics significantly more
> > complex than basic ethernet (including running on a wide range of hot
> > pluggable GBIC/SFP/QSFP/etc media layers).
> >
> > ACPI's "problem" here is that its strongly influenced by the "Plug and
> > Play" revolution of the 1990's where the industry went from having
> > humans describing hardware using machine readable languages, to hardware
> > which was enumerable using standard protocols. ACPI's device
> > descriptions are there as a crutch for the remaining non plug an play
> > hardware in the system.
> >
> > So at a basic level, if your describing hardware in ACPI rather than
> > abstracting it, that is a problem.
> >
> This is also my first run with ACPI and this discussion needs to be
> brought back to the powers that be regarding sorting this out.  This
> is where I see it from an Armada and Layerscape perspective.  This
> isn't a silver bullet fix but the small things I think that need to be
> done to move this forward.
> 
> From Microsoft's documentation.
> 
> Device dependencies
> 
> Typically, there are hardware dependencies between devices on a
> particular platform. Windows requires that all such dep

Re: general protection fault in vsock_poll

2020-07-29 Thread Stefano Garzarella

On Wed, Jul 29, 2020 at 01:59:05AM -0700, syzbot wrote:
> syzbot has bisected this issue to:
> 
> commit 408624af4c89989117bb2c6517bd50b7708a2fcd
> Author: Stefano Garzarella 
> Date:   Tue Dec 10 10:43:06 2019 +
> 
> vsock: use local transport when it is loaded
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17e6489b10
> start commit:   92ed3019 Linux 5.8-rc7
> git tree:   upstream
> final oops: https://syzkaller.appspot.com/x/report.txt?x=1416489b10
> console output: https://syzkaller.appspot.com/x/log.txt?x=1016489b10
> kernel config:  https://syzkaller.appspot.com/x/.config?x=84f076779e989e69
> dashboard link: https://syzkaller.appspot.com/bug?extid=a61bac2fcc1a7c6623fe
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15930b6490
> 
> Reported-by: syzbot+a61bac2fcc1a7c662...@syzkaller.appspotmail.com
> Fixes: 408624af4c89 ("vsock: use local transport when it is loaded")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> 

I'll take a look.

At first glance it seems strange, because if sk_state is TCP_ESTABLISHED,
the transport shouldn't be NULL, that's why we didn't put the check.

Stefano

[PATCH net-next v2] net: mvneta: fix comment about phylink_speed_down

2020-07-29 Thread Jisheng Zhang

mvneta has switched to phylink, so the comment should look
like "We may have called phylink_speed_down before".

Signed-off-by: Jisheng Zhang 
---
Since v1:
  - drop patch2 which tries to avoid link flapping when changing mtu
I need more time on the change mtu refactoring.

 drivers/net/ethernet/marvell/mvneta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 2c9277e73cef..c9b6b0f85bb0 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3637,7 +3637,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
 
phylink_start(pp->phylink);
 
-   /* We may have called phy_speed_down before */
+   /* We may have called phylink_speed_down before */
phylink_speed_up(pp->phylink);
 
netif_tx_start_all_queues(pp->dev);
-- 
2.28.0.rc0

Re: [PATCH net-nex 2/2] net: mvneta: Don't speed down the PHY when changing mtu

2020-07-29 Thread Jisheng Zhang

Hi David,

On Tue, 28 Jul 2020 17:52:02 -0700 (PDT) David Miller wrote:

> 
> 
> > @@ -3651,7 +3651,8 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
> >
> >   set_bit(__MVNETA_DOWN, &pp->state);
> >
> > - if (device_may_wakeup(&pp->dev->dev))
> > + if (device_may_wakeup(&pp->dev->dev) &&
> > + pp->pkt_size == MVNETA_RX_PKT_SIZE(pp->dev->mtu))
> >   phylink_speed_down(pp->phylink, false);
> >  
> 
> This is too much for me.
> 
> You shouldn't have to shut down the entire device and take it back up
> again just to change the MTU.
> 
> Unfortunately, this is a common pattern in many drivers and it is very
> dangerous to take this lazy path of just doing "stop/start" around
> the MTU change.
> 
> It means you can't recover from partial failures properly,
> f.e. recovering from an inability to allocate queue resources for the
> new MTU.
> 
> To solve this properly, you must restructure the MTU change such that
> is specifically stops the necessary and only the units of the chip
> necessary to change the MTU.
> 
> It should next try to allocate the necessary resources to satisfy the
> MTU change, keeping the existing resources allocated in case of
> failure.
> 
> Then, only is all resources are successfully allocated, it should
> commit the MTU change fully and without errors.
> 
> Then none of these link flapping issues are even possible.

Thanks a lot for pointing out the correct direction. Refactoring change
mtu method needs more time(maybe for linux-5.10 is reasonable), so I
just drop patch2 in v2.

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-29 Thread Eli Cohen

On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:
> 
> On 2020/7/28 下午5:04, Eli Cohen wrote:
> >On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
> >>+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> >>+{
> >>+   struct vhost_virtqueue *vq = &v->vqs[qid];
> >>+   const struct vdpa_config_ops *ops = v->vdpa->config;
> >>+   struct vdpa_device *vdpa = v->vdpa;
> >>+   int ret, irq;
> >>+
> >>+   spin_lock(&vq->call_ctx.ctx_lock);
> >>+   irq = ops->get_vq_irq(vdpa, qid);
> >>+   if (!vq->call_ctx.ctx || irq == -EINVAL) {
> >>+   spin_unlock(&vq->call_ctx.ctx_lock);
> >>+   return;
> >>+   }
> >>+
> >If I understand correctly, this will cause these IRQs to be forwarded
> >directly to the VCPU, e.g. will be handled by the guest/qemu.
> 
> 
> Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.
> 

So, usually the network driver knows how to handle interrups for its
devices. I assume the virtio_net driver at the guest has some default
processing but what if the underlying hardware device (such as the case
of vdpa) needs to take some actions? Is there an option to do bounce the
interrupt back to the vendor specific driver in the host so it can take
these actions?

> 
> >Does this mean that the host will not handle this interrupt? How does it
> >work in case on level triggered interrupts?
> 
> 
> There's no guarantee that the KVM arch code can make sure the irq
> bypass work for any type of irq. So if they the irq will still need
> to be handled by host first. This means we should keep the host
> interrupt handler as a slowpath (fallback).
> 
> >
> >In the case of ConnectX, I need to execute some code to acknowledge the
> >interrupt.
> 
> 
> This turns out to be hard for irq bypassing to work. Is it because
> the irq is shared or what kind of ack you need to do?

I have an EQ which is a queue for events comming from the hardware. This
EQ can created so it reports only completion events but I still need to
execute code that roughly tells the device that I saw these event
records and then arm it again so it can report more interrupts (e.g if
more packets are received or sent). This is device specific code.

> 
> Thanks
> 
> 
> >
> >Can you explain how this should be done?
> >
>

[PATCH v1 0/3] net: ethernet: use generic power management

2020-07-29 Thread Vaibhav Gupta

Linux Kernel Mentee: Remove Legacy Power Management.

The purpose of this patch series is to upgrade power management in net ethernet
drivers. This has been done by upgrading .suspend() and .resume() callbacks.

The upgrade makes sure that the involvement of PCI Core does not change the
order of operations executed in a driver. Thus, does not change its behavior.

In general, drivers with legacy PM, .suspend() and .resume() make use of PCI
helper functions like pci_enable/disable_device_mem(), pci_set_power_state(),
pci_save/restore_state(), pci_enable/disable_device(), etc. to complete
their job.

The conversion requires the removal of those function calls, change the
callbacks' definition accordingly and make use of dev_pm_ops structure.

All patches are compile-tested only.

Test tools:
- Compiler: gcc (GCC) 10.1.0
- allmodconfig build: make -j$(nproc) W=1 all

Vaibhav Gupta (3):
  sc92031: use generic power management
  sis900: use generic power management
  tlan: use generic power management

 drivers/net/ethernet/silan/sc92031.c | 26 ---
 drivers/net/ethernet/sis/sis900.c| 23 +++--
 drivers/net/ethernet/ti/tlan.c   | 31 ++--
 3 files changed, 22 insertions(+), 58 deletions(-)

-- 
2.27.0

Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function

2020-07-29 Thread Russell King - ARM Linux admin

On Tue, Jul 28, 2020 at 11:06:08PM +0200, Petr Machata wrote:
> 
> Kurt Kanzenbach  writes:
> 
> > On Mon Jul 27 2020, Petr Machata wrote:
> >> So this looks good, and works, but I'm wondering about one thing.
> >
> > Thanks for testing.
> >
> >>
> >> Your code (and evidently most drivers as well) use a different check
> >> than mlxsw, namely skb->len + ETH_HLEN < X. When I print_hex_dump()
> >> skb_mac_header(skb), skb->len in mlxsw with some test packet, I get e.g.
> >> this:
> >>
> >> 259a4db7: 01 00 5e 00 01 81 00 02 c9 a4 e4 e1 08 00 45 00  
> >> ..^...E.
> >> 5f29f0eb: 00 48 0d c9 40 00 01 11 c8 59 c0 00 02 01 e0 00  
> >> .H..@Y..
> >> f3663e9e: 01 81 01 3f 01 3f 00 34 9f d3 00 02 00 2c 00 00  
> >> ...?.?.4.,..
> >> ^sp^^ ^dp^^ ^len^ ^cks^   ^len^
> >> b3914606: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02  
> >> 
> >> 2e7828ea: c9 ff fe a4 e4 e1 00 01 09 fa 00 00 00 00 00 00  
> >> 
> >> 0b98156e: 00 00 00 00 00 00
> >> ..
> >>
> >> Both UDP and PTP length fields indicate that the payload ends exactly at
> >> the end of the dump. So apparently skb->len contains all the payload
> >> bytes, including the Ethernet header.
> >>
> >> Is that the case for other drivers as well? Maybe mlxsw is just missing
> >> some SKB magic in the driver.
> >
> > So I run some tests (on other hardware/drivers) and it seems like that
> > the skb->len usually doesn't include the ETH_HLEN. Therefore, it is
> > added to the check.
> >
> > Looking at the driver code:
> >
> > |static void mlxsw_sp_rx_sample_listener(struct sk_buff *skb, u8 local_port,
> > |   void *trap_ctx)
> > |{
> > |   [...]
> > |   /* The sample handler expects skb->data to point to the start of the
> > |* Ethernet header.
> > |*/
> > |   skb_push(skb, ETH_HLEN);
> > |   mlxsw_sp_sample_receive(mlxsw_sp, skb, local_port);
> > |}
> >
> > Maybe that's the issue here?
> 
> Correct, mlxsw pushes the header very soon. Given that both
> ptp_classify_raw() and eth_type_trans() that are invoked later assume
> the header, it is reasonable. I have shuffled the pushes around and have
> a patch that both works and I think is correct.

Would it make more sense to do:

u8 *data = skb_mac_header(skb);
u8 *ptr = data;

if (type & PTP_CLASS_VLAN)
ptr += VLAN_HLEN;

switch (type & PTP_CLASS_PMASK) {
case PTP_CLASS_IPV4:
ptr += IPV4_HLEN(ptr) + UDP_HLEN;
break;

case PTP_CLASS_IPV6:
ptr += IP6_HLEN + UDP_HLEN;
break;

case PTP_CLASS_L2:
break;

default:
return NULL;
}

ptr += ETH_HLEN;

if (ptr + 34 > skb->data + skb->len)
return NULL;

return ptr;

in other words, compare pointers, so that whether skb_push() etc has
been used on the skb is irrelevant?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Re: [PATCH] net: ethernet: fix potential memory leak in gemini_ethernet_port_probe()

2020-07-29 Thread Jisheng Zhang

On Wed, 29 Jul 2020 11:46:06 +0800 Lu Wei wrote:


> 
> 
> If some processes in gemini_ethernet_port_probe() fail,
> free_netdev(dev) needs to be called to avoid a memory leak.

Using devm_alloc_etherdev_mqs() would be much simpler

> 
> Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit 
> ethernet")
> Reported-by: Hulk Robot 
> Signed-off-by: Lu Wei 
> ---
>  drivers/net/ethernet/cortina/gemini.c | 24 +++-
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cortina/gemini.c 
> b/drivers/net/ethernet/cortina/gemini.c
> index 8d13ea370db1..5e93a1a570b6 100644
> --- a/drivers/net/ethernet/cortina/gemini.c
> +++ b/drivers/net/ethernet/cortina/gemini.c
> @@ -2407,37 +2407,48 @@ static int gemini_ethernet_port_probe(struct 
> platform_device *pdev)
> dmares = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> if (!dmares) {
> dev_err(dev, "no DMA resource\n");
> +   free_netdev(netdev);
> return -ENODEV;
> }
> port->dma_base = devm_ioremap_resource(dev, dmares);
> -   if (IS_ERR(port->dma_base))
> +   if (IS_ERR(port->dma_base)) {
> +   free_netdev(netdev);
> return PTR_ERR(port->dma_base);
> +   }
> 
> /* GMAC config memory */
> gmacres = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> if (!gmacres) {
> dev_err(dev, "no GMAC resource\n");
> +   free_netdev(netdev);
> return -ENODEV;
> }
> port->gmac_base = devm_ioremap_resource(dev, gmacres);
> -   if (IS_ERR(port->gmac_base))
> +   if (IS_ERR(port->gmac_base)) {
> +   free_netdev(netdev);
> return PTR_ERR(port->gmac_base);
> +   }
> 
> /* Interrupt */
> irq = platform_get_irq(pdev, 0);
> -   if (irq <= 0)
> +   if (irq <= 0) {
> +   free_netdev(netdev);
> return irq ? irq : -ENODEV;
> +   }
> port->irq = irq;
> 
> /* Clock the port */
> port->pclk = devm_clk_get(dev, "PCLK");
> if (IS_ERR(port->pclk)) {
> dev_err(dev, "no PCLK\n");
> +   free_netdev(netdev);
> return PTR_ERR(port->pclk);
> }
> ret = clk_prepare_enable(port->pclk);
> -   if (ret)
> +   if (ret) {
> +   free_netdev(netdev);
> return ret;
> +   }
> 
> /* Maybe there is a nice ethernet address we should use */
> gemini_port_save_mac_addr(port);
> @@ -2446,6 +2457,7 @@ static int gemini_ethernet_port_probe(struct 
> platform_device *pdev)
> port->reset = devm_reset_control_get_exclusive(dev, NULL);
> if (IS_ERR(port->reset)) {
> dev_err(dev, "no reset\n");
> +   free_netdev(netdev);
> return PTR_ERR(port->reset);
> }
> reset_control_reset(port->reset);
> @@ -2501,8 +2513,10 @@ static int gemini_ethernet_port_probe(struct 
> platform_device *pdev)
> IRQF_SHARED,
> port_names[port->id],
> port);
> -   if (ret)
> +   if (ret) {
> +   free_netdev(netdev);
> return ret;
> +   }
> 
> ret = register_netdev(netdev);
> if (!ret) {
> --
> 2.17.1
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Darm-2Dkernel&d=DwICAg&c=7dfBJ8cXbWjhc0BhImu8wQ&r=wlaKTGoVCDxOzHc2QUzpzGEf9oY3eidXlAe3OF1omvo&m=56ruRjtNY-BIzquRyoO0KSbrR7UBB81VfqotT_rfFus&s=Axiqv0SZYKFXgMc1zJLilZCk9wbRAt4LkKtW6VjKTgw&e=

Re: [PATCH net-next v2] net: mvneta: fix comment about phylink_speed_down

2020-07-29 Thread Russell King - ARM Linux admin

On Wed, Jul 29, 2020 at 05:49:09PM +0800, Jisheng Zhang wrote:
> mvneta has switched to phylink, so the comment should look
> like "We may have called phylink_speed_down before".
> 
> Signed-off-by: Jisheng Zhang 

Reviewed-by: Russell King 

> ---
> Since v1:
>   - drop patch2 which tries to avoid link flapping when changing mtu
> I need more time on the change mtu refactoring.
> 
>  drivers/net/ethernet/marvell/mvneta.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index 2c9277e73cef..c9b6b0f85bb0 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3637,7 +3637,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
>  
>   phylink_start(pp->phylink);
>  
> - /* We may have called phy_speed_down before */
> + /* We may have called phylink_speed_down before */
>   phylink_speed_up(pp->phylink);
>  
>   netif_tx_start_all_queues(pp->dev);
> -- 
> 2.28.0.rc0
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Re: [PATCH v1] farsync: use generic power management

2020-07-29 Thread Vaibhav Gupta

On Tue, Jul 28, 2020 at 03:04:13PM -0500, Bjorn Helgaas wrote:
> On Tue, Jul 28, 2020 at 09:58:10AM +0530, Vaibhav Gupta wrote:
> > The .suspend() and .resume() callbacks are not defined for this driver.
> > Still, their power management structure follows the legacy framework. To
> > bring it under the generic framework, simply remove the binding of
> > callbacks from "struct pci_driver".
> 
> FWIW, this commit log is slightly misleading because .suspend and
> .resume are NULL by default, so this patch actually is a complete
> no-op as far as code generation is concerned.
> 
> This change is worthwhile because it simplifies the code a little, but
> it doesn't convert the driver from legacy to generic power management.
> This driver doesn't supply a .pm structure, so it doesn't seem to do
> *any* power management.
> 
Agreed. Actually, as their presence only causes PCI core to call
pci_legacy_suspend/resume() for them, I thought that after removing the binding
from "struct pci_driver", this driver qualifies to be grouped under genric
framework, so used "use generic power management" for the heading.

I should have written "remove legacy bindning".

But David has applied the patch, should I send a v2 or fix to update message?

Thanks
Vaibhav Gupta
> > Change code indentation from space to tab in "struct pci_driver".
> >

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-29 Thread Jason Wang




On 2020/7/29 下午5:55, Eli Cohen wrote:

On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:

On 2020/7/28 下午5:04, Eli Cohen wrote:

On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:

+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
+{
+   struct vhost_virtqueue *vq = &v->vqs[qid];
+   const struct vdpa_config_ops *ops = v->vdpa->config;
+   struct vdpa_device *vdpa = v->vdpa;
+   int ret, irq;
+
+   spin_lock(&vq->call_ctx.ctx_lock);
+   irq = ops->get_vq_irq(vdpa, qid);
+   if (!vq->call_ctx.ctx || irq == -EINVAL) {
+   spin_unlock(&vq->call_ctx.ctx_lock);
+   return;
+   }
+

If I understand correctly, this will cause these IRQs to be forwarded
directly to the VCPU, e.g. will be handled by the guest/qemu.


Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.


So, usually the network driver knows how to handle interrups for its
devices. I assume the virtio_net driver at the guest has some default
processing but what if the underlying hardware device (such as the case
of vdpa) needs to take some actions?



Virtio splits the bus operations out of device operations. So did the 
driver.


The virtio-net driver depends on a transport driver to talk to the real 
device. Usually PCI is used as the transport for the device. In this 
case virtio-pci driver is in charge of dealing with irq 
allocation/free/configuration and it needs to co-operate with platform 
specific irqchip (virtualized by KVM) to finish the work like irq 
acknowledge etc.  E.g for x86, the irq offloading can only work when 
there's a hardware support of virtual irqchip (APICv) then all stuffs 
could be done without vmexits.


So no vendor specific part since the device and transport are all standard.



  Is there an option to do bounce the
interrupt back to the vendor specific driver in the host so it can take
these actions?



Currently not, but even if we can do this, I'm afraid we will lose the 
performance advantage of irq bypassing.






Does this mean that the host will not handle this interrupt? How does it
work in case on level triggered interrupts?


There's no guarantee that the KVM arch code can make sure the irq
bypass work for any type of irq. So if they the irq will still need
to be handled by host first. This means we should keep the host
interrupt handler as a slowpath (fallback).


In the case of ConnectX, I need to execute some code to acknowledge the
interrupt.


This turns out to be hard for irq bypassing to work. Is it because
the irq is shared or what kind of ack you need to do?

I have an EQ which is a queue for events comming from the hardware. This
EQ can created so it reports only completion events but I still need to
execute code that roughly tells the device that I saw these event
records and then arm it again so it can report more interrupts (e.g if
more packets are received or sent). This is device specific code.



Any chance that the hardware can use MSI (which is not the case here)?

Thanks



Thanks



Can you explain how this should be done?

Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function

2020-07-29 Thread Kurt Kanzenbach

On Wed Jul 29 2020, Russell King - ARM Linux admin wrote:
> On Tue, Jul 28, 2020 at 11:06:08PM +0200, Petr Machata wrote:
>> 
>> Kurt Kanzenbach  writes:
>> 
>> > On Mon Jul 27 2020, Petr Machata wrote:
>> >> So this looks good, and works, but I'm wondering about one thing.
>> >
>> > Thanks for testing.
>> >
>> >>
>> >> Your code (and evidently most drivers as well) use a different check
>> >> than mlxsw, namely skb->len + ETH_HLEN < X. When I print_hex_dump()
>> >> skb_mac_header(skb), skb->len in mlxsw with some test packet, I get e.g.
>> >> this:
>> >>
>> >> 259a4db7: 01 00 5e 00 01 81 00 02 c9 a4 e4 e1 08 00 45 00  
>> >> ..^...E.
>> >> 5f29f0eb: 00 48 0d c9 40 00 01 11 c8 59 c0 00 02 01 e0 00  
>> >> .H..@Y..
>> >> f3663e9e: 01 81 01 3f 01 3f 00 34 9f d3 00 02 00 2c 00 00  
>> >> ...?.?.4.,..
>> >> ^sp^^ ^dp^^ ^len^ ^cks^   ^len^
>> >> b3914606: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02  
>> >> 
>> >> 2e7828ea: c9 ff fe a4 e4 e1 00 01 09 fa 00 00 00 00 00 00  
>> >> 
>> >> 0b98156e: 00 00 00 00 00 00
>> >> ..
>> >>
>> >> Both UDP and PTP length fields indicate that the payload ends exactly at
>> >> the end of the dump. So apparently skb->len contains all the payload
>> >> bytes, including the Ethernet header.
>> >>
>> >> Is that the case for other drivers as well? Maybe mlxsw is just missing
>> >> some SKB magic in the driver.
>> >
>> > So I run some tests (on other hardware/drivers) and it seems like that
>> > the skb->len usually doesn't include the ETH_HLEN. Therefore, it is
>> > added to the check.
>> >
>> > Looking at the driver code:
>> >
>> > |static void mlxsw_sp_rx_sample_listener(struct sk_buff *skb, u8 
>> > local_port,
>> > |  void *trap_ctx)
>> > |{
>> > |  [...]
>> > |  /* The sample handler expects skb->data to point to the start of the
>> > |   * Ethernet header.
>> > |   */
>> > |  skb_push(skb, ETH_HLEN);
>> > |  mlxsw_sp_sample_receive(mlxsw_sp, skb, local_port);
>> > |}
>> >
>> > Maybe that's the issue here?
>> 
>> Correct, mlxsw pushes the header very soon. Given that both
>> ptp_classify_raw() and eth_type_trans() that are invoked later assume
>> the header, it is reasonable. I have shuffled the pushes around and have
>> a patch that both works and I think is correct.
>
> Would it make more sense to do:
>
>   u8 *data = skb_mac_header(skb);
>   u8 *ptr = data;
>
>   if (type & PTP_CLASS_VLAN)
>   ptr += VLAN_HLEN;
>
>   switch (type & PTP_CLASS_PMASK) {
>   case PTP_CLASS_IPV4:
>   ptr += IPV4_HLEN(ptr) + UDP_HLEN;
>   break;
>
>   case PTP_CLASS_IPV6:
>   ptr += IP6_HLEN + UDP_HLEN;
>   break;
>
>   case PTP_CLASS_L2:
>   break;
>
>   default:
>   return NULL;
>   }
>
>   ptr += ETH_HLEN;
>
>   if (ptr + 34 > skb->data + skb->len)
>   return NULL;
>
>   return ptr;
>
> in other words, compare pointers, so that whether skb_push() etc has
> been used on the skb is irrelevant?

Yes. Avoiding relying on whether skb_{push,pull} has been used is
overall the simplest solution and it works for all drivers regardless if
DSA or something else is used. Looking at your code, it looks correct
to me.

I'll test it and send v3. But before, I've got another question that
somebody might have an answer to:

The ptp v1 code always does locate the message type at

 msgtype = data + offset + OFF_PTP_CONTROL

OFF_PTP_CONTROL is 32. However, looking at the ptp v1 header, the
message type is located at offset 20. What am I missing here?

Thanks,
Kurt


signature.asc
Description: PGP signature

[PATCH net] net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct

2020-07-29 Thread wenxu

From: wenxu 

When openvswitch conntrack offload with act_ct action. Fragment packets
defrag in the ingress tc act_ct action and miss the next chain. Then the
packet pass to the openvswitch datapath without the mru. The over
mtu packet will be dropped in output action in openvswitch for over mtu.

"kernel: net2: dropped over-mtu packet: 1528 > 1500"

This patch add mru in the tc_skb_ext for adefrag and miss next chain
situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
to the qdisc_skb_cb when the packet defrag. And When the chain miss,
The mru is set to tc_skb_ext which can be got by ovs datapath.

Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: wenxu 
---
 include/linux/skbuff.h| 1 +
 include/net/sch_generic.h | 1 +
 net/openvswitch/flow.c| 1 +
 net/sched/act_ct.c| 8 ++--
 net/sched/cls_api.c   | 1 +
 5 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0c0377f..0d842d6 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -283,6 +283,7 @@ struct nf_bridge_info {
  */
 struct tc_skb_ext {
__u32 chain;
+   __u16 mru;
 };
 #endif
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c510b03..45401d5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -384,6 +384,7 @@ struct qdisc_skb_cb {
};
 #define QDISC_CB_PRIV_LEN 20
unsigned char   data[QDISC_CB_PRIV_LEN];
+   u16 mru;
 };
 
 typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 9d375e7..03942c3 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -890,6 +890,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info 
*tun_info,
if (static_branch_unlikely(&tc_recirc_sharing_support)) {
tc_ext = skb_ext_find(skb, TC_SKB_EXT);
key->recirc_id = tc_ext ? tc_ext->chain : 0;
+   OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0;
} else {
key->recirc_id = 0;
}
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 5928efb..69445ab 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -706,8 +706,10 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
if (err && err != -EINPROGRESS)
goto out_free;
 
-   if (!err)
+   if (!err) {
*defrag = true;
+   cb.mru = IPCB(skb)->frag_max_size;
+   }
} else { /* NFPROTO_IPV6 */
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
@@ -717,8 +719,10 @@ static int tcf_ct_handle_fragments(struct net *net, struct 
sk_buff *skb,
if (err && err != -EINPROGRESS)
goto out_free;
 
-   if (!err)
+   if (!err) {
*defrag = true;
+   cb.mru = IP6CB(skb)->frag_max_size;
+   }
 #else
err = -EOPNOTSUPP;
goto out_free;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 4619cb3..eb6acc5 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1627,6 +1627,7 @@ int tcf_classify_ingress(struct sk_buff *skb,
if (WARN_ON_ONCE(!ext))
return TC_ACT_SHOT;
ext->chain = last_executed_chain;
+   ext->mru = qdisc_skb_cb(skb)->mru;
}
 
return ret;
-- 
1.8.3.1

Re: [net-next 5/6] i40e, xsk: increase budget for AF_XDP path

2020-07-29 Thread Björn Töpel


On 2020-07-28 22:15, Jakub Kicinski wrote:

On Tue, 28 Jul 2020 12:08:41 -0700 Tony Nguyen wrote:

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 1f2dd591dbf1..99f4afdc403d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -265,6 +265,8 @@ static void i40e_inc_ntc(struct i40e_ring *rx_ring)
rx_ring->next_to_clean = ntc;
  }
  
+#define I40E_XSK_CLEAN_RX_BUDGET 256U

+
  /**
   * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring
   * @rx_ring: Rx ring
@@ -280,7 +282,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int 
budget)
bool failure = false;
struct sk_buff *skb;
  
-	while (likely(total_rx_packets < (unsigned int)budget)) {

+   while (likely(total_rx_packets < I40E_XSK_CLEAN_RX_BUDGET)) {
union i40e_rx_desc *rx_desc;
struct xdp_buff **bi;
unsigned int size;


Should this perhaps be a common things that drivers do?

Should we define a "XSK_NAPI_WEIGHT_MULT 4" instead of hard coding the
value in a driver?



Yes, that's a good idea. I can generalize for the AF_XDP paths in the 
other drivers as a follow up!



Cheers,
Björn (in vacation mode)

[PATCH] uapi, seg6_iptunnel: Add missing include in seg6_iptunnel.h

2020-07-29 Thread Ioana-Ruxandra Stancioi

From: Ioana-Ruxandra Stăncioi 

Include  in uapi/linux/seg6_iptunnel.h to fix the
following linux/seg6_iptunnel.h compilation error:

   invalid application of 'sizeof' to incomplete type 'struct ipv6hdr'
   head = sizeof(struct ipv6hdr);
 ^~

This is to allow including this header in places where 
has not been included but __KERNEL__ is defined. In the kernel the easy
workaround is including , but the header may also be used
by code analysis tools.

Signed-off-by: Ioana-Ruxandra Stăncioi 
---
 include/uapi/linux/seg6_iptunnel.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/seg6_iptunnel.h 
b/include/uapi/linux/seg6_iptunnel.h
index 09fb608a35ec..b904228f463c 100644
--- a/include/uapi/linux/seg6_iptunnel.h
+++ b/include/uapi/linux/seg6_iptunnel.h
@@ -38,6 +38,7 @@ enum {
 };
 
 #ifdef __KERNEL__
+#include 
 
 static inline size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo)
 {
-- 
2.28.0.rc0.142.g3c755180ce-goog

Re: [PATCH RFC net-next] net: phy: add Marvell PHY PTP support

2020-07-29 Thread Russell King - ARM Linux admin

On Tue, Jul 14, 2020 at 05:26:28PM +0100, Russell King wrote:
> Add PTP basic support for Marvell 88E151x PHYs.  These PHYs support
> timestamping the egress and ingress of packets, but does not support
> any packet modification, nor do we support any filtering beyond
> selecting packets that the hardware recognises as PTP/802.1AS.

A question has come up concerning the selection of PTP timestamping
sources within a network device.

I have the situation on a couple of devices where there are multiple
places that can do PTP timestamping:

- the PHY (slow to access, only event capture which may not be wired,
   doesn't seem to synchronise well - delay of 58000, frequency changes
   every second between +/-1500ppb.)
- the Ethernet MAC (fast to access, supports event capture and trigger
   generation which also may not be wired, synchronises well, delay of
   700, frequency changes every second +/- 40ppb.)

How do we deal with this situation - from what I can see from the
ethtool API, we have to make a choice about which to use.  How do we
make that choice?

It's not a case of "just implement one" since hardware may have both
available on a particular ethernet interface or just one available.

Do we need a property to indicate whether we wish to use the PHY
or MAC PTP stamping, or something more elaborate?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-29 Thread Eli Cohen

On Wed, Jul 29, 2020 at 06:19:52PM +0800, Jason Wang wrote:
I am checking internally if we can work in a mode not requiring to
acknowledge the interrupt. I will update.

Thanks for the explanations.

> 
> On 2020/7/29 下午5:55, Eli Cohen wrote:
> >On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:
> >>On 2020/7/28 下午5:04, Eli Cohen wrote:
> >>>On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
> +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> +{
> + struct vhost_virtqueue *vq = &v->vqs[qid];
> + const struct vdpa_config_ops *ops = v->vdpa->config;
> + struct vdpa_device *vdpa = v->vdpa;
> + int ret, irq;
> +
> + spin_lock(&vq->call_ctx.ctx_lock);
> + irq = ops->get_vq_irq(vdpa, qid);
> + if (!vq->call_ctx.ctx || irq == -EINVAL) {
> + spin_unlock(&vq->call_ctx.ctx_lock);
> + return;
> + }
> +
> >>>If I understand correctly, this will cause these IRQs to be forwarded
> >>>directly to the VCPU, e.g. will be handled by the guest/qemu.
> >>
> >>Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.
> >>
> >So, usually the network driver knows how to handle interrups for its
> >devices. I assume the virtio_net driver at the guest has some default
> >processing but what if the underlying hardware device (such as the case
> >of vdpa) needs to take some actions?
> 
> 
> Virtio splits the bus operations out of device operations. So did
> the driver.
> 
> The virtio-net driver depends on a transport driver to talk to the
> real device. Usually PCI is used as the transport for the device. In
> this case virtio-pci driver is in charge of dealing with irq
> allocation/free/configuration and it needs to co-operate with
> platform specific irqchip (virtualized by KVM) to finish the work
> like irq acknowledge etc.  E.g for x86, the irq offloading can only
> work when there's a hardware support of virtual irqchip (APICv) then
> all stuffs could be done without vmexits.
> 
> So no vendor specific part since the device and transport are all standard.
> 
> 
> >  Is there an option to do bounce the
> >interrupt back to the vendor specific driver in the host so it can take
> >these actions?
> 
> 
> Currently not, but even if we can do this, I'm afraid we will lose
> the performance advantage of irq bypassing.
> 
> 
> >
> >>>Does this mean that the host will not handle this interrupt? How does it
> >>>work in case on level triggered interrupts?
> >>
> >>There's no guarantee that the KVM arch code can make sure the irq
> >>bypass work for any type of irq. So if they the irq will still need
> >>to be handled by host first. This means we should keep the host
> >>interrupt handler as a slowpath (fallback).
> >>
> >>>In the case of ConnectX, I need to execute some code to acknowledge the
> >>>interrupt.
> >>
> >>This turns out to be hard for irq bypassing to work. Is it because
> >>the irq is shared or what kind of ack you need to do?
> >I have an EQ which is a queue for events comming from the hardware. This
> >EQ can created so it reports only completion events but I still need to
> >execute code that roughly tells the device that I saw these event
> >records and then arm it again so it can report more interrupts (e.g if
> >more packets are received or sent). This is device specific code.
> 
> 
> Any chance that the hardware can use MSI (which is not the case here)?
> 
> Thanks
> 
> 
> >>Thanks
> >>
> >>
> >>>Can you explain how this should be done?
> >>>
>

Re: [PATCH v8 bpf-next 12/13] selftests/bpf: Add test for d_path helper

2020-07-29 Thread Jiri Olsa

On Tue, Jul 28, 2020 at 12:53:00PM -0700, Andrii Nakryiko wrote:

SNIP

> > +   if (CHECK_FAIL(ret < 0))
> > +   goto out_close;
> > +   ret = set_pathname(procfd, pid);
> > +   if (CHECK_FAIL(ret < 0))
> > +   goto out_close;
> > +   ret = set_pathname(devfd, pid);
> > +   if (CHECK_FAIL(ret < 0))
> > +   goto out_close;
> > +   ret = set_pathname(localfd, pid);
> > +   if (CHECK_FAIL(ret < 0))
> > +   goto out_close;
> > +   ret = set_pathname(indicatorfd, pid);
> > +   if (CHECK_FAIL(ret < 0))
> > +   goto out_close;
> 
> Please use CHECK instead of CHECK_FAIL. Thanks.

ok

> > diff --git a/tools/testing/selftests/bpf/progs/test_d_path.c 
> > b/tools/testing/selftests/bpf/progs/test_d_path.c
> > new file mode 100644
> > index ..e02dce614256
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/test_d_path.c
> > @@ -0,0 +1,64 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include "vmlinux.h"
> > +#include 
> > +#include 
> > +
> > +#define MAX_PATH_LEN   128
> > +#define MAX_EVENT_NUM  16
> > +
> > +pid_t my_pid;
> > +__u32 cnt_stat;
> > +__u32 cnt_close;
> > +char paths_stat[MAX_EVENT_NUM][MAX_PATH_LEN];
> > +char paths_close[MAX_EVENT_NUM][MAX_PATH_LEN];
> > +int rets_stat[MAX_EVENT_NUM];
> > +int rets_close[MAX_EVENT_NUM];
> > +
> 
> please zero-initialize all of these, it causes issues on some Clang versions

ook

jirka

linux-next: build failure after merge of the net-next tree

2020-07-29 Thread Stephen Rothwell

Hi all,

After merging the net-next tree, today's linux-next build (i386 defconfig)
failed like this:

x86_64-linux-gnu-ld: net/core/fib_rules.o: in function `fib_rules_lookup':
fib_rules.c:(.text+0x5c6): undefined reference to `fib6_rule_match'
x86_64-linux-gnu-ld: fib_rules.c:(.text+0x5d8): undefined reference to 
`fib6_rule_match'
x86_64-linux-gnu-ld: fib_rules.c:(.text+0x64d): undefined reference to 
`fib6_rule_action'
x86_64-linux-gnu-ld: fib_rules.c:(.text+0x662): undefined reference to 
`fib6_rule_action'
x86_64-linux-gnu-ld: fib_rules.c:(.text+0x67a): undefined reference to 
`fib6_rule_suppress'
x86_64-linux-gnu-ld: fib_rules.c:(.text+0x68d): undefined reference to 
`fib6_rule_suppress'

Caused by commit

  b9aaec8f0be5 ("fib: use indirect call wrappers in the most common 
fib_rules_ops")

# CONFIG_IPV6_MULTIPLE_TABLES is not set

I have reverted that commit for today.

-- 
Cheers,
Stephen Rothwell


pgpd0p6OylDXr.pgp
Description: OpenPGP digital signature

Re: [PATCH ipsec] xfrm: esp6: fix the location of the transport header with encapsulation

2020-07-29 Thread Steffen Klassert

On Mon, Jul 27, 2020 at 04:03:47PM +0200, Sabrina Dubroca wrote:
> commit 17175d1a27c6 ("xfrm: esp6: fix encapsulation header offset
> computation") changed esp6_input_done2 to correctly find the size of
> the IPv6 header that precedes the TCP/UDP encapsulation header, but
> didn't adjust the final call to skb_set_transport_header, which I
> assumed was correct in using skb_network_header_len.
> 
> Xiumei Mu reported that when we create xfrm states that include port
> numbers in the selector, traffic from the user sockets is dropped. It
> turns out that we get a state mismatch in __xfrm_policy_check, because
> we end up trying to compare the encapsulation header's ports with the
> selector that's based on user traffic ports.
> 
> Fixes: 0146dca70b87 ("xfrm: add support for UDPv6 encapsulation of ESP")
> Fixes: 26333c37fc28 ("xfrm: add IPv6 support for espintcp")
> Reported-by: Xiumei Mu 
> Signed-off-by: Sabrina Dubroca 

Applied, thanks a lot Sabrina!

[PATCH v3 net-next 03/11] qed: swap param init and publish

2020-07-29 Thread Igor Russkikh

In theory that could lead to race condition

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
index a62c47c61edf..4e3316c6beb6 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
@@ -75,8 +75,8 @@ struct devlink *qed_devlink_register(struct qed_dev *cdev)
   QED_DEVLINK_PARAM_ID_IWARP_CMT,
   value);
 
-   devlink_params_publish(dl);
cdev->iwarp_cmt = false;
+   devlink_params_publish(dl);
 
return dl;
 
-- 
2.17.1

[PATCH v3 net-next 02/11] qed/qede: make devlink survive recovery

2020-07-29 Thread Igor Russkikh

Before that, devlink instance lifecycle was linked to qed_dev object,
that causes devlink to be recreated on each recovery.

Changing it by making higher level driver (qede) responsible for its
life. This way devlink will survive recoveries.

qede will store devlink structure pointer as a part of its device
object, devlink private data contains a linkage structure, it'll
contain extra devlink related content in following patches.

The same lifecycle should be applied to storage drivers (qedf/qedi) later.

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/qed/qed.h |  1 -
 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 40 ---
 drivers/net/ethernet/qlogic/qed/qed_devlink.h |  4 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c| 10 +
 drivers/net/ethernet/qlogic/qede/qede.h   |  1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c  | 18 +
 include/linux/qed/qed_if.h|  9 +
 7 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index b2a7b53ee760..b6ce1488abcc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -849,7 +849,6 @@ struct qed_dev {
u32 rdma_max_srq_sge;
u16 tunn_feature_mask;
 
-   struct devlink  *dl;
booliwarp_cmt;
 };
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
index eb693787c99e..a62c47c61edf 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 #include "qed.h"
 #include "qed_devlink.h"
 
@@ -13,17 +14,12 @@ enum qed_devlink_param_id {
QED_DEVLINK_PARAM_ID_IWARP_CMT,
 };
 
-struct qed_devlink {
-   struct qed_dev *cdev;
-};
-
 static int qed_dl_param_get(struct devlink *dl, u32 id,
struct devlink_param_gset_ctx *ctx)
 {
-   struct qed_devlink *qed_dl;
+   struct qed_devlink *qed_dl = devlink_priv(dl);
struct qed_dev *cdev;
 
-   qed_dl = devlink_priv(dl);
cdev = qed_dl->cdev;
ctx->val.vbool = cdev->iwarp_cmt;
 
@@ -33,10 +29,9 @@ static int qed_dl_param_get(struct devlink *dl, u32 id,
 static int qed_dl_param_set(struct devlink *dl, u32 id,
struct devlink_param_gset_ctx *ctx)
 {
-   struct qed_devlink *qed_dl;
+   struct qed_devlink *qed_dl = devlink_priv(dl);
struct qed_dev *cdev;
 
-   qed_dl = devlink_priv(dl);
cdev = qed_dl->cdev;
cdev->iwarp_cmt = ctx->val.vbool;
 
@@ -52,21 +47,19 @@ static const struct devlink_param qed_devlink_params[] = {
 
 static const struct devlink_ops qed_dl_ops;
 
-int qed_devlink_register(struct qed_dev *cdev)
+struct devlink *qed_devlink_register(struct qed_dev *cdev)
 {
union devlink_param_value value;
-   struct qed_devlink *qed_dl;
+   struct qed_devlink *qdevlink;
struct devlink *dl;
int rc;
 
-   dl = devlink_alloc(&qed_dl_ops, sizeof(*qed_dl));
+   dl = devlink_alloc(&qed_dl_ops, sizeof(struct qed_devlink));
if (!dl)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
-   qed_dl = devlink_priv(dl);
-
-   cdev->dl = dl;
-   qed_dl->cdev = cdev;
+   qdevlink = devlink_priv(dl);
+   qdevlink->cdev = cdev;
 
rc = devlink_register(dl, &cdev->pdev->dev);
if (rc)
@@ -85,26 +78,25 @@ int qed_devlink_register(struct qed_dev *cdev)
devlink_params_publish(dl);
cdev->iwarp_cmt = false;
 
-   return 0;
+   return dl;
 
 err_unregister:
devlink_unregister(dl);
 
 err_free:
-   cdev->dl = NULL;
devlink_free(dl);
 
-   return rc;
+   return ERR_PTR(rc);
 }
 
-void qed_devlink_unregister(struct qed_dev *cdev)
+void qed_devlink_unregister(struct devlink *devlink)
 {
-   if (!cdev->dl)
+   if (!devlink)
return;
 
-   devlink_params_unregister(cdev->dl, qed_devlink_params,
+   devlink_params_unregister(devlink, qed_devlink_params,
  ARRAY_SIZE(qed_devlink_params));
 
-   devlink_unregister(cdev->dl);
-   devlink_free(cdev->dl);
+   devlink_unregister(devlink);
+   devlink_free(devlink);
 }
diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
index b94c40e9b7c1..c79dc6bfa194 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
@@ -9,7 +9,7 @@
 #include 
 #include 
 
-int qed_devlink_register(struct qed_dev *cdev);
-void qed_devlink_unregister(struct qed_dev *cdev);
+struct devlink *qed_devlink_register(struct qed_dev *cdev);
+void qed_devlink_unregister(struct devlink *devlink);
 
 #

[PATCH v3 net-next 08/11] qed*: make use of devlink recovery infrastructure

2020-07-29 Thread Igor Russkikh

Remove forcible recovery trigger and put it as a normal devlink
callback.

This allows user to enable/disable it via

devlink health set pci/:03:00.0 reporter fw_fatal auto_recover false

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/qed/qed.h |  1 +
 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 14 ++
 drivers/net/ethernet/qlogic/qed/qed_main.c|  2 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c  | 10 --
 4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index ccd789eeda3e..f34b25a79449 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -981,6 +981,7 @@ void qed_bw_update(struct qed_hwfn *hwfn, struct qed_ptt 
*ptt);
 u32 qed_unzip_data(struct qed_hwfn *p_hwfn,
   u32 input_len, u8 *input_buf,
   u32 max_size, u8 *unzip_buf);
+int qed_recovery_process(struct qed_dev *cdev);
 void qed_schedule_recovery_handler(struct qed_hwfn *p_hwfn);
 void qed_hw_error_occurred(struct qed_hwfn *p_hwfn,
   enum qed_hw_err_type err_type);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
index ffe776a4f99a..b25be68f959c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
@@ -31,8 +31,22 @@ int qed_report_fatal_error(struct devlink *devlink, enum 
qed_hw_err_type err_typ
return 0;
 }
 
+static int
+qed_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
+ void *priv_ctx,
+ struct netlink_ext_ack *extack)
+{
+   struct qed_devlink *qdl = devlink_health_reporter_priv(reporter);
+   struct qed_dev *cdev = qdl->cdev;
+
+   qed_recovery_process(cdev);
+
+   return 0;
+}
+
 static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = {
.name = "fw_fatal",
+   .recover = qed_fw_fatal_reporter_recover,
 };
 
 #define QED_REPORTER_FW_GRACEFUL_PERIOD 120
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index a64d594f9294..db5d003770ba 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -2817,7 +2817,7 @@ static int qed_set_led(struct qed_dev *cdev, enum 
qed_led_mode mode)
return status;
 }
 
-static int qed_recovery_process(struct qed_dev *cdev)
+int qed_recovery_process(struct qed_dev *cdev)
 {
struct qed_hwfn *p_hwfn = QED_LEADING_HWFN(cdev);
struct qed_ptt *p_ptt;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index df437c3f1fc9..937d8e69ad39 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -2596,8 +2596,6 @@ static void qede_atomic_hw_err_handler(struct qede_dev 
*edev)
 
 static void qede_generic_hw_err_handler(struct qede_dev *edev)
 {
-   struct qed_dev *cdev = edev->cdev;
-
DP_NOTICE(edev,
  "Generic sleepable HW error handling started - err_flags 
0x%lx\n",
  edev->err_flags);
@@ -2605,14 +2603,6 @@ static void qede_generic_hw_err_handler(struct qede_dev 
*edev)
if (edev->devlink)
edev->ops->common->report_fatal_error(edev->devlink, 
edev->last_err_type);
 
-   /* Trigger a recovery process.
-* This is placed in the sleep requiring section just to make
-* sure it is the last one, and that all the other operations
-* were completed.
-*/
-   if (test_bit(QEDE_ERR_IS_RECOVERABLE, &edev->err_flags))
-   edev->ops->common->recovery_process(cdev);
-
clear_bit(QEDE_ERR_IS_HANDLED, &edev->err_flags);
 
DP_NOTICE(edev, "Generic sleepable HW error handling is done\n");
-- 
2.17.1

[PATCH v3 net-next 00/11] qed: introduce devlink health support

2020-07-29 Thread Igor Russkikh

This is a followup implementation after series

https://patchwork.ozlabs.org/project/netdev/cover/20200514095727.1361-1-irussk...@marvell.com/

This is an implementation of devlink health infrastructure.

With this we are now able to report HW errors to devlink, and it'll take
its own actions depending on user configuration to capture and store the
dump at the bad moment, and to request the driver to recover the device.

So far we do not differentiate global device failures or specific PCI
function failures. This means that some errors specific to one physical
function will affect an entire device. This is not yet fully designed
and verified, will followup in future.

Solution was verified with artificial HW errors generated, existing
tools for dump analysis could be used.

v3: fix uninit var usage in patch 11
v2: fix #include issue from kbuild test robot.

Igor Russkikh (11):
  qed: move out devlink logic into a new file
  qed/qede: make devlink survive recovery
  qed: swap param init and publish
  qed: fix kconfig help entries
  qed: implement devlink info request
  qed: health reporter init deinit seq
  qed: use devlink logic to report errors
  qed*: make use of devlink recovery infrastructure
  qed: implement devlink dump
  qed: align adjacent indent
  qede: make driver reliable on unload after failures

 drivers/net/ethernet/qlogic/Kconfig   |   5 +-
 drivers/net/ethernet/qlogic/qed/Makefile  |   1 +
 drivers/net/ethernet/qlogic/qed/qed.h |   3 +-
 drivers/net/ethernet/qlogic/qed/qed_dev.c |  10 +
 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 256 ++
 drivers/net/ethernet/qlogic/qed/qed_devlink.h |  20 ++
 drivers/net/ethernet/qlogic/qed/qed_main.c| 116 +---
 drivers/net/ethernet/qlogic/qede/qede.h   |   2 +
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  39 ++-
 include/linux/qed/qed_if.h|  23 +-
 10 files changed, 345 insertions(+), 130 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_devlink.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_devlink.h

-- 
2.17.1

[PATCH v3 net-next 01/11] qed: move out devlink logic into a new file

2020-07-29 Thread Igor Russkikh

We are extending devlink infrastructure, thus move the existing
stuff into a new file qed_devlink.c

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/qed/Makefile  |   1 +
 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 110 ++
 drivers/net/ethernet/qlogic/qed/qed_devlink.h |  15 +++
 drivers/net/ethernet/qlogic/qed/qed_main.c| 102 +---
 4 files changed, 127 insertions(+), 101 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_devlink.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_devlink.h

diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
b/drivers/net/ethernet/qlogic/qed/Makefile
index f947b105cf14..8251755ec18c 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -9,6 +9,7 @@ qed-y :=\
qed_dcbx.o  \
qed_debug.o \
qed_dev.o   \
+   qed_devlink.o   \
qed_hw.o\
qed_init_fw_funcs.o \
qed_init_ops.o  \
diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
new file mode 100644
index ..eb693787c99e
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Marvell/Qlogic FastLinQ NIC driver
+ *
+ * Copyright (C) 2020 Marvell International Ltd.
+ */
+
+#include 
+#include "qed.h"
+#include "qed_devlink.h"
+
+enum qed_devlink_param_id {
+   QED_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+   QED_DEVLINK_PARAM_ID_IWARP_CMT,
+};
+
+struct qed_devlink {
+   struct qed_dev *cdev;
+};
+
+static int qed_dl_param_get(struct devlink *dl, u32 id,
+   struct devlink_param_gset_ctx *ctx)
+{
+   struct qed_devlink *qed_dl;
+   struct qed_dev *cdev;
+
+   qed_dl = devlink_priv(dl);
+   cdev = qed_dl->cdev;
+   ctx->val.vbool = cdev->iwarp_cmt;
+
+   return 0;
+}
+
+static int qed_dl_param_set(struct devlink *dl, u32 id,
+   struct devlink_param_gset_ctx *ctx)
+{
+   struct qed_devlink *qed_dl;
+   struct qed_dev *cdev;
+
+   qed_dl = devlink_priv(dl);
+   cdev = qed_dl->cdev;
+   cdev->iwarp_cmt = ctx->val.vbool;
+
+   return 0;
+}
+
+static const struct devlink_param qed_devlink_params[] = {
+   DEVLINK_PARAM_DRIVER(QED_DEVLINK_PARAM_ID_IWARP_CMT,
+"iwarp_cmt", DEVLINK_PARAM_TYPE_BOOL,
+BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+qed_dl_param_get, qed_dl_param_set, NULL),
+};
+
+static const struct devlink_ops qed_dl_ops;
+
+int qed_devlink_register(struct qed_dev *cdev)
+{
+   union devlink_param_value value;
+   struct qed_devlink *qed_dl;
+   struct devlink *dl;
+   int rc;
+
+   dl = devlink_alloc(&qed_dl_ops, sizeof(*qed_dl));
+   if (!dl)
+   return -ENOMEM;
+
+   qed_dl = devlink_priv(dl);
+
+   cdev->dl = dl;
+   qed_dl->cdev = cdev;
+
+   rc = devlink_register(dl, &cdev->pdev->dev);
+   if (rc)
+   goto err_free;
+
+   rc = devlink_params_register(dl, qed_devlink_params,
+ARRAY_SIZE(qed_devlink_params));
+   if (rc)
+   goto err_unregister;
+
+   value.vbool = false;
+   devlink_param_driverinit_value_set(dl,
+  QED_DEVLINK_PARAM_ID_IWARP_CMT,
+  value);
+
+   devlink_params_publish(dl);
+   cdev->iwarp_cmt = false;
+
+   return 0;
+
+err_unregister:
+   devlink_unregister(dl);
+
+err_free:
+   cdev->dl = NULL;
+   devlink_free(dl);
+
+   return rc;
+}
+
+void qed_devlink_unregister(struct qed_dev *cdev)
+{
+   if (!cdev->dl)
+   return;
+
+   devlink_params_unregister(cdev->dl, qed_devlink_params,
+ ARRAY_SIZE(qed_devlink_params));
+
+   devlink_unregister(cdev->dl);
+   devlink_free(cdev->dl);
+}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
new file mode 100644
index ..b94c40e9b7c1
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Marvell/Qlogic FastLinQ NIC driver
+ *
+ * Copyright (C) 2020 Marvell International Ltd.
+ */
+#ifndef _QED_DEVLINK_H
+#define _QED_DEVLINK_H
+
+#include 
+#include 
+
+int qed_devlink_register(struct qed_dev *cdev);
+void qed_devlink_unregister(struct qed_dev *cdev);
+
+#endif
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 2558cb680db3..8751355d9ef7 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_m

[PATCH v3 net-next 04/11] qed: fix kconfig help entries

2020-07-29 Thread Igor Russkikh

This patch replaces stubs in kconfig help entries with an actual description.

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/Kconfig 
b/drivers/net/ethernet/qlogic/Kconfig
index 8f743d80760b..4366c7a8de95 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -80,7 +80,7 @@ config QED
select CRC8
select NET_DEVLINK
help
- This enables the support for ...
+ This enables the support for Marvell FastLinQ adapters family.
 
 config QED_LL2
bool
@@ -100,7 +100,8 @@ config QEDE
depends on QED
imply PTP_1588_CLOCK
help
- This enables the support for ...
+ This enables the support for Marvell FastLinQ adapters family,
+ ethernet driver.
 
 config QED_RDMA
bool
-- 
2.17.1

[PATCH v3 net-next 05/11] qed: implement devlink info request

2020-07-29 Thread Igor Russkikh

Here we return existing fw & mfw versions, we also fetch device's
serial number.

The base device specific structure (qed_dev_info) was not directly
available to the base driver before.
Thus, here we create and store a private copy of this structure
in qed_dev root object.

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/qed/qed.h |  1 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 10 
 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 52 ++-
 drivers/net/ethernet/qlogic/qed/qed_main.c|  1 +
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index b6ce1488abcc..ccd789eeda3e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -807,6 +807,7 @@ struct qed_dev {
struct qed_llh_info *p_llh_info;
 
/* Linux specific here */
+   struct qed_dev_info common_dev_info;
struct  qede_dev*edev;
struct  pci_dev *pdev;
u32 flags;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index b3c9ebaf2280..377950ce8ea2 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -4290,6 +4290,16 @@ static int qed_hw_get_nvm_info(struct qed_hwfn *p_hwfn, 
struct qed_ptt *p_ptt)
__set_bit(QED_DEV_CAP_ROCE,
  &p_hwfn->hw_info.device_capabilities);
 
+   /* Read device serial number information from shmem */
+   addr = MCP_REG_SCRATCH + nvm_cfg1_offset +
+   offsetof(struct nvm_cfg1, glob) +
+   offsetof(struct nvm_cfg1_glob, serial_number);
+
+   p_hwfn->hw_info.part_num[0] = qed_rd(p_hwfn, p_ptt, addr);
+   p_hwfn->hw_info.part_num[1] = qed_rd(p_hwfn, p_ptt, addr + 4);
+   p_hwfn->hw_info.part_num[2] = qed_rd(p_hwfn, p_ptt, addr + 8);
+   p_hwfn->hw_info.part_num[3] = qed_rd(p_hwfn, p_ptt, addr + 12);
+
return qed_mcp_fill_shmem_func_info(p_hwfn, p_ptt);
 }
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
index 4e3316c6beb6..5bd5528dc409 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
@@ -45,7 +45,57 @@ static const struct devlink_param qed_devlink_params[] = {
 qed_dl_param_get, qed_dl_param_set, NULL),
 };
 
-static const struct devlink_ops qed_dl_ops;
+static int qed_devlink_info_get(struct devlink *devlink,
+   struct devlink_info_req *req,
+   struct netlink_ext_ack *extack)
+{
+   struct qed_devlink *qed_dl = devlink_priv(devlink);
+   struct qed_dev *cdev = qed_dl->cdev;
+   struct qed_dev_info *dev_info;
+   char buf[100];
+   int err;
+
+   dev_info = &cdev->common_dev_info;
+
+   err = devlink_info_driver_name_put(req, KBUILD_MODNAME);
+   if (err)
+   return err;
+
+   memcpy(buf, cdev->hwfns[0].hw_info.part_num, 
sizeof(cdev->hwfns[0].hw_info.part_num));
+   buf[sizeof(cdev->hwfns[0].hw_info.part_num)] = 0;
+
+   if (buf[0]) {
+   err = devlink_info_serial_number_put(req, buf);
+   if (err)
+   return err;
+   }
+
+   snprintf(buf, sizeof(buf), "%d.%d.%d.%d",
+GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_3),
+GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_2),
+GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_1),
+GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_0));
+
+   err = devlink_info_version_stored_put(req,
+ 
DEVLINK_INFO_VERSION_GENERIC_FW_MGMT, buf);
+   if (err)
+   return err;
+
+   snprintf(buf, sizeof(buf), "%d.%d.%d.%d",
+dev_info->fw_major,
+dev_info->fw_minor,
+dev_info->fw_rev,
+dev_info->fw_eng);
+
+   err = devlink_info_version_running_put(req,
+  DEVLINK_INFO_VERSION_GENERIC_FW, 
buf);
+
+   return err;
+}
+
+static const struct devlink_ops qed_dl_ops = {
+   .info_get = qed_devlink_info_get,
+};
 
 struct devlink *qed_devlink_register(struct qed_dev *cdev)
 {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index d6f76421379b..d1a559ccf516 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -479,6 +479,7 @@ int qed_fill_dev_info(struct qed_dev *cdev,
}
 
dev_info->mtu = hw_info->mtu;
+   cdev->common_dev_info = *dev_info;
 
return 0;
 }
-- 
2.17.1

[PATCH v3 net-next 07/11] qed: use devlink logic to report errors

2020-07-29 Thread Igor Russkikh

Use devlink_health_report to push error indications.
We implement this in qede via callback function to make it possible
to reuse the same for other drivers sitting on top of qed in future.

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 17 +
 drivers/net/ethernet/qlogic/qed/qed_devlink.h |  2 ++
 drivers/net/ethernet/qlogic/qed/qed_main.c|  1 +
 drivers/net/ethernet/qlogic/qede/qede.h   |  1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  5 -
 include/linux/qed/qed_if.h|  3 +++
 6 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
index 843a35f14cca..ffe776a4f99a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
@@ -14,6 +14,23 @@ enum qed_devlink_param_id {
QED_DEVLINK_PARAM_ID_IWARP_CMT,
 };
 
+struct qed_fw_fatal_ctx {
+   enum qed_hw_err_type err_type;
+};
+
+int qed_report_fatal_error(struct devlink *devlink, enum qed_hw_err_type 
err_type)
+{
+   struct qed_devlink *qdl = devlink_priv(devlink);
+   struct qed_fw_fatal_ctx fw_fatal_ctx = {
+   .err_type = err_type,
+   };
+
+   devlink_health_report(qdl->fw_reporter,
+ "Fatal error reported", &fw_fatal_ctx);
+
+   return 0;
+}
+
 static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = {
.name = "fw_fatal",
 };
diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
index c68ecf778826..ccc7d1d1bfd4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
@@ -15,4 +15,6 @@ void qed_devlink_unregister(struct devlink *devlink);
 void qed_fw_reporters_create(struct devlink *devlink);
 void qed_fw_reporters_destroy(struct devlink *devlink);
 
+int qed_report_fatal_error(struct devlink *dl, enum qed_hw_err_type err_type);
+
 #endif
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index d1a559ccf516..a64d594f9294 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -3007,6 +3007,7 @@ const struct qed_common_ops qed_common_ops_pass = {
.update_msglvl = &qed_init_dp,
.devlink_register = qed_devlink_register,
.devlink_unregister = qed_devlink_unregister,
+   .report_fatal_error = qed_report_fatal_error,
.dbg_all_data = &qed_dbg_all_data,
.dbg_all_data_size = &qed_dbg_all_data_size,
.chain_alloc = &qed_chain_alloc,
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 1f0e7505a973..3efc5899f656 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -264,6 +264,7 @@ struct qede_dev {
 
struct bpf_prog *xdp_prog;
 
+   enum qed_hw_err_typelast_err_type;
unsigned long   err_flags;
 #define QEDE_ERR_IS_HANDLED31
 #define QEDE_ERR_ATTN_CLR_EN   0
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 7c2d948b2035..df437c3f1fc9 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1181,7 +1181,6 @@ static int __qede_probe(struct pci_dev *pdev, u32 
dp_module, u8 dp_level,
}
} else {
struct net_device *ndev = pci_get_drvdata(pdev);
-
edev = netdev_priv(ndev);
 
if (edev && edev->devlink) {
@@ -2603,6 +2602,9 @@ static void qede_generic_hw_err_handler(struct qede_dev 
*edev)
  "Generic sleepable HW error handling started - err_flags 
0x%lx\n",
  edev->err_flags);
 
+   if (edev->devlink)
+   edev->ops->common->report_fatal_error(edev->devlink, 
edev->last_err_type);
+
/* Trigger a recovery process.
 * This is placed in the sleep requiring section just to make
 * sure it is the last one, and that all the other operations
@@ -2663,6 +2665,7 @@ static void qede_schedule_hw_err_handler(void *dev,
return;
}
 
+   edev->last_err_type = err_type;
qede_set_hw_err_flags(edev, err_type);
qede_atomic_hw_err_handler(edev);
set_bit(QEDE_SP_HW_ERR, &edev->sp_flags);
diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index 30fe06fe06a0..1297726f2b25 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -906,6 +906,9 @@ struct qed_common_ops {
 
int (*dbg_all_data_size) (struct qed_dev *cdev);
 
+   int (*report_fatal_error)(struct devlink *devlink,
+

[PATCH v3 net-next 06/11] qed: health reporter init deinit seq

2020-07-29 Thread Igor Russkikh

Here we declare health reporter ops (empty for now)
and register these in qed probe and remove callbacks.

This way we get devlink attached to all kind of qed* PCI
device entities: networking or storage offload entity.

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 30 +++
 drivers/net/ethernet/qlogic/qed/qed_devlink.h |  3 ++
 include/linux/qed/qed_if.h|  1 +
 3 files changed, 34 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
index 5bd5528dc409..843a35f14cca 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
@@ -14,6 +14,34 @@ enum qed_devlink_param_id {
QED_DEVLINK_PARAM_ID_IWARP_CMT,
 };
 
+static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = {
+   .name = "fw_fatal",
+};
+
+#define QED_REPORTER_FW_GRACEFUL_PERIOD 120
+
+void qed_fw_reporters_create(struct devlink *devlink)
+{
+   struct qed_devlink *dl = devlink_priv(devlink);
+
+   dl->fw_reporter = devlink_health_reporter_create(devlink, 
&qed_fw_fatal_reporter_ops,
+
QED_REPORTER_FW_GRACEFUL_PERIOD, dl);
+   if (IS_ERR(dl->fw_reporter))
+   DP_NOTICE(dl->cdev, "Failed to create fw reporter, err = %ld\n",
+ PTR_ERR(dl->fw_reporter));
+}
+
+void qed_fw_reporters_destroy(struct devlink *devlink)
+{
+   struct qed_devlink *dl = devlink_priv(devlink);
+   struct devlink_health_reporter *rep;
+
+   rep = dl->fw_reporter;
+
+   if (!IS_ERR_OR_NULL(rep))
+   devlink_health_reporter_destroy(rep);
+}
+
 static int qed_dl_param_get(struct devlink *dl, u32 id,
struct devlink_param_gset_ctx *ctx)
 {
@@ -144,6 +172,8 @@ void qed_devlink_unregister(struct devlink *devlink)
if (!devlink)
return;
 
+   qed_fw_reporters_destroy(devlink);
+
devlink_params_unregister(devlink, qed_devlink_params,
  ARRAY_SIZE(qed_devlink_params));
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
index c79dc6bfa194..c68ecf778826 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
@@ -12,4 +12,7 @@
 struct devlink *qed_devlink_register(struct qed_dev *cdev);
 void qed_devlink_unregister(struct devlink *devlink);
 
+void qed_fw_reporters_create(struct devlink *devlink);
+void qed_fw_reporters_destroy(struct devlink *devlink);
+
 #endif
diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index d8368e1770df..30fe06fe06a0 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -782,6 +782,7 @@ enum qed_nvm_flash_cmd {
 
 struct qed_devlink {
struct qed_dev *cdev;
+   struct devlink_health_reporter *fw_reporter;
 };
 
 struct qed_common_cb_ops {
-- 
2.17.1

[PATCH v3 net-next 09/11] qed: implement devlink dump

2020-07-29 Thread Igor Russkikh

Gather and push out full device dump to devlink.
Device dump is the same as with `ethtool -d`, but now its generated
exactly at the moment bad thing happens.

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
index b25be68f959c..904babe70489 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 #include "qed.h"
 #include "qed_devlink.h"
 
@@ -31,6 +32,47 @@ int qed_report_fatal_error(struct devlink *devlink, enum 
qed_hw_err_type err_typ
return 0;
 }
 
+static int
+qed_fw_fatal_reporter_dump(struct devlink_health_reporter *reporter,
+  struct devlink_fmsg *fmsg, void *priv_ctx,
+  struct netlink_ext_ack *extack)
+{
+   struct qed_devlink *qdl = devlink_health_reporter_priv(reporter);
+   struct qed_fw_fatal_ctx *fw_fatal_ctx = priv_ctx;
+   struct qed_dev *cdev = qdl->cdev;
+   u32 dbg_data_buf_size;
+   u8 *p_dbg_data_buf;
+   int err;
+
+   /* Having context means that was a dump request after fatal,
+* so we enable extra debugging while gathering the dump,
+* just in case
+*/
+   cdev->print_dbg_data = fw_fatal_ctx ? true : false;
+
+   dbg_data_buf_size = qed_dbg_all_data_size(cdev);
+   p_dbg_data_buf = vzalloc(dbg_data_buf_size);
+   if (!p_dbg_data_buf) {
+   DP_NOTICE(cdev,
+ "Failed to allocate memory for a debug data 
buffer\n");
+   return -ENOMEM;
+   }
+
+   err = qed_dbg_all_data(cdev, p_dbg_data_buf);
+   if (err) {
+   DP_NOTICE(cdev, "Failed to obtain debug data\n");
+   vfree(p_dbg_data_buf);
+   return err;
+   }
+
+   err = devlink_fmsg_binary_pair_put(fmsg, "dump_data",
+  p_dbg_data_buf, dbg_data_buf_size);
+
+   vfree(p_dbg_data_buf);
+
+   return err;
+}
+
 static int
 qed_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
  void *priv_ctx,
@@ -47,6 +89,7 @@ qed_fw_fatal_reporter_recover(struct devlink_health_reporter 
*reporter,
 static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = {
.name = "fw_fatal",
.recover = qed_fw_fatal_reporter_recover,
+   .dump = qed_fw_fatal_reporter_dump,
 };
 
 #define QED_REPORTER_FW_GRACEFUL_PERIOD 120
-- 
2.17.1

[PATCH v3 net-next 11/11] qede: make driver reliable on unload after failures

2020-07-29 Thread Igor Russkikh

In case recovery was not successful, netdev still should be
present. But we should clear cdev if something bad happens
on recovery.

We also check cdev for null on dev close. That could be a case
if recovery was not successful.

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 937d8e69ad39..eeb4a7311633 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1239,7 +1239,10 @@ static int __qede_probe(struct pci_dev *pdev, u32 
dp_module, u8 dp_level,
 err4:
qede_rdma_dev_remove(edev, (mode == QEDE_PROBE_RECOVERY));
 err3:
-   free_netdev(edev->ndev);
+   if (mode != QEDE_PROBE_RECOVERY)
+   free_netdev(edev->ndev);
+   else
+   edev->cdev = NULL;
 err2:
qed_ops->common->slowpath_stop(cdev);
 err1:
@@ -2474,7 +2477,8 @@ static int qede_close(struct net_device *ndev)
 
qede_unload(edev, QEDE_UNLOAD_NORMAL, false);
 
-   edev->ops->common->update_drv_state(edev->cdev, false);
+   if (edev->cdev)
+   edev->ops->common->update_drv_state(edev->cdev, false);
 
return 0;
 }
-- 
2.17.1

[PATCH v3 net-next 10/11] qed: align adjacent indent

2020-07-29 Thread Igor Russkikh

Fix indent on some of adjacent declarations.

Signed-off-by: Igor Russkikh 
Signed-off-by: Alexander Lobakin 
Signed-off-by: Michal Kalderon 
---
 include/linux/qed/qed_if.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index 1297726f2b25..b8fb80c9be80 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -897,14 +897,14 @@ struct qed_common_ops {
 
void(*simd_handler_clean)(struct qed_dev *cdev,
  int index);
-   int (*dbg_grc)(struct qed_dev *cdev,
-  void *buffer, u32 *num_dumped_bytes);
+   int (*dbg_grc)(struct qed_dev *cdev,
+  void *buffer, u32 *num_dumped_bytes);
 
-   int (*dbg_grc_size)(struct qed_dev *cdev);
+   int (*dbg_grc_size)(struct qed_dev *cdev);
 
-   int (*dbg_all_data) (struct qed_dev *cdev, void *buffer);
+   int (*dbg_all_data)(struct qed_dev *cdev, void *buffer);
 
-   int (*dbg_all_data_size) (struct qed_dev *cdev);
+   int (*dbg_all_data_size)(struct qed_dev *cdev);
 
int (*report_fatal_error)(struct devlink *devlink,
  enum qed_hw_err_type err_type);
-- 
2.17.1

Re: Bug: ip utility fails to show routes with large # of multipath next-hops

2020-07-29 Thread Ido Schimmel

On Tue, Jul 28, 2020 at 05:52:44PM -0700, Ashutosh Grewal wrote:
> Hello David and all,
> 
> I hope this is the correct way to report a bug.

Sure

> 
> I observed this problem with 256 v4 next-hops or 128 v6 next-hops (or
> 128 or so # of v4 next-hops with labels).
> 
> Here is an example -
> 
> root@a6be8c892bb7:/# ip route show 2.2.2.2
> Error: Buffer too small for object.
> Dump terminated
> 
> Kernel details (though I recall running into the same problem on 4.4*
> kernel as well) -
> root@ubuntu-vm:/# uname -a
> Linux ch1 5.4.0-33-generic #37-Ubuntu SMP Thu May 21 12:53:59 UTC 2020
> x86_64 x86_64 x86_64 GNU/Linux
> 
> I think the problem may be to do with the size of the skbuf being
> allocated as part of servicing the netlink request.
> 
> static int netlink_dump(struct sock *sk)
> {
>   
> 
> skb = alloc_skb(...)

Yes, I believe you are correct. You will get an skb of size 4K and it
can't fit the entire RTA_MULTIPATH attribute with all the nested
nexthops. Since it's a single attribute it cannot be split across
multiple messages.

Looking at the code, I think a similar problem was already encountered
with IFLA_VFINFO_LIST. See commit c7ac8679bec9 ("rtnetlink: Compute and
store minimum ifinfo dump size").

Maybe we can track the maximum number of IPv4/IPv6 nexthops during
insertion and then consult it to adjust 'min_dump_alloc' for
RTM_GETROUTE.

It's a bit complicated for IPv6 because you can append nexthops, but I
believe anyone using so many nexthops is already using RTA_MULTIPATH to
insert them, so we can simplify.

David, what do you think? You have a better / simpler idea? Maybe one
day everyone will be using the new nexthop API and this won't be needed
:)

> 
> Thanks,
> Ashutosh

Re: [PATCH v8 bpf-next 08/13] bpf: Add BTF_SET_START/END macros

2020-07-29 Thread Jiri Olsa

On Tue, Jul 28, 2020 at 12:39:06PM -0700, Andrii Nakryiko wrote:

SNIP

> 
> [...]
> 
> > +#define BTF_SET_START(name)\
> > +__BTF_ID_LIST(name, local) \
> > +asm(   \
> > +".pushsection " BTF_IDS_SECTION ",\"a\";   \n" \
> > +".local __BTF_ID__set__" #name ";  \n" \
> > +"__BTF_ID__set__" #name ":;\n" \
> > +".zero 4   \n" \
> > +".popsection;  \n");
> > +
> > +#define BTF_SET_END(name)  \
> > +asm(   \
> > +".pushsection " BTF_IDS_SECTION ",\"a\";  \n"  \
> > +".size __BTF_ID__set__" #name ", .-" #name "  \n"  \
> > +".popsection; \n");\
> > +extern struct btf_id_set name;
> > +
> >  #else
> 
> This local symbol assumption will probably at some point bite us.
> Yonghong already did global vs static variants for BTF ID list, we'll
> end up doing something like that for sets of BTF IDs as well. Let's do
> this similarly from the get go.

sure, will add that

> 
> >
> >  #define BTF_ID_LIST(name) static u32 name[5];
> >  #define BTF_ID(prefix, name)
> >  #define BTF_ID_UNUSED
> >  #define BTF_ID_LIST_GLOBAL(name) u32 name[1];
> > +#define BTF_SET_START(name) static struct btf_id_set name = { 0 };
> 
> nit: this zero is unnecessary and misleading (it's initialized for
> only the first member of a struct). Just {} is enough.

ok

> 
> > +#define BTF_SET_END(name)
> >
> >  #endif /* CONFIG_DEBUG_INFO_BTF */
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 562d4453fad3..06714cdda0a9 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -21,6 +21,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  #include 
> >
> >  /* BTF (BPF Type Format) is the meta data format which describes
> > @@ -4740,3 +4742,15 @@ u32 btf_id(const struct btf *btf)
> >  {
> > return btf->id;
> >  }
> > +
> > +static int btf_id_cmp_func(const void *a, const void *b)
> > +{
> > +   const int *pa = a, *pb = b;
> > +
> > +   return *pa - *pb;
> > +}
> > +
> > +bool btf_id_set_contains(struct btf_id_set *set, u32 id)
> > +{
> > +   return bsearch(&id, set->ids, set->cnt, sizeof(int), 
> > btf_id_cmp_func) != NULL;
> 
> very nit ;) sizeof(__u32)

sure ;-)

thanks,
jirka

Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function

2020-07-29 Thread Grygorii Strashko





On 29/07/2020 00:06, Petr Machata wrote:


Kurt Kanzenbach  writes:


On Mon Jul 27 2020, Petr Machata wrote:

So this looks good, and works, but I'm wondering about one thing.


Thanks for testing.



Your code (and evidently most drivers as well) use a different check
than mlxsw, namely skb->len + ETH_HLEN < X. When I print_hex_dump()
skb_mac_header(skb), skb->len in mlxsw with some test packet, I get e.g.
this:

 259a4db7: 01 00 5e 00 01 81 00 02 c9 a4 e4 e1 08 00 45 00  
..^...E.
 5f29f0eb: 00 48 0d c9 40 00 01 11 c8 59 c0 00 02 01 e0 00  
.H..@Y..
 f3663e9e: 01 81 01 3f 01 3f 00 34 9f d3 00 02 00 2c 00 00  
...?.?.4.,..
 ^sp^^ ^dp^^ ^len^ ^cks^   ^len^
 b3914606: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02  

 2e7828ea: c9 ff fe a4 e4 e1 00 01 09 fa 00 00 00 00 00 00  

 0b98156e: 00 00 00 00 00 00..

Both UDP and PTP length fields indicate that the payload ends exactly at
the end of the dump. So apparently skb->len contains all the payload
bytes, including the Ethernet header.

Is that the case for other drivers as well? Maybe mlxsw is just missing
some SKB magic in the driver.


So I run some tests (on other hardware/drivers) and it seems like that
the skb->len usually doesn't include the ETH_HLEN. Therefore, it is
added to the check.

Looking at the driver code:

|static void mlxsw_sp_rx_sample_listener(struct sk_buff *skb, u8 local_port,
|   void *trap_ctx)
|{
|   [...]
|   /* The sample handler expects skb->data to point to the start of the
|* Ethernet header.
|*/
|   skb_push(skb, ETH_HLEN);
|   mlxsw_sp_sample_receive(mlxsw_sp, skb, local_port);
|}

Maybe that's the issue here?


Correct, mlxsw pushes the header very soon. Given that both
ptp_classify_raw() and eth_type_trans() that are invoked later assume
the header, it is reasonable. I have shuffled the pushes around and have
a patch that both works and I think is correct.

However, I find it odd that ptp_classify_raw() assumes that ->data
points at Ethernet, while ptp_parse_header() makes the contrary
assumption that ->len does not cover Ethernet. Those functions are
likely to be used just a couple calls away from each other, if not
outright next to each other.

I suspect that ti/cpts.c and ti/am65-cpts.c (patches 5 and 6) actually
hit an issue in this. ptp_classify_raw() is called without a surrounding
_push / _pull (unlike DSA), which would imply skb->data points at
Ethernet header, and indeed, the way the "data" variable is used
confirms it.


Both drivers, in all cases, will get
 ->data points at Ethernet header and
 ->len covers ETH_HLEN

So, yes below check is incorrect, in general, and will be false always if other
calculation are correct
(only IPV4_HLEN(data + offset) can cause issues).

if (skb->len + ETH_HLEN < offset + OFF_PTP_SEQUENCE_ID + sizeof(*seqid))
return 0;

it might be good to update ptp_parse_header() documentation with expected state 
of skb.

 (At the same time the code adds ETH_HLEN to skb->len, but

maybe it is just a cut'n'paste.) But then ptp_parse_header() is called,
and that makes the assumption that skb->len does not cover the Ethernet
header.


I was also wondering about something else in that driver driver: The
parsing code allows for ptp v1, but the message type was always fetched
from offset 0 in the header. Is that indented?


Yeah, I noticed that as well. That was a bug in the mlxsw code. Good
riddance :)



--
Best regards,
grygorii

Re: [PATCH v1] farsync: use generic power management

2020-07-29 Thread Bjorn Helgaas

On Wed, Jul 29, 2020 at 03:47:30PM +0530, Vaibhav Gupta wrote:
> On Tue, Jul 28, 2020 at 03:04:13PM -0500, Bjorn Helgaas wrote:
> > On Tue, Jul 28, 2020 at 09:58:10AM +0530, Vaibhav Gupta wrote:
> > > The .suspend() and .resume() callbacks are not defined for this driver.
> > > Still, their power management structure follows the legacy framework. To
> > > bring it under the generic framework, simply remove the binding of
> > > callbacks from "struct pci_driver".
> > 
> > FWIW, this commit log is slightly misleading because .suspend and
> > .resume are NULL by default, so this patch actually is a complete
> > no-op as far as code generation is concerned.
> > 
> > This change is worthwhile because it simplifies the code a little, but
> > it doesn't convert the driver from legacy to generic power management.
> > This driver doesn't supply a .pm structure, so it doesn't seem to do
> > *any* power management.
>
> Agreed. Actually, as their presence only causes PCI core to call
> pci_legacy_suspend/resume() for them, I thought that after removing
> the binding from "struct pci_driver", this driver qualifies to be
> grouped under genric framework, so used "use generic power
> management" for the heading.
> 
> I should have written "remove legacy bindning".

This removed the *mention* of fst_driver.suspend and fst_driver.resume,
which is important because we want to eventually remove those members
completely from struct pci_driver.

But fst_driver.suspend and fst_driver.resume *exist* before and after
this patch, and they're initialized to zero before and after this
patch.

Since they were zero before, and they're still zero after this patch,
the PCI core doesn't call pci_legacy_suspend/resume().  This patch
doesn't change that at all.

> But David has applied the patch, should I send a v2 or fix to update
> message?

No, I don't think David updates patches after he's applied them.  But
if the situation comes up again, you'll know how to describe it :)

Bjorn

[PATCH] net/mlx5e: fix bpf_prog refcnt leaks in mlx5e_alloc_rq

2020-07-29 Thread Xin Xiong

The function invokes bpf_prog_inc(), which increases the refcount of a
bpf_prog object "rq->xdp_prog" if the object isn't NULL.

The refcount leak issues take place in two error handling paths. When
mlx5_wq_ll_create() or mlx5_wq_cyc_create() fails, the function simply
returns the error code and forgets to drop the refcount increased
earlier, causing a refcount leak of "rq->xdp_prog".

Fix this issue by jumping to the error handling path err_rq_wq_destroy
when either function fails.

Signed-off-by: Xin Xiong 
Signed-off-by: Xiyu Yang 
Signed-off-by: Xin Tan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a836a02a2116..8e1b1ab416d8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -419,7 +419,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
err = mlx5_wq_ll_create(mdev, &rqp->wq, rqc_wq, &rq->mpwqe.wq,
&rq->wq_ctrl);
if (err)
-   return err;
+   goto err_rq_wq_destroy;
 
rq->mpwqe.wq.db = &rq->mpwqe.wq.db[MLX5_RCV_DBR];
 
@@ -470,7 +470,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
err = mlx5_wq_cyc_create(mdev, &rqp->wq, rqc_wq, &rq->wqe.wq,
 &rq->wq_ctrl);
if (err)
-   return err;
+   goto err_rq_wq_destroy;
 
rq->wqe.wq.db = &rq->wqe.wq.db[MLX5_RCV_DBR];
 
-- 
2.25.1

Re: [PATCH v3 net-next 02/11] qed/qede: make devlink survive recovery

2020-07-29 Thread Jiri Pirko

Wed, Jul 29, 2020 at 01:38:37PM CEST, irussk...@marvell.com wrote:
>Before that, devlink instance lifecycle was linked to qed_dev object,

Before what?


>that causes devlink to be recreated on each recovery.
>
>Changing it by making higher level driver (qede) responsible for its
>life. This way devlink will survive recoveries.
>
>qede will store devlink structure pointer as a part of its device
>object, devlink private data contains a linkage structure, it'll
>contain extra devlink related content in following patches.
>
>The same lifecycle should be applied to storage drivers (qedf/qedi) later.

>From this, one can't really tell if you are talking about existing state
or about the matter of this patch. In the patch description, you should
be imperative to the codebase telling it what to do. Could you please do
that?


>
>Signed-off-by: Igor Russkikh 
>Signed-off-by: Alexander Lobakin 
>Signed-off-by: Michal Kalderon 
>---
> drivers/net/ethernet/qlogic/qed/qed.h |  1 -
> drivers/net/ethernet/qlogic/qed/qed_devlink.c | 40 ---
> drivers/net/ethernet/qlogic/qed/qed_devlink.h |  4 +-
> drivers/net/ethernet/qlogic/qed/qed_main.c| 10 +
> drivers/net/ethernet/qlogic/qede/qede.h   |  1 +
> drivers/net/ethernet/qlogic/qede/qede_main.c  | 18 +
> include/linux/qed/qed_if.h|  9 +
> 7 files changed, 48 insertions(+), 35 deletions(-)
>
>diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
>b/drivers/net/ethernet/qlogic/qed/qed.h
>index b2a7b53ee760..b6ce1488abcc 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed.h
>+++ b/drivers/net/ethernet/qlogic/qed/qed.h
>@@ -849,7 +849,6 @@ struct qed_dev {
>   u32 rdma_max_srq_sge;
>   u16 tunn_feature_mask;
> 
>-  struct devlink  *dl;
>   booliwarp_cmt;
> };
> 
>diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
>b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>index eb693787c99e..a62c47c61edf 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>@@ -5,6 +5,7 @@
>  */
> 
> #include 
>+#include 
> #include "qed.h"
> #include "qed_devlink.h"
> 
>@@ -13,17 +14,12 @@ enum qed_devlink_param_id {
>   QED_DEVLINK_PARAM_ID_IWARP_CMT,
> };
> 
>-struct qed_devlink {
>-  struct qed_dev *cdev;
>-};
>-
> static int qed_dl_param_get(struct devlink *dl, u32 id,
>   struct devlink_param_gset_ctx *ctx)
> {
>-  struct qed_devlink *qed_dl;
>+  struct qed_devlink *qed_dl = devlink_priv(dl);
>   struct qed_dev *cdev;
> 
>-  qed_dl = devlink_priv(dl);
>   cdev = qed_dl->cdev;
>   ctx->val.vbool = cdev->iwarp_cmt;
> 
>@@ -33,10 +29,9 @@ static int qed_dl_param_get(struct devlink *dl, u32 id,
> static int qed_dl_param_set(struct devlink *dl, u32 id,
>   struct devlink_param_gset_ctx *ctx)
> {
>-  struct qed_devlink *qed_dl;
>+  struct qed_devlink *qed_dl = devlink_priv(dl);
>   struct qed_dev *cdev;
> 
>-  qed_dl = devlink_priv(dl);
>   cdev = qed_dl->cdev;
>   cdev->iwarp_cmt = ctx->val.vbool;
> 
>@@ -52,21 +47,19 @@ static const struct devlink_param qed_devlink_params[] = {
> 
> static const struct devlink_ops qed_dl_ops;
> 
>-int qed_devlink_register(struct qed_dev *cdev)
>+struct devlink *qed_devlink_register(struct qed_dev *cdev)
> {
>   union devlink_param_value value;
>-  struct qed_devlink *qed_dl;
>+  struct qed_devlink *qdevlink;
>   struct devlink *dl;
>   int rc;
> 
>-  dl = devlink_alloc(&qed_dl_ops, sizeof(*qed_dl));
>+  dl = devlink_alloc(&qed_dl_ops, sizeof(struct qed_devlink));

Do "sizeof(*qdevlink)" instead.


>   if (!dl)
>-  return -ENOMEM;
>+  return ERR_PTR(-ENOMEM);
> 
>-  qed_dl = devlink_priv(dl);
>-
>-  cdev->dl = dl;
>-  qed_dl->cdev = cdev;
>+  qdevlink = devlink_priv(dl);
>+  qdevlink->cdev = cdev;
> 
>   rc = devlink_register(dl, &cdev->pdev->dev);
>   if (rc)
>@@ -85,26 +78,25 @@ int qed_devlink_register(struct qed_dev *cdev)
>   devlink_params_publish(dl);
>   cdev->iwarp_cmt = false;
> 
>-  return 0;
>+  return dl;
> 
> err_unregister:
>   devlink_unregister(dl);
> 
> err_free:
>-  cdev->dl = NULL;
>   devlink_free(dl);
> 
>-  return rc;
>+  return ERR_PTR(rc);
> }
> 
>-void qed_devlink_unregister(struct qed_dev *cdev)
>+void qed_devlink_unregister(struct devlink *devlink)
> {
>-  if (!cdev->dl)
>+  if (!devlink)
>   return;
> 
>-  devlink_params_unregister(cdev->dl, qed_devlink_params,
>+  devlink_params_unregister(devlink, qed_devlink_params,
> ARRAY_SIZE(qed_devlink_params));
> 
>-  devlink_unregister(cdev->dl);
>-  devlink_free(cdev->dl);
>+  devlink_unregister(devlink);
>+  devlink_free(devlink);
> }
>diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h 
>b/drivers/net/eth

Re: [PATCH v3 net-next 03/11] qed: swap param init and publish

2020-07-29 Thread Jiri Pirko

Wed, Jul 29, 2020 at 01:38:38PM CEST, irussk...@marvell.com wrote:
>In theory that could lead to race condition

Describe the problem, tell the codebase what to do.

Plus, this looks like a -net material. Please consider pushing this
separatelly with proper "fixes" tag.


>
>Signed-off-by: Igor Russkikh 
>Signed-off-by: Alexander Lobakin 
>Signed-off-by: Michal Kalderon 
>---
> drivers/net/ethernet/qlogic/qed/qed_devlink.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
>b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>index a62c47c61edf..4e3316c6beb6 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>@@ -75,8 +75,8 @@ struct devlink *qed_devlink_register(struct qed_dev *cdev)
>  QED_DEVLINK_PARAM_ID_IWARP_CMT,
>  value);
> 
>-  devlink_params_publish(dl);
>   cdev->iwarp_cmt = false;
>+  devlink_params_publish(dl);
> 
>   return dl;
> 
>-- 
>2.17.1
>

Re: [PATCH v3 net-next 05/11] qed: implement devlink info request

2020-07-29 Thread Jiri Pirko

Wed, Jul 29, 2020 at 01:38:40PM CEST, irussk...@marvell.com wrote:
>Here we return existing fw & mfw versions, we also fetch device's
>serial number.
>
>The base device specific structure (qed_dev_info) was not directly
>available to the base driver before.
>Thus, here we create and store a private copy of this structure
>in qed_dev root object.
>
>Signed-off-by: Igor Russkikh 
>Signed-off-by: Alexander Lobakin 
>Signed-off-by: Michal Kalderon 
>---
> drivers/net/ethernet/qlogic/qed/qed.h |  1 +
> drivers/net/ethernet/qlogic/qed/qed_dev.c | 10 
> drivers/net/ethernet/qlogic/qed/qed_devlink.c | 52 ++-
> drivers/net/ethernet/qlogic/qed/qed_main.c|  1 +
> 4 files changed, 63 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
>b/drivers/net/ethernet/qlogic/qed/qed.h
>index b6ce1488abcc..ccd789eeda3e 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed.h
>+++ b/drivers/net/ethernet/qlogic/qed/qed.h
>@@ -807,6 +807,7 @@ struct qed_dev {
>   struct qed_llh_info *p_llh_info;
> 
>   /* Linux specific here */
>+  struct qed_dev_info common_dev_info;
>   struct  qede_dev*edev;
>   struct  pci_dev *pdev;
>   u32 flags;
>diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
>b/drivers/net/ethernet/qlogic/qed/qed_dev.c
>index b3c9ebaf2280..377950ce8ea2 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
>+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
>@@ -4290,6 +4290,16 @@ static int qed_hw_get_nvm_info(struct qed_hwfn *p_hwfn, 
>struct qed_ptt *p_ptt)
>   __set_bit(QED_DEV_CAP_ROCE,
> &p_hwfn->hw_info.device_capabilities);
> 
>+  /* Read device serial number information from shmem */
>+  addr = MCP_REG_SCRATCH + nvm_cfg1_offset +
>+  offsetof(struct nvm_cfg1, glob) +
>+  offsetof(struct nvm_cfg1_glob, serial_number);
>+
>+  p_hwfn->hw_info.part_num[0] = qed_rd(p_hwfn, p_ptt, addr);
>+  p_hwfn->hw_info.part_num[1] = qed_rd(p_hwfn, p_ptt, addr + 4);
>+  p_hwfn->hw_info.part_num[2] = qed_rd(p_hwfn, p_ptt, addr + 8);
>+  p_hwfn->hw_info.part_num[3] = qed_rd(p_hwfn, p_ptt, addr + 12);

for() ?


>+
>   return qed_mcp_fill_shmem_func_info(p_hwfn, p_ptt);
> }
> 
>diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
>b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>index 4e3316c6beb6..5bd5528dc409 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>@@ -45,7 +45,57 @@ static const struct devlink_param qed_devlink_params[] = {
>qed_dl_param_get, qed_dl_param_set, NULL),
> };
> 
>-static const struct devlink_ops qed_dl_ops;
>+static int qed_devlink_info_get(struct devlink *devlink,
>+  struct devlink_info_req *req,
>+  struct netlink_ext_ack *extack)
>+{
>+  struct qed_devlink *qed_dl = devlink_priv(devlink);
>+  struct qed_dev *cdev = qed_dl->cdev;
>+  struct qed_dev_info *dev_info;
>+  char buf[100];
>+  int err;
>+
>+  dev_info = &cdev->common_dev_info;
>+
>+  err = devlink_info_driver_name_put(req, KBUILD_MODNAME);
>+  if (err)
>+  return err;
>+
>+  memcpy(buf, cdev->hwfns[0].hw_info.part_num, 
>sizeof(cdev->hwfns[0].hw_info.part_num));
>+  buf[sizeof(cdev->hwfns[0].hw_info.part_num)] = 0;
>+
>+  if (buf[0]) {
>+  err = devlink_info_serial_number_put(req, buf);
>+  if (err)
>+  return err;
>+  }
>+
>+  snprintf(buf, sizeof(buf), "%d.%d.%d.%d",
>+   GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_3),
>+   GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_2),
>+   GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_1),
>+   GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_0));
>+
>+  err = devlink_info_version_stored_put(req,
>+
>DEVLINK_INFO_VERSION_GENERIC_FW_MGMT, buf);
>+  if (err)
>+  return err;
>+
>+  snprintf(buf, sizeof(buf), "%d.%d.%d.%d",
>+   dev_info->fw_major,
>+   dev_info->fw_minor,
>+   dev_info->fw_rev,
>+   dev_info->fw_eng);
>+
>+  err = devlink_info_version_running_put(req,
>+ DEVLINK_INFO_VERSION_GENERIC_FW, 
>buf);

return = devlink...


>+
>+  return err;
>+}
>+
>+static const struct devlink_ops qed_dl_ops = {
>+  .info_get = qed_devlink_info_get,
>+};
> 
> struct devlink *qed_devlink_register(struct qed_dev *cdev)
> {
>diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
>b/drivers/net/ethernet/qlogic/qed/qed_main.c
>index d6f76421379b..d1a559ccf516 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
>+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
>@@ -479,6 +479,7 @@ int qed_fill_dev_info(struct

Re: [PATCH net 2/2] net/sched: act_ct: Set offload timeout when setting the offload bit

2020-07-29 Thread Roi Dayan




On 2020-07-28 5:42 PM, Marcelo Ricardo Leitner wrote:
> On Tue, Jul 28, 2020 at 02:57:59PM +0300, Roi Dayan wrote:
>> On heavily loaded systems the GC can take time to go over all existing
>> conns and reset their timeout. At that time other calls like from
>> nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
>> expired. To fix this when we set the offload bit we should also reset
>> the timeout instead of counting on GC to finish first iteration over
>> all conns before the initial timeout.
>>
>> Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to 
>> flow table")
>> Signed-off-by: Roi Dayan 
>> Reviewed-by: Oz Shlomo 
>> ---
>>  net/sched/act_ct.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> index e9f3576cbf71..650c2d78a346 100644
>> --- a/net/sched/act_ct.c
>> +++ b/net/sched/act_ct.c
>> @@ -366,6 +366,8 @@ static void tcf_ct_flow_table_add(struct 
>> tcf_ct_flow_table *ct_ft,
> 
> Extra context line:
>   err = flow_offload_add(&ct_ft->nf_ft, entry);
>>  if (err)
>>  goto err_add;
>>  
>> +nf_ct_offload_timeout(ct);
>> +
> 
> What about adding this to flow_offload_add() instead?
> It is already adjusting the flow_offload timeout there and then it
> also effective for nft.
> 

As you said, in flow_offload_add() we adjust the flow timeout.
Here we adjust the conn timeout.
So it's outside flow_offload_add() which only touch the flow struct.
I guess it's like conn offload bit is set outside here and for nft.
What do you think?

>>  return;
>>  
>>  err_add:
>> -- 
>> 2.8.4
>>

[PATCH iproute2 master v3] bridge: fdb show: fix fdb entry state output for json context

2020-07-29 Thread Julien Fortin

From: Julien Fortin 

bridge json fdb show is printing an incorrect / non-machine readable
value, when using -j (json output) we are expecting machine readable
data that shouldn't require special handling/parsing.

$ bridge -j fdb show | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
{
"master": "br0",
"mac": "56:23:28:4f:4f:e5",
"flags": [],
"ifname": "vx0",
"state": "state=0x80"  < with the patch: "state": "0x80"
}
]

Fixes: c7c1a1ef51aea7c ("bridge: colorize output and use JSON print library")

Signed-off-by: Julien Fortin 
---
 bridge/fdb.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index 710dfc99..d59bfb34 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -62,7 +62,10 @@ static const char *state_n2a(unsigned int s)
if (s & NUD_REACHABLE)
return "";
 
-   sprintf(buf, "state=%#x", s);
+   if (is_json_context())
+   sprintf(buf, "%#x", s);
+   else
+   sprintf(buf, "state=%#x", s);
return buf;
 }
 
-- 
2.27.0

[PATCH] atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent

2020-07-29 Thread Xin Xiong

atmtcp_remove_persistent() invokes atm_dev_lookup(), which returns a
reference of atm_dev with increased refcount or NULL if fails.

The refcount leaks issues occur in two error handling paths. If
dev_data->persist is zero or PRIV(dev)->vcc isn't NULL, the function
returns 0 without decreasing the refcount kept by a local variable,
resulting in refcount leaks.

Fix the issue by adding atm_dev_put() before returning 0 both when
dev_data->persist is zero or PRIV(dev)->vcc isn't NULL.

Signed-off-by: Xin Xiong 
Signed-off-by: Xiyu Yang 
Signed-off-by: Xin Tan 
---
 drivers/atm/atmtcp.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/atm/atmtcp.c b/drivers/atm/atmtcp.c
index d9fd70280482..7f814da3c2d0 100644
--- a/drivers/atm/atmtcp.c
+++ b/drivers/atm/atmtcp.c
@@ -433,9 +433,15 @@ static int atmtcp_remove_persistent(int itf)
return -EMEDIUMTYPE;
}
dev_data = PRIV(dev);
-   if (!dev_data->persist) return 0;
+   if (!dev_data->persist) {
+   atm_dev_put(dev);
+   return 0;
+   }
dev_data->persist = 0;
-   if (PRIV(dev)->vcc) return 0;
+   if (PRIV(dev)->vcc) {
+   atm_dev_put(dev);
+   return 0;
+   }
kfree(dev_data);
atm_dev_put(dev);
atm_dev_deregister(dev);
-- 
2.25.1

Re: [PATCH v3 net-next 07/11] qed: use devlink logic to report errors

2020-07-29 Thread Jiri Pirko

Wed, Jul 29, 2020 at 01:38:42PM CEST, irussk...@marvell.com wrote:
>Use devlink_health_report to push error indications.
>We implement this in qede via callback function to make it possible
>to reuse the same for other drivers sitting on top of qed in future.
>
>Signed-off-by: Igor Russkikh 
>Signed-off-by: Alexander Lobakin 
>Signed-off-by: Michal Kalderon 
>---
> drivers/net/ethernet/qlogic/qed/qed_devlink.c | 17 +
> drivers/net/ethernet/qlogic/qed/qed_devlink.h |  2 ++
> drivers/net/ethernet/qlogic/qed/qed_main.c|  1 +
> drivers/net/ethernet/qlogic/qede/qede.h   |  1 +
> drivers/net/ethernet/qlogic/qede/qede_main.c  |  5 -
> include/linux/qed/qed_if.h|  3 +++
> 6 files changed, 28 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c 
>b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>index 843a35f14cca..ffe776a4f99a 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
>@@ -14,6 +14,23 @@ enum qed_devlink_param_id {
>   QED_DEVLINK_PARAM_ID_IWARP_CMT,
> };
> 
>+struct qed_fw_fatal_ctx {
>+  enum qed_hw_err_type err_type;
>+};
>+
>+int qed_report_fatal_error(struct devlink *devlink, enum qed_hw_err_type 
>err_type)
>+{
>+  struct qed_devlink *qdl = devlink_priv(devlink);
>+  struct qed_fw_fatal_ctx fw_fatal_ctx = {
>+  .err_type = err_type,
>+  };
>+
>+  devlink_health_report(qdl->fw_reporter,
>+"Fatal error reported", &fw_fatal_ctx);

saying "reported" to the reporter sounds odd. Maybe occurred?


>+
>+  return 0;
>+}
>+
> static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = {
>   .name = "fw_fatal",
> };
>diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h 
>b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
>index c68ecf778826..ccc7d1d1bfd4 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.h
>+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h
>@@ -15,4 +15,6 @@ void qed_devlink_unregister(struct devlink *devlink);
> void qed_fw_reporters_create(struct devlink *devlink);
> void qed_fw_reporters_destroy(struct devlink *devlink);
> 
>+int qed_report_fatal_error(struct devlink *dl, enum qed_hw_err_type err_type);
>+
> #endif
>diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
>b/drivers/net/ethernet/qlogic/qed/qed_main.c
>index d1a559ccf516..a64d594f9294 100644
>--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
>+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
>@@ -3007,6 +3007,7 @@ const struct qed_common_ops qed_common_ops_pass = {
>   .update_msglvl = &qed_init_dp,
>   .devlink_register = qed_devlink_register,
>   .devlink_unregister = qed_devlink_unregister,
>+  .report_fatal_error = qed_report_fatal_error,
>   .dbg_all_data = &qed_dbg_all_data,
>   .dbg_all_data_size = &qed_dbg_all_data_size,
>   .chain_alloc = &qed_chain_alloc,
>diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
>b/drivers/net/ethernet/qlogic/qede/qede.h
>index 1f0e7505a973..3efc5899f656 100644
>--- a/drivers/net/ethernet/qlogic/qede/qede.h
>+++ b/drivers/net/ethernet/qlogic/qede/qede.h
>@@ -264,6 +264,7 @@ struct qede_dev {
> 
>   struct bpf_prog *xdp_prog;
> 
>+  enum qed_hw_err_typelast_err_type;
>   unsigned long   err_flags;
> #define QEDE_ERR_IS_HANDLED   31
> #define QEDE_ERR_ATTN_CLR_EN  0
>diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
>b/drivers/net/ethernet/qlogic/qede/qede_main.c
>index 7c2d948b2035..df437c3f1fc9 100644
>--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
>+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
>@@ -1181,7 +1181,6 @@ static int __qede_probe(struct pci_dev *pdev, u32 
>dp_module, u8 dp_level,
>   }
>   } else {
>   struct net_device *ndev = pci_get_drvdata(pdev);
>-

Leftover. Please remove.


>   edev = netdev_priv(ndev);
> 
>   if (edev && edev->devlink) {
>@@ -2603,6 +2602,9 @@ static void qede_generic_hw_err_handler(struct qede_dev 
>*edev)
> "Generic sleepable HW error handling started - err_flags 
> 0x%lx\n",
> edev->err_flags);
> 
>+  if (edev->devlink)
>+  edev->ops->common->report_fatal_error(edev->devlink, 
>edev->last_err_type);
>+
>   /* Trigger a recovery process.
>* This is placed in the sleep requiring section just to make
>* sure it is the last one, and that all the other operations
>@@ -2663,6 +2665,7 @@ static void qede_schedule_hw_err_handler(void *dev,
>   return;
>   }
> 
>+  edev->last_err_type = err_type;
>   qede_set_hw_err_flags(edev, err_type);
>   qede_atomic_hw_err_handler(edev);
>   set_bit(QEDE_SP_HW_ERR, &edev->sp_flags);
>diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
>index 30fe06fe06a0..1297726f2b25 100644
>--- a/incl

[PATCH bpf-next] Documentation/bpf: Use valid and new links in index.rst

2020-07-29 Thread Tiezhu Yang

There exists an error "404 Not Found" when I clik the html link of
"Documentation/networking/filter.rst" in the BPF documentation [1],
fix it.

Additionally, use the new links about "BPF and XDP Reference Guide"
and "bpf(2)" to avoid redirects.

[1] https://www.kernel.org/doc/html/latest/bpf/

Fixes: d9b9170a2653 ("docs: bpf: Rename README.rst to index.rst")
Fixes: cb3f0d56e153 ("docs: networking: convert filter.txt to ReST")
Signed-off-by: Tiezhu Yang 
---
 Documentation/bpf/index.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index 26f4bb3..1b901b4 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -68,7 +68,7 @@ Testing and debugging BPF
 
 
 .. Links:
-.. _Documentation/networking/filter.rst: ../networking/filter.txt
+.. _Documentation/networking/filter.rst: ../networking/filter.html
 .. _man-pages: https://www.kernel.org/doc/man-pages/
-.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html
-.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/
+.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
+.. _BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/
-- 
2.1.0

[PATCH bpf-next v2] Documentation/bpf: Use valid and new links in index.rst

2020-07-29 Thread Tiezhu Yang

There exists an error "404 Not Found" when I click the html link of
"Documentation/networking/filter.rst" in the BPF documentation [1],
fix it.

Additionally, use the new links about "BPF and XDP Reference Guide"
and "bpf(2)" to avoid redirects.

[1] https://www.kernel.org/doc/html/latest/bpf/

Fixes: d9b9170a2653 ("docs: bpf: Rename README.rst to index.rst")
Fixes: cb3f0d56e153 ("docs: networking: convert filter.txt to ReST")
Signed-off-by: Tiezhu Yang 
---

v2:
  - Fix a typo "clik" to "click" in the commit message, sorry for that

 Documentation/bpf/index.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index 26f4bb3..1b901b4 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -68,7 +68,7 @@ Testing and debugging BPF
 
 
 .. Links:
-.. _Documentation/networking/filter.rst: ../networking/filter.txt
+.. _Documentation/networking/filter.rst: ../networking/filter.html
 .. _man-pages: https://www.kernel.org/doc/man-pages/
-.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html
-.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/
+.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
+.. _BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/
-- 
2.1.0

Re: [PATCH bpf-next v4 05/14] xsk: move queue_id, dev and need_wakeup to buffer pool

2020-07-29 Thread Magnus Karlsson

On Tue, Jul 28, 2020 at 9:10 AM Björn Töpel  wrote:
>
>
>
> On 2020-07-21 07:03, Magnus Karlsson wrote:
> > Move queue_id, dev, and need_wakeup from the umem to the
> > buffer pool. This so that we in a later commit can share the umem
> > between multiple HW queues. There is one buffer pool per dev and
> > queue id, so these variables should belong to the buffer pool, not
> > the umem. Need_wakeup is also something that is set on a per napi
> > level, so there is usually one per device and queue id. So move
> > this to the buffer pool too.
> >
> > Signed-off-by: Magnus Karlsson 
> > ---
> >   include/net/xdp_sock.h  |  3 ---
> >   include/net/xsk_buff_pool.h |  4 
> >   net/xdp/xdp_umem.c  | 19 +--
> >   net/xdp/xdp_umem.h  |  4 
> >   net/xdp/xsk.c   | 40 +++-
> >   net/xdp/xsk_buff_pool.c | 39 ++-
> >   net/xdp/xsk_diag.c  |  4 ++--
> >   7 files changed, 44 insertions(+), 69 deletions(-)
> >
> [...]
> >   }
> > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> > index 36287d2..436648a 100644
> > --- a/net/xdp/xsk_buff_pool.c
> > +++ b/net/xdp/xsk_buff_pool.c
> > @@ -95,10 +95,9 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct 
> > xdp_rxq_info *rxq)
> >   }
> >   EXPORT_SYMBOL(xp_set_rxq_info);
> >
> > -int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev,
> > +int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *netdev,
> > u16 queue_id, u16 flags)
> >   {
> > - struct xdp_umem *umem = pool->umem;
> >   bool force_zc, force_copy;
> >   struct netdev_bpf bpf;
> >   int err = 0;
> > @@ -111,27 +110,30 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct 
> > net_device *dev,
> >   if (force_zc && force_copy)
> >   return -EINVAL;
> >
> > - if (xsk_get_pool_from_qid(dev, queue_id))
> > + if (xsk_get_pool_from_qid(netdev, queue_id))
> >   return -EBUSY;
> >
> > - err = xsk_reg_pool_at_qid(dev, pool, queue_id);
> > + err = xsk_reg_pool_at_qid(netdev, pool, queue_id);
> >   if (err)
> >   return err;
> >
> >   if (flags & XDP_USE_NEED_WAKEUP) {
> > - umem->flags |= XDP_UMEM_USES_NEED_WAKEUP;
> > + pool->uses_need_wakeup = true;
> >   /* Tx needs to be explicitly woken up the first time.
> >* Also for supporting drivers that do not implement this
> >* feature. They will always have to call sendto().
> >*/
> > - umem->need_wakeup = XDP_WAKEUP_TX;
> > + pool->cached_need_wakeup = XDP_WAKEUP_TX;
> >   }
> >
> > + dev_hold(netdev);
> > +
>
> You have a reference leak here for the error case.

Thanks. Will fix.

/Magnus

>
> Björn

Re: [PATCH RFC net-next] net: phy: add Marvell PHY PTP support

2020-07-29 Thread Richard Cochran

On Wed, Jul 29, 2020 at 11:58:07AM +0100, Russell King - ARM Linux admin wrote:
> How do we deal with this situation - from what I can see from the
> ethtool API, we have to make a choice about which to use.  How do we
> make that choice?

Unfortunately the stack does not implement simultaneous MAC + PHY time
stamping.  If your board has both, then you make the choice to use the
PHY by selecting NETWORK_PHY_TIMESTAMPING at kernel compile time.

(Also some MAC drivers do not defer to the PHY properly.  Sometimes
you can work around that by de-selecting the MAC's PTP function in the
Kconfig if possible, but otherwise you need to patch the MAC driver.)

> Do we need a property to indicate whether we wish to use the PHY
> or MAC PTP stamping, or something more elaborate?

To do this at run time would require quite some work, I expect.

Thanks,
Richard

Re: [PATCH bpf-next v4 05/14] xsk: move queue_id, dev and need_wakeup to buffer pool

2020-07-29 Thread Magnus Karlsson

On Tue, Jul 28, 2020 at 11:22 AM Maxim Mikityanskiy
 wrote:
>
> On 2020-07-21 08:03, Magnus Karlsson wrote:
> > Move queue_id, dev, and need_wakeup from the umem to the
> > buffer pool. This so that we in a later commit can share the umem
> > between multiple HW queues. There is one buffer pool per dev and
> > queue id, so these variables should belong to the buffer pool, not
> > the umem. Need_wakeup is also something that is set on a per napi
> > level, so there is usually one per device and queue id. So move
> > this to the buffer pool too.
> >
> > Signed-off-by: Magnus Karlsson 
> > ---
> >   include/net/xdp_sock.h  |  3 ---
> >   include/net/xsk_buff_pool.h |  4 
> >   net/xdp/xdp_umem.c  | 19 +--
> >   net/xdp/xdp_umem.h  |  4 
> >   net/xdp/xsk.c   | 40 +++-
> >   net/xdp/xsk_buff_pool.c | 39 ++-
> >   net/xdp/xsk_diag.c  |  4 ++--
> >   7 files changed, 44 insertions(+), 69 deletions(-)
> >
> > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> > index 2a284e1..b052f1c 100644
> > --- a/include/net/xdp_sock.h
> > +++ b/include/net/xdp_sock.h
> > @@ -26,11 +26,8 @@ struct xdp_umem {
> >   refcount_t users;
> >   struct page **pgs;
> >   u32 npgs;
> > - u16 queue_id;
> > - u8 need_wakeup;
> >   u8 flags;
> >   int id;
> > - struct net_device *dev;
> >   bool zc;
> >   spinlock_t xsk_tx_list_lock;
> >   struct list_head xsk_tx_list;
> > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> > index 380d9ae..2d94890 100644
> > --- a/include/net/xsk_buff_pool.h
> > +++ b/include/net/xsk_buff_pool.h
> > @@ -43,11 +43,15 @@ struct xsk_buff_pool {
> >   u32 headroom;
> >   u32 chunk_size;
> >   u32 frame_len;
> > + u16 queue_id;
> > + u8 cached_need_wakeup;
> > + bool uses_need_wakeup;
> >   bool dma_need_sync;
> >   bool unaligned;
> >   struct xdp_umem *umem;
> >   void *addrs;
> >   struct device *dev;
> > + struct net_device *netdev;
> >   refcount_t users;
> >   struct work_struct work;
> >   struct xdp_buff_xsk *free_heads[];
> > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> > index 7d86a63..b1699d0 100644
> > --- a/net/xdp/xdp_umem.c
> > +++ b/net/xdp/xdp_umem.c
> > @@ -63,26 +63,9 @@ static void xdp_umem_unaccount_pages(struct xdp_umem 
> > *umem)
> >   }
> >   }
> >
> > -void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > -  u16 queue_id)
> > -{
> > - umem->dev = dev;
> > - umem->queue_id = queue_id;
> > -
> > - dev_hold(dev);
> > -}
> > -
> > -void xdp_umem_clear_dev(struct xdp_umem *umem)
> > -{
> > - dev_put(umem->dev);
> > - umem->dev = NULL;
> > - umem->zc = false;
> > -}
> > -
> >   static void xdp_umem_release(struct xdp_umem *umem)
> >   {
> > - xdp_umem_clear_dev(umem);
> > -
> > + umem->zc = false;
> >   ida_simple_remove(&umem_ida, umem->id);
> >
> >   xdp_umem_unpin_pages(umem);
> > diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
> > index 93e96be..67bf3f3 100644
> > --- a/net/xdp/xdp_umem.h
> > +++ b/net/xdp/xdp_umem.h
> > @@ -8,10 +8,6 @@
> >
> >   #include 
> >
> > -void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > -  u16 queue_id);
> > -void xdp_umem_clear_dev(struct xdp_umem *umem);
> > -bool xdp_umem_validate_queues(struct xdp_umem *umem);
> >   void xdp_get_umem(struct xdp_umem *umem);
> >   void xdp_put_umem(struct xdp_umem *umem);
> >   void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs);
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index ee04887..624d0fc 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -41,67 +41,61 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
> >
> >   void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool)
> >   {
> > - struct xdp_umem *umem = pool->umem;
> > -
> > - if (umem->need_wakeup & XDP_WAKEUP_RX)
> > + if (pool->cached_need_wakeup & XDP_WAKEUP_RX)
> >   return;
> >
> >   pool->fq->ring->flags |= XDP_RING_NEED_WAKEUP;
> > - umem->need_wakeup |= XDP_WAKEUP_RX;
> > + pool->cached_need_wakeup |= XDP_WAKEUP_RX;
> >   }
> >   EXPORT_SYMBOL(xsk_set_rx_need_wakeup);
> >
> >   void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool)
> >   {
> > - struct xdp_umem *umem = pool->umem;
> >   struct xdp_sock *xs;
> >
> > - if (umem->need_wakeup & XDP_WAKEUP_TX)
> > + if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
> >   return;
> >
> >   rcu_read_lock();
> > - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) {
> > + list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) {
>
> I know this is going to be fixed in the next patch, but it breaks
> bisectability: this patch is broken because xs is not assigned. I don't
> think this cha

Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function

2020-07-29 Thread Richard Cochran

On Wed, Jul 29, 2020 at 11:02:57AM +0100, Russell King - ARM Linux admin wrote:
> in other words, compare pointers, so that whether skb_push() etc has
> been used on the skb is irrelevant?

+1

Re: [PATCH v1] farsync: use generic power management

2020-07-29 Thread Vaibhav Gupta

On Wed, Jul 29, 2020 at 07:29:54AM -0500, Bjorn Helgaas wrote:
> On Wed, Jul 29, 2020 at 03:47:30PM +0530, Vaibhav Gupta wrote:
> >
> > Agreed. Actually, as their presence only causes PCI core to call
> > pci_legacy_suspend/resume() for them, I thought that after removing
> > the binding from "struct pci_driver", this driver qualifies to be
> > grouped under genric framework, so used "use generic power
> > management" for the heading.
> > 
> > I should have written "remove legacy bindning".
> 
> This removed the *mention* of fst_driver.suspend and fst_driver.resume,
> which is important because we want to eventually remove those members
> completely from struct pci_driver.
> 
> But fst_driver.suspend and fst_driver.resume *exist* before and after
> this patch, and they're initialized to zero before and after this
> patch.
> 
> Since they were zero before, and they're still zero after this patch,
> the PCI core doesn't call pci_legacy_suspend/resume().  This patch
> doesn't change that at all.
>
Got it. Thanks :) 
> > But David has applied the patch, should I send a v2 or fix to update
> > message?
> 
> No, I don't think David updates patches after he's applied them.  But
> if the situation comes up again, you'll know how to describe it :)
> 
Thanks a lot. :D

Vaibhav Gupta
> Bjorn

Re: [PATCH bpf-next v4 08/14] xsk: enable sharing of dma mappings

2020-07-29 Thread Magnus Karlsson

On Tue, Jul 28, 2020 at 11:00 AM Maxim Mikityanskiy
 wrote:
>
> On 2020-07-21 08:04, Magnus Karlsson wrote:
> > Enable the sharing of dma mappings by moving them out from the buffer
> > pool. Instead we put each dma mapped umem region in a list in the umem
> > structure. If dma has already been mapped for this umem and device, it
> > is not mapped again and the existing dma mappings are reused.
> >
> > Signed-off-by: Magnus Karlsson 
> > ---
> >   include/net/xdp_sock.h  |   1 +
> >   include/net/xsk_buff_pool.h |   7 +++
> >   net/xdp/xdp_umem.c  |   1 +
> >   net/xdp/xsk_buff_pool.c | 112 
> > 
> >   4 files changed, 102 insertions(+), 19 deletions(-)
> >
> > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> > index 126d243..282aeba 100644
> > --- a/include/net/xdp_sock.h
> > +++ b/include/net/xdp_sock.h
> > @@ -30,6 +30,7 @@ struct xdp_umem {
> >   u8 flags;
> >   int id;
> >   bool zc;
> > + struct list_head xsk_dma_list;
> >   };
> >
> >   struct xsk_map {
> > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> > index 83f100c..8f1dc4c 100644
> > --- a/include/net/xsk_buff_pool.h
> > +++ b/include/net/xsk_buff_pool.h
> > @@ -28,6 +28,13 @@ struct xdp_buff_xsk {
> >   struct list_head free_list_node;
> >   };
> >
> > +struct xsk_dma_map {
> > + dma_addr_t *dma_pages;
> > + struct net_device *dev;
> > + refcount_t users;
> > + struct list_head list; /* Protected by the RTNL_LOCK */
> > +};
> > +
> >   struct xsk_buff_pool {
> >   struct xsk_queue *fq;
> >   struct xsk_queue *cq;
> > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> > index 372998d..cf27249 100644
> > --- a/net/xdp/xdp_umem.c
> > +++ b/net/xdp/xdp_umem.c
> > @@ -199,6 +199,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct 
> > xdp_umem_reg *mr)
> >   umem->user = NULL;
> >   umem->flags = mr->flags;
> >
> > + INIT_LIST_HEAD(&umem->xsk_dma_list);
> >   refcount_set(&umem->users, 1);
> >
> >   err = xdp_umem_account_pages(umem);
> > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> > index c563874..ca74a3e 100644
> > --- a/net/xdp/xsk_buff_pool.c
> > +++ b/net/xdp/xsk_buff_pool.c
> > @@ -104,6 +104,25 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, 
> > struct xdp_rxq_info *rxq)
> >   }
> >   EXPORT_SYMBOL(xp_set_rxq_info);
> >
> > +static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
> > +{
> > + struct netdev_bpf bpf;
> > + int err;
> > +
> > + ASSERT_RTNL();
> > +
> > + if (pool->umem->zc) {
> > + bpf.command = XDP_SETUP_XSK_POOL;
> > + bpf.xsk.pool = NULL;
> > + bpf.xsk.queue_id = pool->queue_id;
> > +
> > + err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf);
> > +
> > + if (err)
> > + WARN(1, "Failed to disable zero-copy!\n");
> > + }
> > +}
> > +
> >   int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *netdev,
> > u16 queue_id, u16 flags)
> >   {
> > @@ -122,6 +141,8 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct 
> > net_device *netdev,
> >   if (xsk_get_pool_from_qid(netdev, queue_id))
> >   return -EBUSY;
> >
> > + pool->netdev = netdev;
> > + pool->queue_id = queue_id;
> >   err = xsk_reg_pool_at_qid(netdev, pool, queue_id);
> >   if (err)
> >   return err;
> > @@ -155,11 +176,15 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct 
> > net_device *netdev,
> >   if (err)
> >   goto err_unreg_pool;
> >
> > - pool->netdev = netdev;
> > - pool->queue_id = queue_id;
> > + if (!pool->dma_pages) {
> > + WARN(1, "Driver did not DMA map zero-copy buffers");
> > + goto err_unreg_xsk;
> > + }
> >   pool->umem->zc = true;
> >   return 0;
> >
> > +err_unreg_xsk:
> > + xp_disable_drv_zc(pool);
> >   err_unreg_pool:
> >   if (!force_zc)
> >   err = 0; /* fallback to copy mode */
> > @@ -170,25 +195,10 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct 
> > net_device *netdev,
> >
> >   void xp_clear_dev(struct xsk_buff_pool *pool)
> >   {
> > - struct netdev_bpf bpf;
> > - int err;
> > -
> > - ASSERT_RTNL();
> > -
> >   if (!pool->netdev)
> >   return;
> >
> > - if (pool->umem->zc) {
> > - bpf.command = XDP_SETUP_XSK_POOL;
> > - bpf.xsk.pool = NULL;
> > - bpf.xsk.queue_id = pool->queue_id;
> > -
> > - err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf);
> > -
> > - if (err)
> > - WARN(1, "Failed to disable zero-copy!\n");
> > - }
> > -
> > + xp_disable_drv_zc(pool);
> >   xsk_clear_pool_at_qid(pool->netdev, pool->queue_id);
> >   dev_put(pool->netdev);
> >   pool->netdev = NULL;
> > @@ -233,14 +243,61 @@ void xp_put_pool(struct xsk_buf

[PATCH net-next v3 1/2] net: openvswitch: add masks cache hit counter

2020-07-29 Thread Eelco Chaudron

Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.

Reviewed-by: Paolo Abeni 
Reviewed-by: Tonghao Zhang 
Signed-off-by: Eelco Chaudron 
---
 include/uapi/linux/openvswitch.h |2 +-
 net/openvswitch/datapath.c   |5 -
 net/openvswitch/datapath.h   |3 +++
 net/openvswitch/flow_table.c |   19 ++-
 net/openvswitch/flow_table.h |3 ++-
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 9b14519e74d9..7cb76e5ca7cf 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -102,8 +102,8 @@ struct ovs_dp_megaflow_stats {
__u64 n_mask_hit;/* Number of masks used for flow lookups. */
__u32 n_masks;   /* Number of masks for the datapath. */
__u32 pad0;  /* Pad for future expension. */
+   __u64 n_cache_hit;   /* Number of cache matches for flow lookups. */
__u64 pad1;  /* Pad for future expension. */
-   __u64 pad2;  /* Pad for future expension. */
 };
 
 struct ovs_vport_stats {
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 95805f0e27bd..a54df1fe3ec4 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -225,13 +225,14 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct 
sw_flow_key *key)
struct dp_stats_percpu *stats;
u64 *stats_counter;
u32 n_mask_hit;
+   u32 n_cache_hit;
int error;
 
stats = this_cpu_ptr(dp->stats_percpu);
 
/* Look up flow. */
flow = ovs_flow_tbl_lookup_stats(&dp->table, key, skb_get_hash(skb),
-&n_mask_hit);
+&n_mask_hit, &n_cache_hit);
if (unlikely(!flow)) {
struct dp_upcall_info upcall;
 
@@ -262,6 +263,7 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct 
sw_flow_key *key)
u64_stats_update_begin(&stats->syncp);
(*stats_counter)++;
stats->n_mask_hit += n_mask_hit;
+   stats->n_cache_hit += n_cache_hit;
u64_stats_update_end(&stats->syncp);
 }
 
@@ -699,6 +701,7 @@ static void get_dp_stats(const struct datapath *dp, struct 
ovs_dp_stats *stats,
stats->n_missed += local_stats.n_missed;
stats->n_lost += local_stats.n_lost;
mega_stats->n_mask_hit += local_stats.n_mask_hit;
+   mega_stats->n_cache_hit += local_stats.n_cache_hit;
}
 }
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 697a2354194b..86d78613edb4 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -38,12 +38,15 @@
  * @n_mask_hit: Number of masks looked up for flow match.
  *   @n_mask_hit / (@n_hit + @n_missed)  will be the average masks looked
  *   up per packet.
+ * @n_cache_hit: The number of received packets that had their mask found using
+ * the mask cache.
  */
 struct dp_stats_percpu {
u64 n_hit;
u64 n_missed;
u64 n_lost;
u64 n_mask_hit;
+   u64 n_cache_hit;
struct u64_stats_sync syncp;
 };
 
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index af22c9ee28dd..a5912ea05352 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -667,6 +667,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
   struct mask_array *ma,
   const struct sw_flow_key *key,
   u32 *n_mask_hit,
+  u32 *n_cache_hit,
   u32 *index)
 {
u64 *usage_counters = this_cpu_ptr(ma->masks_usage_cntr);
@@ -682,6 +683,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
u64_stats_update_begin(&ma->syncp);
usage_counters[*index]++;
u64_stats_update_end(&ma->syncp);
+   (*n_cache_hit)++;
return flow;
}
}
@@ -719,7 +721,8 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
 struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
  const struct sw_flow_key *key,
  u32 skb_hash,
- u32 *n_mask_hit)
+ u32 *n_mask_hit,
+ u32 *n_cache_hit)
 {
struct mask_array *ma = rcu_dereference(tbl->mask_array);
struct table_instance *ti = rcu_dereference(tbl->ti);
@@ -729,10 +732,13 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct 
flow_table *tbl,
int seg;
 
*n_mask_hit

[PATCH net-next v3 0/2] net: openvswitch: masks cache enhancements

2020-07-29 Thread Eelco Chaudron

This patchset adds two enhancements to the Open vSwitch masks cache.

Signed-off-by: Eelco Chaudron 

Changes in v3 [patch 2/2 only]:
 - Use is_power_of_2() function
 - Use array_size() function
 - Fix remaining sparse errors

Changes in v2 [patch 2/2 only]:
 - Fix sparse warnings
 - Fix netlink policy items reported by Florian Westphal

Eelco Chaudron (2):
  net: openvswitch: add masks cache hit counter
  net: openvswitch: make masks cache size configurable


 include/uapi/linux/openvswitch.h |3 +
 net/openvswitch/datapath.c   |   19 ++
 net/openvswitch/datapath.h   |3 +
 net/openvswitch/flow_table.c |  116 --
 net/openvswitch/flow_table.h |   13 
 5 files changed, 132 insertions(+), 22 deletions(-)

[PATCH net-next v3 2/2] net: openvswitch: make masks cache size configurable

2020-07-29 Thread Eelco Chaudron

This patch makes the masks cache size configurable, or with
a size of 0, disable it.

Reviewed-by: Paolo Abeni 
Signed-off-by: Eelco Chaudron 
---
Changes in v3:
 - Use is_power_of_2() function
 - Use array_size() function
 - Fix remaining sparse errors

Changes in v2:
 - Fix sparse warnings
 - Fix netlink policy items reported by Florian Westphal

 include/uapi/linux/openvswitch.h |1 
 net/openvswitch/datapath.c   |   14 +
 net/openvswitch/flow_table.c |   97 +-
 net/openvswitch/flow_table.h |   10 
 4 files changed, 108 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 7cb76e5ca7cf..8300cc29dec8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -86,6 +86,7 @@ enum ovs_datapath_attr {
OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */
OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
OVS_DP_ATTR_PAD,
+   OVS_DP_ATTR_MASKS_CACHE_SIZE,
__OVS_DP_ATTR_MAX
 };
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index a54df1fe3ec4..114b2ddb8037 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1498,6 +1498,7 @@ static size_t ovs_dp_cmd_msg_size(void)
msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_stats));
msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_megaflow_stats));
msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_USER_FEATURES */
+   msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_MASKS_CACHE_SIZE 
*/
 
return msgsize;
 }
@@ -1535,6 +1536,10 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, 
struct sk_buff *skb,
if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features))
goto nla_put_failure;
 
+   if (nla_put_u32(skb, OVS_DP_ATTR_MASKS_CACHE_SIZE,
+   ovs_flow_tbl_masks_cache_size(&dp->table)))
+   goto nla_put_failure;
+
genlmsg_end(skb, ovs_header);
return 0;
 
@@ -1599,6 +1604,13 @@ static int ovs_dp_change(struct datapath *dp, struct 
nlattr *a[])
 #endif
}
 
+   if (a[OVS_DP_ATTR_MASKS_CACHE_SIZE]) {
+   u32 cache_size;
+
+   cache_size = nla_get_u32(a[OVS_DP_ATTR_MASKS_CACHE_SIZE]);
+   ovs_flow_tbl_masks_cache_resize(&dp->table, cache_size);
+   }
+
dp->user_features = user_features;
 
if (dp->user_features & OVS_DP_F_TC_RECIRC_SHARING)
@@ -1894,6 +1906,8 @@ static const struct nla_policy 
datapath_policy[OVS_DP_ATTR_MAX + 1] = {
[OVS_DP_ATTR_NAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 },
[OVS_DP_ATTR_UPCALL_PID] = { .type = NLA_U32 },
[OVS_DP_ATTR_USER_FEATURES] = { .type = NLA_U32 },
+   [OVS_DP_ATTR_MASKS_CACHE_SIZE] =  NLA_POLICY_RANGE(NLA_U32, 0,
+   PCPU_MIN_UNIT_SIZE / sizeof(struct mask_cache_entry)),
 };
 
 static const struct genl_ops dp_datapath_genl_ops[] = {
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index a5912ea05352..5280aeeef628 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -38,8 +38,8 @@
 #define MASK_ARRAY_SIZE_MIN16
 #define REHASH_INTERVAL(10 * 60 * HZ)
 
+#define MC_DEFAULT_HASH_ENTRIES256
 #define MC_HASH_SHIFT  8
-#define MC_HASH_ENTRIES(1u << MC_HASH_SHIFT)
 #define MC_HASH_SEGS   ((sizeof(uint32_t) * 8) / MC_HASH_SHIFT)
 
 static struct kmem_cache *flow_cache;
@@ -341,15 +341,75 @@ static void flow_mask_remove(struct flow_table *tbl, 
struct sw_flow_mask *mask)
}
 }
 
+static void __mask_cache_destroy(struct mask_cache *mc)
+{
+   if (mc->mask_cache)
+   free_percpu(mc->mask_cache);
+   kfree(mc);
+}
+
+static void mask_cache_rcu_cb(struct rcu_head *rcu)
+{
+   struct mask_cache *mc = container_of(rcu, struct mask_cache, rcu);
+
+   __mask_cache_destroy(mc);
+}
+
+static struct mask_cache *tbl_mask_cache_alloc(u32 size)
+{
+   struct mask_cache_entry __percpu *cache = NULL;
+   struct mask_cache *new;
+
+   /* Only allow size to be 0, or a power of 2, and does not exceed
+* percpu allocation size.
+*/
+   if ((!is_power_of_2(size) && size != 0) ||
+   (size * sizeof(struct mask_cache_entry)) > PCPU_MIN_UNIT_SIZE)
+   return NULL;
+
+   new = kzalloc(sizeof(*new), GFP_KERNEL);
+   if (!new)
+   return NULL;
+
+   new->cache_size = size;
+   if (new->cache_size > 0) {
+   cache = __alloc_percpu(array_size(sizeof(struct 
mask_cache_entry),
+ new->cache_size),
+  __alignof__(struct mask_cache_entry));
+   if (!cache) {
+   kfree(new);
+   return NULL;
+   }
+

Re: [PATCH RFC net-next] net: phy: add Marvell PHY PTP support

2020-07-29 Thread Russell King - ARM Linux admin

On Wed, Jul 29, 2020 at 06:19:32AM -0700, Richard Cochran wrote:
> On Wed, Jul 29, 2020 at 11:58:07AM +0100, Russell King - ARM Linux admin 
> wrote:
> > How do we deal with this situation - from what I can see from the
> > ethtool API, we have to make a choice about which to use.  How do we
> > make that choice?
> 
> Unfortunately the stack does not implement simultaneous MAC + PHY time
> stamping.  If your board has both, then you make the choice to use the
> PHY by selecting NETWORK_PHY_TIMESTAMPING at kernel compile time.

Which is more or less what I said in my email.  However, the important
question about how to select between the two, which is really what I'm
after, has not been addressed.

> (Also some MAC drivers do not defer to the PHY properly.  Sometimes
> you can work around that by de-selecting the MAC's PTP function in the
> Kconfig if possible, but otherwise you need to patch the MAC driver.)

... which really doesn't work if you have a board where only some
network interfaces have a PHY with PTP support, but all have PTP
support in the MAC.

If all MACs or the majority of MACs use a common PTP clock, it seems
to me that you would want to use the MACs rather than the PHY,
especially if the PHY doesn't offer as good a quality PTP clock as
is available from the MAC.

Randomly patching the kernel is out of the question, for arm based
systems we want one kernel that works correctly across as many
platforms as possible, and Kconfig choices to set platform specific
details are basically unacceptable, let alone patching the kernel to
make those decisions.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

HFSC latency control

2020-07-29 Thread Santi

If this is not the correct place to ask these questions, please let me 
know if you know about any forum, mailing list, IRC channel or wherever 
HFSC questions could be asked and answered.


I have read some HFSC theory and some old messages in the archive of 
this mailing list. I am testing HFSC in my lab. Traffic is correctly 
shaped but I do not get the expected results regarding latency.


tc qdisc add dev eth0 root handle 1: hfsc default 90
tc class add dev eth0 parent 1:0 classid 1:1 hfsc ls m2 100kbps ul m2 
100kbps
tc class add dev eth0 parent 1:1 classid 1:10 hfsc rt m1 50kbps d 100 m2 
10kbps

tc class add dev eth0 parent 1:1 classid 1:20 hfsc ls m2 70kbps
tc class add dev eth0 parent 1:1 classid 1:90 hfsc ls m2 10kbps
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip src 
192.168.10.0/24 flowid 1:10
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip src 
192.168.20.0/24 flowid 1:20


I expected to have the lowest latency on packets from 192.168.10.0/24, 
however they get much more delayed than any other traffic.


What am I doing wrong?

I would like to have one example where I can see HFSC controls latency. 
I have seen lots of theory but very few examples, and when I test them 
in my lab I don't see the expected results regarding latency.


I generate traffic with Ostinato and test latency with ping and qperf.

Re: PROBLEM: (DSA/Microchip): 802.1Q-Header lost on KSZ9477-DSA ingress without bridge

2020-07-29 Thread Florian Fainelli




On 7/28/2020 11:05 PM, Gaube, Marvin (THSE-TL1) wrote:
> Summary: 802.1Q-Header lost on KSZ9477-DSA ingress without bridge
> Keywords: networking, dsa, microchip, 802.1q, vlan
> Full description:
> 
> Hello,
> we're trying to get 802.1Q-Tagged Ethernet Frames through an KSZ9477 
> DSA-enabled switch without creating a bridge on the kernel side.

Does it work if you have a bridge that is VLAN aware though? If it does,
this would suggest that the default VLAN behavior without a bridge is
too restrictive and needs changing.

> Following setup:
> Switchport 1 <-- KSZ9477 --> eth1 (CPU-Port) <---> lan1

This representation is confusing, is switchport 1 a network device or is
this meant to be physical switch port number of 1 of the KSZ9477?

> 
> No bridge is configured, only the interface directly. Untagged packets are 
> working without problems. The Switch uses the ksz9477-DSA-Driver with 
> Tail-Tagging ("DSA_TAG_PROTO_KSZ9477").
> When sending packets with 802.1Q-Header (tagged VLAN) into the Switchport, I 
> see them including the 802.1Q-Header on eth1.
> They also appear on lan1, but with the 802.1Q-Header missing.
> When I create an VLAN-Interface over lan1 (e.g. lan1.21), nothing arrives 
> there.
> The other way around, everything works fine: Packets transmitted into lan1.21 
> are appearing in 802.1Q-VLAN 21 on the Switchport 1.
> 
> I assume that is not the intended behavior.
> I haven't found an obvious reason for this behavior yet, but I suspect the 
> VLAN-Header gets stripped of anywhere around "dsa_switch_rcv" in 
> net/dsa/dsa.c or "ksz9477_rcv" in net/dsa/tag_ksz.c.

Not sure how though, ksz9477_rcv() only removes the trail tag, this
should leave any header intact. It seems to me that the switch is
incorrectly configured and is not VLAN aware at all, nor passing VLAN
tagged frames through on ingress to CPU when it should.
-- 
Florian

Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function

2020-07-29 Thread Richard Cochran

On Wed, Jul 29, 2020 at 12:29:08PM +0200, Kurt Kanzenbach wrote:

> I'll test it and send v3. But before, I've got another question that
> somebody might have an answer to:
> 
> The ptp v1 code always does locate the message type at
> 
>  msgtype = data + offset + OFF_PTP_CONTROL
> 
> OFF_PTP_CONTROL is 32. However, looking at the ptp v1 header, the
> message type is located at offset 20. What am I missing here?


My source back in the day was the John Eidson book.  In Appendix A it claims


   Table A.1. Common PTP message fields

   Field namePurpose
   
   messageType   Identifies message as event or general
   control   Indicates the message type, e.g., Sync


So I think the code is correct.

Thanks,
Richard

[PATCH 1/5] seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g

2020-07-29 Thread Peter Zijlstra

__SEQ_LOCKDEP() is an expression gate for the
seqcount_LOCKNAME_t::lock member. Rename it to be about the member,
not the gate condition.

Later (PREEMPT_RT) patches will make the member available for !LOCKDEP
configs.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/seqlock.h |   20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -133,20 +133,20 @@ static inline void seqcount_lockdep_read
  */
 
 #ifdef CONFIG_LOCKDEP
-#define __SEQ_LOCKDEP(expr)expr
+#define __SEQ_LOCK(expr)   expr
 #else
-#define __SEQ_LOCKDEP(expr)
+#define __SEQ_LOCK(expr)
 #endif
 
 #define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) { \
.seqcount   = SEQCNT_ZERO(seq_name.seqcount),   \
-   __SEQ_LOCKDEP(.lock = (assoc_lock)) \
+   __SEQ_LOCK(.lock= (assoc_lock)) \
 }
 
 #define seqcount_locktype_init(s, assoc_lock)  \
 do {   \
seqcount_init(&(s)->seqcount);  \
-   __SEQ_LOCKDEP((s)->lock = (assoc_lock));\
+   __SEQ_LOCK((s)->lock = (assoc_lock));   \
 } while (0)
 
 /**
@@ -161,7 +161,7 @@ do {
\
  */
 typedef struct seqcount_spinlock {
seqcount_t  seqcount;
-   __SEQ_LOCKDEP(spinlock_t*lock);
+   __SEQ_LOCK(spinlock_t   *lock);
 } seqcount_spinlock_t;
 
 /**
@@ -192,7 +192,7 @@ typedef struct seqcount_spinlock {
  */
 typedef struct seqcount_raw_spinlock {
seqcount_t  seqcount;
-   __SEQ_LOCKDEP(raw_spinlock_t*lock);
+   __SEQ_LOCK(raw_spinlock_t   *lock);
 } seqcount_raw_spinlock_t;
 
 /**
@@ -223,7 +223,7 @@ typedef struct seqcount_raw_spinlock {
  */
 typedef struct seqcount_rwlock {
seqcount_t  seqcount;
-   __SEQ_LOCKDEP(rwlock_t  *lock);
+   __SEQ_LOCK(rwlock_t *lock);
 } seqcount_rwlock_t;
 
 /**
@@ -257,7 +257,7 @@ typedef struct seqcount_rwlock {
  */
 typedef struct seqcount_mutex {
seqcount_t  seqcount;
-   __SEQ_LOCKDEP(struct mutex  *lock);
+   __SEQ_LOCK(struct mutex *lock);
 } seqcount_mutex_t;
 
 /**
@@ -291,7 +291,7 @@ typedef struct seqcount_mutex {
  */
 typedef struct seqcount_ww_mutex {
seqcount_t  seqcount;
-   __SEQ_LOCKDEP(struct ww_mutex   *lock);
+   __SEQ_LOCK(struct ww_mutex  *lock);
 } seqcount_ww_mutex_t;
 
 /**
@@ -329,7 +329,7 @@ __seqcount_##locktype##_preemptible(seqc
 static inline void \
 __seqcount_##locktype##_assert(seqcount_##locktype##_t *s) \
 {  \
-   __SEQ_LOCKDEP(lockdep_assert_held(lockmember)); \
+   __SEQ_LOCK(lockdep_assert_held(lockmember));\
 }
 
 /*

[PATCH 3/5] seqlock: Fold seqcount_LOCKNAME_init() definition

2020-07-29 Thread Peter Zijlstra

Manual repetition is boring and error prone.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/seqlock.h |   61 +++-
 1 file changed, 14 insertions(+), 47 deletions(-)

--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -143,12 +143,6 @@ static inline void seqcount_lockdep_read
__SEQ_LOCK(.lock= (assoc_lock)) \
 }
 
-#define seqcount_locktype_init(s, assoc_lock)  \
-do {   \
-   seqcount_init(&(s)->seqcount);  \
-   __SEQ_LOCK((s)->lock = (assoc_lock));   \
-} while (0)
-
 /**
  * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t
  * @name:  Name of the seqcount_spinlock_t instance
@@ -158,14 +152,6 @@ do {   
\
SEQCOUNT_LOCKTYPE_ZERO(name, lock)
 
 /**
- * seqcount_spinlock_init - runtime initializer for seqcount_spinlock_t
- * @s: Pointer to the seqcount_spinlock_t instance
- * @lock:  Pointer to the associated spinlock
- */
-#define seqcount_spinlock_init(s, lock)
\
-   seqcount_locktype_init(s, lock)
-
-/**
  * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t
  * @name:  Name of the seqcount_raw_spinlock_t instance
  * @lock:  Pointer to the associated raw_spinlock
@@ -174,14 +160,6 @@ do {   
\
SEQCOUNT_LOCKTYPE_ZERO(name, lock)
 
 /**
- * seqcount_raw_spinlock_init - runtime initializer for seqcount_raw_spinlock_t
- * @s: Pointer to the seqcount_raw_spinlock_t instance
- * @lock:  Pointer to the associated raw_spinlock
- */
-#define seqcount_raw_spinlock_init(s, lock)\
-   seqcount_locktype_init(s, lock)
-
-/**
  * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t
  * @name:  Name of the seqcount_rwlock_t instance
  * @lock:  Pointer to the associated rwlock
@@ -190,14 +168,6 @@ do {   
\
SEQCOUNT_LOCKTYPE_ZERO(name, lock)
 
 /**
- * seqcount_rwlock_init - runtime initializer for seqcount_rwlock_t
- * @s: Pointer to the seqcount_rwlock_t instance
- * @lock:  Pointer to the associated rwlock
- */
-#define seqcount_rwlock_init(s, lock)  \
-   seqcount_locktype_init(s, lock)
-
-/**
  * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t
  * @name:  Name of the seqcount_mutex_t instance
  * @lock:  Pointer to the associated mutex
@@ -206,14 +176,6 @@ do {   
\
SEQCOUNT_LOCKTYPE_ZERO(name, lock)
 
 /**
- * seqcount_mutex_init - runtime initializer for seqcount_mutex_t
- * @s: Pointer to the seqcount_mutex_t instance
- * @lock:  Pointer to the associated mutex
- */
-#define seqcount_mutex_init(s, lock)   \
-   seqcount_locktype_init(s, lock)
-
-/**
  * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t
  * @name:  Name of the seqcount_ww_mutex_t instance
  * @lock:  Pointer to the associated ww_mutex
@@ -222,15 +184,7 @@ do {   
\
SEQCOUNT_LOCKTYPE_ZERO(name, lock)
 
 /**
- * seqcount_ww_mutex_init - runtime initializer for seqcount_ww_mutex_t
- * @s: Pointer to the seqcount_ww_mutex_t instance
- * @lock:  Pointer to the associated ww_mutex
- */
-#define seqcount_ww_mutex_init(s, lock)
\
-   seqcount_locktype_init(s, lock)
-
-/**
- * typedef seqcount_LOCKNAME_t - sequence counter with spinlock associated
+ * typedef seqcount_LOCKNAME_t - sequence counter with LOCKTYPR associated
  * @seqcount:  The real sequence counter
  * @lock:  Pointer to the associated spinlock
  *
@@ -240,6 +194,12 @@ do {   
\
  * that the write side critical section is properly serialized.
  */
 
+/**
+ * seqcount_LOCKNAME_init() - runtime initializer for seqcount_LOCKNAME_t
+ * @s: Pointer to the seqcount_LOCKNAME_t instance
+ * @lock:  Pointer to the associated LOCKTYPE
+ */
+
 /*
  * SEQCOUNT_LOCKTYPE() - Instantiate seqcount_LOCKNAME_t and helpers
  * @locktype:  actual typename
@@ -253,6 +213,13 @@ typedef struct seqcount_##lockname {
__SEQ_LOCK(locktype *lock); \
 } seqcount_##lockname##_t; \
\
+static __always_inline void\
+seqcount_##lockname##_init(se

[PATCH 0/5] seqlock: Cleanups

2020-07-29 Thread Peter Zijlstra

Hi,

These are some minor cleanups that go on top of darwi's seqlock patches:

  https://lkml.kernel.org/r/20200720155530.1173732-1-a.darw...@linutronix.de

It's mostly trimming excessive manual repetition and a few naming niggles.

The series has been exposed to 0-day for a while now, so I'm going to push the
lot out to tip/locking/core.

[ 0day found a Sparse bug in it's _Generic() implementation that has since been
  fixed by Luc ]

---
 seqlock.h |  292 +-
 1 file changed, 84 insertions(+), 208 deletions(-)

[PATCH 5/5] seqcount: More consistent seqprop names

2020-07-29 Thread Peter Zijlstra

Attempt uniformity and brevity.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/seqlock.h |   52 
 1 file changed, 26 insertions(+), 26 deletions(-)

--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -247,9 +247,9 @@ SEQCOUNT_LOCKTYPE(struct ww_mutex,  ww_mu
__seqprop_case((s), mutex,  prop),  \
__seqprop_case((s), ww_mutex,   prop))
 
-#define __to_seqcount_t(s) __seqprop(s, ptr)
-#define __associated_lock_exists_and_is_preemptible(s) __seqprop(s, 
preemptible)
-#define __assert_write_section_is_protected(s) __seqprop(s, assert)
+#define __seqcount_ptr(s)  __seqprop(s, ptr)
+#define __seqcount_lock_preemptible(s) __seqprop(s, preemptible)
+#define __seqcount_assert_lock_held(s) __seqprop(s, assert)
 
 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -266,7 +266,7 @@ SEQCOUNT_LOCKTYPE(struct ww_mutex,  ww_mu
  * Return: count to be passed to read_seqcount_retry()
  */
 #define __read_seqcount_begin(s)   \
-   __read_seqcount_t_begin(__to_seqcount_t(s))
+   __read_seqcount_t_begin(__seqcount_ptr(s))
 
 static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
 {
@@ -289,7 +289,7 @@ static inline unsigned __read_seqcount_t
  * Return: count to be passed to read_seqcount_retry()
  */
 #define raw_read_seqcount_begin(s) \
-   raw_read_seqcount_t_begin(__to_seqcount_t(s))
+   raw_read_seqcount_t_begin(__seqcount_ptr(s))
 
 static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
 {
@@ -305,7 +305,7 @@ static inline unsigned raw_read_seqcount
  * Return: count to be passed to read_seqcount_retry()
  */
 #define read_seqcount_begin(s) \
-   read_seqcount_t_begin(__to_seqcount_t(s))
+   read_seqcount_t_begin(__seqcount_ptr(s))
 
 static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
 {
@@ -325,7 +325,7 @@ static inline unsigned read_seqcount_t_b
  * Return: count to be passed to read_seqcount_retry()
  */
 #define raw_read_seqcount(s)   \
-   raw_read_seqcount_t(__to_seqcount_t(s))
+   raw_read_seqcount_t(__seqcount_ptr(s))
 
 static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
 {
@@ -353,7 +353,7 @@ static inline unsigned raw_read_seqcount
  * Return: count to be passed to read_seqcount_retry()
  */
 #define raw_seqcount_begin(s)  \
-   raw_seqcount_t_begin(__to_seqcount_t(s))
+   raw_seqcount_t_begin(__seqcount_ptr(s))
 
 static inline unsigned raw_seqcount_t_begin(const seqcount_t *s)
 {
@@ -380,7 +380,7 @@ static inline unsigned raw_seqcount_t_be
  * Return: true if a read section retry is required, else false
  */
 #define __read_seqcount_retry(s, start)
\
-   __read_seqcount_t_retry(__to_seqcount_t(s), start)
+   __read_seqcount_t_retry(__seqcount_ptr(s), start)
 
 static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
@@ -400,7 +400,7 @@ static inline int __read_seqcount_t_retr
  * Return: true if a read section retry is required, else false
  */
 #define read_seqcount_retry(s, start)  \
-   read_seqcount_t_retry(__to_seqcount_t(s), start)
+   read_seqcount_t_retry(__seqcount_ptr(s), start)
 
 static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
@@ -414,10 +414,10 @@ static inline int read_seqcount_t_retry(
  */
 #define raw_write_seqcount_begin(s)\
 do {   \
-   if (__associated_lock_exists_and_is_preemptible(s)) \
+   if (__seqcount_lock_preemptible(s)) \
preempt_disable();  \
\
-   raw_write_seqcount_t_begin(__to_seqcount_t(s)); \
+   raw_write_seqcount_t_begin(__seqcount_ptr(s));  \
 } while (0)
 
 static inline void raw_write_seqcount_t_begin(seqcount_t *s)
@@ -433,9 +433,9 @@ static inline void raw_write_seqcount_t_
  */
 #define raw_write_seqcount_end(s)  \
 do {   \
-   raw_write_seqcount_t_end(__to_seqcount_t(s));   \
+   raw_write_seqcount_t_end(__seqcount_ptr(s));\
\
-   if (__associated_lock_exists_and_is_preemptible(s)) \
+   if (__seqcount_lock_preemptible(s)) \
preempt_enable();

[PATCH 2/5] seqlock: Fold seqcount_LOCKNAME_t definition

2020-07-29 Thread Peter Zijlstra

Manual repetition is boring and error prone.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/seqlock.h |  140 +---
 1 file changed, 38 insertions(+), 102 deletions(-)

--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -150,21 +150,6 @@ do {   
\
 } while (0)
 
 /**
- * typedef seqcount_spinlock_t - sequence counter with spinlock associated
- * @seqcount:  The real sequence counter
- * @lock:  Pointer to the associated spinlock
- *
- * A plain sequence counter with external writer synchronization by a
- * spinlock. The spinlock is associated to the sequence count in the
- * static initializer or init function. This enables lockdep to validate
- * that the write side critical section is properly serialized.
- */
-typedef struct seqcount_spinlock {
-   seqcount_t  seqcount;
-   __SEQ_LOCK(spinlock_t   *lock);
-} seqcount_spinlock_t;
-
-/**
  * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t
  * @name:  Name of the seqcount_spinlock_t instance
  * @lock:  Pointer to the associated spinlock
@@ -181,21 +166,6 @@ typedef struct seqcount_spinlock {
seqcount_locktype_init(s, lock)
 
 /**
- * typedef seqcount_raw_spinlock_t - sequence count with raw spinlock 
associated
- * @seqcount:  The real sequence counter
- * @lock:  Pointer to the associated raw spinlock
- *
- * A plain sequence counter with external writer synchronization by a
- * raw spinlock. The raw spinlock is associated to the sequence count in
- * the static initializer or init function. This enables lockdep to
- * validate that the write side critical section is properly serialized.
- */
-typedef struct seqcount_raw_spinlock {
-   seqcount_t  seqcount;
-   __SEQ_LOCK(raw_spinlock_t   *lock);
-} seqcount_raw_spinlock_t;
-
-/**
  * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t
  * @name:  Name of the seqcount_raw_spinlock_t instance
  * @lock:  Pointer to the associated raw_spinlock
@@ -212,21 +182,6 @@ typedef struct seqcount_raw_spinlock {
seqcount_locktype_init(s, lock)
 
 /**
- * typedef seqcount_rwlock_t - sequence count with rwlock associated
- * @seqcount:  The real sequence counter
- * @lock:  Pointer to the associated rwlock
- *
- * A plain sequence counter with external writer synchronization by a
- * rwlock. The rwlock is associated to the sequence count in the static
- * initializer or init function. This enables lockdep to validate that
- * the write side critical section is properly serialized.
- */
-typedef struct seqcount_rwlock {
-   seqcount_t  seqcount;
-   __SEQ_LOCK(rwlock_t *lock);
-} seqcount_rwlock_t;
-
-/**
  * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t
  * @name:  Name of the seqcount_rwlock_t instance
  * @lock:  Pointer to the associated rwlock
@@ -243,24 +198,6 @@ typedef struct seqcount_rwlock {
seqcount_locktype_init(s, lock)
 
 /**
- * typedef seqcount_mutex_t - sequence count with mutex associated
- * @seqcount:  The real sequence counter
- * @lock:  Pointer to the associated mutex
- *
- * A plain sequence counter with external writer synchronization by a
- * mutex. The mutex is associated to the sequence counter in the static
- * initializer or init function. This enables lockdep to validate that
- * the write side critical section is properly serialized.
- *
- * The write side API functions write_seqcount_begin()/end() automatically
- * disable and enable preemption when used with seqcount_mutex_t.
- */
-typedef struct seqcount_mutex {
-   seqcount_t  seqcount;
-   __SEQ_LOCK(struct mutex *lock);
-} seqcount_mutex_t;
-
-/**
  * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t
  * @name:  Name of the seqcount_mutex_t instance
  * @lock:  Pointer to the associated mutex
@@ -277,24 +214,6 @@ typedef struct seqcount_mutex {
seqcount_locktype_init(s, lock)
 
 /**
- * typedef seqcount_ww_mutex_t - sequence count with ww_mutex associated
- * @seqcount:  The real sequence counter
- * @lock:  Pointer to the associated ww_mutex
- *
- * A plain sequence counter with external writer synchronization by a
- * ww_mutex. The ww_mutex is associated to the sequence counter in the static
- * initializer or init function. This enables lockdep to validate that
- * the write side critical section is properly serialized.
- *
- * The write side API functions write_seqcount_begin()/end() automatically
- * disable and enable preemption when used with seqcount_ww_mutex_t.
- */
-typedef struct seqcount_ww_mutex {
-   seqcount_t  seqcount;
-   __SEQ_LOCK(struct ww_mutex  *lock);
-} seqcount_ww_mutex_t;
-
-/**
  * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t
  * @name:  Name of the seqcount_ww_mutex_t instance
  * @lock:  Pointer to the associated ww_

[PATCH 4/5] seqcount: Compress SEQCNT_LOCKNAME_ZERO()

2020-07-29 Thread Peter Zijlstra

Less is more.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/seqlock.h |   63 +---
 1 file changed, 18 insertions(+), 45 deletions(-)

--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -138,51 +138,6 @@ static inline void seqcount_lockdep_read
 #define __SEQ_LOCK(expr)
 #endif
 
-#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) { \
-   .seqcount   = SEQCNT_ZERO(seq_name.seqcount),   \
-   __SEQ_LOCK(.lock= (assoc_lock)) \
-}
-
-/**
- * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t
- * @name:  Name of the seqcount_spinlock_t instance
- * @lock:  Pointer to the associated spinlock
- */
-#define SEQCNT_SPINLOCK_ZERO(name, lock)   \
-   SEQCOUNT_LOCKTYPE_ZERO(name, lock)
-
-/**
- * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t
- * @name:  Name of the seqcount_raw_spinlock_t instance
- * @lock:  Pointer to the associated raw_spinlock
- */
-#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock)   \
-   SEQCOUNT_LOCKTYPE_ZERO(name, lock)
-
-/**
- * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t
- * @name:  Name of the seqcount_rwlock_t instance
- * @lock:  Pointer to the associated rwlock
- */
-#define SEQCNT_RWLOCK_ZERO(name, lock) \
-   SEQCOUNT_LOCKTYPE_ZERO(name, lock)
-
-/**
- * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t
- * @name:  Name of the seqcount_mutex_t instance
- * @lock:  Pointer to the associated mutex
- */
-#define SEQCNT_MUTEX_ZERO(name, lock)  \
-   SEQCOUNT_LOCKTYPE_ZERO(name, lock)
-
-/**
- * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t
- * @name:  Name of the seqcount_ww_mutex_t instance
- * @lock:  Pointer to the associated ww_mutex
- */
-#define SEQCNT_WW_MUTEX_ZERO(name, lock)   \
-   SEQCOUNT_LOCKTYPE_ZERO(name, lock)
-
 /**
  * typedef seqcount_LOCKNAME_t - sequence counter with LOCKTYPR associated
  * @seqcount:  The real sequence counter
@@ -263,6 +218,24 @@ SEQCOUNT_LOCKTYPE(rwlock_t,rwlock, 
fa
 SEQCOUNT_LOCKTYPE(struct mutex,mutex,  true,   s->lock)
 SEQCOUNT_LOCKTYPE(struct ww_mutex, ww_mutex,   true,   &s->lock->base)
 
+/**
+ * SEQCNT_LOCKNAME_ZERO - static initializer for seqcount_LOCKNAME_t
+ * @name:  Name of the seqcount_LOCKNAME_t instance
+ * @lock:  Pointer to the associated LOCKTYPE
+ */
+
+#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) { \
+   .seqcount   = SEQCNT_ZERO(seq_name.seqcount),   \
+   __SEQ_LOCK(.lock= (assoc_lock)) \
+}
+
+#define SEQCNT_SPINLOCK_ZERO(name, lock)   SEQCOUNT_LOCKTYPE_ZERO(name, 
lock)
+#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock)   SEQCOUNT_LOCKTYPE_ZERO(name, 
lock)
+#define SEQCNT_RWLOCK_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, 
lock)
+#define SEQCNT_MUTEX_ZERO(name, lock)  SEQCOUNT_LOCKTYPE_ZERO(name, 
lock)
+#define SEQCNT_WW_MUTEX_ZERO(name, lock)   SEQCOUNT_LOCKTYPE_ZERO(name, 
lock)
+
+
 #define __seqprop_case(s, lockname, prop)  \
seqcount_##lockname##_t: __seqcount_##lockname##_##prop((void *)(s))

Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function

2020-07-29 Thread Petr Machata



Kurt Kanzenbach  writes:

> On Wed Jul 29 2020, Russell King - ARM Linux admin wrote:
>> Would it make more sense to do:
>>
>>  u8 *data = skb_mac_header(skb);
>>  u8 *ptr = data;
>>
>>  if (type & PTP_CLASS_VLAN)
>>  ptr += VLAN_HLEN;
>>
>>  switch (type & PTP_CLASS_PMASK) {
>>  case PTP_CLASS_IPV4:
>>  ptr += IPV4_HLEN(ptr) + UDP_HLEN;
>>  break;
>>
>>  case PTP_CLASS_IPV6:
>>  ptr += IP6_HLEN + UDP_HLEN;
>>  break;
>>
>>  case PTP_CLASS_L2:
>>  break;
>>
>>  default:
>>  return NULL;
>>  }
>>
>>  ptr += ETH_HLEN;
>>
>>  if (ptr + 34 > skb->data + skb->len)
>>  return NULL;
>>
>>  return ptr;
>>
>> in other words, compare pointers, so that whether skb_push() etc has
>> been used on the skb is irrelevant?

I like this!

> The ptp v1 code always does locate the message type at
>
>  msgtype = data + offset + OFF_PTP_CONTROL
>
> OFF_PTP_CONTROL is 32. However, looking at the ptp v1 header, the
> message type is located at offset 20. What am I missing here?

0x20 == 32? I see it at offset 32 in IEEE 1588.

Re: [PATCH] bpf: Add bpf_ktime_get_real_ns

2020-07-29 Thread David Ahern

On 7/28/20 11:15 PM, Andrii Nakryiko wrote:
> On Tue, Jul 28, 2020 at 1:57 PM David Ahern  wrote:
>>
>> On 7/28/20 12:28 PM, Andrii Nakryiko wrote:
>>> In some, yes, which also means that in some other they can't. So I'm
>>> still worried about misuses of REALCLOCK, within (internal daemons
>>> within the company) our outside (BCC tools and alike) of data centers.
>>> Especially if people will start using it to measure elapsed time
>>> between events. I'd rather not have to explain over and over again
>>> that REALCLOCK is not for measuring passage of time.
>>
>> Why is documenting the type of clock and its limitations not sufficient?
>> Users are going to make mistakes and use of gettimeofday to measure time
>> differences is a common one for userspace code. That should not define
>> or limit the ability to correctly and most directly do something in bpf.
>>
>> I have a patch to export local_clock as bpf_ktime_get_fast_ns. It too
>> can be abused given that it has limitations (can not be used across CPUs
>> and does not correlate to any exported clock), but it too has important
>> use cases (twice as fast as bpf_ktime_get_ns and useful for per-cpu
>> delta-time needs).
>>
>> Users have to know what they are doing; making mistakes is part of
>> learning. Proper documentation is all you can do.
> 
> I don't believe that's all we can do. Designing APIs that are less
> error-prone is at least one way to go about that. One can find plenty
> of examples where well-documented and standardized APIs are
> nevertheless misused regularly. Also, "users read and follow
> documentation" doesn't match my experience, unfortunately.
> 

This API is about exposing a standard, well-known clock source to bpf
programs. Your argument is no because some users might use it
incorrectly, and I think that argument is wrong. Don't make people who
know what they are doing jump through hoops because a novice might make
a mistake.

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-29 Thread Eli Cohen

OK, we have a mode of operation that does not require driver
intervention to manipulate the event queues so I think we're ok with
this design.

On Wed, Jul 29, 2020 at 06:19:52PM +0800, Jason Wang wrote:
> 
> On 2020/7/29 下午5:55, Eli Cohen wrote:
> >On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:
> >>On 2020/7/28 下午5:04, Eli Cohen wrote:
> >>>On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
> +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
> +{
> + struct vhost_virtqueue *vq = &v->vqs[qid];
> + const struct vdpa_config_ops *ops = v->vdpa->config;
> + struct vdpa_device *vdpa = v->vdpa;
> + int ret, irq;
> +
> + spin_lock(&vq->call_ctx.ctx_lock);
> + irq = ops->get_vq_irq(vdpa, qid);
> + if (!vq->call_ctx.ctx || irq == -EINVAL) {
> + spin_unlock(&vq->call_ctx.ctx_lock);
> + return;
> + }
> +
> >>>If I understand correctly, this will cause these IRQs to be forwarded
> >>>directly to the VCPU, e.g. will be handled by the guest/qemu.
> >>
> >>Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.
> >>
> >So, usually the network driver knows how to handle interrups for its
> >devices. I assume the virtio_net driver at the guest has some default
> >processing but what if the underlying hardware device (such as the case
> >of vdpa) needs to take some actions?
> 
> 
> Virtio splits the bus operations out of device operations. So did
> the driver.
> 
> The virtio-net driver depends on a transport driver to talk to the
> real device. Usually PCI is used as the transport for the device. In
> this case virtio-pci driver is in charge of dealing with irq
> allocation/free/configuration and it needs to co-operate with
> platform specific irqchip (virtualized by KVM) to finish the work
> like irq acknowledge etc.  E.g for x86, the irq offloading can only
> work when there's a hardware support of virtual irqchip (APICv) then
> all stuffs could be done without vmexits.
> 
> So no vendor specific part since the device and transport are all standard.
> 
> 
> >  Is there an option to do bounce the
> >interrupt back to the vendor specific driver in the host so it can take
> >these actions?
> 
> 
> Currently not, but even if we can do this, I'm afraid we will lose
> the performance advantage of irq bypassing.
> 
> 
> >
> >>>Does this mean that the host will not handle this interrupt? How does it
> >>>work in case on level triggered interrupts?
> >>
> >>There's no guarantee that the KVM arch code can make sure the irq
> >>bypass work for any type of irq. So if they the irq will still need
> >>to be handled by host first. This means we should keep the host
> >>interrupt handler as a slowpath (fallback).
> >>
> >>>In the case of ConnectX, I need to execute some code to acknowledge the
> >>>interrupt.
> >>
> >>This turns out to be hard for irq bypassing to work. Is it because
> >>the irq is shared or what kind of ack you need to do?
> >I have an EQ which is a queue for events comming from the hardware. This
> >EQ can created so it reports only completion events but I still need to
> >execute code that roughly tells the device that I saw these event
> >records and then arm it again so it can report more interrupts (e.g if
> >more packets are received or sent). This is device specific code.
> 
> 
> Any chance that the hardware can use MSI (which is not the case here)?
> 
> Thanks
> 
> 
> >>Thanks
> >>
> >>
> >>>Can you explain how this should be done?
> >>>
>

Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function

2020-07-29 Thread Kurt Kanzenbach

On Wed Jul 29 2020, Richard Cochran wrote:
> On Wed, Jul 29, 2020 at 12:29:08PM +0200, Kurt Kanzenbach wrote:
>> I'll test it and send v3. But before, I've got another question that
>> somebody might have an answer to:
>> 
>> The ptp v1 code always does locate the message type at
>> 
>>  msgtype = data + offset + OFF_PTP_CONTROL
>> 
>> OFF_PTP_CONTROL is 32. However, looking at the ptp v1 header, the
>> message type is located at offset 20. What am I missing here?
>
>
> My source back in the day was the John Eidson book.  In Appendix A it claims
>
>
>Table A.1. Common PTP message fields
>
>Field namePurpose
>
>messageType   Identifies message as event or general
>control   Indicates the message type, e.g., Sync

OK. That was the missing piece. I'll adjust patch 2 accordingly. Thanks
a lot.

Thanks,
Kurt


signature.asc
Description: PGP signature

Re: [PATCH v4 bpf 2/2] selftests/bpf: extend map-in-map selftest to detect memory leaks

2020-07-29 Thread Jakub Sitnicki

On Wed, Jul 29, 2020 at 06:09 AM CEST, Andrii Nakryiko wrote:
> Add test validating that all inner maps are released properly after skeleton
> is destroyed. To ensure determinism, trigger kernel-side synchronize_rcu()
> before checking map existence by their IDs.
>
> Acked-by: Song Liu 
> Signed-off-by: Andrii Nakryiko 
> ---
>  .../selftests/bpf/prog_tests/btf_map_in_map.c | 124 --
>  1 file changed, 110 insertions(+), 14 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c 
> b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
> index f7ee8fa377ad..f6eee3fb933c 100644
> --- a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
> +++ b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
> @@ -5,10 +5,60 @@
>
>  #include "test_btf_map_in_map.skel.h"
>
> +static int duration;
> +
> +static __u32 bpf_map_id(struct bpf_map *map)
> +{
> + struct bpf_map_info info;
> + __u32 info_len = sizeof(info);
> + int err;
> +
> + memset(&info, 0, info_len);
> + err = bpf_obj_get_info_by_fd(bpf_map__fd(map), &info, &info_len);
> + if (err)
> + return 0;
> + return info.id;
> +}
> +
> +/*
> + * Trigger synchronize_cpu() in kernel.

Nit: synchronize_*r*cu().

> + *
> + * ARRAY_OF_MAPS/HASH_OF_MAPS lookup/update operations trigger
> + * synchronize_rcu(), if looking up/updating non-NULL element. Use this fact
> + * to trigger synchronize_cpu(): create map-in-map, create a trivial ARRAY
> + * map, update map-in-map with ARRAY inner map. Then cleanup. At the end, at
> + * least one synchronize_rcu() would be called.
> + */

That's a cool trick. I'm a bit confused by "looking up/updating non-NULL
element". It looks like you're updating an element that is NULL/unset in
the code below. What am I missing?

> +static int kern_sync_rcu(void)
> +{
> + int inner_map_fd, outer_map_fd, err, zero = 0;
> +
> + inner_map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 4, 1, 0);
> + if (CHECK(inner_map_fd < 0, "inner_map_create", "failed %d\n", -errno))
> + return -1;
> +
> + outer_map_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS, NULL,
> +  sizeof(int), inner_map_fd, 1, 0);
> + if (CHECK(outer_map_fd < 0, "outer_map_create", "failed %d\n", -errno)) 
> {
> + close(inner_map_fd);
> + return -1;
> + }
> +
> + err = bpf_map_update_elem(outer_map_fd, &zero, &inner_map_fd, 0);
> + if (err)
> + err = -errno;
> + CHECK(err, "outer_map_update", "failed %d\n", err);
> + close(inner_map_fd);
> + close(outer_map_fd);
> + return err;
> +}
> +
>  void test_btf_map_in_map(void)
>  {
> - int duration = 0, err, key = 0, val;
> - struct test_btf_map_in_map* skel;
> + int err, key = 0, val, i;
> + struct test_btf_map_in_map *skel;
> + int outer_arr_fd, outer_hash_fd;
> + int fd, map1_fd, map2_fd, map1_id, map2_id;
>
>   skel = test_btf_map_in_map__open_and_load();
>   if (CHECK(!skel, "skel_open", "failed to open&load skeleton\n"))
> @@ -18,32 +68,78 @@ void test_btf_map_in_map(void)
>   if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
>   goto cleanup;
>
> + map1_fd = bpf_map__fd(skel->maps.inner_map1);
> + map2_fd = bpf_map__fd(skel->maps.inner_map2);
> + outer_arr_fd = bpf_map__fd(skel->maps.outer_arr);
> + outer_hash_fd = bpf_map__fd(skel->maps.outer_hash);
> +
>   /* inner1 = input, inner2 = input + 1 */
> - val = bpf_map__fd(skel->maps.inner_map1);
> - bpf_map_update_elem(bpf_map__fd(skel->maps.outer_arr), &key, &val, 0);
> - val = bpf_map__fd(skel->maps.inner_map2);
> - bpf_map_update_elem(bpf_map__fd(skel->maps.outer_hash), &key, &val, 0);
> + map1_fd = bpf_map__fd(skel->maps.inner_map1);
> + bpf_map_update_elem(outer_arr_fd, &key, &map1_fd, 0);
> + map2_fd = bpf_map__fd(skel->maps.inner_map2);
> + bpf_map_update_elem(outer_hash_fd, &key, &map2_fd, 0);
>   skel->bss->input = 1;
>   usleep(1);
>
> - bpf_map_lookup_elem(bpf_map__fd(skel->maps.inner_map1), &key, &val);
> + bpf_map_lookup_elem(map1_fd, &key, &val);
>   CHECK(val != 1, "inner1", "got %d != exp %d\n", val, 1);
> - bpf_map_lookup_elem(bpf_map__fd(skel->maps.inner_map2), &key, &val);
> + bpf_map_lookup_elem(map2_fd, &key, &val);
>   CHECK(val != 2, "inner2", "got %d != exp %d\n", val, 2);
>
>   /* inner1 = input + 1, inner2 = input */
> - val = bpf_map__fd(skel->maps.inner_map2);
> - bpf_map_update_elem(bpf_map__fd(skel->maps.outer_arr), &key, &val, 0);
> - val = bpf_map__fd(skel->maps.inner_map1);
> - bpf_map_update_elem(bpf_map__fd(skel->maps.outer_hash), &key, &val, 0);
> + bpf_map_update_elem(outer_arr_fd, &key, &map2_fd, 0);
> + bpf_map_update_elem(outer_hash_fd, &key, &map1_fd, 0);
>   skel->bss->input = 3;
>   usleep(1);
>
> - bpf_map_lookup_elem(bpf_map__fd(

Re: [PATCH 1/5] seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g

2020-07-29 Thread Ahmed S. Darwish

On Wed, Jul 29, 2020 at 03:52:50PM +0200, Peter Zijlstra wrote:
> __SEQ_LOCKDEP() is an expression gate for the
> seqcount_LOCKNAME_t::lock member. Rename it to be about the member,
> not the gate condition.
>
> Later (PREEMPT_RT) patches will make the member available for !LOCKDEP
> configs.
>
> Signed-off-by: Peter Zijlstra (Intel) 
> ---

Acked-by: Ahmed S. Darwish

1 2 3 >

1 - 100 of 280 matches

Mail list logo