Re: [V9fs-developer] [PATCH kernel] 9p/trans_fd: Check file mode at opening
On Wed, 29 Jul 2020 09:50:21 +1000 Alexey Kardashevskiy wrote: > > > On 29/07/2020 03:42, Greg Kurz wrote: > > Hi Alexey, > > > > Working on 9p now ?!? ;-) > > No, I am running syzkaller and seeing things :) > :) > > > Cc'ing Dominique Martinet who appears to be the person who takes care of 9p > > these days. > > > > On Tue, 28 Jul 2020 22:41:29 +1000 > > Alexey Kardashevskiy wrote: > > > >> The "fd" transport layer uses 2 file descriptors passed externally > >> and calls kernel_write()/kernel_read() on these. If files were opened > >> without FMODE_WRITE/FMODE_READ, WARN_ON_ONCE() will fire. > >> > >> This adds file mode checking in p9_fd_open; this returns -EBADF to > >> preserve the original behavior. > >> > > > > So this would cause open() to fail with EBADF, which might look a bit > > weird to userspace since it didn't pass an fd... Is this to have a > > different error than -EIO that is returned when either rfd or wfd > > doesn't point to an open file descriptor ? > > This is only to preserve the existing behavior. > > > If yes, why do we care ? > > > Without the patch, p9_fd_open() produces a kernel warning which is not > great by itself and becomes crash with panic_on_warn. > I don't question the patch, just the errno. Why not returning -EIO ? > > > > > >> Found by syzkaller. > >> > >> Signed-off-by: Alexey Kardashevskiy > >> --- > >> net/9p/trans_fd.c | 7 ++- > >> 1 file changed, 6 insertions(+), 1 deletion(-) > >> > >> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c > >> index 13cd683a658a..62cdfbd01f0a 100644 > >> --- a/net/9p/trans_fd.c > >> +++ b/net/9p/trans_fd.c > >> @@ -797,6 +797,7 @@ static int parse_opts(char *params, struct p9_fd_opts > >> *opts) > >> > >> static int p9_fd_open(struct p9_client *client, int rfd, int wfd) > >> { > >> + bool perm; > >>struct p9_trans_fd *ts = kzalloc(sizeof(struct p9_trans_fd), > >> GFP_KERNEL); > >>if (!ts) > >> @@ -804,12 +805,16 @@ static int p9_fd_open(struct p9_client *client, int > >> rfd, int wfd) > >> > >>ts->rd = fget(rfd); > >>ts->wr = fget(wfd); > >> - if (!ts->rd || !ts->wr) { > >> + perm = ts->rd && (ts->rd->f_mode & FMODE_READ) && > >> + ts->wr && (ts->wr->f_mode & FMODE_WRITE); > >> + if (!ts->rd || !ts->wr || !perm) { > >>if (ts->rd) > >>fput(ts->rd); > >>if (ts->wr) > >>fput(ts->wr); > >>kfree(ts); > >> + if (!perm) > >> + return -EBADF; > >>return -EIO; > >>} > >> > > >
回复: INFO: rcu detected stall in tc_modify_qdisc
发件人: linux-kernel-ow...@vger.kernel.org 代表 syzbot 发送时间: 2020年7月29日 13:53 收件人: da...@davemloft.net; fweis...@gmail.com; j...@mojatatu.com; j...@resnulli.us; linux-ker...@vger.kernel.org; mi...@kernel.org; netdev@vger.kernel.org; syzkaller-b...@googlegroups.com; t...@linutronix.de; vinicius.go...@intel.com; xiyou.wangc...@gmail.com 主题: INFO: rcu detected stall in tc_modify_qdisc Hello, syzbot found the following issue on: HEAD commit:181964e6 fix a braino in cmsghdr_from_user_compat_to_kern() git tree: net console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890 kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267 dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15b2d79090 The issue was bisected to: commit 5a781ccbd19e4664babcbe4b4ead7aa2b9283d22 Author: Vinicius Costa Gomes Date: Sat Sep 29 00:59:43 2018 + tc: Add support for configuring the taprio scheduler bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=160e1bac90 console output: https://syzkaller.appspot.com/x/log.txt?x=110e1bac90 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+9f78d5c664a8c33f4...@syzkaller.appspotmail.com Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") rcu: INFO: rcu_preempt self-detected stall on CPU rcu:1-...!: (1 GPs behind) idle=6f6/1/0x4000 softirq=10195/10196 fqs=1 (t=27930 jiffies g=9233 q=413) rcu: rcu_preempt kthread starved for 27901 jiffies! g9233 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu:Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: rcu_preempt R running task2911210 2 0x4000 Call Trace: context_switch kernel/sched/core.c:3458 [inline] __schedule+0x8ea/0x2210 kernel/sched/core.c:4219 schedule+0xd0/0x2a0 kernel/sched/core.c:4294 schedule_timeout+0x148/0x250 kernel/time/timer.c:1908 rcu_gp_fqs_loop kernel/rcu/tree.c:1874 [inline] rcu_gp_kthread+0xae5/0x1b50 kernel/rcu/tree.c:2044 kthread+0x3b5/0x4a0 kernel/kthread.c:291 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 NMI backtrace for cpu 1 CPU: 1 PID: 6799 Comm: syz-executor494 Not tainted 5.8.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101 nmi_trigger_cpumask_backtrace+0x1b3/0x223 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x194/0x1cf kernel/rcu/tree_stall.h:320 print_cpu_stall kernel/rcu/tree_stall.h:553 [inline] check_cpu_stall kernel/rcu/tree_stall.h:627 [inline] rcu_pending kernel/rcu/tree.c:3489 [inline] rcu_sched_clock_irq.cold+0x5b3/0xccc kernel/rcu/tree.c:2504 update_process_times+0x25/0x60 kernel/time/timer.c:1737 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176 tick_sched_timer+0x108/0x290 kernel/time/tick-sched.c:1320 __run_hrtimer kernel/time/hrtimer.c:1520 [inline] __hrtimer_run_queues+0x1d5/0xfc0 kernel/time/hrtimer.c:1584 hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline] __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline] RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191 Code: 48 c7 c0 88 e0 b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d e3 52 cc 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 35 e5 66 f9 65 8b 05 fe 70 19 78 RSP: 0018:c900016672c0 EFLAGS: 0282 RAX: 11369c11 RBX: 0282 RCX: 0002 RDX: dc00 RSI: RDI: 0282 RBP: 888093a052e8 R08: R09: R10: 0001 R11: R12: 0282 R13: 0078100c35c3 R14: 888093a05000 R15: spin_unlock_irqrestore include/linux/spinlock.h:408 [inline] taprio_change+0x1fdc/0x2960 net/sched/sch_taprio.c:1557 It looks like
Aw: Re: [PATCH] net: ethernet: mtk_eth_soc: Always call mtk_gmac0_rgmii_adjust() for mt7623
Hi, Thank you David's to get this finally applied. Add recipient for stable tree as TRGMII on 5.4+ is also broken without this Patch. regards Frank > Gesendet: Mittwoch, 29. Juli 2020 um 02:05 Uhr > Von: "David Miller" Applied.
Re: [PATCH v3] net: ethernet: mtk_eth_soc: fix mtu warning
Hi Frank, If you send next version of patch, you can help to add the Signed-off line. Thanks. Signed-off-by: Landen Chao On Tue, 2020-07-28 at 23:53 +0800, Jakub Kicinski wrote: > On Tue, 28 Jul 2020 14:27:43 +0200 Frank Wunderlich wrote: > > From: Landen Chao > > Hi gents, > > if the patch is from Landen we need his sign-off on it. > > > in recent Kernel-Versions there are warnings about incorrect MTU-Size > > like these: > > > > eth0: mtu greater than device maximum > > mtk_soc_eth 1b10.ethernet eth0: error -22 setting MTU to include DSA > > overhead > > > > Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") > > Fixes: 72579e14a1d3 ("net: dsa: don't fail to probe if we couldn't set the > > MTU") > > Fixes: 7a4c53bee332 ("net: report invalid mtu value via netlink extack") > > Signed-off-by: René van Dorst > > Signed-off-by: Frank Wunderlich >
Re: [PATCH] net: nixge: fix potential memory leak in nixge_probe()
> If some processes in nixge_probe() fail, free_netdev(dev) > needs to be called to aviod a memory leak. * Would you like to avoid a typo in this change description? * An imperative wording can be preferred here, can't it? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=6ba1b005ffc388c2aeaddae20da29e4810dea298#n151 Regards, Markus
RE: [PATCH 2/4] net: make sockptr_is_null strict aliasing safe
From: Christoph Hellwig > Sent: 28 July 2020 17:39 > > While the kernel in general is not strict aliasing safe we can trivially > do that in sockptr_is_null without affecting code generation, so always > check the actually assigned union member. Even with 'strict aliasing' gcc (at least) guarantees that the members of a union alias each other. It is about the only way so safely interpret a float as an int. So when sockptr_t is a union testing either member is enough. When it is a structure the changed form almost certainly adds code. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[PATCH v4] net: ethernet: mtk_eth_soc: fix MTU warnings
From: Landen Chao in recent kernel versions there are warnings about incorrect MTU size like these: eth0: mtu greater than device maximum mtk_soc_eth 1b10.ethernet eth0: error -22 setting MTU to include DSA overhead Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") Fixes: 72579e14a1d3 ("net: dsa: don't fail to probe if we couldn't set the MTU") Fixes: 7a4c53bee332 ("net: report invalid mtu value via netlink extack") Signed-off-by: Landen Chao Signed-off-by: Frank Wunderlich Reviewed-by: Andrew Lunn --- v3->v4 - fix commit-message (hyphernations,capitalisation) as suggested by Russell - add Signed-off-by Landen - dropped wrong signed-off from rene (because previous v1/2 was from him) --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 85735d32ecb0..a1c45b39a230 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -2891,6 +2891,8 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np) eth->netdev[id]->irq = eth->irq[0]; eth->netdev[id]->dev.of_node = np; + eth->netdev[id]->max_mtu = MTK_MAX_RX_LENGTH - MTK_RX_ETH_HLEN; + return 0; free_netdev: -- 2.25.1
Re: fentry/fexit attach to EXT type XDP program does not work
On Wed, Jul 29, 2020 at 08:23:56AM +0200, Eelco Chaudron wrote: SNIP > > > > > a patch > > > > > that would nice. > > > > > You can also send it to me before bpf-next opens and I can verify > > > > > it, and > > > > > clean up the self-test so it can be included as well. > > > > > > > > > > > > > hi, > > > > it seems that you cannot exten fentry/fexit programs, > > > > but it's possible to attach fentry/fexit to ext program. > > > > > > > >/* Program extensions can extend all program types > > > > * except fentry/fexit. The reason is the following. > > > > * The fentry/fexit programs are used for performance > > > > * analysis, stats and can be attached to any program > > > > * type except themselves. When extension program is > > > > * replacing XDP function it is necessary to allow > > > > * performance analysis of all functions. Both original > > > > * XDP program and its program extension. Hence > > > > * attaching fentry/fexit to BPF_PROG_TYPE_EXT is > > > > * allowed. If extending of fentry/fexit was allowed it > > > > * would be possible to create long call chain > > > > * fentry->extension->fentry->extension beyond > > > > * reasonable stack size. Hence extending fentry is not > > > > * allowed. > > > > */ > > > > > > > > I changed fexit_bpf2bpf.c test just to do a quick check > > > > and it seems to work: > > > > > > Hi Jiri this is exactly what I’m trying, however when you do this > > > where the > > > first argument is a pointer to some context data which you are > > > accessing > > > it’s failing in the verifier. > > > This is a link to the original email, which has a test patch > > > attached that > > > will show the failure when trying to load/attach the fentry function > > > and > > > access the context: > > > > > > https://lore.kernel.org/bpf/159162546868.10791.12432342618156330247.stgit@ebuild/ > > > > ok, I tried to trace ext program with __sk_buff argument and I can see > > the issue as well.. can't acess the skb argument > > > > patch below fixes it for me, I can access the skb pointer and its data > > via probe read, like: > > > > SEC("fexit/new_get_skb_ifindex") > > int BPF_PROG(fexit_new_get_skb_ifindex, int val, struct __sk_buff *skb, > > int var, int ret) > > { > > __u32 data; > > int err; > > > > bpf_printk("EXIT skb %p", skb); > > bpf_probe_read_kernel(&data, sizeof(data), &skb->data); > > bpf_printk("EXIT ret %d, data %p", err, data); > > return 0; > > } > > > > I think it should fix the xdp_md acess as well > > Excellent patch ;) It works with xdp_md as well, and even better it does not > require the bpf_probe_read_kernel(), so the test_xdp_bpf2bpf.c code just > works. great ;-) will check on xdp_md > > Are you planning to send the patch upstream? yep, I'll add some test for that and send it thanks, jirka
KASAN: vmalloc-out-of-bounds Read in get_counters
Hello, syzbot found the following issue on: HEAD commit:68845a55 Merge branch 'akpm' into master (patches from And.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1366896490 kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267 dashboard link: https://syzkaller.appspot.com/bug?extid=a450cb4aa95912e62487 compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a450cb4aa95912e62...@syzkaller.appspotmail.com == BUG: KASAN: vmalloc-out-of-bounds in get_counters+0x593/0x610 net/ipv6/netfilter/ip6_tables.c:780 Read of size 8 at addr c9000528b048 by task syz-executor.1/6968 CPU: 1 PID: 6968 Comm: syz-executor.1 Not tainted 5.8.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 get_counters+0x593/0x610 net/ipv6/netfilter/ip6_tables.c:780 do_ip6t_get_ctl+0x516/0x910 net/ipv6/netfilter/ip6_tables.c:821 nf_sockopt net/netfilter/nf_sockopt.c:104 [inline] nf_getsockopt+0x72/0xd0 net/netfilter/nf_sockopt.c:122 ipv6_getsockopt+0x1bf/0x270 net/ipv6/ipv6_sockglue.c:1468 tcp_getsockopt+0x86/0xd0 net/ipv4/tcp.c:3893 __sys_getsockopt+0x14b/0x2e0 net/socket.c:2172 __do_sys_getsockopt net/socket.c:2187 [inline] __se_sys_getsockopt net/socket.c:2184 [inline] __x64_sys_getsockopt+0xba/0x150 net/socket.c:2184 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45ee7a Code: Bad RIP value. RSP: 002b:00c9f618 EFLAGS: 0212 ORIG_RAX: 0037 RAX: ffda RBX: 00c9f640 RCX: 0045ee7a RDX: 0041 RSI: 0029 RDI: 0003 RBP: 00744ca0 R08: 00c9f63c R09: 4000 R10: 00c9f740 R11: 0212 R12: 0003 R13: R14: 0029 R15: 007445e0 Memory state around the buggy address: c9000528af00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 c9000528af80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 >c9000528b000: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 f9 ^ c9000528b080: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 c9000528b100: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 == --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
Re: [PATCH net] devlink: ignore -EOPNOTSUPP errors on dumpit
Wed, Jul 29, 2020 at 01:15:07AM CEST, k...@kernel.org wrote: >Number of .dumpit functions try to ignore -EOPNOTSUPP errors. >Recent change missed that, and started reporting all errors >but -EMSGSIZE back from dumps. This leads to situation like >this: > >$ devlink dev info >devlink answers: Operation not supported > >Dump should not report an error just because the last device >to be queried could not provide an answer. > >To fix this and avoid similar confusion make sure we clear >err properly, and not leave it set to an error if we don't >terminate the iteration. > >Fixes: c62c2cfb801b ("net: devlink: don't ignore errors during dumpit") >Signed-off-by: Jakub Kicinski Yeah, that makes perfect sense. Thanks for the fix Kuba! Reviewed-by: Jiri Pirko
RE: [PATCH 4/4] net: improve the user pointer check in init_user_sockptr
From: Christoph Hellwig > Sent: 28 July 2020 17:39 > > Make sure not just the pointer itself but the whole range lies in > the user address space. For that pass the length and then use > the access_ok helper to do the check. Now that the address is never changed it is enough to check the base address (although this would be slightly safer if sockaddr_t were 'const'). This is especially true given some code paths ignore the length (so the length must be checked later - which it will be). Isn't TASK_SIZE the wrong 'constant'. Looks pretty 'heavy' on x86-64. You just need something between valid user and valid kernel addresses. So (1ull << 63) is fine on x86-64 (and probably all 64bit with non-overlapping user/kernel addresses). For i386 you need (3ul << 30) - probably also common. David > > Fixes: 6d04fe15f78a ("net: optimize the sockptr_t for unified kernel/user > address spaces") > Reported-by: David Laight > Signed-off-by: Christoph Hellwig > --- > include/linux/sockptr.h | 18 ++ > net/ipv4/bpfilter/sockopt.c | 2 +- > net/socket.c| 2 +- > 3 files changed, 8 insertions(+), 14 deletions(-) > > diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h > index 9e6c81d474cba8..96840def9d69cc 100644 > --- a/include/linux/sockptr.h > +++ b/include/linux/sockptr.h > @@ -27,14 +27,6 @@ static inline sockptr_t KERNEL_SOCKPTR(void *p) > { > return (sockptr_t) { .kernel = p }; > } > - > -static inline int __must_check init_user_sockptr(sockptr_t *sp, void __user > *p) > -{ > - if ((unsigned long)p >= TASK_SIZE) > - return -EFAULT; > - sp->user = p; > - return 0; > -} > #else /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */ > typedef struct { > union { > @@ -53,14 +45,16 @@ static inline sockptr_t KERNEL_SOCKPTR(void *p) > { > return (sockptr_t) { .kernel = p, .is_kernel = true }; > } > +#endif /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */ > > -static inline int __must_check init_user_sockptr(sockptr_t *sp, void __user > *p) > +static inline int __must_check init_user_sockptr(sockptr_t *sp, void __user > *p, > + size_t size) > { > - sp->user = p; > - sp->is_kernel = false; > + if (!access_ok(p, size)) > + return -EFAULT; > + *sp = (sockptr_t) { .user = p }; > return 0; > } > -#endif /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */ > > static inline bool sockptr_is_null(sockptr_t sockptr) > { > diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c > index 94f18d2352d007..545b2640f0194d 100644 > --- a/net/ipv4/bpfilter/sockopt.c > +++ b/net/ipv4/bpfilter/sockopt.c > @@ -65,7 +65,7 @@ int bpfilter_ip_get_sockopt(struct sock *sk, int optname, > > if (get_user(len, optlen)) > return -EFAULT; > - err = init_user_sockptr(&optval, user_optval); > + err = init_user_sockptr(&optval, user_optval, len); > if (err) > return err; > return bpfilter_mbox_request(sk, optname, optval, len, false); > diff --git a/net/socket.c b/net/socket.c > index 94ca4547cd7c53..aff52e81653ce3 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -2105,7 +2105,7 @@ int __sys_setsockopt(int fd, int level, int optname, > char __user *user_optval, > if (optlen < 0) > return -EINVAL; > > - err = init_user_sockptr(&optval, user_optval); > + err = init_user_sockptr(&optval, user_optval, optlen); > if (err) > return err; > > -- > 2.27.0 - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[PATCH net] vxlan: Ensure FDB dump is performed under RCU
From: Ido Schimmel The commit cited below removed the RCU read-side critical section from rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked without RCU protection. This results in the following warning [1] in the VXLAN driver, which relied on the callback being invoked from an RCU read-side critical section. Fix this by calling rcu_read_lock() in the VXLAN driver, as already done in the bridge driver. [1] WARNING: suspicious RCU usage 5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted - drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by bridge/166: #0: 85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090 stack backtrace: CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x153/0x15d vxlan_fdb_dump+0x51e/0x6d0 rtnl_fdb_dump+0x4dc/0xad0 netlink_dump+0x540/0x1090 __netlink_dump_start+0x695/0x950 rtnetlink_rcv_msg+0x802/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 __sys_sendto+0x279/0x3b0 __x64_sys_sendto+0xe6/0x1a0 do_syscall_64+0x54/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fe14fa2ade0 Code: Bad RIP value. RSP: 002b:7fff75bb5b88 EFLAGS: 0246 ORIG_RAX: 002c RAX: ffda RBX: 5614b1ba0020 RCX: 7fe14fa2ade0 RDX: 011c RSI: 7fff75bb5b90 RDI: 0003 RBP: 7fff75bb5b90 R08: R09: R10: R11: 0246 R12: 5614b1b89160 R13: R14: R15: Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl") Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- drivers/net/vxlan.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 89d85dcb200e..5efe1e28f270 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1376,6 +1376,7 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb, for (h = 0; h < FDB_HASH_SIZE; ++h) { struct vxlan_fdb *f; + rcu_read_lock(); hlist_for_each_entry_rcu(f, &vxlan->fdb_head[h], hlist) { struct vxlan_rdst *rd; @@ -1387,8 +1388,10 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb, cb->nlh->nlmsg_seq, RTM_NEWNEIGH, NLM_F_MULTI, NULL); - if (err < 0) + if (err < 0) { + rcu_read_unlock(); goto out; + } skip_nh: *idx += 1; continue; @@ -1403,12 +1406,15 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb, cb->nlh->nlmsg_seq, RTM_NEWNEIGH, NLM_F_MULTI, rd); - if (err < 0) + if (err < 0) { + rcu_read_unlock(); goto out; + } skip: *idx += 1; } } + rcu_read_unlock(); } out: return err; -- 2.26.2
[PATCH net] ipv4: Silence suspicious RCU usage warning
From: Ido Schimmel fib_trie_unmerge() is called with RTNL held, but not from an RCU read-side critical section. This leads to the following warning [1] when the FIB alias list in a leaf is traversed with hlist_for_each_entry_rcu(). Since the function is always called with RTNL held and since modification of the list is protected by RTNL, simply use hlist_for_each_entry() and silence the warning. [1] WARNING: suspicious RCU usage 5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Not tainted - net/ipv4/fib_trie.c:1867 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/164: #0: 85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0 stack backtrace: CPU: 0 PID: 164 Comm: ip Not tainted 5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x153/0x15d fib_trie_unmerge+0x608/0xdb0 fib_unmerge+0x44/0x360 fib4_rule_configure+0xc8/0xad0 fib_nl_newrule+0x37a/0x1dd0 rtnetlink_rcv_msg+0x4f7/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 sys_sendmsg+0x879/0xa00 ___sys_sendmsg+0x122/0x190 __sys_sendmsg+0x103/0x1d0 __x64_sys_sendmsg+0x7d/0xb0 do_syscall_64+0x54/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc80a234e97 Code: Bad RIP value. RSP: 002b:7ffef8b66798 EFLAGS: 0246 ORIG_RAX: 002e RAX: ffda RBX: RCX: 7fc80a234e97 RDX: RSI: 7ffef8b66800 RDI: 0003 RBP: 5f141b1c R08: 0001 R09: R10: 7fc80a2a8ac0 R11: 0246 R12: 0001 R13: R14: 7ffef8b67008 R15: 556fccb10020 Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- net/ipv4/fib_trie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 248f1c1959a6..3c65f71d0e82 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1864,7 +1864,7 @@ struct fib_table *fib_trie_unmerge(struct fib_table *oldtb) while ((l = leaf_walk_rcu(&tp, key)) != NULL) { struct key_vector *local_l = NULL, *local_tp; - hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) { + hlist_for_each_entry(fa, &l->leaf, fa_list) { struct fib_alias *new_fa; if (local_tb->tb_id != fa->tb_id) -- 2.26.2
Re: [PATCH] ethernet: fix potential memory leak in gemini_ethernet_port_probe()
> If some processes in gemini_ethernet_port_probe() fail, > free_netdev(dev) needs to be called to avoid a memory leak. Would you like to use an imperative wording for this change description? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=6ba1b005ffc388c2aeaddae20da29e4810dea298#n151 … > +++ b/drivers/net/ethernet/cortina/gemini.c > @@ -2501,8 +2513,10 @@ static int gemini_ethernet_port_probe(struct > platform_device *pdev) > IRQF_SHARED, > port_names[port->id], > port); > - if (ret) > + if (ret) { > + free_netdev(netdev); > return ret; > + } > > ret = register_netdev(netdev); … I suggest to add a jump target for the desired exception handling in this function implementation. if (ret) - return ret; + goto free_netdev; Regards, Markus
Re: [net-next PATCH v7 1/6] Documentation: ACPI: DSD: Document MDIO PHY
On Wed, Jul 29, 2020 at 4:53 AM Jeremy Linton wrote: > > Hi, > > On 7/28/20 7:39 PM, Florian Fainelli wrote: > > On 7/28/2020 3:30 PM, Jeremy Linton wrote: > >> Hi, > >> > >> On 7/28/20 3:06 AM, Dan Callaghan wrote: > >>> Excerpts from Andrew Lunn's message of 2020-07-24 21:14:36 +02:00: > Now i could be wrong, but are Ethernet switches something you expect > to see on ACPI/SBSA platforms? Or is this a legitimate use of the > escape hatch? > >>> > >>> As an extra data point: right now I am working on an x86 embedded > >>> appliance (ACPI not Device Tree) with 3x integrated Marvell switches. > >>> I have been watching this patch series with great interest, because > >>> right now there is no way for me to configure a complex switch topology > >>> in DSA without Device Tree. > >> > >> DSA though, the switch is hung off a normal MAC/MDIO, right? (ignoring > >> whether that NIC/MAC is actually hug off PCIe for the moment). > > > > There is no specific bus, we have memory mapped, MDIO, SPI, I2C swiches > > all supported within the driver framework right now. > > > >> > >> It just has a bunch of phy devices strung out on that single MAC/MDIO. > > > > It has a number of built-in PHYs that typically appear on a MDIO bus, > > whether that bus is the switch's internal MDIO bus, or another MDIO bus > > (which could be provided with just two GPIOs) depends on how the switch > > is connected to its management host. > > > > When the switch is interfaced via MDIO the switch also typically has a > > MDIO interface called the pseudo-PHY which is how you can actually tap > > into the control interface of the switch, as opposed to reading its > > internal PHYs from the MDIO bus. > > > >> So in ACPI land it would still have a relationship similar to the one > >> Andrew pointed out with his DT example where the eth0->mdio->phy are all > >> contained in their physical parent. The phy in that case associated with > >> the parent adapter would be the first direct decedent of the mdio, the > >> switch itself could then be represented with another device, with a > >> further string of device/phys representing the devices. (I dislike > >> drawing acsii art, but if this isn't clear I will, its also worthwhile > >> to look at the dpaa2 docs for how the mac/phys work on this device for > >> contrast.). > > > > The eth0->mdio->phy relationship you describe is the simple case that > > you are well aware of which is say what we have on the Raspberry Pi 4 > > with GENET and the external Broadcom PHY. > > > > For an Ethernet switch connected to an Ethernet MAC, we have 4 different > > types of objects: > > > > - the Ethernet MAC which sits on its specific bus > > > > - the Ethernet switch which also sits on its specific bus > > > > - the built-in PHYs of the Ethernet switch which sit on whatever > > bus/interface the switch provides to make them accessible > > > > - the specific bus controller that provides access to the Ethernet switch > > > > and this is a simplification that does not take into account Physical > > Coding Sublayer devices, pure MDIO devices (with no foot in the Ethernet > > land such as PCIe, USB3 or SATA PHYs), SFP, SFF and other pluggable modules. > > Which is why I've stayed away from much of the switch discussion. There > are a lot of edge cases to fall into, because for whatever reason > networking seems to be unique in all this non-enumerable customization > vs many other areas of the system. Storage, being an example i'm more > familiar with which has very similar problems yet, somehow has managed > to avoid much of this, despite having run on fabrics significantly more > complex than basic ethernet (including running on a wide range of hot > pluggable GBIC/SFP/QSFP/etc media layers). > > ACPI's "problem" here is that its strongly influenced by the "Plug and > Play" revolution of the 1990's where the industry went from having > humans describing hardware using machine readable languages, to hardware > which was enumerable using standard protocols. ACPI's device > descriptions are there as a crutch for the remaining non plug an play > hardware in the system. > > So at a basic level, if your describing hardware in ACPI rather than > abstracting it, that is a problem. > This is also my first run with ACPI and this discussion needs to be brought back to the powers that be regarding sorting this out. This is where I see it from an Armada and Layerscape perspective. This isn't a silver bullet fix but the small things I think that need to be done to move this forward. >From Microsoft's documentation. Device dependencies Typically, there are hardware dependencies between devices on a particular platform. Windows requires that all such dependencies be described so that it can ensure that all devices function correctly as things change dynamically in the system (device power is removed, drivers are stopped and started, and so on). In ACPI, dependencies between devices are described in the following ways:
Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
Hi Daniel, On Tue, Jul 28, 2020 at 10:47 PM CEST, Daniel Borkmann wrote: [...] > Jakub, I'm actually seeing a slightly different one on my test machine with > sk_lookup: > > # ./test_progs -t sk_lookup > #14 cgroup_skb_sk_lookup:OK > #73/1 query lookup prog:OK > #73/2 TCP IPv4 redir port:OK > #73/3 TCP IPv4 redir addr:OK > #73/4 TCP IPv4 redir with reuseport:OK > #73/5 TCP IPv4 redir skip reuseport:OK > #73/6 TCP IPv6 redir port:OK > #73/7 TCP IPv6 redir addr:OK > #73/8 TCP IPv4->IPv6 redir port:OK > #73/9 TCP IPv6 redir with reuseport:OK > #73/10 TCP IPv6 redir skip reuseport:OK > #73/11 UDP IPv4 redir port:OK > #73/12 UDP IPv4 redir addr:OK > #73/13 UDP IPv4 redir with reuseport:OK > attach_lookup_prog:PASS:open 0 nsec > attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec > make_socket:PASS:make_address 0 nsec > make_socket:PASS:socket 0 nsec > make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec > make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec > make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec > make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec > make_server:PASS:bind 0 nsec > make_server:PASS:attach_reuseport 0 nsec > update_lookup_map:PASS:bpf_map__fd 0 nsec > update_lookup_map:PASS:bpf_map_update_elem 0 nsec > make_socket:PASS:make_address 0 nsec > make_socket:PASS:socket 0 nsec > make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec > make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec > make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec > make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec > make_server:PASS:bind 0 nsec > make_server:PASS:attach_reuseport 0 nsec > update_lookup_map:PASS:bpf_map__fd 0 nsec > update_lookup_map:PASS:bpf_map_update_elem 0 nsec > make_socket:PASS:make_address 0 nsec > make_socket:PASS:socket 0 nsec > make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec > make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec > make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec > make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec > make_server:PASS:bind 0 nsec > make_server:PASS:attach_reuseport 0 nsec > run_lookup_prog:PASS:getsockname 0 nsec > run_lookup_prog:PASS:connect 0 nsec > make_socket:PASS:make_address 0 nsec > make_socket:PASS:socket 0 nsec > make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec > make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec > make_client:PASS:make_client 0 nsec > send_byte:PASS:send_byte 0 nsec > udp_recv_send:FAIL:recvmsg failed > (/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: > errno: Resource temporarily unavailable) failed to receive > #73/14 UDP IPv4 redir and reuseport with conns:FAIL > #73/15 UDP IPv4 redir skip reuseport:OK > #73/16 UDP IPv6 redir port:OK > #73/17 UDP IPv6 redir addr:OK > #73/18 UDP IPv4->IPv6 redir port:OK > #73/19 UDP IPv6 redir and reuseport:OK > attach_lookup_prog:PASS:open 0 nsec > attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec > make_socket:PASS:make_address 0 nsec > make_socket:PASS:socket 0 nsec > make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec > make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec > make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec > make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec > make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec > make_server:PASS:bind 0 nsec > make_server:PASS:attach_reuseport 0 nsec > update_lookup_map:PASS:bpf_map__fd 0 nsec > update_lookup_map:PASS:bpf_map_update_elem 0 nsec > make_socket:PASS:make_address 0 nsec > make_socket:PASS:socket 0 nsec > make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec > make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec > make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec > make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec > make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec > make_server:PASS:bind 0 nsec > make_server:PASS:attach_reuseport 0 nsec > update_lookup_map:PASS:bpf_map__fd 0 nsec > update_lookup_map:PASS:bpf_map_update_elem 0 nsec > make_socket:PASS:make_address 0 nsec > make_socket:PASS:socket 0 nsec > make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec > make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec > make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec > make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec > make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec > make_server:PASS:bind 0 nsec > make_server:PASS:attach_reuseport 0 nsec > run_lookup_prog:PASS:getsockname 0 nsec > run_lookup_prog:PASS:connect 0 nsec > make_socket:PASS:make_address 0 nsec > make_socket:PASS:socket 0 nsec > make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec > make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec > make_client:PASS:make_client 0 nsec > send_byte:PASS:send_byte 0 nsec > udp_recv_send:FAIL:recvmsg failed > (/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: > errno: Resource temporarily unavailable) failed to receive > #73/20 UDP IPv6 redir and reuseport with conns:FAIL > #73/21 UDP IPv6 redir skip reuseport:OK > #73/22 TCP IPv4 drop on lookup:OK > #73/23 TCP IPv6 drop on lookup:OK > #73/24 UDP IPv4 drop o
[PATCH net] selftests/bpf: add xdpdrv mode for test_xdp_redirect
This patch add xdpdrv mode for test_xdp_redirect.sh since veth has support native mode. After update here is the test result: ]# ./test_xdp_redirect.sh selftests: test_xdp_redirect xdpgeneric [PASS] selftests: test_xdp_redirect xdpdrv [PASS] Signed-off-by: Hangbin Liu --- .../selftests/bpf/test_xdp_redirect.sh| 84 --- 1 file changed, 52 insertions(+), 32 deletions(-) diff --git a/tools/testing/selftests/bpf/test_xdp_redirect.sh b/tools/testing/selftests/bpf/test_xdp_redirect.sh index c4b17e08d431..dd80f0c84afb 100755 --- a/tools/testing/selftests/bpf/test_xdp_redirect.sh +++ b/tools/testing/selftests/bpf/test_xdp_redirect.sh @@ -10,52 +10,72 @@ # | xdp forwarding | # -- -cleanup() +ret=0 + +setup() { - if [ "$?" = "0" ]; then - echo "selftests: test_xdp_redirect [PASS]"; - else - echo "selftests: test_xdp_redirect [FAILED]"; - fi - set +e + local xdpmode=$1 + + ip netns add ns1 + ip netns add ns2 + + ip link add veth1 index 111 type veth peer name veth11 netns ns1 + ip link add veth2 index 222 type veth peer name veth22 netns ns2 + + ip link set veth1 up + ip link set veth2 up + ip -n ns1 link set dev veth11 up + ip -n ns2 link set dev veth22 up + + ip -n ns1 addr add 10.1.1.11/24 dev veth11 + ip -n ns2 addr add 10.1.1.22/24 dev veth22 +} + +cleanup() +{ ip link del veth1 2> /dev/null ip link del veth2 2> /dev/null ip netns del ns1 2> /dev/null ip netns del ns2 2> /dev/null } -ip link set dev lo xdpgeneric off 2>/dev/null > /dev/null -if [ $? -ne 0 ];then - echo "selftests: [SKIP] Could not run test without the ip xdpgeneric support" - exit 0 -fi -set -e - -ip netns add ns1 -ip netns add ns2 +test_xdp_redirect() +{ + local xdpmode=$1 -trap cleanup 0 2 3 6 9 + setup -ip link add veth1 index 111 type veth peer name veth11 -ip link add veth2 index 222 type veth peer name veth22 + ip link set dev veth1 $xdpmode off &> /dev/null + if [ $? -ne 0 ];then + echo "selftests: test_xdp_redirect $xdpmode [SKIP]" + return 0 + fi -ip link set veth11 netns ns1 -ip link set veth22 netns ns2 + ip -n ns1 link set veth11 $xdpmode obj xdp_dummy.o sec xdp_dummy &> /dev/null + ip -n ns2 link set veth22 $xdpmode obj xdp_dummy.o sec xdp_dummy &> /dev/null + ip link set dev veth1 $xdpmode obj test_xdp_redirect.o sec redirect_to_222 &> /dev/null + ip link set dev veth2 $xdpmode obj test_xdp_redirect.o sec redirect_to_111 &> /dev/null -ip link set veth1 up -ip link set veth2 up + ip netns exec ns1 ping -c 1 10.1.1.22 &> /dev/null + local ret1=$? + ip netns exec ns2 ping -c 1 10.1.1.11 &> /dev/null + local ret2=$? -ip netns exec ns1 ip addr add 10.1.1.11/24 dev veth11 -ip netns exec ns2 ip addr add 10.1.1.22/24 dev veth22 + if [ $ret1 -eq 0 -a $ret2 -eq 0 ]; then + echo "selftests: test_xdp_redirect $xdpmode [PASS]"; + else + ret=1 + echo "selftests: test_xdp_redirect $xdpmode [FAILED]"; + fi -ip netns exec ns1 ip link set dev veth11 up -ip netns exec ns2 ip link set dev veth22 up + cleanup +} -ip link set dev veth1 xdpgeneric obj test_xdp_redirect.o sec redirect_to_222 -ip link set dev veth2 xdpgeneric obj test_xdp_redirect.o sec redirect_to_111 +set -e +trap cleanup 2 3 6 9 -ip netns exec ns1 ping -c 1 10.1.1.22 -ip netns exec ns2 ping -c 1 10.1.1.11 +test_xdp_redirect xdpgeneric +test_xdp_redirect xdpdrv -exit 0 +exit $ret -- 2.25.4
Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
Hi Andrii, On Tue, Jul 28, 2020 at 10:13 PM CEST, Andrii Nakryiko wrote: [...] > We are getting this failure in Travis CI when syncing libbpf [0]: > > ``` > ip: either "local" is duplicate, or "nodad" is garbage > > switch_netns:PASS:unshare 0 nsec > > switch_netns:FAIL:system failed > > (/home/travis/build/libbpf/libbpf/travis-ci/vmtest/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:1310: > errno: No such file or directory) system(ip -6 addr add dev lo > fd00::1/128 nodad) > > #73 sk_lookup:FAIL > ``` > > > Can you please help fix it so that it works in a Travis CI environment > as well? For now I disabled sk_lookup selftests altogether. You can > try to repro it locally by forking https://github.com/libbpf/libbpf > and enabling Travis CI for your account. See [1] for the PR that > disabled sk_lookup. > > > [0] https://travis-ci.com/github/libbpf/libbpf/jobs/365878309#L5408 > [1] > https://github.com/libbpf/libbpf/pull/182/commits/78368c2eaed8b0681381fc34d6016c9b5a443be8 > > > Thanks for your help! "nodad is garbage" message smells like old iproute2. I will take a look. Thanks for letting me know.
Re: general protection fault in vsock_poll
syzbot has bisected this issue to: commit 408624af4c89989117bb2c6517bd50b7708a2fcd Author: Stefano Garzarella Date: Tue Dec 10 10:43:06 2019 + vsock: use local transport when it is loaded bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17e6489b10 start commit: 92ed3019 Linux 5.8-rc7 git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=1416489b10 console output: https://syzkaller.appspot.com/x/log.txt?x=1016489b10 kernel config: https://syzkaller.appspot.com/x/.config?x=84f076779e989e69 dashboard link: https://syzkaller.appspot.com/bug?extid=a61bac2fcc1a7c6623fe syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15930b6490 Reported-by: syzbot+a61bac2fcc1a7c662...@syzkaller.appspotmail.com Fixes: 408624af4c89 ("vsock: use local transport when it is loaded") For information about bisection process see: https://goo.gl/tpsmEJ#bisection
RE: [PATCH 2/4] net: make sockptr_is_null strict aliasing safe
On Wednesday 2020-07-29 10:04, David Laight wrote: >From: Christoph Hellwig >> Sent: 28 July 2020 17:39 >> >> While the kernel in general is not strict aliasing safe we can trivially >> do that in sockptr_is_null without affecting code generation, so always >> check the actually assigned union member. > >Even with 'strict aliasing' gcc (at least) guarantees that >the members of a union alias each other. >It is about the only way so safely interpret a float as an int. The only? float given; int i; memcpy(&i, &given, sizeof(i)); BUILD_BUG_ON(sizeof(i) > sizeof(given));
Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
On 2020/7/28 下午5:04, Eli Cohen wrote: On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) +{ + struct vhost_virtqueue *vq = &v->vqs[qid]; + const struct vdpa_config_ops *ops = v->vdpa->config; + struct vdpa_device *vdpa = v->vdpa; + int ret, irq; + + spin_lock(&vq->call_ctx.ctx_lock); + irq = ops->get_vq_irq(vdpa, qid); + if (!vq->call_ctx.ctx || irq == -EINVAL) { + spin_unlock(&vq->call_ctx.ctx_lock); + return; + } + If I understand correctly, this will cause these IRQs to be forwarded directly to the VCPU, e.g. will be handled by the guest/qemu. Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. Does this mean that the host will not handle this interrupt? How does it work in case on level triggered interrupts? There's no guarantee that the KVM arch code can make sure the irq bypass work for any type of irq. So if they the irq will still need to be handled by host first. This means we should keep the host interrupt handler as a slowpath (fallback). In the case of ConnectX, I need to execute some code to acknowledge the interrupt. This turns out to be hard for irq bypassing to work. Is it because the irq is shared or what kind of ack you need to do? Thanks Can you explain how this should be done?
[PATCH net 0/6] mlxsw fixes
From: Ido Schimmel This patch set contains various fixes for mlxsw. Patches #1-#2 fix two trap related issues introduced in previous cycle. Patches #3-#5 fix rare use-after-frees discovered by syzkaller. After over a week of fuzzing with the fixes, the bugs did not reproduce. Patch #6 from Amit fixes an issue in the ethtool selftest that was recently discovered after running the test on a new platform that supports only 1Gbps and 10Gbps speeds. Amit Cohen (1): selftests: ethtool: Fix test when only two speeds are supported Ido Schimmel (5): mlxsw: spectrum_router: Allow programming link-local host routes mlxsw: spectrum: Use different trap group for externally routed packets mlxsw: core: Increase scope of RCU read-side critical section mlxsw: core: Free EMAD transactions using kfree_rcu() mlxsw: spectrum_router: Fix use-after-free in router init / de-init .../networking/devlink/devlink-trap.rst | 4 ++ drivers/net/ethernet/mellanox/mlxsw/core.c| 8 ++- drivers/net/ethernet/mellanox/mlxsw/reg.h | 1 + .../ethernet/mellanox/mlxsw/spectrum_router.c | 59 --- .../ethernet/mellanox/mlxsw/spectrum_trap.c | 14 - include/net/devlink.h | 3 + net/core/devlink.c| 1 + .../selftests/net/forwarding/ethtool.sh | 2 - 8 files changed, 51 insertions(+), 41 deletions(-) -- 2.26.2
[PATCH net 1/6] mlxsw: spectrum_router: Allow programming link-local host routes
From: Ido Schimmel Cited commit added the ability to program link-local prefix routes to the ASIC so that relevant packets are routed and trapped correctly. However, host routes were not included in the change and thus not programmed to the ASIC. This can result in packets being trapped via an external route trap instead of a local route trap as in IPv4. Fix this by programming all the link-local routes to the ASIC. Fixes: 10d3757fcb07 ("mlxsw: spectrum_router: Allow programming link-local prefix routes") Reported-by: Alex Veber Tested-by: Alex Veber Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 9 - 1 file changed, 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 019ed503aadf..bd4803074776 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -5001,15 +5001,6 @@ static void mlxsw_sp_router_fib4_del(struct mlxsw_sp *mlxsw_sp, static bool mlxsw_sp_fib6_rt_should_ignore(const struct fib6_info *rt) { - /* Packets with link-local destination IP arriving to the router -* are trapped to the CPU, so no need to program specific routes -* for them. Only allow prefix routes (usually one fe80::/64) so -* that packets are trapped for the right reason. -*/ - if ((ipv6_addr_type(&rt->fib6_dst.addr) & IPV6_ADDR_LINKLOCAL) && - (rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST))) - return true; - /* Multicast routes aren't supported, so ignore them. Neighbour * Discovery packets are specifically trapped. */ -- 2.26.2
[PATCH net 5/6] mlxsw: spectrum_router: Fix use-after-free in router init / de-init
From: Ido Schimmel Several notifiers are registered as part of router initialization. Since some of these notifiers are registered before the end of the initialization, it is possible for them to access uninitialized or freed memory when processing notifications [1]. Additionally, some of these notifiers queue work items on a workqueue. If these work items are executed after the router was de-initialized, they will access freed memory. Fix both problems by moving the registration of the notifiers to the end of the router initialization and flush the work queue after they are unregistered. [1] BUG: KASAN: use-after-free in __mutex_lock_common kernel/locking/mutex.c:938 [inline] BUG: KASAN: use-after-free in __mutex_lock+0xeea/0x1340 kernel/locking/mutex.c:1103 Read of size 8 at addr 888038c3a6e0 by task kworker/u4:1/61 CPU: 1 PID: 61 Comm: kworker/u4:1 Not tainted 5.8.0-rc2+ #36 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: mlxsw_core_ordered mlxsw_sp_inet6addr_event_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 __mutex_lock_common kernel/locking/mutex.c:938 [inline] __mutex_lock+0xeea/0x1340 kernel/locking/mutex.c:1103 mlxsw_sp_inet6addr_event_work+0xb3/0x1b0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7123 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415 kthread+0x355/0x470 kernel/kthread.c:291 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293 Allocated by task 1298: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:494 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467 kmalloc include/linux/slab.h:555 [inline] kzalloc include/linux/slab.h:669 [inline] mlxsw_sp_router_init+0xb2/0x1d20 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:8074 mlxsw_sp_init+0xbd8/0x3ac0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2932 __mlxsw_core_bus_device_register+0x657/0x10d0 drivers/net/ethernet/mellanox/mlxsw/core.c:1375 mlxsw_core_bus_device_register drivers/net/ethernet/mellanox/mlxsw/core.c:1436 [inline] mlxsw_devlink_core_bus_device_reload_up+0xcd/0x150 drivers/net/ethernet/mellanox/mlxsw/core.c:1133 devlink_reload net/core/devlink.c:2959 [inline] devlink_reload+0x281/0x3b0 net/core/devlink.c:2944 devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987 genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline] genl_family_rcv_msg net/netlink/genetlink.c:736 [inline] genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469 genl_rcv+0x24/0x40 net/netlink/genetlink.c:764 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0x150/0x190 net/socket.c:672 sys_sendmsg+0x6d8/0x840 net/socket.c:2363 ___sys_sendmsg+0xff/0x170 net/socket.c:2417 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 1348: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455 slab_free_hook mm/slub.c:1474 [inline] slab_free_freelist_hook mm/slub.c:1507 [inline] slab_free mm/slub.c:3072 [inline] kfree+0xe6/0x320 mm/slub.c:4063 mlxsw_sp_fini+0x340/0x4e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3132 mlxsw_core_bus_device_unregister+0x16c/0x6d0 drivers/net/ethernet/mellanox/mlxsw/core.c:1474 mlxsw_devlink_core_bus_device_reload_down+0x8e/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:1123 devlink_reload+0xc6/0x3b0 net/core/devlink.c:2952 devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987 genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline] genl_family_rcv_msg net/netlink/genetlink.c:736 [inline] genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469 genl_rcv+0x24/0x40 net/netlink/genetlink.c:764 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0x150/0x190 net/socket.c:672 sys_sendmsg+0x6d8/0x840 net/socket.c:2363 ___sys_sendmsg+0xff/0x170 net/socket.c:2417 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 entry_SYSCALL_64_after_hwframe
[PATCH net 4/6] mlxsw: core: Free EMAD transactions using kfree_rcu()
From: Ido Schimmel The lifetime of EMAD transactions (i.e., 'struct mlxsw_reg_trans') is managed using RCU. They are freed using kfree_rcu() once the transaction ends. However, in case the transaction failed it is freed immediately after being removed from the active transactions list. This is problematic because it is still possible for a different CPU to dereference the transaction from an RCU read-side critical section while traversing the active transaction list in mlxsw_emad_rx_listener_func(). In which case, a use-after-free is triggered [1]. Fix this by freeing the transaction after a grace period by calling kfree_rcu(). [1] BUG: KASAN: use-after-free in mlxsw_emad_rx_listener_func+0x969/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:671 Read of size 8 at addr 88800b7964e8 by task syz-executor.2/2881 CPU: 0 PID: 2881 Comm: syz-executor.2 Not tainted 5.8.0-rc4+ #44 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 mlxsw_emad_rx_listener_func+0x969/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:671 mlxsw_core_skb_receive+0x571/0x700 drivers/net/ethernet/mellanox/mlxsw/core.c:2061 mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline] mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651 tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550 __do_softirq+0x223/0x964 kernel/softirq.c:292 asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] do_softirq_own_stack+0x109/0x140 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:387 [inline] __irq_exit_rcu kernel/softirq.c:417 [inline] irq_exit_rcu+0x16f/0x1a0 kernel/softirq.c:429 sysvec_apic_timer_interrupt+0x4e/0xd0 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/irqflags.h:85 [inline] RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] RIP: 0010:_raw_spin_unlock_irqrestore+0x3b/0x40 kernel/locking/spinlock.c:191 Code: e8 2a c3 f4 fc 48 89 ef e8 12 96 f5 fc f6 c7 02 75 11 53 9d e8 d6 db 11 fd 65 ff 0d 1f 21 b3 56 5b 5d c3 e8 a7 d7 11 fd 53 9d ed 0f 1f 00 55 48 89 fd 65 ff 05 05 21 b3 56 ff 74 24 08 48 8d RSP: 0018:8880446ffd80 EFLAGS: 0286 RAX: 0006 RBX: 0286 RCX: 0006 RDX: RSI: RDI: a94ecea9 RBP: 888012934408 R08: R09: R10: 0001 R11: fbfff57be301 R12: 1110088dffc1 R13: 888037b817c0 R14: 88802442415a R15: 888024424000 __do_sys_perf_event_open+0x1b5d/0x2bd0 kernel/events/core.c:11874 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x473dbd Code: Bad RIP value. RSP: 002b:7f21e5e9cc28 EFLAGS: 0246 ORIG_RAX: 012a RAX: ffda RBX: 0057bf00 RCX: 00473dbd RDX: RSI: RDI: 2040 RBP: 0057bf00 R08: R09: R10: 0003 R11: 0246 R12: 0057bf0c R13: 7ffd0493503f R14: 004d0f46 R15: 7f21e5e9cd80 Allocated by task 871: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:494 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467 kmalloc include/linux/slab.h:555 [inline] kzalloc include/linux/slab.h:669 [inline] mlxsw_core_reg_access_emad+0x70/0x1410 drivers/net/ethernet/mellanox/mlxsw/core.c:1812 mlxsw_core_reg_access+0xeb/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1991 mlxsw_sp_port_get_hw_xstats+0x335/0x7e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1130 update_stats_cache+0xf4/0x140 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1173 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415 kthread+0x355/0x470 kernel/kthread.c:291 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293 Freed by task 871: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455 slab_free_hook mm/slub.c:1474 [inline] slab_free_freelist_hook mm/slub.c:1507 [inline] slab_free mm/slub.c:3072 [inline] kfree+0xe6/0x320 mm/slub.c:4052 mlxsw_core_reg_access_emad+0xd45/0x1410 drivers/net/ethernet/mellanox/mlxsw/core.c
[PATCH net 6/6] selftests: ethtool: Fix test when only two speeds are supported
From: Amit Cohen The test case check_highest_speed_is_chosen() configures $h1 to advertise a subset of its supported speeds and checks that $h2 chooses the highest speed from the subset. To find the common advertised speeds between $h1 and $h2, common_speeds_get() is called. Currently, the first speed returned from common_speeds_get() is removed claiming "h1 does not advertise this speed". The claim is wrong because the function is called after $h1 already advertised a subset of speeds. In case $h1 supports only two speeds, it will advertise a single speed which will be later removed because of previously mentioned bug. This results in the test needlessly failing. When more than two speeds are supported this is not an issue because the first advertised speed is the lowest one. Fix this by not removing any speed from the list of commonly advertised speeds. Fixes: 64916b57c0b1 ("selftests: forwarding: Add speed and auto-negotiation test") Reported-by: Danielle Ratson Signed-off-by: Amit Cohen Signed-off-by: Ido Schimmel --- tools/testing/selftests/net/forwarding/ethtool.sh | 2 -- 1 file changed, 2 deletions(-) diff --git a/tools/testing/selftests/net/forwarding/ethtool.sh b/tools/testing/selftests/net/forwarding/ethtool.sh index eb8e2a23bbb4..43a948feed26 100755 --- a/tools/testing/selftests/net/forwarding/ethtool.sh +++ b/tools/testing/selftests/net/forwarding/ethtool.sh @@ -252,8 +252,6 @@ check_highest_speed_is_chosen() fi local -a speeds_arr=($(common_speeds_get $h1 $h2 0 1)) - # Remove the first speed, h1 does not advertise this speed. - unset speeds_arr[0] max_speed=${speeds_arr[0]} for current in ${speeds_arr[@]}; do -- 2.26.2
[PATCH net 2/6] mlxsw: spectrum: Use different trap group for externally routed packets
From: Ido Schimmel Cited commit mistakenly removed the trap group for externally routed packets (e.g., via the management interface) and grouped locally routed and externally routed packet traps under the same group, thereby subjecting them to the same policer. This can result in problems, for example, when FRR is restarted and suddenly all transient traffic is trapped to the CPU because of a default route through the management interface. Locally routed packets required to re-establish a BGP connection will never reach the CPU and the routing tables will not be re-populated. Fix this by using a different trap group for externally routed packets. Fixes: 8110668ecd9a ("mlxsw: spectrum_trap: Register layer 3 control traps") Reported-by: Alex Veber Tested-by: Alex Veber Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- Documentation/networking/devlink/devlink-trap.rst | 4 drivers/net/ethernet/mellanox/mlxsw/reg.h | 1 + .../net/ethernet/mellanox/mlxsw/spectrum_trap.c| 14 +++--- include/net/devlink.h | 3 +++ net/core/devlink.c | 1 + 5 files changed, 20 insertions(+), 3 deletions(-) diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst index 1e3f3ffee248..2014307fbe63 100644 --- a/Documentation/networking/devlink/devlink-trap.rst +++ b/Documentation/networking/devlink/devlink-trap.rst @@ -486,6 +486,10 @@ narrow. The description of these groups must be added to the following table: - Contains packet traps for packets that should be locally delivered after routing, but do not match more specific packet traps (e.g., ``ipv4_bgp``) + * - ``external_delivery`` + - Contains packet traps for packets that should be routed through an + external interface (e.g., management interface) that does not belong to + the same device (e.g., switch ASIC) as the ingress interface * - ``ipv6`` - Contains packet traps for various IPv6 control packets (e.g., Router Advertisements) diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index fcb88d4271bf..8ac987c8c8bc 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -5536,6 +5536,7 @@ enum mlxsw_reg_htgt_trap_group { MLXSW_REG_HTGT_TRAP_GROUP_SP_MULTICAST, MLXSW_REG_HTGT_TRAP_GROUP_SP_NEIGH_DISCOVERY, MLXSW_REG_HTGT_TRAP_GROUP_SP_ROUTER_EXP, + MLXSW_REG_HTGT_TRAP_GROUP_SP_EXTERNAL_ROUTE, MLXSW_REG_HTGT_TRAP_GROUP_SP_IP2ME, MLXSW_REG_HTGT_TRAP_GROUP_SP_DHCP, MLXSW_REG_HTGT_TRAP_GROUP_SP_EVENT, diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c index 157a42c63066..1e38dfe7cf64 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c @@ -328,6 +328,9 @@ mlxsw_sp_trap_policer_items_arr[] = { { .policer = MLXSW_SP_TRAP_POLICER(18, 1024, 128), }, + { + .policer = MLXSW_SP_TRAP_POLICER(19, 1024, 512), + }, }; static const struct mlxsw_sp_trap_group_item mlxsw_sp_trap_group_items_arr[] = { @@ -421,6 +424,11 @@ static const struct mlxsw_sp_trap_group_item mlxsw_sp_trap_group_items_arr[] = { .hw_group_id = MLXSW_REG_HTGT_TRAP_GROUP_SP_IP2ME, .priority = 2, }, + { + .group = DEVLINK_TRAP_GROUP_GENERIC(EXTERNAL_DELIVERY, 19), + .hw_group_id = MLXSW_REG_HTGT_TRAP_GROUP_SP_EXTERNAL_ROUTE, + .priority = 1, + }, { .group = DEVLINK_TRAP_GROUP_GENERIC(IPV6, 15), .hw_group_id = MLXSW_REG_HTGT_TRAP_GROUP_SP_IPV6, @@ -882,11 +890,11 @@ static const struct mlxsw_sp_trap_item mlxsw_sp_trap_items_arr[] = { }, }, { - .trap = MLXSW_SP_TRAP_CONTROL(EXTERNAL_ROUTE, LOCAL_DELIVERY, + .trap = MLXSW_SP_TRAP_CONTROL(EXTERNAL_ROUTE, EXTERNAL_DELIVERY, TRAP), .listeners_arr = { - MLXSW_SP_RXL_MARK(RTR_INGRESS0, IP2ME, TRAP_TO_CPU, - false), + MLXSW_SP_RXL_MARK(RTR_INGRESS0, EXTERNAL_ROUTE, + TRAP_TO_CPU, false), }, }, { diff --git a/include/net/devlink.h b/include/net/devlink.h index 1df6dfec26c2..95b0322a2a82 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -718,6 +718,7 @@ enum devlink_trap_group_generic_id { DEVLINK_TRAP_GROUP_GENERIC_ID_PIM, DEVLINK_TRAP_GROUP_GENERIC_ID_UC_LB, DEVLINK_TRAP_GROUP_GENERIC_ID_LOCAL_DELIVERY, + DEVLINK_TRAP_GROUP_GENERIC_ID_EXTERNAL_DELIVERY, DEVLI
[PATCH net 3/6] mlxsw: core: Increase scope of RCU read-side critical section
From: Ido Schimmel The lifetime of the Rx listener item ('rxl_item') is managed using RCU, but is dereferenced outside of RCU read-side critical section, which can lead to a use-after-free. Fix this by increasing the scope of the RCU read-side critical section. Fixes: 93c1edb27f9e ("mlxsw: Introduce Mellanox switch driver core") Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/core.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index d6d6fe64887b..5e76a96a118e 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -2051,11 +2051,13 @@ void mlxsw_core_skb_receive(struct mlxsw_core *mlxsw_core, struct sk_buff *skb, break; } } - rcu_read_unlock(); - if (!found) + if (!found) { + rcu_read_unlock(); goto drop; + } rxl->func(skb, local_port, rxl_item->priv); + rcu_read_unlock(); return; drop: -- 2.26.2
Re: [PATCH v2 0/9] ptp: Add generic header parsing function
On Tue Jul 28 2020, David Miller wrote: > It looks like some mlxsw et al. issues wrt. which header is expected at > skb->data when certain helper functions are invoked need to be resolved > still. Yes, the length check needs to be sorted out first. So, please don't merge this version. Thanks, Kurt signature.asc Description: PGP signature
Re: [net-next PATCH v7 1/6] Documentation: ACPI: DSD: Document MDIO PHY
On Wed, Jul 29, 2020 at 10:43:34AM +0200, Jon Nettleton wrote: > On Wed, Jul 29, 2020 at 4:53 AM Jeremy Linton wrote: > > > > Hi, > > > > On 7/28/20 7:39 PM, Florian Fainelli wrote: > > > On 7/28/2020 3:30 PM, Jeremy Linton wrote: > > >> Hi, > > >> > > >> On 7/28/20 3:06 AM, Dan Callaghan wrote: > > >>> Excerpts from Andrew Lunn's message of 2020-07-24 21:14:36 +02:00: > > Now i could be wrong, but are Ethernet switches something you expect > > to see on ACPI/SBSA platforms? Or is this a legitimate use of the > > escape hatch? > > >>> > > >>> As an extra data point: right now I am working on an x86 embedded > > >>> appliance (ACPI not Device Tree) with 3x integrated Marvell switches. > > >>> I have been watching this patch series with great interest, because > > >>> right now there is no way for me to configure a complex switch topology > > >>> in DSA without Device Tree. > > >> > > >> DSA though, the switch is hung off a normal MAC/MDIO, right? (ignoring > > >> whether that NIC/MAC is actually hug off PCIe for the moment). > > > > > > There is no specific bus, we have memory mapped, MDIO, SPI, I2C swiches > > > all supported within the driver framework right now. > > > > > >> > > >> It just has a bunch of phy devices strung out on that single MAC/MDIO. > > > > > > It has a number of built-in PHYs that typically appear on a MDIO bus, > > > whether that bus is the switch's internal MDIO bus, or another MDIO bus > > > (which could be provided with just two GPIOs) depends on how the switch > > > is connected to its management host. > > > > > > When the switch is interfaced via MDIO the switch also typically has a > > > MDIO interface called the pseudo-PHY which is how you can actually tap > > > into the control interface of the switch, as opposed to reading its > > > internal PHYs from the MDIO bus. > > > > > >> So in ACPI land it would still have a relationship similar to the one > > >> Andrew pointed out with his DT example where the eth0->mdio->phy are all > > >> contained in their physical parent. The phy in that case associated with > > >> the parent adapter would be the first direct decedent of the mdio, the > > >> switch itself could then be represented with another device, with a > > >> further string of device/phys representing the devices. (I dislike > > >> drawing acsii art, but if this isn't clear I will, its also worthwhile > > >> to look at the dpaa2 docs for how the mac/phys work on this device for > > >> contrast.). > > > > > > The eth0->mdio->phy relationship you describe is the simple case that > > > you are well aware of which is say what we have on the Raspberry Pi 4 > > > with GENET and the external Broadcom PHY. > > > > > > For an Ethernet switch connected to an Ethernet MAC, we have 4 different > > > types of objects: > > > > > > - the Ethernet MAC which sits on its specific bus > > > > > > - the Ethernet switch which also sits on its specific bus > > > > > > - the built-in PHYs of the Ethernet switch which sit on whatever > > > bus/interface the switch provides to make them accessible > > > > > > - the specific bus controller that provides access to the Ethernet switch > > > > > > and this is a simplification that does not take into account Physical > > > Coding Sublayer devices, pure MDIO devices (with no foot in the Ethernet > > > land such as PCIe, USB3 or SATA PHYs), SFP, SFF and other pluggable > > > modules. > > > > Which is why I've stayed away from much of the switch discussion. There > > are a lot of edge cases to fall into, because for whatever reason > > networking seems to be unique in all this non-enumerable customization > > vs many other areas of the system. Storage, being an example i'm more > > familiar with which has very similar problems yet, somehow has managed > > to avoid much of this, despite having run on fabrics significantly more > > complex than basic ethernet (including running on a wide range of hot > > pluggable GBIC/SFP/QSFP/etc media layers). > > > > ACPI's "problem" here is that its strongly influenced by the "Plug and > > Play" revolution of the 1990's where the industry went from having > > humans describing hardware using machine readable languages, to hardware > > which was enumerable using standard protocols. ACPI's device > > descriptions are there as a crutch for the remaining non plug an play > > hardware in the system. > > > > So at a basic level, if your describing hardware in ACPI rather than > > abstracting it, that is a problem. > > > This is also my first run with ACPI and this discussion needs to be > brought back to the powers that be regarding sorting this out. This > is where I see it from an Armada and Layerscape perspective. This > isn't a silver bullet fix but the small things I think that need to be > done to move this forward. > > From Microsoft's documentation. > > Device dependencies > > Typically, there are hardware dependencies between devices on a > particular platform. Windows requires that all such dep
Re: general protection fault in vsock_poll
On Wed, Jul 29, 2020 at 01:59:05AM -0700, syzbot wrote: > syzbot has bisected this issue to: > > commit 408624af4c89989117bb2c6517bd50b7708a2fcd > Author: Stefano Garzarella > Date: Tue Dec 10 10:43:06 2019 + > > vsock: use local transport when it is loaded > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17e6489b10 > start commit: 92ed3019 Linux 5.8-rc7 > git tree: upstream > final oops: https://syzkaller.appspot.com/x/report.txt?x=1416489b10 > console output: https://syzkaller.appspot.com/x/log.txt?x=1016489b10 > kernel config: https://syzkaller.appspot.com/x/.config?x=84f076779e989e69 > dashboard link: https://syzkaller.appspot.com/bug?extid=a61bac2fcc1a7c6623fe > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15930b6490 > > Reported-by: syzbot+a61bac2fcc1a7c662...@syzkaller.appspotmail.com > Fixes: 408624af4c89 ("vsock: use local transport when it is loaded") > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > I'll take a look. At first glance it seems strange, because if sk_state is TCP_ESTABLISHED, the transport shouldn't be NULL, that's why we didn't put the check. Stefano
[PATCH net-next v2] net: mvneta: fix comment about phylink_speed_down
mvneta has switched to phylink, so the comment should look like "We may have called phylink_speed_down before". Signed-off-by: Jisheng Zhang --- Since v1: - drop patch2 which tries to avoid link flapping when changing mtu I need more time on the change mtu refactoring. drivers/net/ethernet/marvell/mvneta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 2c9277e73cef..c9b6b0f85bb0 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -3637,7 +3637,7 @@ static void mvneta_start_dev(struct mvneta_port *pp) phylink_start(pp->phylink); - /* We may have called phy_speed_down before */ + /* We may have called phylink_speed_down before */ phylink_speed_up(pp->phylink); netif_tx_start_all_queues(pp->dev); -- 2.28.0.rc0
Re: [PATCH net-nex 2/2] net: mvneta: Don't speed down the PHY when changing mtu
Hi David, On Tue, 28 Jul 2020 17:52:02 -0700 (PDT) David Miller wrote: > > > > @@ -3651,7 +3651,8 @@ static void mvneta_stop_dev(struct mvneta_port *pp) > > > > set_bit(__MVNETA_DOWN, &pp->state); > > > > - if (device_may_wakeup(&pp->dev->dev)) > > + if (device_may_wakeup(&pp->dev->dev) && > > + pp->pkt_size == MVNETA_RX_PKT_SIZE(pp->dev->mtu)) > > phylink_speed_down(pp->phylink, false); > > > > This is too much for me. > > You shouldn't have to shut down the entire device and take it back up > again just to change the MTU. > > Unfortunately, this is a common pattern in many drivers and it is very > dangerous to take this lazy path of just doing "stop/start" around > the MTU change. > > It means you can't recover from partial failures properly, > f.e. recovering from an inability to allocate queue resources for the > new MTU. > > To solve this properly, you must restructure the MTU change such that > is specifically stops the necessary and only the units of the chip > necessary to change the MTU. > > It should next try to allocate the necessary resources to satisfy the > MTU change, keeping the existing resources allocated in case of > failure. > > Then, only is all resources are successfully allocated, it should > commit the MTU change fully and without errors. > > Then none of these link flapping issues are even possible. Thanks a lot for pointing out the correct direction. Refactoring change mtu method needs more time(maybe for linux-5.10 is reasonable), so I just drop patch2 in v2.
Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote: > > On 2020/7/28 下午5:04, Eli Cohen wrote: > >On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: > >>+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) > >>+{ > >>+ struct vhost_virtqueue *vq = &v->vqs[qid]; > >>+ const struct vdpa_config_ops *ops = v->vdpa->config; > >>+ struct vdpa_device *vdpa = v->vdpa; > >>+ int ret, irq; > >>+ > >>+ spin_lock(&vq->call_ctx.ctx_lock); > >>+ irq = ops->get_vq_irq(vdpa, qid); > >>+ if (!vq->call_ctx.ctx || irq == -EINVAL) { > >>+ spin_unlock(&vq->call_ctx.ctx_lock); > >>+ return; > >>+ } > >>+ > >If I understand correctly, this will cause these IRQs to be forwarded > >directly to the VCPU, e.g. will be handled by the guest/qemu. > > > Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. > So, usually the network driver knows how to handle interrups for its devices. I assume the virtio_net driver at the guest has some default processing but what if the underlying hardware device (such as the case of vdpa) needs to take some actions? Is there an option to do bounce the interrupt back to the vendor specific driver in the host so it can take these actions? > > >Does this mean that the host will not handle this interrupt? How does it > >work in case on level triggered interrupts? > > > There's no guarantee that the KVM arch code can make sure the irq > bypass work for any type of irq. So if they the irq will still need > to be handled by host first. This means we should keep the host > interrupt handler as a slowpath (fallback). > > > > >In the case of ConnectX, I need to execute some code to acknowledge the > >interrupt. > > > This turns out to be hard for irq bypassing to work. Is it because > the irq is shared or what kind of ack you need to do? I have an EQ which is a queue for events comming from the hardware. This EQ can created so it reports only completion events but I still need to execute code that roughly tells the device that I saw these event records and then arm it again so it can report more interrupts (e.g if more packets are received or sent). This is device specific code. > > Thanks > > > > > >Can you explain how this should be done? > > >
[PATCH v1 0/3] net: ethernet: use generic power management
Linux Kernel Mentee: Remove Legacy Power Management. The purpose of this patch series is to upgrade power management in net ethernet drivers. This has been done by upgrading .suspend() and .resume() callbacks. The upgrade makes sure that the involvement of PCI Core does not change the order of operations executed in a driver. Thus, does not change its behavior. In general, drivers with legacy PM, .suspend() and .resume() make use of PCI helper functions like pci_enable/disable_device_mem(), pci_set_power_state(), pci_save/restore_state(), pci_enable/disable_device(), etc. to complete their job. The conversion requires the removal of those function calls, change the callbacks' definition accordingly and make use of dev_pm_ops structure. All patches are compile-tested only. Test tools: - Compiler: gcc (GCC) 10.1.0 - allmodconfig build: make -j$(nproc) W=1 all Vaibhav Gupta (3): sc92031: use generic power management sis900: use generic power management tlan: use generic power management drivers/net/ethernet/silan/sc92031.c | 26 --- drivers/net/ethernet/sis/sis900.c| 23 +++-- drivers/net/ethernet/ti/tlan.c | 31 ++-- 3 files changed, 22 insertions(+), 58 deletions(-) -- 2.27.0
Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function
On Tue, Jul 28, 2020 at 11:06:08PM +0200, Petr Machata wrote: > > Kurt Kanzenbach writes: > > > On Mon Jul 27 2020, Petr Machata wrote: > >> So this looks good, and works, but I'm wondering about one thing. > > > > Thanks for testing. > > > >> > >> Your code (and evidently most drivers as well) use a different check > >> than mlxsw, namely skb->len + ETH_HLEN < X. When I print_hex_dump() > >> skb_mac_header(skb), skb->len in mlxsw with some test packet, I get e.g. > >> this: > >> > >> 259a4db7: 01 00 5e 00 01 81 00 02 c9 a4 e4 e1 08 00 45 00 > >> ..^...E. > >> 5f29f0eb: 00 48 0d c9 40 00 01 11 c8 59 c0 00 02 01 e0 00 > >> .H..@Y.. > >> f3663e9e: 01 81 01 3f 01 3f 00 34 9f d3 00 02 00 2c 00 00 > >> ...?.?.4.,.. > >> ^sp^^ ^dp^^ ^len^ ^cks^ ^len^ > >> b3914606: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 > >> > >> 2e7828ea: c9 ff fe a4 e4 e1 00 01 09 fa 00 00 00 00 00 00 > >> > >> 0b98156e: 00 00 00 00 00 00 > >> .. > >> > >> Both UDP and PTP length fields indicate that the payload ends exactly at > >> the end of the dump. So apparently skb->len contains all the payload > >> bytes, including the Ethernet header. > >> > >> Is that the case for other drivers as well? Maybe mlxsw is just missing > >> some SKB magic in the driver. > > > > So I run some tests (on other hardware/drivers) and it seems like that > > the skb->len usually doesn't include the ETH_HLEN. Therefore, it is > > added to the check. > > > > Looking at the driver code: > > > > |static void mlxsw_sp_rx_sample_listener(struct sk_buff *skb, u8 local_port, > > | void *trap_ctx) > > |{ > > | [...] > > | /* The sample handler expects skb->data to point to the start of the > > |* Ethernet header. > > |*/ > > | skb_push(skb, ETH_HLEN); > > | mlxsw_sp_sample_receive(mlxsw_sp, skb, local_port); > > |} > > > > Maybe that's the issue here? > > Correct, mlxsw pushes the header very soon. Given that both > ptp_classify_raw() and eth_type_trans() that are invoked later assume > the header, it is reasonable. I have shuffled the pushes around and have > a patch that both works and I think is correct. Would it make more sense to do: u8 *data = skb_mac_header(skb); u8 *ptr = data; if (type & PTP_CLASS_VLAN) ptr += VLAN_HLEN; switch (type & PTP_CLASS_PMASK) { case PTP_CLASS_IPV4: ptr += IPV4_HLEN(ptr) + UDP_HLEN; break; case PTP_CLASS_IPV6: ptr += IP6_HLEN + UDP_HLEN; break; case PTP_CLASS_L2: break; default: return NULL; } ptr += ETH_HLEN; if (ptr + 34 > skb->data + skb->len) return NULL; return ptr; in other words, compare pointers, so that whether skb_push() etc has been used on the skb is irrelevant? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH] net: ethernet: fix potential memory leak in gemini_ethernet_port_probe()
On Wed, 29 Jul 2020 11:46:06 +0800 Lu Wei wrote: > > > If some processes in gemini_ethernet_port_probe() fail, > free_netdev(dev) needs to be called to avoid a memory leak. Using devm_alloc_etherdev_mqs() would be much simpler > > Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit > ethernet") > Reported-by: Hulk Robot > Signed-off-by: Lu Wei > --- > drivers/net/ethernet/cortina/gemini.c | 24 +++- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/cortina/gemini.c > b/drivers/net/ethernet/cortina/gemini.c > index 8d13ea370db1..5e93a1a570b6 100644 > --- a/drivers/net/ethernet/cortina/gemini.c > +++ b/drivers/net/ethernet/cortina/gemini.c > @@ -2407,37 +2407,48 @@ static int gemini_ethernet_port_probe(struct > platform_device *pdev) > dmares = platform_get_resource(pdev, IORESOURCE_MEM, 0); > if (!dmares) { > dev_err(dev, "no DMA resource\n"); > + free_netdev(netdev); > return -ENODEV; > } > port->dma_base = devm_ioremap_resource(dev, dmares); > - if (IS_ERR(port->dma_base)) > + if (IS_ERR(port->dma_base)) { > + free_netdev(netdev); > return PTR_ERR(port->dma_base); > + } > > /* GMAC config memory */ > gmacres = platform_get_resource(pdev, IORESOURCE_MEM, 1); > if (!gmacres) { > dev_err(dev, "no GMAC resource\n"); > + free_netdev(netdev); > return -ENODEV; > } > port->gmac_base = devm_ioremap_resource(dev, gmacres); > - if (IS_ERR(port->gmac_base)) > + if (IS_ERR(port->gmac_base)) { > + free_netdev(netdev); > return PTR_ERR(port->gmac_base); > + } > > /* Interrupt */ > irq = platform_get_irq(pdev, 0); > - if (irq <= 0) > + if (irq <= 0) { > + free_netdev(netdev); > return irq ? irq : -ENODEV; > + } > port->irq = irq; > > /* Clock the port */ > port->pclk = devm_clk_get(dev, "PCLK"); > if (IS_ERR(port->pclk)) { > dev_err(dev, "no PCLK\n"); > + free_netdev(netdev); > return PTR_ERR(port->pclk); > } > ret = clk_prepare_enable(port->pclk); > - if (ret) > + if (ret) { > + free_netdev(netdev); > return ret; > + } > > /* Maybe there is a nice ethernet address we should use */ > gemini_port_save_mac_addr(port); > @@ -2446,6 +2457,7 @@ static int gemini_ethernet_port_probe(struct > platform_device *pdev) > port->reset = devm_reset_control_get_exclusive(dev, NULL); > if (IS_ERR(port->reset)) { > dev_err(dev, "no reset\n"); > + free_netdev(netdev); > return PTR_ERR(port->reset); > } > reset_control_reset(port->reset); > @@ -2501,8 +2513,10 @@ static int gemini_ethernet_port_probe(struct > platform_device *pdev) > IRQF_SHARED, > port_names[port->id], > port); > - if (ret) > + if (ret) { > + free_netdev(netdev); > return ret; > + } > > ret = register_netdev(netdev); > if (!ret) { > -- > 2.17.1 > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Darm-2Dkernel&d=DwICAg&c=7dfBJ8cXbWjhc0BhImu8wQ&r=wlaKTGoVCDxOzHc2QUzpzGEf9oY3eidXlAe3OF1omvo&m=56ruRjtNY-BIzquRyoO0KSbrR7UBB81VfqotT_rfFus&s=Axiqv0SZYKFXgMc1zJLilZCk9wbRAt4LkKtW6VjKTgw&e=
Re: [PATCH net-next v2] net: mvneta: fix comment about phylink_speed_down
On Wed, Jul 29, 2020 at 05:49:09PM +0800, Jisheng Zhang wrote: > mvneta has switched to phylink, so the comment should look > like "We may have called phylink_speed_down before". > > Signed-off-by: Jisheng Zhang Reviewed-by: Russell King > --- > Since v1: > - drop patch2 which tries to avoid link flapping when changing mtu > I need more time on the change mtu refactoring. > > drivers/net/ethernet/marvell/mvneta.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > b/drivers/net/ethernet/marvell/mvneta.c > index 2c9277e73cef..c9b6b0f85bb0 100644 > --- a/drivers/net/ethernet/marvell/mvneta.c > +++ b/drivers/net/ethernet/marvell/mvneta.c > @@ -3637,7 +3637,7 @@ static void mvneta_start_dev(struct mvneta_port *pp) > > phylink_start(pp->phylink); > > - /* We may have called phy_speed_down before */ > + /* We may have called phylink_speed_down before */ > phylink_speed_up(pp->phylink); > > netif_tx_start_all_queues(pp->dev); > -- > 2.28.0.rc0 > > -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH v1] farsync: use generic power management
On Tue, Jul 28, 2020 at 03:04:13PM -0500, Bjorn Helgaas wrote: > On Tue, Jul 28, 2020 at 09:58:10AM +0530, Vaibhav Gupta wrote: > > The .suspend() and .resume() callbacks are not defined for this driver. > > Still, their power management structure follows the legacy framework. To > > bring it under the generic framework, simply remove the binding of > > callbacks from "struct pci_driver". > > FWIW, this commit log is slightly misleading because .suspend and > .resume are NULL by default, so this patch actually is a complete > no-op as far as code generation is concerned. > > This change is worthwhile because it simplifies the code a little, but > it doesn't convert the driver from legacy to generic power management. > This driver doesn't supply a .pm structure, so it doesn't seem to do > *any* power management. > Agreed. Actually, as their presence only causes PCI core to call pci_legacy_suspend/resume() for them, I thought that after removing the binding from "struct pci_driver", this driver qualifies to be grouped under genric framework, so used "use generic power management" for the heading. I should have written "remove legacy bindning". But David has applied the patch, should I send a v2 or fix to update message? Thanks Vaibhav Gupta > > Change code indentation from space to tab in "struct pci_driver". > >
Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
On 2020/7/29 下午5:55, Eli Cohen wrote: On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote: On 2020/7/28 下午5:04, Eli Cohen wrote: On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) +{ + struct vhost_virtqueue *vq = &v->vqs[qid]; + const struct vdpa_config_ops *ops = v->vdpa->config; + struct vdpa_device *vdpa = v->vdpa; + int ret, irq; + + spin_lock(&vq->call_ctx.ctx_lock); + irq = ops->get_vq_irq(vdpa, qid); + if (!vq->call_ctx.ctx || irq == -EINVAL) { + spin_unlock(&vq->call_ctx.ctx_lock); + return; + } + If I understand correctly, this will cause these IRQs to be forwarded directly to the VCPU, e.g. will be handled by the guest/qemu. Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. So, usually the network driver knows how to handle interrups for its devices. I assume the virtio_net driver at the guest has some default processing but what if the underlying hardware device (such as the case of vdpa) needs to take some actions? Virtio splits the bus operations out of device operations. So did the driver. The virtio-net driver depends on a transport driver to talk to the real device. Usually PCI is used as the transport for the device. In this case virtio-pci driver is in charge of dealing with irq allocation/free/configuration and it needs to co-operate with platform specific irqchip (virtualized by KVM) to finish the work like irq acknowledge etc. E.g for x86, the irq offloading can only work when there's a hardware support of virtual irqchip (APICv) then all stuffs could be done without vmexits. So no vendor specific part since the device and transport are all standard. Is there an option to do bounce the interrupt back to the vendor specific driver in the host so it can take these actions? Currently not, but even if we can do this, I'm afraid we will lose the performance advantage of irq bypassing. Does this mean that the host will not handle this interrupt? How does it work in case on level triggered interrupts? There's no guarantee that the KVM arch code can make sure the irq bypass work for any type of irq. So if they the irq will still need to be handled by host first. This means we should keep the host interrupt handler as a slowpath (fallback). In the case of ConnectX, I need to execute some code to acknowledge the interrupt. This turns out to be hard for irq bypassing to work. Is it because the irq is shared or what kind of ack you need to do? I have an EQ which is a queue for events comming from the hardware. This EQ can created so it reports only completion events but I still need to execute code that roughly tells the device that I saw these event records and then arm it again so it can report more interrupts (e.g if more packets are received or sent). This is device specific code. Any chance that the hardware can use MSI (which is not the case here)? Thanks Thanks Can you explain how this should be done?
Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function
On Wed Jul 29 2020, Russell King - ARM Linux admin wrote: > On Tue, Jul 28, 2020 at 11:06:08PM +0200, Petr Machata wrote: >> >> Kurt Kanzenbach writes: >> >> > On Mon Jul 27 2020, Petr Machata wrote: >> >> So this looks good, and works, but I'm wondering about one thing. >> > >> > Thanks for testing. >> > >> >> >> >> Your code (and evidently most drivers as well) use a different check >> >> than mlxsw, namely skb->len + ETH_HLEN < X. When I print_hex_dump() >> >> skb_mac_header(skb), skb->len in mlxsw with some test packet, I get e.g. >> >> this: >> >> >> >> 259a4db7: 01 00 5e 00 01 81 00 02 c9 a4 e4 e1 08 00 45 00 >> >> ..^...E. >> >> 5f29f0eb: 00 48 0d c9 40 00 01 11 c8 59 c0 00 02 01 e0 00 >> >> .H..@Y.. >> >> f3663e9e: 01 81 01 3f 01 3f 00 34 9f d3 00 02 00 2c 00 00 >> >> ...?.?.4.,.. >> >> ^sp^^ ^dp^^ ^len^ ^cks^ ^len^ >> >> b3914606: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 >> >> >> >> 2e7828ea: c9 ff fe a4 e4 e1 00 01 09 fa 00 00 00 00 00 00 >> >> >> >> 0b98156e: 00 00 00 00 00 00 >> >> .. >> >> >> >> Both UDP and PTP length fields indicate that the payload ends exactly at >> >> the end of the dump. So apparently skb->len contains all the payload >> >> bytes, including the Ethernet header. >> >> >> >> Is that the case for other drivers as well? Maybe mlxsw is just missing >> >> some SKB magic in the driver. >> > >> > So I run some tests (on other hardware/drivers) and it seems like that >> > the skb->len usually doesn't include the ETH_HLEN. Therefore, it is >> > added to the check. >> > >> > Looking at the driver code: >> > >> > |static void mlxsw_sp_rx_sample_listener(struct sk_buff *skb, u8 >> > local_port, >> > | void *trap_ctx) >> > |{ >> > | [...] >> > | /* The sample handler expects skb->data to point to the start of the >> > | * Ethernet header. >> > | */ >> > | skb_push(skb, ETH_HLEN); >> > | mlxsw_sp_sample_receive(mlxsw_sp, skb, local_port); >> > |} >> > >> > Maybe that's the issue here? >> >> Correct, mlxsw pushes the header very soon. Given that both >> ptp_classify_raw() and eth_type_trans() that are invoked later assume >> the header, it is reasonable. I have shuffled the pushes around and have >> a patch that both works and I think is correct. > > Would it make more sense to do: > > u8 *data = skb_mac_header(skb); > u8 *ptr = data; > > if (type & PTP_CLASS_VLAN) > ptr += VLAN_HLEN; > > switch (type & PTP_CLASS_PMASK) { > case PTP_CLASS_IPV4: > ptr += IPV4_HLEN(ptr) + UDP_HLEN; > break; > > case PTP_CLASS_IPV6: > ptr += IP6_HLEN + UDP_HLEN; > break; > > case PTP_CLASS_L2: > break; > > default: > return NULL; > } > > ptr += ETH_HLEN; > > if (ptr + 34 > skb->data + skb->len) > return NULL; > > return ptr; > > in other words, compare pointers, so that whether skb_push() etc has > been used on the skb is irrelevant? Yes. Avoiding relying on whether skb_{push,pull} has been used is overall the simplest solution and it works for all drivers regardless if DSA or something else is used. Looking at your code, it looks correct to me. I'll test it and send v3. But before, I've got another question that somebody might have an answer to: The ptp v1 code always does locate the message type at msgtype = data + offset + OFF_PTP_CONTROL OFF_PTP_CONTROL is 32. However, looking at the ptp v1 header, the message type is located at offset 20. What am I missing here? Thanks, Kurt signature.asc Description: PGP signature
[PATCH net] net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct
From: wenxu When openvswitch conntrack offload with act_ct action. Fragment packets defrag in the ingress tc act_ct action and miss the next chain. Then the packet pass to the openvswitch datapath without the mru. The over mtu packet will be dropped in output action in openvswitch for over mtu. "kernel: net2: dropped over-mtu packet: 1528 > 1500" This patch add mru in the tc_skb_ext for adefrag and miss next chain situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru to the qdisc_skb_cb when the packet defrag. And When the chain miss, The mru is set to tc_skb_ext which can be got by ovs datapath. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: wenxu --- include/linux/skbuff.h| 1 + include/net/sch_generic.h | 1 + net/openvswitch/flow.c| 1 + net/sched/act_ct.c| 8 ++-- net/sched/cls_api.c | 1 + 5 files changed, 10 insertions(+), 2 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0c0377f..0d842d6 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -283,6 +283,7 @@ struct nf_bridge_info { */ struct tc_skb_ext { __u32 chain; + __u16 mru; }; #endif diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index c510b03..45401d5 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -384,6 +384,7 @@ struct qdisc_skb_cb { }; #define QDISC_CB_PRIV_LEN 20 unsigned char data[QDISC_CB_PRIV_LEN]; + u16 mru; }; typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv); diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index 9d375e7..03942c3 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -890,6 +890,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, if (static_branch_unlikely(&tc_recirc_sharing_support)) { tc_ext = skb_ext_find(skb, TC_SKB_EXT); key->recirc_id = tc_ext ? tc_ext->chain : 0; + OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0; } else { key->recirc_id = 0; } diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 5928efb..69445ab 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -706,8 +706,10 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (err && err != -EINPROGRESS) goto out_free; - if (!err) + if (!err) { *defrag = true; + cb.mru = IPCB(skb)->frag_max_size; + } } else { /* NFPROTO_IPV6 */ #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; @@ -717,8 +719,10 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (err && err != -EINPROGRESS) goto out_free; - if (!err) + if (!err) { *defrag = true; + cb.mru = IP6CB(skb)->frag_max_size; + } #else err = -EOPNOTSUPP; goto out_free; diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 4619cb3..eb6acc5 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1627,6 +1627,7 @@ int tcf_classify_ingress(struct sk_buff *skb, if (WARN_ON_ONCE(!ext)) return TC_ACT_SHOT; ext->chain = last_executed_chain; + ext->mru = qdisc_skb_cb(skb)->mru; } return ret; -- 1.8.3.1
Re: [net-next 5/6] i40e, xsk: increase budget for AF_XDP path
On 2020-07-28 22:15, Jakub Kicinski wrote: On Tue, 28 Jul 2020 12:08:41 -0700 Tony Nguyen wrote: diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 1f2dd591dbf1..99f4afdc403d 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -265,6 +265,8 @@ static void i40e_inc_ntc(struct i40e_ring *rx_ring) rx_ring->next_to_clean = ntc; } +#define I40E_XSK_CLEAN_RX_BUDGET 256U + /** * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring * @rx_ring: Rx ring @@ -280,7 +282,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) bool failure = false; struct sk_buff *skb; - while (likely(total_rx_packets < (unsigned int)budget)) { + while (likely(total_rx_packets < I40E_XSK_CLEAN_RX_BUDGET)) { union i40e_rx_desc *rx_desc; struct xdp_buff **bi; unsigned int size; Should this perhaps be a common things that drivers do? Should we define a "XSK_NAPI_WEIGHT_MULT 4" instead of hard coding the value in a driver? Yes, that's a good idea. I can generalize for the AF_XDP paths in the other drivers as a follow up! Cheers, Björn (in vacation mode)
[PATCH] uapi, seg6_iptunnel: Add missing include in seg6_iptunnel.h
From: Ioana-Ruxandra Stăncioi Include in uapi/linux/seg6_iptunnel.h to fix the following linux/seg6_iptunnel.h compilation error: invalid application of 'sizeof' to incomplete type 'struct ipv6hdr' head = sizeof(struct ipv6hdr); ^~ This is to allow including this header in places where has not been included but __KERNEL__ is defined. In the kernel the easy workaround is including , but the header may also be used by code analysis tools. Signed-off-by: Ioana-Ruxandra Stăncioi --- include/uapi/linux/seg6_iptunnel.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/seg6_iptunnel.h b/include/uapi/linux/seg6_iptunnel.h index 09fb608a35ec..b904228f463c 100644 --- a/include/uapi/linux/seg6_iptunnel.h +++ b/include/uapi/linux/seg6_iptunnel.h @@ -38,6 +38,7 @@ enum { }; #ifdef __KERNEL__ +#include static inline size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo) { -- 2.28.0.rc0.142.g3c755180ce-goog
Re: [PATCH RFC net-next] net: phy: add Marvell PHY PTP support
On Tue, Jul 14, 2020 at 05:26:28PM +0100, Russell King wrote: > Add PTP basic support for Marvell 88E151x PHYs. These PHYs support > timestamping the egress and ingress of packets, but does not support > any packet modification, nor do we support any filtering beyond > selecting packets that the hardware recognises as PTP/802.1AS. A question has come up concerning the selection of PTP timestamping sources within a network device. I have the situation on a couple of devices where there are multiple places that can do PTP timestamping: - the PHY (slow to access, only event capture which may not be wired, doesn't seem to synchronise well - delay of 58000, frequency changes every second between +/-1500ppb.) - the Ethernet MAC (fast to access, supports event capture and trigger generation which also may not be wired, synchronises well, delay of 700, frequency changes every second +/- 40ppb.) How do we deal with this situation - from what I can see from the ethtool API, we have to make a choice about which to use. How do we make that choice? It's not a case of "just implement one" since hardware may have both available on a particular ethernet interface or just one available. Do we need a property to indicate whether we wish to use the PHY or MAC PTP stamping, or something more elaborate? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
On Wed, Jul 29, 2020 at 06:19:52PM +0800, Jason Wang wrote: I am checking internally if we can work in a mode not requiring to acknowledge the interrupt. I will update. Thanks for the explanations. > > On 2020/7/29 下午5:55, Eli Cohen wrote: > >On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote: > >>On 2020/7/28 下午5:04, Eli Cohen wrote: > >>>On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: > +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) > +{ > + struct vhost_virtqueue *vq = &v->vqs[qid]; > + const struct vdpa_config_ops *ops = v->vdpa->config; > + struct vdpa_device *vdpa = v->vdpa; > + int ret, irq; > + > + spin_lock(&vq->call_ctx.ctx_lock); > + irq = ops->get_vq_irq(vdpa, qid); > + if (!vq->call_ctx.ctx || irq == -EINVAL) { > + spin_unlock(&vq->call_ctx.ctx_lock); > + return; > + } > + > >>>If I understand correctly, this will cause these IRQs to be forwarded > >>>directly to the VCPU, e.g. will be handled by the guest/qemu. > >> > >>Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. > >> > >So, usually the network driver knows how to handle interrups for its > >devices. I assume the virtio_net driver at the guest has some default > >processing but what if the underlying hardware device (such as the case > >of vdpa) needs to take some actions? > > > Virtio splits the bus operations out of device operations. So did > the driver. > > The virtio-net driver depends on a transport driver to talk to the > real device. Usually PCI is used as the transport for the device. In > this case virtio-pci driver is in charge of dealing with irq > allocation/free/configuration and it needs to co-operate with > platform specific irqchip (virtualized by KVM) to finish the work > like irq acknowledge etc. E.g for x86, the irq offloading can only > work when there's a hardware support of virtual irqchip (APICv) then > all stuffs could be done without vmexits. > > So no vendor specific part since the device and transport are all standard. > > > > Is there an option to do bounce the > >interrupt back to the vendor specific driver in the host so it can take > >these actions? > > > Currently not, but even if we can do this, I'm afraid we will lose > the performance advantage of irq bypassing. > > > > > >>>Does this mean that the host will not handle this interrupt? How does it > >>>work in case on level triggered interrupts? > >> > >>There's no guarantee that the KVM arch code can make sure the irq > >>bypass work for any type of irq. So if they the irq will still need > >>to be handled by host first. This means we should keep the host > >>interrupt handler as a slowpath (fallback). > >> > >>>In the case of ConnectX, I need to execute some code to acknowledge the > >>>interrupt. > >> > >>This turns out to be hard for irq bypassing to work. Is it because > >>the irq is shared or what kind of ack you need to do? > >I have an EQ which is a queue for events comming from the hardware. This > >EQ can created so it reports only completion events but I still need to > >execute code that roughly tells the device that I saw these event > >records and then arm it again so it can report more interrupts (e.g if > >more packets are received or sent). This is device specific code. > > > Any chance that the hardware can use MSI (which is not the case here)? > > Thanks > > > >>Thanks > >> > >> > >>>Can you explain how this should be done? > >>> >
Re: [PATCH v8 bpf-next 12/13] selftests/bpf: Add test for d_path helper
On Tue, Jul 28, 2020 at 12:53:00PM -0700, Andrii Nakryiko wrote: SNIP > > + if (CHECK_FAIL(ret < 0)) > > + goto out_close; > > + ret = set_pathname(procfd, pid); > > + if (CHECK_FAIL(ret < 0)) > > + goto out_close; > > + ret = set_pathname(devfd, pid); > > + if (CHECK_FAIL(ret < 0)) > > + goto out_close; > > + ret = set_pathname(localfd, pid); > > + if (CHECK_FAIL(ret < 0)) > > + goto out_close; > > + ret = set_pathname(indicatorfd, pid); > > + if (CHECK_FAIL(ret < 0)) > > + goto out_close; > > Please use CHECK instead of CHECK_FAIL. Thanks. ok > > diff --git a/tools/testing/selftests/bpf/progs/test_d_path.c > > b/tools/testing/selftests/bpf/progs/test_d_path.c > > new file mode 100644 > > index ..e02dce614256 > > --- /dev/null > > +++ b/tools/testing/selftests/bpf/progs/test_d_path.c > > @@ -0,0 +1,64 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +#include "vmlinux.h" > > +#include > > +#include > > + > > +#define MAX_PATH_LEN 128 > > +#define MAX_EVENT_NUM 16 > > + > > +pid_t my_pid; > > +__u32 cnt_stat; > > +__u32 cnt_close; > > +char paths_stat[MAX_EVENT_NUM][MAX_PATH_LEN]; > > +char paths_close[MAX_EVENT_NUM][MAX_PATH_LEN]; > > +int rets_stat[MAX_EVENT_NUM]; > > +int rets_close[MAX_EVENT_NUM]; > > + > > please zero-initialize all of these, it causes issues on some Clang versions ook jirka
linux-next: build failure after merge of the net-next tree
Hi all, After merging the net-next tree, today's linux-next build (i386 defconfig) failed like this: x86_64-linux-gnu-ld: net/core/fib_rules.o: in function `fib_rules_lookup': fib_rules.c:(.text+0x5c6): undefined reference to `fib6_rule_match' x86_64-linux-gnu-ld: fib_rules.c:(.text+0x5d8): undefined reference to `fib6_rule_match' x86_64-linux-gnu-ld: fib_rules.c:(.text+0x64d): undefined reference to `fib6_rule_action' x86_64-linux-gnu-ld: fib_rules.c:(.text+0x662): undefined reference to `fib6_rule_action' x86_64-linux-gnu-ld: fib_rules.c:(.text+0x67a): undefined reference to `fib6_rule_suppress' x86_64-linux-gnu-ld: fib_rules.c:(.text+0x68d): undefined reference to `fib6_rule_suppress' Caused by commit b9aaec8f0be5 ("fib: use indirect call wrappers in the most common fib_rules_ops") # CONFIG_IPV6_MULTIPLE_TABLES is not set I have reverted that commit for today. -- Cheers, Stephen Rothwell pgpd0p6OylDXr.pgp Description: OpenPGP digital signature
Re: [PATCH ipsec] xfrm: esp6: fix the location of the transport header with encapsulation
On Mon, Jul 27, 2020 at 04:03:47PM +0200, Sabrina Dubroca wrote: > commit 17175d1a27c6 ("xfrm: esp6: fix encapsulation header offset > computation") changed esp6_input_done2 to correctly find the size of > the IPv6 header that precedes the TCP/UDP encapsulation header, but > didn't adjust the final call to skb_set_transport_header, which I > assumed was correct in using skb_network_header_len. > > Xiumei Mu reported that when we create xfrm states that include port > numbers in the selector, traffic from the user sockets is dropped. It > turns out that we get a state mismatch in __xfrm_policy_check, because > we end up trying to compare the encapsulation header's ports with the > selector that's based on user traffic ports. > > Fixes: 0146dca70b87 ("xfrm: add support for UDPv6 encapsulation of ESP") > Fixes: 26333c37fc28 ("xfrm: add IPv6 support for espintcp") > Reported-by: Xiumei Mu > Signed-off-by: Sabrina Dubroca Applied, thanks a lot Sabrina!
[PATCH v3 net-next 03/11] qed: swap param init and publish
In theory that could lead to race condition Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qed/qed_devlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/ethernet/qlogic/qed/qed_devlink.c index a62c47c61edf..4e3316c6beb6 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c @@ -75,8 +75,8 @@ struct devlink *qed_devlink_register(struct qed_dev *cdev) QED_DEVLINK_PARAM_ID_IWARP_CMT, value); - devlink_params_publish(dl); cdev->iwarp_cmt = false; + devlink_params_publish(dl); return dl; -- 2.17.1
[PATCH v3 net-next 02/11] qed/qede: make devlink survive recovery
Before that, devlink instance lifecycle was linked to qed_dev object, that causes devlink to be recreated on each recovery. Changing it by making higher level driver (qede) responsible for its life. This way devlink will survive recoveries. qede will store devlink structure pointer as a part of its device object, devlink private data contains a linkage structure, it'll contain extra devlink related content in following patches. The same lifecycle should be applied to storage drivers (qedf/qedi) later. Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qed/qed.h | 1 - drivers/net/ethernet/qlogic/qed/qed_devlink.c | 40 --- drivers/net/ethernet/qlogic/qed/qed_devlink.h | 4 +- drivers/net/ethernet/qlogic/qed/qed_main.c| 10 + drivers/net/ethernet/qlogic/qede/qede.h | 1 + drivers/net/ethernet/qlogic/qede/qede_main.c | 18 + include/linux/qed/qed_if.h| 9 + 7 files changed, 48 insertions(+), 35 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index b2a7b53ee760..b6ce1488abcc 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -849,7 +849,6 @@ struct qed_dev { u32 rdma_max_srq_sge; u16 tunn_feature_mask; - struct devlink *dl; booliwarp_cmt; }; diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/ethernet/qlogic/qed/qed_devlink.c index eb693787c99e..a62c47c61edf 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c @@ -5,6 +5,7 @@ */ #include +#include #include "qed.h" #include "qed_devlink.h" @@ -13,17 +14,12 @@ enum qed_devlink_param_id { QED_DEVLINK_PARAM_ID_IWARP_CMT, }; -struct qed_devlink { - struct qed_dev *cdev; -}; - static int qed_dl_param_get(struct devlink *dl, u32 id, struct devlink_param_gset_ctx *ctx) { - struct qed_devlink *qed_dl; + struct qed_devlink *qed_dl = devlink_priv(dl); struct qed_dev *cdev; - qed_dl = devlink_priv(dl); cdev = qed_dl->cdev; ctx->val.vbool = cdev->iwarp_cmt; @@ -33,10 +29,9 @@ static int qed_dl_param_get(struct devlink *dl, u32 id, static int qed_dl_param_set(struct devlink *dl, u32 id, struct devlink_param_gset_ctx *ctx) { - struct qed_devlink *qed_dl; + struct qed_devlink *qed_dl = devlink_priv(dl); struct qed_dev *cdev; - qed_dl = devlink_priv(dl); cdev = qed_dl->cdev; cdev->iwarp_cmt = ctx->val.vbool; @@ -52,21 +47,19 @@ static const struct devlink_param qed_devlink_params[] = { static const struct devlink_ops qed_dl_ops; -int qed_devlink_register(struct qed_dev *cdev) +struct devlink *qed_devlink_register(struct qed_dev *cdev) { union devlink_param_value value; - struct qed_devlink *qed_dl; + struct qed_devlink *qdevlink; struct devlink *dl; int rc; - dl = devlink_alloc(&qed_dl_ops, sizeof(*qed_dl)); + dl = devlink_alloc(&qed_dl_ops, sizeof(struct qed_devlink)); if (!dl) - return -ENOMEM; + return ERR_PTR(-ENOMEM); - qed_dl = devlink_priv(dl); - - cdev->dl = dl; - qed_dl->cdev = cdev; + qdevlink = devlink_priv(dl); + qdevlink->cdev = cdev; rc = devlink_register(dl, &cdev->pdev->dev); if (rc) @@ -85,26 +78,25 @@ int qed_devlink_register(struct qed_dev *cdev) devlink_params_publish(dl); cdev->iwarp_cmt = false; - return 0; + return dl; err_unregister: devlink_unregister(dl); err_free: - cdev->dl = NULL; devlink_free(dl); - return rc; + return ERR_PTR(rc); } -void qed_devlink_unregister(struct qed_dev *cdev) +void qed_devlink_unregister(struct devlink *devlink) { - if (!cdev->dl) + if (!devlink) return; - devlink_params_unregister(cdev->dl, qed_devlink_params, + devlink_params_unregister(devlink, qed_devlink_params, ARRAY_SIZE(qed_devlink_params)); - devlink_unregister(cdev->dl); - devlink_free(cdev->dl); + devlink_unregister(devlink); + devlink_free(devlink); } diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h b/drivers/net/ethernet/qlogic/qed/qed_devlink.h index b94c40e9b7c1..c79dc6bfa194 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.h +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h @@ -9,7 +9,7 @@ #include #include -int qed_devlink_register(struct qed_dev *cdev); -void qed_devlink_unregister(struct qed_dev *cdev); +struct devlink *qed_devlink_register(struct qed_dev *cdev); +void qed_devlink_unregister(struct devlink *devlink); #
[PATCH v3 net-next 08/11] qed*: make use of devlink recovery infrastructure
Remove forcible recovery trigger and put it as a normal devlink callback. This allows user to enable/disable it via devlink health set pci/:03:00.0 reporter fw_fatal auto_recover false Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qed/qed.h | 1 + drivers/net/ethernet/qlogic/qed/qed_devlink.c | 14 ++ drivers/net/ethernet/qlogic/qed/qed_main.c| 2 +- drivers/net/ethernet/qlogic/qede/qede_main.c | 10 -- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index ccd789eeda3e..f34b25a79449 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -981,6 +981,7 @@ void qed_bw_update(struct qed_hwfn *hwfn, struct qed_ptt *ptt); u32 qed_unzip_data(struct qed_hwfn *p_hwfn, u32 input_len, u8 *input_buf, u32 max_size, u8 *unzip_buf); +int qed_recovery_process(struct qed_dev *cdev); void qed_schedule_recovery_handler(struct qed_hwfn *p_hwfn); void qed_hw_error_occurred(struct qed_hwfn *p_hwfn, enum qed_hw_err_type err_type); diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/ethernet/qlogic/qed/qed_devlink.c index ffe776a4f99a..b25be68f959c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c @@ -31,8 +31,22 @@ int qed_report_fatal_error(struct devlink *devlink, enum qed_hw_err_type err_typ return 0; } +static int +qed_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter, + void *priv_ctx, + struct netlink_ext_ack *extack) +{ + struct qed_devlink *qdl = devlink_health_reporter_priv(reporter); + struct qed_dev *cdev = qdl->cdev; + + qed_recovery_process(cdev); + + return 0; +} + static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = { .name = "fw_fatal", + .recover = qed_fw_fatal_reporter_recover, }; #define QED_REPORTER_FW_GRACEFUL_PERIOD 120 diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index a64d594f9294..db5d003770ba 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -2817,7 +2817,7 @@ static int qed_set_led(struct qed_dev *cdev, enum qed_led_mode mode) return status; } -static int qed_recovery_process(struct qed_dev *cdev) +int qed_recovery_process(struct qed_dev *cdev) { struct qed_hwfn *p_hwfn = QED_LEADING_HWFN(cdev); struct qed_ptt *p_ptt; diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index df437c3f1fc9..937d8e69ad39 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -2596,8 +2596,6 @@ static void qede_atomic_hw_err_handler(struct qede_dev *edev) static void qede_generic_hw_err_handler(struct qede_dev *edev) { - struct qed_dev *cdev = edev->cdev; - DP_NOTICE(edev, "Generic sleepable HW error handling started - err_flags 0x%lx\n", edev->err_flags); @@ -2605,14 +2603,6 @@ static void qede_generic_hw_err_handler(struct qede_dev *edev) if (edev->devlink) edev->ops->common->report_fatal_error(edev->devlink, edev->last_err_type); - /* Trigger a recovery process. -* This is placed in the sleep requiring section just to make -* sure it is the last one, and that all the other operations -* were completed. -*/ - if (test_bit(QEDE_ERR_IS_RECOVERABLE, &edev->err_flags)) - edev->ops->common->recovery_process(cdev); - clear_bit(QEDE_ERR_IS_HANDLED, &edev->err_flags); DP_NOTICE(edev, "Generic sleepable HW error handling is done\n"); -- 2.17.1
[PATCH v3 net-next 00/11] qed: introduce devlink health support
This is a followup implementation after series https://patchwork.ozlabs.org/project/netdev/cover/20200514095727.1361-1-irussk...@marvell.com/ This is an implementation of devlink health infrastructure. With this we are now able to report HW errors to devlink, and it'll take its own actions depending on user configuration to capture and store the dump at the bad moment, and to request the driver to recover the device. So far we do not differentiate global device failures or specific PCI function failures. This means that some errors specific to one physical function will affect an entire device. This is not yet fully designed and verified, will followup in future. Solution was verified with artificial HW errors generated, existing tools for dump analysis could be used. v3: fix uninit var usage in patch 11 v2: fix #include issue from kbuild test robot. Igor Russkikh (11): qed: move out devlink logic into a new file qed/qede: make devlink survive recovery qed: swap param init and publish qed: fix kconfig help entries qed: implement devlink info request qed: health reporter init deinit seq qed: use devlink logic to report errors qed*: make use of devlink recovery infrastructure qed: implement devlink dump qed: align adjacent indent qede: make driver reliable on unload after failures drivers/net/ethernet/qlogic/Kconfig | 5 +- drivers/net/ethernet/qlogic/qed/Makefile | 1 + drivers/net/ethernet/qlogic/qed/qed.h | 3 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 10 + drivers/net/ethernet/qlogic/qed/qed_devlink.c | 256 ++ drivers/net/ethernet/qlogic/qed/qed_devlink.h | 20 ++ drivers/net/ethernet/qlogic/qed/qed_main.c| 116 +--- drivers/net/ethernet/qlogic/qede/qede.h | 2 + drivers/net/ethernet/qlogic/qede/qede_main.c | 39 ++- include/linux/qed/qed_if.h| 23 +- 10 files changed, 345 insertions(+), 130 deletions(-) create mode 100644 drivers/net/ethernet/qlogic/qed/qed_devlink.c create mode 100644 drivers/net/ethernet/qlogic/qed/qed_devlink.h -- 2.17.1
[PATCH v3 net-next 01/11] qed: move out devlink logic into a new file
We are extending devlink infrastructure, thus move the existing stuff into a new file qed_devlink.c Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qed/Makefile | 1 + drivers/net/ethernet/qlogic/qed/qed_devlink.c | 110 ++ drivers/net/ethernet/qlogic/qed/qed_devlink.h | 15 +++ drivers/net/ethernet/qlogic/qed/qed_main.c| 102 +--- 4 files changed, 127 insertions(+), 101 deletions(-) create mode 100644 drivers/net/ethernet/qlogic/qed/qed_devlink.c create mode 100644 drivers/net/ethernet/qlogic/qed/qed_devlink.h diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile index f947b105cf14..8251755ec18c 100644 --- a/drivers/net/ethernet/qlogic/qed/Makefile +++ b/drivers/net/ethernet/qlogic/qed/Makefile @@ -9,6 +9,7 @@ qed-y :=\ qed_dcbx.o \ qed_debug.o \ qed_dev.o \ + qed_devlink.o \ qed_hw.o\ qed_init_fw_funcs.o \ qed_init_ops.o \ diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/ethernet/qlogic/qed/qed_devlink.c new file mode 100644 index ..eb693787c99e --- /dev/null +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Marvell/Qlogic FastLinQ NIC driver + * + * Copyright (C) 2020 Marvell International Ltd. + */ + +#include +#include "qed.h" +#include "qed_devlink.h" + +enum qed_devlink_param_id { + QED_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX, + QED_DEVLINK_PARAM_ID_IWARP_CMT, +}; + +struct qed_devlink { + struct qed_dev *cdev; +}; + +static int qed_dl_param_get(struct devlink *dl, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct qed_devlink *qed_dl; + struct qed_dev *cdev; + + qed_dl = devlink_priv(dl); + cdev = qed_dl->cdev; + ctx->val.vbool = cdev->iwarp_cmt; + + return 0; +} + +static int qed_dl_param_set(struct devlink *dl, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct qed_devlink *qed_dl; + struct qed_dev *cdev; + + qed_dl = devlink_priv(dl); + cdev = qed_dl->cdev; + cdev->iwarp_cmt = ctx->val.vbool; + + return 0; +} + +static const struct devlink_param qed_devlink_params[] = { + DEVLINK_PARAM_DRIVER(QED_DEVLINK_PARAM_ID_IWARP_CMT, +"iwarp_cmt", DEVLINK_PARAM_TYPE_BOOL, +BIT(DEVLINK_PARAM_CMODE_RUNTIME), +qed_dl_param_get, qed_dl_param_set, NULL), +}; + +static const struct devlink_ops qed_dl_ops; + +int qed_devlink_register(struct qed_dev *cdev) +{ + union devlink_param_value value; + struct qed_devlink *qed_dl; + struct devlink *dl; + int rc; + + dl = devlink_alloc(&qed_dl_ops, sizeof(*qed_dl)); + if (!dl) + return -ENOMEM; + + qed_dl = devlink_priv(dl); + + cdev->dl = dl; + qed_dl->cdev = cdev; + + rc = devlink_register(dl, &cdev->pdev->dev); + if (rc) + goto err_free; + + rc = devlink_params_register(dl, qed_devlink_params, +ARRAY_SIZE(qed_devlink_params)); + if (rc) + goto err_unregister; + + value.vbool = false; + devlink_param_driverinit_value_set(dl, + QED_DEVLINK_PARAM_ID_IWARP_CMT, + value); + + devlink_params_publish(dl); + cdev->iwarp_cmt = false; + + return 0; + +err_unregister: + devlink_unregister(dl); + +err_free: + cdev->dl = NULL; + devlink_free(dl); + + return rc; +} + +void qed_devlink_unregister(struct qed_dev *cdev) +{ + if (!cdev->dl) + return; + + devlink_params_unregister(cdev->dl, qed_devlink_params, + ARRAY_SIZE(qed_devlink_params)); + + devlink_unregister(cdev->dl); + devlink_free(cdev->dl); +} diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h b/drivers/net/ethernet/qlogic/qed/qed_devlink.h new file mode 100644 index ..b94c40e9b7c1 --- /dev/null +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Marvell/Qlogic FastLinQ NIC driver + * + * Copyright (C) 2020 Marvell International Ltd. + */ +#ifndef _QED_DEVLINK_H +#define _QED_DEVLINK_H + +#include +#include + +int qed_devlink_register(struct qed_dev *cdev); +void qed_devlink_unregister(struct qed_dev *cdev); + +#endif diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index 2558cb680db3..8751355d9ef7 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_m
[PATCH v3 net-next 04/11] qed: fix kconfig help entries
This patch replaces stubs in kconfig help entries with an actual description. Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig index 8f743d80760b..4366c7a8de95 100644 --- a/drivers/net/ethernet/qlogic/Kconfig +++ b/drivers/net/ethernet/qlogic/Kconfig @@ -80,7 +80,7 @@ config QED select CRC8 select NET_DEVLINK help - This enables the support for ... + This enables the support for Marvell FastLinQ adapters family. config QED_LL2 bool @@ -100,7 +100,8 @@ config QEDE depends on QED imply PTP_1588_CLOCK help - This enables the support for ... + This enables the support for Marvell FastLinQ adapters family, + ethernet driver. config QED_RDMA bool -- 2.17.1
[PATCH v3 net-next 05/11] qed: implement devlink info request
Here we return existing fw & mfw versions, we also fetch device's serial number. The base device specific structure (qed_dev_info) was not directly available to the base driver before. Thus, here we create and store a private copy of this structure in qed_dev root object. Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qed/qed.h | 1 + drivers/net/ethernet/qlogic/qed/qed_dev.c | 10 drivers/net/ethernet/qlogic/qed/qed_devlink.c | 52 ++- drivers/net/ethernet/qlogic/qed/qed_main.c| 1 + 4 files changed, 63 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index b6ce1488abcc..ccd789eeda3e 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -807,6 +807,7 @@ struct qed_dev { struct qed_llh_info *p_llh_info; /* Linux specific here */ + struct qed_dev_info common_dev_info; struct qede_dev*edev; struct pci_dev *pdev; u32 flags; diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index b3c9ebaf2280..377950ce8ea2 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -4290,6 +4290,16 @@ static int qed_hw_get_nvm_info(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) __set_bit(QED_DEV_CAP_ROCE, &p_hwfn->hw_info.device_capabilities); + /* Read device serial number information from shmem */ + addr = MCP_REG_SCRATCH + nvm_cfg1_offset + + offsetof(struct nvm_cfg1, glob) + + offsetof(struct nvm_cfg1_glob, serial_number); + + p_hwfn->hw_info.part_num[0] = qed_rd(p_hwfn, p_ptt, addr); + p_hwfn->hw_info.part_num[1] = qed_rd(p_hwfn, p_ptt, addr + 4); + p_hwfn->hw_info.part_num[2] = qed_rd(p_hwfn, p_ptt, addr + 8); + p_hwfn->hw_info.part_num[3] = qed_rd(p_hwfn, p_ptt, addr + 12); + return qed_mcp_fill_shmem_func_info(p_hwfn, p_ptt); } diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/ethernet/qlogic/qed/qed_devlink.c index 4e3316c6beb6..5bd5528dc409 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c @@ -45,7 +45,57 @@ static const struct devlink_param qed_devlink_params[] = { qed_dl_param_get, qed_dl_param_set, NULL), }; -static const struct devlink_ops qed_dl_ops; +static int qed_devlink_info_get(struct devlink *devlink, + struct devlink_info_req *req, + struct netlink_ext_ack *extack) +{ + struct qed_devlink *qed_dl = devlink_priv(devlink); + struct qed_dev *cdev = qed_dl->cdev; + struct qed_dev_info *dev_info; + char buf[100]; + int err; + + dev_info = &cdev->common_dev_info; + + err = devlink_info_driver_name_put(req, KBUILD_MODNAME); + if (err) + return err; + + memcpy(buf, cdev->hwfns[0].hw_info.part_num, sizeof(cdev->hwfns[0].hw_info.part_num)); + buf[sizeof(cdev->hwfns[0].hw_info.part_num)] = 0; + + if (buf[0]) { + err = devlink_info_serial_number_put(req, buf); + if (err) + return err; + } + + snprintf(buf, sizeof(buf), "%d.%d.%d.%d", +GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_3), +GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_2), +GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_1), +GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_0)); + + err = devlink_info_version_stored_put(req, + DEVLINK_INFO_VERSION_GENERIC_FW_MGMT, buf); + if (err) + return err; + + snprintf(buf, sizeof(buf), "%d.%d.%d.%d", +dev_info->fw_major, +dev_info->fw_minor, +dev_info->fw_rev, +dev_info->fw_eng); + + err = devlink_info_version_running_put(req, + DEVLINK_INFO_VERSION_GENERIC_FW, buf); + + return err; +} + +static const struct devlink_ops qed_dl_ops = { + .info_get = qed_devlink_info_get, +}; struct devlink *qed_devlink_register(struct qed_dev *cdev) { diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index d6f76421379b..d1a559ccf516 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -479,6 +479,7 @@ int qed_fill_dev_info(struct qed_dev *cdev, } dev_info->mtu = hw_info->mtu; + cdev->common_dev_info = *dev_info; return 0; } -- 2.17.1
[PATCH v3 net-next 07/11] qed: use devlink logic to report errors
Use devlink_health_report to push error indications. We implement this in qede via callback function to make it possible to reuse the same for other drivers sitting on top of qed in future. Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qed/qed_devlink.c | 17 + drivers/net/ethernet/qlogic/qed/qed_devlink.h | 2 ++ drivers/net/ethernet/qlogic/qed/qed_main.c| 1 + drivers/net/ethernet/qlogic/qede/qede.h | 1 + drivers/net/ethernet/qlogic/qede/qede_main.c | 5 - include/linux/qed/qed_if.h| 3 +++ 6 files changed, 28 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/ethernet/qlogic/qed/qed_devlink.c index 843a35f14cca..ffe776a4f99a 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c @@ -14,6 +14,23 @@ enum qed_devlink_param_id { QED_DEVLINK_PARAM_ID_IWARP_CMT, }; +struct qed_fw_fatal_ctx { + enum qed_hw_err_type err_type; +}; + +int qed_report_fatal_error(struct devlink *devlink, enum qed_hw_err_type err_type) +{ + struct qed_devlink *qdl = devlink_priv(devlink); + struct qed_fw_fatal_ctx fw_fatal_ctx = { + .err_type = err_type, + }; + + devlink_health_report(qdl->fw_reporter, + "Fatal error reported", &fw_fatal_ctx); + + return 0; +} + static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = { .name = "fw_fatal", }; diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h b/drivers/net/ethernet/qlogic/qed/qed_devlink.h index c68ecf778826..ccc7d1d1bfd4 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.h +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h @@ -15,4 +15,6 @@ void qed_devlink_unregister(struct devlink *devlink); void qed_fw_reporters_create(struct devlink *devlink); void qed_fw_reporters_destroy(struct devlink *devlink); +int qed_report_fatal_error(struct devlink *dl, enum qed_hw_err_type err_type); + #endif diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index d1a559ccf516..a64d594f9294 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -3007,6 +3007,7 @@ const struct qed_common_ops qed_common_ops_pass = { .update_msglvl = &qed_init_dp, .devlink_register = qed_devlink_register, .devlink_unregister = qed_devlink_unregister, + .report_fatal_error = qed_report_fatal_error, .dbg_all_data = &qed_dbg_all_data, .dbg_all_data_size = &qed_dbg_all_data_size, .chain_alloc = &qed_chain_alloc, diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index 1f0e7505a973..3efc5899f656 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -264,6 +264,7 @@ struct qede_dev { struct bpf_prog *xdp_prog; + enum qed_hw_err_typelast_err_type; unsigned long err_flags; #define QEDE_ERR_IS_HANDLED31 #define QEDE_ERR_ATTN_CLR_EN 0 diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 7c2d948b2035..df437c3f1fc9 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -1181,7 +1181,6 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, } } else { struct net_device *ndev = pci_get_drvdata(pdev); - edev = netdev_priv(ndev); if (edev && edev->devlink) { @@ -2603,6 +2602,9 @@ static void qede_generic_hw_err_handler(struct qede_dev *edev) "Generic sleepable HW error handling started - err_flags 0x%lx\n", edev->err_flags); + if (edev->devlink) + edev->ops->common->report_fatal_error(edev->devlink, edev->last_err_type); + /* Trigger a recovery process. * This is placed in the sleep requiring section just to make * sure it is the last one, and that all the other operations @@ -2663,6 +2665,7 @@ static void qede_schedule_hw_err_handler(void *dev, return; } + edev->last_err_type = err_type; qede_set_hw_err_flags(edev, err_type); qede_atomic_hw_err_handler(edev); set_bit(QEDE_SP_HW_ERR, &edev->sp_flags); diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h index 30fe06fe06a0..1297726f2b25 100644 --- a/include/linux/qed/qed_if.h +++ b/include/linux/qed/qed_if.h @@ -906,6 +906,9 @@ struct qed_common_ops { int (*dbg_all_data_size) (struct qed_dev *cdev); + int (*report_fatal_error)(struct devlink *devlink, +
[PATCH v3 net-next 06/11] qed: health reporter init deinit seq
Here we declare health reporter ops (empty for now) and register these in qed probe and remove callbacks. This way we get devlink attached to all kind of qed* PCI device entities: networking or storage offload entity. Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qed/qed_devlink.c | 30 +++ drivers/net/ethernet/qlogic/qed/qed_devlink.h | 3 ++ include/linux/qed/qed_if.h| 1 + 3 files changed, 34 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/ethernet/qlogic/qed/qed_devlink.c index 5bd5528dc409..843a35f14cca 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c @@ -14,6 +14,34 @@ enum qed_devlink_param_id { QED_DEVLINK_PARAM_ID_IWARP_CMT, }; +static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = { + .name = "fw_fatal", +}; + +#define QED_REPORTER_FW_GRACEFUL_PERIOD 120 + +void qed_fw_reporters_create(struct devlink *devlink) +{ + struct qed_devlink *dl = devlink_priv(devlink); + + dl->fw_reporter = devlink_health_reporter_create(devlink, &qed_fw_fatal_reporter_ops, + QED_REPORTER_FW_GRACEFUL_PERIOD, dl); + if (IS_ERR(dl->fw_reporter)) + DP_NOTICE(dl->cdev, "Failed to create fw reporter, err = %ld\n", + PTR_ERR(dl->fw_reporter)); +} + +void qed_fw_reporters_destroy(struct devlink *devlink) +{ + struct qed_devlink *dl = devlink_priv(devlink); + struct devlink_health_reporter *rep; + + rep = dl->fw_reporter; + + if (!IS_ERR_OR_NULL(rep)) + devlink_health_reporter_destroy(rep); +} + static int qed_dl_param_get(struct devlink *dl, u32 id, struct devlink_param_gset_ctx *ctx) { @@ -144,6 +172,8 @@ void qed_devlink_unregister(struct devlink *devlink) if (!devlink) return; + qed_fw_reporters_destroy(devlink); + devlink_params_unregister(devlink, qed_devlink_params, ARRAY_SIZE(qed_devlink_params)); diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h b/drivers/net/ethernet/qlogic/qed/qed_devlink.h index c79dc6bfa194..c68ecf778826 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.h +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h @@ -12,4 +12,7 @@ struct devlink *qed_devlink_register(struct qed_dev *cdev); void qed_devlink_unregister(struct devlink *devlink); +void qed_fw_reporters_create(struct devlink *devlink); +void qed_fw_reporters_destroy(struct devlink *devlink); + #endif diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h index d8368e1770df..30fe06fe06a0 100644 --- a/include/linux/qed/qed_if.h +++ b/include/linux/qed/qed_if.h @@ -782,6 +782,7 @@ enum qed_nvm_flash_cmd { struct qed_devlink { struct qed_dev *cdev; + struct devlink_health_reporter *fw_reporter; }; struct qed_common_cb_ops { -- 2.17.1
[PATCH v3 net-next 09/11] qed: implement devlink dump
Gather and push out full device dump to devlink. Device dump is the same as with `ethtool -d`, but now its generated exactly at the moment bad thing happens. Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qed/qed_devlink.c | 43 +++ 1 file changed, 43 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/ethernet/qlogic/qed/qed_devlink.c index b25be68f959c..904babe70489 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c +++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c @@ -6,6 +6,7 @@ #include #include +#include #include "qed.h" #include "qed_devlink.h" @@ -31,6 +32,47 @@ int qed_report_fatal_error(struct devlink *devlink, enum qed_hw_err_type err_typ return 0; } +static int +qed_fw_fatal_reporter_dump(struct devlink_health_reporter *reporter, + struct devlink_fmsg *fmsg, void *priv_ctx, + struct netlink_ext_ack *extack) +{ + struct qed_devlink *qdl = devlink_health_reporter_priv(reporter); + struct qed_fw_fatal_ctx *fw_fatal_ctx = priv_ctx; + struct qed_dev *cdev = qdl->cdev; + u32 dbg_data_buf_size; + u8 *p_dbg_data_buf; + int err; + + /* Having context means that was a dump request after fatal, +* so we enable extra debugging while gathering the dump, +* just in case +*/ + cdev->print_dbg_data = fw_fatal_ctx ? true : false; + + dbg_data_buf_size = qed_dbg_all_data_size(cdev); + p_dbg_data_buf = vzalloc(dbg_data_buf_size); + if (!p_dbg_data_buf) { + DP_NOTICE(cdev, + "Failed to allocate memory for a debug data buffer\n"); + return -ENOMEM; + } + + err = qed_dbg_all_data(cdev, p_dbg_data_buf); + if (err) { + DP_NOTICE(cdev, "Failed to obtain debug data\n"); + vfree(p_dbg_data_buf); + return err; + } + + err = devlink_fmsg_binary_pair_put(fmsg, "dump_data", + p_dbg_data_buf, dbg_data_buf_size); + + vfree(p_dbg_data_buf); + + return err; +} + static int qed_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter, void *priv_ctx, @@ -47,6 +89,7 @@ qed_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter, static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = { .name = "fw_fatal", .recover = qed_fw_fatal_reporter_recover, + .dump = qed_fw_fatal_reporter_dump, }; #define QED_REPORTER_FW_GRACEFUL_PERIOD 120 -- 2.17.1
[PATCH v3 net-next 11/11] qede: make driver reliable on unload after failures
In case recovery was not successful, netdev still should be present. But we should clear cdev if something bad happens on recovery. We also check cdev for null on dev close. That could be a case if recovery was not successful. Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- drivers/net/ethernet/qlogic/qede/qede_main.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 937d8e69ad39..eeb4a7311633 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -1239,7 +1239,10 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, err4: qede_rdma_dev_remove(edev, (mode == QEDE_PROBE_RECOVERY)); err3: - free_netdev(edev->ndev); + if (mode != QEDE_PROBE_RECOVERY) + free_netdev(edev->ndev); + else + edev->cdev = NULL; err2: qed_ops->common->slowpath_stop(cdev); err1: @@ -2474,7 +2477,8 @@ static int qede_close(struct net_device *ndev) qede_unload(edev, QEDE_UNLOAD_NORMAL, false); - edev->ops->common->update_drv_state(edev->cdev, false); + if (edev->cdev) + edev->ops->common->update_drv_state(edev->cdev, false); return 0; } -- 2.17.1
[PATCH v3 net-next 10/11] qed: align adjacent indent
Fix indent on some of adjacent declarations. Signed-off-by: Igor Russkikh Signed-off-by: Alexander Lobakin Signed-off-by: Michal Kalderon --- include/linux/qed/qed_if.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h index 1297726f2b25..b8fb80c9be80 100644 --- a/include/linux/qed/qed_if.h +++ b/include/linux/qed/qed_if.h @@ -897,14 +897,14 @@ struct qed_common_ops { void(*simd_handler_clean)(struct qed_dev *cdev, int index); - int (*dbg_grc)(struct qed_dev *cdev, - void *buffer, u32 *num_dumped_bytes); + int (*dbg_grc)(struct qed_dev *cdev, + void *buffer, u32 *num_dumped_bytes); - int (*dbg_grc_size)(struct qed_dev *cdev); + int (*dbg_grc_size)(struct qed_dev *cdev); - int (*dbg_all_data) (struct qed_dev *cdev, void *buffer); + int (*dbg_all_data)(struct qed_dev *cdev, void *buffer); - int (*dbg_all_data_size) (struct qed_dev *cdev); + int (*dbg_all_data_size)(struct qed_dev *cdev); int (*report_fatal_error)(struct devlink *devlink, enum qed_hw_err_type err_type); -- 2.17.1
Re: Bug: ip utility fails to show routes with large # of multipath next-hops
On Tue, Jul 28, 2020 at 05:52:44PM -0700, Ashutosh Grewal wrote: > Hello David and all, > > I hope this is the correct way to report a bug. Sure > > I observed this problem with 256 v4 next-hops or 128 v6 next-hops (or > 128 or so # of v4 next-hops with labels). > > Here is an example - > > root@a6be8c892bb7:/# ip route show 2.2.2.2 > Error: Buffer too small for object. > Dump terminated > > Kernel details (though I recall running into the same problem on 4.4* > kernel as well) - > root@ubuntu-vm:/# uname -a > Linux ch1 5.4.0-33-generic #37-Ubuntu SMP Thu May 21 12:53:59 UTC 2020 > x86_64 x86_64 x86_64 GNU/Linux > > I think the problem may be to do with the size of the skbuf being > allocated as part of servicing the netlink request. > > static int netlink_dump(struct sock *sk) > { > > > skb = alloc_skb(...) Yes, I believe you are correct. You will get an skb of size 4K and it can't fit the entire RTA_MULTIPATH attribute with all the nested nexthops. Since it's a single attribute it cannot be split across multiple messages. Looking at the code, I think a similar problem was already encountered with IFLA_VFINFO_LIST. See commit c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo dump size"). Maybe we can track the maximum number of IPv4/IPv6 nexthops during insertion and then consult it to adjust 'min_dump_alloc' for RTM_GETROUTE. It's a bit complicated for IPv6 because you can append nexthops, but I believe anyone using so many nexthops is already using RTA_MULTIPATH to insert them, so we can simplify. David, what do you think? You have a better / simpler idea? Maybe one day everyone will be using the new nexthop API and this won't be needed :) > > Thanks, > Ashutosh
Re: [PATCH v8 bpf-next 08/13] bpf: Add BTF_SET_START/END macros
On Tue, Jul 28, 2020 at 12:39:06PM -0700, Andrii Nakryiko wrote: SNIP > > [...] > > > +#define BTF_SET_START(name)\ > > +__BTF_ID_LIST(name, local) \ > > +asm( \ > > +".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ > > +".local __BTF_ID__set__" #name "; \n" \ > > +"__BTF_ID__set__" #name ":;\n" \ > > +".zero 4 \n" \ > > +".popsection; \n"); > > + > > +#define BTF_SET_END(name) \ > > +asm( \ > > +".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ > > +".size __BTF_ID__set__" #name ", .-" #name " \n" \ > > +".popsection; \n");\ > > +extern struct btf_id_set name; > > + > > #else > > This local symbol assumption will probably at some point bite us. > Yonghong already did global vs static variants for BTF ID list, we'll > end up doing something like that for sets of BTF IDs as well. Let's do > this similarly from the get go. sure, will add that > > > > > #define BTF_ID_LIST(name) static u32 name[5]; > > #define BTF_ID(prefix, name) > > #define BTF_ID_UNUSED > > #define BTF_ID_LIST_GLOBAL(name) u32 name[1]; > > +#define BTF_SET_START(name) static struct btf_id_set name = { 0 }; > > nit: this zero is unnecessary and misleading (it's initialized for > only the first member of a struct). Just {} is enough. ok > > > +#define BTF_SET_END(name) > > > > #endif /* CONFIG_DEBUG_INFO_BTF */ > > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c > > index 562d4453fad3..06714cdda0a9 100644 > > --- a/kernel/bpf/btf.c > > +++ b/kernel/bpf/btf.c > > @@ -21,6 +21,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > #include > > > > /* BTF (BPF Type Format) is the meta data format which describes > > @@ -4740,3 +4742,15 @@ u32 btf_id(const struct btf *btf) > > { > > return btf->id; > > } > > + > > +static int btf_id_cmp_func(const void *a, const void *b) > > +{ > > + const int *pa = a, *pb = b; > > + > > + return *pa - *pb; > > +} > > + > > +bool btf_id_set_contains(struct btf_id_set *set, u32 id) > > +{ > > + return bsearch(&id, set->ids, set->cnt, sizeof(int), > > btf_id_cmp_func) != NULL; > > very nit ;) sizeof(__u32) sure ;-) thanks, jirka
Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function
On 29/07/2020 00:06, Petr Machata wrote: Kurt Kanzenbach writes: On Mon Jul 27 2020, Petr Machata wrote: So this looks good, and works, but I'm wondering about one thing. Thanks for testing. Your code (and evidently most drivers as well) use a different check than mlxsw, namely skb->len + ETH_HLEN < X. When I print_hex_dump() skb_mac_header(skb), skb->len in mlxsw with some test packet, I get e.g. this: 259a4db7: 01 00 5e 00 01 81 00 02 c9 a4 e4 e1 08 00 45 00 ..^...E. 5f29f0eb: 00 48 0d c9 40 00 01 11 c8 59 c0 00 02 01 e0 00 .H..@Y.. f3663e9e: 01 81 01 3f 01 3f 00 34 9f d3 00 02 00 2c 00 00 ...?.?.4.,.. ^sp^^ ^dp^^ ^len^ ^cks^ ^len^ b3914606: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 2e7828ea: c9 ff fe a4 e4 e1 00 01 09 fa 00 00 00 00 00 00 0b98156e: 00 00 00 00 00 00.. Both UDP and PTP length fields indicate that the payload ends exactly at the end of the dump. So apparently skb->len contains all the payload bytes, including the Ethernet header. Is that the case for other drivers as well? Maybe mlxsw is just missing some SKB magic in the driver. So I run some tests (on other hardware/drivers) and it seems like that the skb->len usually doesn't include the ETH_HLEN. Therefore, it is added to the check. Looking at the driver code: |static void mlxsw_sp_rx_sample_listener(struct sk_buff *skb, u8 local_port, | void *trap_ctx) |{ | [...] | /* The sample handler expects skb->data to point to the start of the |* Ethernet header. |*/ | skb_push(skb, ETH_HLEN); | mlxsw_sp_sample_receive(mlxsw_sp, skb, local_port); |} Maybe that's the issue here? Correct, mlxsw pushes the header very soon. Given that both ptp_classify_raw() and eth_type_trans() that are invoked later assume the header, it is reasonable. I have shuffled the pushes around and have a patch that both works and I think is correct. However, I find it odd that ptp_classify_raw() assumes that ->data points at Ethernet, while ptp_parse_header() makes the contrary assumption that ->len does not cover Ethernet. Those functions are likely to be used just a couple calls away from each other, if not outright next to each other. I suspect that ti/cpts.c and ti/am65-cpts.c (patches 5 and 6) actually hit an issue in this. ptp_classify_raw() is called without a surrounding _push / _pull (unlike DSA), which would imply skb->data points at Ethernet header, and indeed, the way the "data" variable is used confirms it. Both drivers, in all cases, will get ->data points at Ethernet header and ->len covers ETH_HLEN So, yes below check is incorrect, in general, and will be false always if other calculation are correct (only IPV4_HLEN(data + offset) can cause issues). if (skb->len + ETH_HLEN < offset + OFF_PTP_SEQUENCE_ID + sizeof(*seqid)) return 0; it might be good to update ptp_parse_header() documentation with expected state of skb. (At the same time the code adds ETH_HLEN to skb->len, but maybe it is just a cut'n'paste.) But then ptp_parse_header() is called, and that makes the assumption that skb->len does not cover the Ethernet header. I was also wondering about something else in that driver driver: The parsing code allows for ptp v1, but the message type was always fetched from offset 0 in the header. Is that indented? Yeah, I noticed that as well. That was a bug in the mlxsw code. Good riddance :) -- Best regards, grygorii
Re: [PATCH v1] farsync: use generic power management
On Wed, Jul 29, 2020 at 03:47:30PM +0530, Vaibhav Gupta wrote: > On Tue, Jul 28, 2020 at 03:04:13PM -0500, Bjorn Helgaas wrote: > > On Tue, Jul 28, 2020 at 09:58:10AM +0530, Vaibhav Gupta wrote: > > > The .suspend() and .resume() callbacks are not defined for this driver. > > > Still, their power management structure follows the legacy framework. To > > > bring it under the generic framework, simply remove the binding of > > > callbacks from "struct pci_driver". > > > > FWIW, this commit log is slightly misleading because .suspend and > > .resume are NULL by default, so this patch actually is a complete > > no-op as far as code generation is concerned. > > > > This change is worthwhile because it simplifies the code a little, but > > it doesn't convert the driver from legacy to generic power management. > > This driver doesn't supply a .pm structure, so it doesn't seem to do > > *any* power management. > > Agreed. Actually, as their presence only causes PCI core to call > pci_legacy_suspend/resume() for them, I thought that after removing > the binding from "struct pci_driver", this driver qualifies to be > grouped under genric framework, so used "use generic power > management" for the heading. > > I should have written "remove legacy bindning". This removed the *mention* of fst_driver.suspend and fst_driver.resume, which is important because we want to eventually remove those members completely from struct pci_driver. But fst_driver.suspend and fst_driver.resume *exist* before and after this patch, and they're initialized to zero before and after this patch. Since they were zero before, and they're still zero after this patch, the PCI core doesn't call pci_legacy_suspend/resume(). This patch doesn't change that at all. > But David has applied the patch, should I send a v2 or fix to update > message? No, I don't think David updates patches after he's applied them. But if the situation comes up again, you'll know how to describe it :) Bjorn
[PATCH] net/mlx5e: fix bpf_prog refcnt leaks in mlx5e_alloc_rq
The function invokes bpf_prog_inc(), which increases the refcount of a bpf_prog object "rq->xdp_prog" if the object isn't NULL. The refcount leak issues take place in two error handling paths. When mlx5_wq_ll_create() or mlx5_wq_cyc_create() fails, the function simply returns the error code and forgets to drop the refcount increased earlier, causing a refcount leak of "rq->xdp_prog". Fix this issue by jumping to the error handling path err_rq_wq_destroy when either function fails. Signed-off-by: Xin Xiong Signed-off-by: Xiyu Yang Signed-off-by: Xin Tan --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a836a02a2116..8e1b1ab416d8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -419,7 +419,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, err = mlx5_wq_ll_create(mdev, &rqp->wq, rqc_wq, &rq->mpwqe.wq, &rq->wq_ctrl); if (err) - return err; + goto err_rq_wq_destroy; rq->mpwqe.wq.db = &rq->mpwqe.wq.db[MLX5_RCV_DBR]; @@ -470,7 +470,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, err = mlx5_wq_cyc_create(mdev, &rqp->wq, rqc_wq, &rq->wqe.wq, &rq->wq_ctrl); if (err) - return err; + goto err_rq_wq_destroy; rq->wqe.wq.db = &rq->wqe.wq.db[MLX5_RCV_DBR]; -- 2.25.1
Re: [PATCH v3 net-next 02/11] qed/qede: make devlink survive recovery
Wed, Jul 29, 2020 at 01:38:37PM CEST, irussk...@marvell.com wrote: >Before that, devlink instance lifecycle was linked to qed_dev object, Before what? >that causes devlink to be recreated on each recovery. > >Changing it by making higher level driver (qede) responsible for its >life. This way devlink will survive recoveries. > >qede will store devlink structure pointer as a part of its device >object, devlink private data contains a linkage structure, it'll >contain extra devlink related content in following patches. > >The same lifecycle should be applied to storage drivers (qedf/qedi) later. >From this, one can't really tell if you are talking about existing state or about the matter of this patch. In the patch description, you should be imperative to the codebase telling it what to do. Could you please do that? > >Signed-off-by: Igor Russkikh >Signed-off-by: Alexander Lobakin >Signed-off-by: Michal Kalderon >--- > drivers/net/ethernet/qlogic/qed/qed.h | 1 - > drivers/net/ethernet/qlogic/qed/qed_devlink.c | 40 --- > drivers/net/ethernet/qlogic/qed/qed_devlink.h | 4 +- > drivers/net/ethernet/qlogic/qed/qed_main.c| 10 + > drivers/net/ethernet/qlogic/qede/qede.h | 1 + > drivers/net/ethernet/qlogic/qede/qede_main.c | 18 + > include/linux/qed/qed_if.h| 9 + > 7 files changed, 48 insertions(+), 35 deletions(-) > >diff --git a/drivers/net/ethernet/qlogic/qed/qed.h >b/drivers/net/ethernet/qlogic/qed/qed.h >index b2a7b53ee760..b6ce1488abcc 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed.h >+++ b/drivers/net/ethernet/qlogic/qed/qed.h >@@ -849,7 +849,6 @@ struct qed_dev { > u32 rdma_max_srq_sge; > u16 tunn_feature_mask; > >- struct devlink *dl; > booliwarp_cmt; > }; > >diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c >b/drivers/net/ethernet/qlogic/qed/qed_devlink.c >index eb693787c99e..a62c47c61edf 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c >+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c >@@ -5,6 +5,7 @@ > */ > > #include >+#include > #include "qed.h" > #include "qed_devlink.h" > >@@ -13,17 +14,12 @@ enum qed_devlink_param_id { > QED_DEVLINK_PARAM_ID_IWARP_CMT, > }; > >-struct qed_devlink { >- struct qed_dev *cdev; >-}; >- > static int qed_dl_param_get(struct devlink *dl, u32 id, > struct devlink_param_gset_ctx *ctx) > { >- struct qed_devlink *qed_dl; >+ struct qed_devlink *qed_dl = devlink_priv(dl); > struct qed_dev *cdev; > >- qed_dl = devlink_priv(dl); > cdev = qed_dl->cdev; > ctx->val.vbool = cdev->iwarp_cmt; > >@@ -33,10 +29,9 @@ static int qed_dl_param_get(struct devlink *dl, u32 id, > static int qed_dl_param_set(struct devlink *dl, u32 id, > struct devlink_param_gset_ctx *ctx) > { >- struct qed_devlink *qed_dl; >+ struct qed_devlink *qed_dl = devlink_priv(dl); > struct qed_dev *cdev; > >- qed_dl = devlink_priv(dl); > cdev = qed_dl->cdev; > cdev->iwarp_cmt = ctx->val.vbool; > >@@ -52,21 +47,19 @@ static const struct devlink_param qed_devlink_params[] = { > > static const struct devlink_ops qed_dl_ops; > >-int qed_devlink_register(struct qed_dev *cdev) >+struct devlink *qed_devlink_register(struct qed_dev *cdev) > { > union devlink_param_value value; >- struct qed_devlink *qed_dl; >+ struct qed_devlink *qdevlink; > struct devlink *dl; > int rc; > >- dl = devlink_alloc(&qed_dl_ops, sizeof(*qed_dl)); >+ dl = devlink_alloc(&qed_dl_ops, sizeof(struct qed_devlink)); Do "sizeof(*qdevlink)" instead. > if (!dl) >- return -ENOMEM; >+ return ERR_PTR(-ENOMEM); > >- qed_dl = devlink_priv(dl); >- >- cdev->dl = dl; >- qed_dl->cdev = cdev; >+ qdevlink = devlink_priv(dl); >+ qdevlink->cdev = cdev; > > rc = devlink_register(dl, &cdev->pdev->dev); > if (rc) >@@ -85,26 +78,25 @@ int qed_devlink_register(struct qed_dev *cdev) > devlink_params_publish(dl); > cdev->iwarp_cmt = false; > >- return 0; >+ return dl; > > err_unregister: > devlink_unregister(dl); > > err_free: >- cdev->dl = NULL; > devlink_free(dl); > >- return rc; >+ return ERR_PTR(rc); > } > >-void qed_devlink_unregister(struct qed_dev *cdev) >+void qed_devlink_unregister(struct devlink *devlink) > { >- if (!cdev->dl) >+ if (!devlink) > return; > >- devlink_params_unregister(cdev->dl, qed_devlink_params, >+ devlink_params_unregister(devlink, qed_devlink_params, > ARRAY_SIZE(qed_devlink_params)); > >- devlink_unregister(cdev->dl); >- devlink_free(cdev->dl); >+ devlink_unregister(devlink); >+ devlink_free(devlink); > } >diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h >b/drivers/net/eth
Re: [PATCH v3 net-next 03/11] qed: swap param init and publish
Wed, Jul 29, 2020 at 01:38:38PM CEST, irussk...@marvell.com wrote: >In theory that could lead to race condition Describe the problem, tell the codebase what to do. Plus, this looks like a -net material. Please consider pushing this separatelly with proper "fixes" tag. > >Signed-off-by: Igor Russkikh >Signed-off-by: Alexander Lobakin >Signed-off-by: Michal Kalderon >--- > drivers/net/ethernet/qlogic/qed/qed_devlink.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c >b/drivers/net/ethernet/qlogic/qed/qed_devlink.c >index a62c47c61edf..4e3316c6beb6 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c >+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c >@@ -75,8 +75,8 @@ struct devlink *qed_devlink_register(struct qed_dev *cdev) > QED_DEVLINK_PARAM_ID_IWARP_CMT, > value); > >- devlink_params_publish(dl); > cdev->iwarp_cmt = false; >+ devlink_params_publish(dl); > > return dl; > >-- >2.17.1 >
Re: [PATCH v3 net-next 05/11] qed: implement devlink info request
Wed, Jul 29, 2020 at 01:38:40PM CEST, irussk...@marvell.com wrote: >Here we return existing fw & mfw versions, we also fetch device's >serial number. > >The base device specific structure (qed_dev_info) was not directly >available to the base driver before. >Thus, here we create and store a private copy of this structure >in qed_dev root object. > >Signed-off-by: Igor Russkikh >Signed-off-by: Alexander Lobakin >Signed-off-by: Michal Kalderon >--- > drivers/net/ethernet/qlogic/qed/qed.h | 1 + > drivers/net/ethernet/qlogic/qed/qed_dev.c | 10 > drivers/net/ethernet/qlogic/qed/qed_devlink.c | 52 ++- > drivers/net/ethernet/qlogic/qed/qed_main.c| 1 + > 4 files changed, 63 insertions(+), 1 deletion(-) > >diff --git a/drivers/net/ethernet/qlogic/qed/qed.h >b/drivers/net/ethernet/qlogic/qed/qed.h >index b6ce1488abcc..ccd789eeda3e 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed.h >+++ b/drivers/net/ethernet/qlogic/qed/qed.h >@@ -807,6 +807,7 @@ struct qed_dev { > struct qed_llh_info *p_llh_info; > > /* Linux specific here */ >+ struct qed_dev_info common_dev_info; > struct qede_dev*edev; > struct pci_dev *pdev; > u32 flags; >diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c >b/drivers/net/ethernet/qlogic/qed/qed_dev.c >index b3c9ebaf2280..377950ce8ea2 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c >+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c >@@ -4290,6 +4290,16 @@ static int qed_hw_get_nvm_info(struct qed_hwfn *p_hwfn, >struct qed_ptt *p_ptt) > __set_bit(QED_DEV_CAP_ROCE, > &p_hwfn->hw_info.device_capabilities); > >+ /* Read device serial number information from shmem */ >+ addr = MCP_REG_SCRATCH + nvm_cfg1_offset + >+ offsetof(struct nvm_cfg1, glob) + >+ offsetof(struct nvm_cfg1_glob, serial_number); >+ >+ p_hwfn->hw_info.part_num[0] = qed_rd(p_hwfn, p_ptt, addr); >+ p_hwfn->hw_info.part_num[1] = qed_rd(p_hwfn, p_ptt, addr + 4); >+ p_hwfn->hw_info.part_num[2] = qed_rd(p_hwfn, p_ptt, addr + 8); >+ p_hwfn->hw_info.part_num[3] = qed_rd(p_hwfn, p_ptt, addr + 12); for() ? >+ > return qed_mcp_fill_shmem_func_info(p_hwfn, p_ptt); > } > >diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c >b/drivers/net/ethernet/qlogic/qed/qed_devlink.c >index 4e3316c6beb6..5bd5528dc409 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c >+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c >@@ -45,7 +45,57 @@ static const struct devlink_param qed_devlink_params[] = { >qed_dl_param_get, qed_dl_param_set, NULL), > }; > >-static const struct devlink_ops qed_dl_ops; >+static int qed_devlink_info_get(struct devlink *devlink, >+ struct devlink_info_req *req, >+ struct netlink_ext_ack *extack) >+{ >+ struct qed_devlink *qed_dl = devlink_priv(devlink); >+ struct qed_dev *cdev = qed_dl->cdev; >+ struct qed_dev_info *dev_info; >+ char buf[100]; >+ int err; >+ >+ dev_info = &cdev->common_dev_info; >+ >+ err = devlink_info_driver_name_put(req, KBUILD_MODNAME); >+ if (err) >+ return err; >+ >+ memcpy(buf, cdev->hwfns[0].hw_info.part_num, >sizeof(cdev->hwfns[0].hw_info.part_num)); >+ buf[sizeof(cdev->hwfns[0].hw_info.part_num)] = 0; >+ >+ if (buf[0]) { >+ err = devlink_info_serial_number_put(req, buf); >+ if (err) >+ return err; >+ } >+ >+ snprintf(buf, sizeof(buf), "%d.%d.%d.%d", >+ GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_3), >+ GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_2), >+ GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_1), >+ GET_MFW_FIELD(dev_info->mfw_rev, QED_MFW_VERSION_0)); >+ >+ err = devlink_info_version_stored_put(req, >+ >DEVLINK_INFO_VERSION_GENERIC_FW_MGMT, buf); >+ if (err) >+ return err; >+ >+ snprintf(buf, sizeof(buf), "%d.%d.%d.%d", >+ dev_info->fw_major, >+ dev_info->fw_minor, >+ dev_info->fw_rev, >+ dev_info->fw_eng); >+ >+ err = devlink_info_version_running_put(req, >+ DEVLINK_INFO_VERSION_GENERIC_FW, >buf); return = devlink... >+ >+ return err; >+} >+ >+static const struct devlink_ops qed_dl_ops = { >+ .info_get = qed_devlink_info_get, >+}; > > struct devlink *qed_devlink_register(struct qed_dev *cdev) > { >diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c >b/drivers/net/ethernet/qlogic/qed/qed_main.c >index d6f76421379b..d1a559ccf516 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed_main.c >+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c >@@ -479,6 +479,7 @@ int qed_fill_dev_info(struct
Re: [PATCH net 2/2] net/sched: act_ct: Set offload timeout when setting the offload bit
On 2020-07-28 5:42 PM, Marcelo Ricardo Leitner wrote: > On Tue, Jul 28, 2020 at 02:57:59PM +0300, Roi Dayan wrote: >> On heavily loaded systems the GC can take time to go over all existing >> conns and reset their timeout. At that time other calls like from >> nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as >> expired. To fix this when we set the offload bit we should also reset >> the timeout instead of counting on GC to finish first iteration over >> all conns before the initial timeout. >> >> Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to >> flow table") >> Signed-off-by: Roi Dayan >> Reviewed-by: Oz Shlomo >> --- >> net/sched/act_ct.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c >> index e9f3576cbf71..650c2d78a346 100644 >> --- a/net/sched/act_ct.c >> +++ b/net/sched/act_ct.c >> @@ -366,6 +366,8 @@ static void tcf_ct_flow_table_add(struct >> tcf_ct_flow_table *ct_ft, > > Extra context line: > err = flow_offload_add(&ct_ft->nf_ft, entry); >> if (err) >> goto err_add; >> >> +nf_ct_offload_timeout(ct); >> + > > What about adding this to flow_offload_add() instead? > It is already adjusting the flow_offload timeout there and then it > also effective for nft. > As you said, in flow_offload_add() we adjust the flow timeout. Here we adjust the conn timeout. So it's outside flow_offload_add() which only touch the flow struct. I guess it's like conn offload bit is set outside here and for nft. What do you think? >> return; >> >> err_add: >> -- >> 2.8.4 >>
[PATCH iproute2 master v3] bridge: fdb show: fix fdb entry state output for json context
From: Julien Fortin bridge json fdb show is printing an incorrect / non-machine readable value, when using -j (json output) we are expecting machine readable data that shouldn't require special handling/parsing. $ bridge -j fdb show | \ python -c \ 'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))' [ { "master": "br0", "mac": "56:23:28:4f:4f:e5", "flags": [], "ifname": "vx0", "state": "state=0x80" < with the patch: "state": "0x80" } ] Fixes: c7c1a1ef51aea7c ("bridge: colorize output and use JSON print library") Signed-off-by: Julien Fortin --- bridge/fdb.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/bridge/fdb.c b/bridge/fdb.c index 710dfc99..d59bfb34 100644 --- a/bridge/fdb.c +++ b/bridge/fdb.c @@ -62,7 +62,10 @@ static const char *state_n2a(unsigned int s) if (s & NUD_REACHABLE) return ""; - sprintf(buf, "state=%#x", s); + if (is_json_context()) + sprintf(buf, "%#x", s); + else + sprintf(buf, "state=%#x", s); return buf; } -- 2.27.0
[PATCH] atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent
atmtcp_remove_persistent() invokes atm_dev_lookup(), which returns a reference of atm_dev with increased refcount or NULL if fails. The refcount leaks issues occur in two error handling paths. If dev_data->persist is zero or PRIV(dev)->vcc isn't NULL, the function returns 0 without decreasing the refcount kept by a local variable, resulting in refcount leaks. Fix the issue by adding atm_dev_put() before returning 0 both when dev_data->persist is zero or PRIV(dev)->vcc isn't NULL. Signed-off-by: Xin Xiong Signed-off-by: Xiyu Yang Signed-off-by: Xin Tan --- drivers/atm/atmtcp.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/atm/atmtcp.c b/drivers/atm/atmtcp.c index d9fd70280482..7f814da3c2d0 100644 --- a/drivers/atm/atmtcp.c +++ b/drivers/atm/atmtcp.c @@ -433,9 +433,15 @@ static int atmtcp_remove_persistent(int itf) return -EMEDIUMTYPE; } dev_data = PRIV(dev); - if (!dev_data->persist) return 0; + if (!dev_data->persist) { + atm_dev_put(dev); + return 0; + } dev_data->persist = 0; - if (PRIV(dev)->vcc) return 0; + if (PRIV(dev)->vcc) { + atm_dev_put(dev); + return 0; + } kfree(dev_data); atm_dev_put(dev); atm_dev_deregister(dev); -- 2.25.1
Re: [PATCH v3 net-next 07/11] qed: use devlink logic to report errors
Wed, Jul 29, 2020 at 01:38:42PM CEST, irussk...@marvell.com wrote: >Use devlink_health_report to push error indications. >We implement this in qede via callback function to make it possible >to reuse the same for other drivers sitting on top of qed in future. > >Signed-off-by: Igor Russkikh >Signed-off-by: Alexander Lobakin >Signed-off-by: Michal Kalderon >--- > drivers/net/ethernet/qlogic/qed/qed_devlink.c | 17 + > drivers/net/ethernet/qlogic/qed/qed_devlink.h | 2 ++ > drivers/net/ethernet/qlogic/qed/qed_main.c| 1 + > drivers/net/ethernet/qlogic/qede/qede.h | 1 + > drivers/net/ethernet/qlogic/qede/qede_main.c | 5 - > include/linux/qed/qed_if.h| 3 +++ > 6 files changed, 28 insertions(+), 1 deletion(-) > >diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c >b/drivers/net/ethernet/qlogic/qed/qed_devlink.c >index 843a35f14cca..ffe776a4f99a 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c >+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c >@@ -14,6 +14,23 @@ enum qed_devlink_param_id { > QED_DEVLINK_PARAM_ID_IWARP_CMT, > }; > >+struct qed_fw_fatal_ctx { >+ enum qed_hw_err_type err_type; >+}; >+ >+int qed_report_fatal_error(struct devlink *devlink, enum qed_hw_err_type >err_type) >+{ >+ struct qed_devlink *qdl = devlink_priv(devlink); >+ struct qed_fw_fatal_ctx fw_fatal_ctx = { >+ .err_type = err_type, >+ }; >+ >+ devlink_health_report(qdl->fw_reporter, >+"Fatal error reported", &fw_fatal_ctx); saying "reported" to the reporter sounds odd. Maybe occurred? >+ >+ return 0; >+} >+ > static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = { > .name = "fw_fatal", > }; >diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.h >b/drivers/net/ethernet/qlogic/qed/qed_devlink.h >index c68ecf778826..ccc7d1d1bfd4 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.h >+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.h >@@ -15,4 +15,6 @@ void qed_devlink_unregister(struct devlink *devlink); > void qed_fw_reporters_create(struct devlink *devlink); > void qed_fw_reporters_destroy(struct devlink *devlink); > >+int qed_report_fatal_error(struct devlink *dl, enum qed_hw_err_type err_type); >+ > #endif >diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c >b/drivers/net/ethernet/qlogic/qed/qed_main.c >index d1a559ccf516..a64d594f9294 100644 >--- a/drivers/net/ethernet/qlogic/qed/qed_main.c >+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c >@@ -3007,6 +3007,7 @@ const struct qed_common_ops qed_common_ops_pass = { > .update_msglvl = &qed_init_dp, > .devlink_register = qed_devlink_register, > .devlink_unregister = qed_devlink_unregister, >+ .report_fatal_error = qed_report_fatal_error, > .dbg_all_data = &qed_dbg_all_data, > .dbg_all_data_size = &qed_dbg_all_data_size, > .chain_alloc = &qed_chain_alloc, >diff --git a/drivers/net/ethernet/qlogic/qede/qede.h >b/drivers/net/ethernet/qlogic/qede/qede.h >index 1f0e7505a973..3efc5899f656 100644 >--- a/drivers/net/ethernet/qlogic/qede/qede.h >+++ b/drivers/net/ethernet/qlogic/qede/qede.h >@@ -264,6 +264,7 @@ struct qede_dev { > > struct bpf_prog *xdp_prog; > >+ enum qed_hw_err_typelast_err_type; > unsigned long err_flags; > #define QEDE_ERR_IS_HANDLED 31 > #define QEDE_ERR_ATTN_CLR_EN 0 >diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c >b/drivers/net/ethernet/qlogic/qede/qede_main.c >index 7c2d948b2035..df437c3f1fc9 100644 >--- a/drivers/net/ethernet/qlogic/qede/qede_main.c >+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c >@@ -1181,7 +1181,6 @@ static int __qede_probe(struct pci_dev *pdev, u32 >dp_module, u8 dp_level, > } > } else { > struct net_device *ndev = pci_get_drvdata(pdev); >- Leftover. Please remove. > edev = netdev_priv(ndev); > > if (edev && edev->devlink) { >@@ -2603,6 +2602,9 @@ static void qede_generic_hw_err_handler(struct qede_dev >*edev) > "Generic sleepable HW error handling started - err_flags > 0x%lx\n", > edev->err_flags); > >+ if (edev->devlink) >+ edev->ops->common->report_fatal_error(edev->devlink, >edev->last_err_type); >+ > /* Trigger a recovery process. >* This is placed in the sleep requiring section just to make >* sure it is the last one, and that all the other operations >@@ -2663,6 +2665,7 @@ static void qede_schedule_hw_err_handler(void *dev, > return; > } > >+ edev->last_err_type = err_type; > qede_set_hw_err_flags(edev, err_type); > qede_atomic_hw_err_handler(edev); > set_bit(QEDE_SP_HW_ERR, &edev->sp_flags); >diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h >index 30fe06fe06a0..1297726f2b25 100644 >--- a/incl
[PATCH bpf-next] Documentation/bpf: Use valid and new links in index.rst
There exists an error "404 Not Found" when I clik the html link of "Documentation/networking/filter.rst" in the BPF documentation [1], fix it. Additionally, use the new links about "BPF and XDP Reference Guide" and "bpf(2)" to avoid redirects. [1] https://www.kernel.org/doc/html/latest/bpf/ Fixes: d9b9170a2653 ("docs: bpf: Rename README.rst to index.rst") Fixes: cb3f0d56e153 ("docs: networking: convert filter.txt to ReST") Signed-off-by: Tiezhu Yang --- Documentation/bpf/index.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index 26f4bb3..1b901b4 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -68,7 +68,7 @@ Testing and debugging BPF .. Links: -.. _Documentation/networking/filter.rst: ../networking/filter.txt +.. _Documentation/networking/filter.rst: ../networking/filter.html .. _man-pages: https://www.kernel.org/doc/man-pages/ -.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html -.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/ +.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html +.. _BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/ -- 2.1.0
[PATCH bpf-next v2] Documentation/bpf: Use valid and new links in index.rst
There exists an error "404 Not Found" when I click the html link of "Documentation/networking/filter.rst" in the BPF documentation [1], fix it. Additionally, use the new links about "BPF and XDP Reference Guide" and "bpf(2)" to avoid redirects. [1] https://www.kernel.org/doc/html/latest/bpf/ Fixes: d9b9170a2653 ("docs: bpf: Rename README.rst to index.rst") Fixes: cb3f0d56e153 ("docs: networking: convert filter.txt to ReST") Signed-off-by: Tiezhu Yang --- v2: - Fix a typo "clik" to "click" in the commit message, sorry for that Documentation/bpf/index.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index 26f4bb3..1b901b4 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -68,7 +68,7 @@ Testing and debugging BPF .. Links: -.. _Documentation/networking/filter.rst: ../networking/filter.txt +.. _Documentation/networking/filter.rst: ../networking/filter.html .. _man-pages: https://www.kernel.org/doc/man-pages/ -.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html -.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/ +.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html +.. _BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/ -- 2.1.0
Re: [PATCH bpf-next v4 05/14] xsk: move queue_id, dev and need_wakeup to buffer pool
On Tue, Jul 28, 2020 at 9:10 AM Björn Töpel wrote: > > > > On 2020-07-21 07:03, Magnus Karlsson wrote: > > Move queue_id, dev, and need_wakeup from the umem to the > > buffer pool. This so that we in a later commit can share the umem > > between multiple HW queues. There is one buffer pool per dev and > > queue id, so these variables should belong to the buffer pool, not > > the umem. Need_wakeup is also something that is set on a per napi > > level, so there is usually one per device and queue id. So move > > this to the buffer pool too. > > > > Signed-off-by: Magnus Karlsson > > --- > > include/net/xdp_sock.h | 3 --- > > include/net/xsk_buff_pool.h | 4 > > net/xdp/xdp_umem.c | 19 +-- > > net/xdp/xdp_umem.h | 4 > > net/xdp/xsk.c | 40 +++- > > net/xdp/xsk_buff_pool.c | 39 ++- > > net/xdp/xsk_diag.c | 4 ++-- > > 7 files changed, 44 insertions(+), 69 deletions(-) > > > [...] > > } > > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c > > index 36287d2..436648a 100644 > > --- a/net/xdp/xsk_buff_pool.c > > +++ b/net/xdp/xsk_buff_pool.c > > @@ -95,10 +95,9 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct > > xdp_rxq_info *rxq) > > } > > EXPORT_SYMBOL(xp_set_rxq_info); > > > > -int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev, > > +int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *netdev, > > u16 queue_id, u16 flags) > > { > > - struct xdp_umem *umem = pool->umem; > > bool force_zc, force_copy; > > struct netdev_bpf bpf; > > int err = 0; > > @@ -111,27 +110,30 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct > > net_device *dev, > > if (force_zc && force_copy) > > return -EINVAL; > > > > - if (xsk_get_pool_from_qid(dev, queue_id)) > > + if (xsk_get_pool_from_qid(netdev, queue_id)) > > return -EBUSY; > > > > - err = xsk_reg_pool_at_qid(dev, pool, queue_id); > > + err = xsk_reg_pool_at_qid(netdev, pool, queue_id); > > if (err) > > return err; > > > > if (flags & XDP_USE_NEED_WAKEUP) { > > - umem->flags |= XDP_UMEM_USES_NEED_WAKEUP; > > + pool->uses_need_wakeup = true; > > /* Tx needs to be explicitly woken up the first time. > >* Also for supporting drivers that do not implement this > >* feature. They will always have to call sendto(). > >*/ > > - umem->need_wakeup = XDP_WAKEUP_TX; > > + pool->cached_need_wakeup = XDP_WAKEUP_TX; > > } > > > > + dev_hold(netdev); > > + > > You have a reference leak here for the error case. Thanks. Will fix. /Magnus > > Björn
Re: [PATCH RFC net-next] net: phy: add Marvell PHY PTP support
On Wed, Jul 29, 2020 at 11:58:07AM +0100, Russell King - ARM Linux admin wrote: > How do we deal with this situation - from what I can see from the > ethtool API, we have to make a choice about which to use. How do we > make that choice? Unfortunately the stack does not implement simultaneous MAC + PHY time stamping. If your board has both, then you make the choice to use the PHY by selecting NETWORK_PHY_TIMESTAMPING at kernel compile time. (Also some MAC drivers do not defer to the PHY properly. Sometimes you can work around that by de-selecting the MAC's PTP function in the Kconfig if possible, but otherwise you need to patch the MAC driver.) > Do we need a property to indicate whether we wish to use the PHY > or MAC PTP stamping, or something more elaborate? To do this at run time would require quite some work, I expect. Thanks, Richard
Re: [PATCH bpf-next v4 05/14] xsk: move queue_id, dev and need_wakeup to buffer pool
On Tue, Jul 28, 2020 at 11:22 AM Maxim Mikityanskiy wrote: > > On 2020-07-21 08:03, Magnus Karlsson wrote: > > Move queue_id, dev, and need_wakeup from the umem to the > > buffer pool. This so that we in a later commit can share the umem > > between multiple HW queues. There is one buffer pool per dev and > > queue id, so these variables should belong to the buffer pool, not > > the umem. Need_wakeup is also something that is set on a per napi > > level, so there is usually one per device and queue id. So move > > this to the buffer pool too. > > > > Signed-off-by: Magnus Karlsson > > --- > > include/net/xdp_sock.h | 3 --- > > include/net/xsk_buff_pool.h | 4 > > net/xdp/xdp_umem.c | 19 +-- > > net/xdp/xdp_umem.h | 4 > > net/xdp/xsk.c | 40 +++- > > net/xdp/xsk_buff_pool.c | 39 ++- > > net/xdp/xsk_diag.c | 4 ++-- > > 7 files changed, 44 insertions(+), 69 deletions(-) > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > index 2a284e1..b052f1c 100644 > > --- a/include/net/xdp_sock.h > > +++ b/include/net/xdp_sock.h > > @@ -26,11 +26,8 @@ struct xdp_umem { > > refcount_t users; > > struct page **pgs; > > u32 npgs; > > - u16 queue_id; > > - u8 need_wakeup; > > u8 flags; > > int id; > > - struct net_device *dev; > > bool zc; > > spinlock_t xsk_tx_list_lock; > > struct list_head xsk_tx_list; > > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h > > index 380d9ae..2d94890 100644 > > --- a/include/net/xsk_buff_pool.h > > +++ b/include/net/xsk_buff_pool.h > > @@ -43,11 +43,15 @@ struct xsk_buff_pool { > > u32 headroom; > > u32 chunk_size; > > u32 frame_len; > > + u16 queue_id; > > + u8 cached_need_wakeup; > > + bool uses_need_wakeup; > > bool dma_need_sync; > > bool unaligned; > > struct xdp_umem *umem; > > void *addrs; > > struct device *dev; > > + struct net_device *netdev; > > refcount_t users; > > struct work_struct work; > > struct xdp_buff_xsk *free_heads[]; > > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c > > index 7d86a63..b1699d0 100644 > > --- a/net/xdp/xdp_umem.c > > +++ b/net/xdp/xdp_umem.c > > @@ -63,26 +63,9 @@ static void xdp_umem_unaccount_pages(struct xdp_umem > > *umem) > > } > > } > > > > -void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, > > - u16 queue_id) > > -{ > > - umem->dev = dev; > > - umem->queue_id = queue_id; > > - > > - dev_hold(dev); > > -} > > - > > -void xdp_umem_clear_dev(struct xdp_umem *umem) > > -{ > > - dev_put(umem->dev); > > - umem->dev = NULL; > > - umem->zc = false; > > -} > > - > > static void xdp_umem_release(struct xdp_umem *umem) > > { > > - xdp_umem_clear_dev(umem); > > - > > + umem->zc = false; > > ida_simple_remove(&umem_ida, umem->id); > > > > xdp_umem_unpin_pages(umem); > > diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h > > index 93e96be..67bf3f3 100644 > > --- a/net/xdp/xdp_umem.h > > +++ b/net/xdp/xdp_umem.h > > @@ -8,10 +8,6 @@ > > > > #include > > > > -void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, > > - u16 queue_id); > > -void xdp_umem_clear_dev(struct xdp_umem *umem); > > -bool xdp_umem_validate_queues(struct xdp_umem *umem); > > void xdp_get_umem(struct xdp_umem *umem); > > void xdp_put_umem(struct xdp_umem *umem); > > void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs); > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index ee04887..624d0fc 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -41,67 +41,61 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs) > > > > void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) > > { > > - struct xdp_umem *umem = pool->umem; > > - > > - if (umem->need_wakeup & XDP_WAKEUP_RX) > > + if (pool->cached_need_wakeup & XDP_WAKEUP_RX) > > return; > > > > pool->fq->ring->flags |= XDP_RING_NEED_WAKEUP; > > - umem->need_wakeup |= XDP_WAKEUP_RX; > > + pool->cached_need_wakeup |= XDP_WAKEUP_RX; > > } > > EXPORT_SYMBOL(xsk_set_rx_need_wakeup); > > > > void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) > > { > > - struct xdp_umem *umem = pool->umem; > > struct xdp_sock *xs; > > > > - if (umem->need_wakeup & XDP_WAKEUP_TX) > > + if (pool->cached_need_wakeup & XDP_WAKEUP_TX) > > return; > > > > rcu_read_lock(); > > - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { > > + list_for_each_entry_rcu(xs, &xs->umem->xsk_tx_list, list) { > > I know this is going to be fixed in the next patch, but it breaks > bisectability: this patch is broken because xs is not assigned. I don't > think this cha
Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function
On Wed, Jul 29, 2020 at 11:02:57AM +0100, Russell King - ARM Linux admin wrote: > in other words, compare pointers, so that whether skb_push() etc has > been used on the skb is irrelevant? +1
Re: [PATCH v1] farsync: use generic power management
On Wed, Jul 29, 2020 at 07:29:54AM -0500, Bjorn Helgaas wrote: > On Wed, Jul 29, 2020 at 03:47:30PM +0530, Vaibhav Gupta wrote: > > > > Agreed. Actually, as their presence only causes PCI core to call > > pci_legacy_suspend/resume() for them, I thought that after removing > > the binding from "struct pci_driver", this driver qualifies to be > > grouped under genric framework, so used "use generic power > > management" for the heading. > > > > I should have written "remove legacy bindning". > > This removed the *mention* of fst_driver.suspend and fst_driver.resume, > which is important because we want to eventually remove those members > completely from struct pci_driver. > > But fst_driver.suspend and fst_driver.resume *exist* before and after > this patch, and they're initialized to zero before and after this > patch. > > Since they were zero before, and they're still zero after this patch, > the PCI core doesn't call pci_legacy_suspend/resume(). This patch > doesn't change that at all. > Got it. Thanks :) > > But David has applied the patch, should I send a v2 or fix to update > > message? > > No, I don't think David updates patches after he's applied them. But > if the situation comes up again, you'll know how to describe it :) > Thanks a lot. :D Vaibhav Gupta > Bjorn
Re: [PATCH bpf-next v4 08/14] xsk: enable sharing of dma mappings
On Tue, Jul 28, 2020 at 11:00 AM Maxim Mikityanskiy wrote: > > On 2020-07-21 08:04, Magnus Karlsson wrote: > > Enable the sharing of dma mappings by moving them out from the buffer > > pool. Instead we put each dma mapped umem region in a list in the umem > > structure. If dma has already been mapped for this umem and device, it > > is not mapped again and the existing dma mappings are reused. > > > > Signed-off-by: Magnus Karlsson > > --- > > include/net/xdp_sock.h | 1 + > > include/net/xsk_buff_pool.h | 7 +++ > > net/xdp/xdp_umem.c | 1 + > > net/xdp/xsk_buff_pool.c | 112 > > > > 4 files changed, 102 insertions(+), 19 deletions(-) > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > index 126d243..282aeba 100644 > > --- a/include/net/xdp_sock.h > > +++ b/include/net/xdp_sock.h > > @@ -30,6 +30,7 @@ struct xdp_umem { > > u8 flags; > > int id; > > bool zc; > > + struct list_head xsk_dma_list; > > }; > > > > struct xsk_map { > > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h > > index 83f100c..8f1dc4c 100644 > > --- a/include/net/xsk_buff_pool.h > > +++ b/include/net/xsk_buff_pool.h > > @@ -28,6 +28,13 @@ struct xdp_buff_xsk { > > struct list_head free_list_node; > > }; > > > > +struct xsk_dma_map { > > + dma_addr_t *dma_pages; > > + struct net_device *dev; > > + refcount_t users; > > + struct list_head list; /* Protected by the RTNL_LOCK */ > > +}; > > + > > struct xsk_buff_pool { > > struct xsk_queue *fq; > > struct xsk_queue *cq; > > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c > > index 372998d..cf27249 100644 > > --- a/net/xdp/xdp_umem.c > > +++ b/net/xdp/xdp_umem.c > > @@ -199,6 +199,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct > > xdp_umem_reg *mr) > > umem->user = NULL; > > umem->flags = mr->flags; > > > > + INIT_LIST_HEAD(&umem->xsk_dma_list); > > refcount_set(&umem->users, 1); > > > > err = xdp_umem_account_pages(umem); > > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c > > index c563874..ca74a3e 100644 > > --- a/net/xdp/xsk_buff_pool.c > > +++ b/net/xdp/xsk_buff_pool.c > > @@ -104,6 +104,25 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, > > struct xdp_rxq_info *rxq) > > } > > EXPORT_SYMBOL(xp_set_rxq_info); > > > > +static void xp_disable_drv_zc(struct xsk_buff_pool *pool) > > +{ > > + struct netdev_bpf bpf; > > + int err; > > + > > + ASSERT_RTNL(); > > + > > + if (pool->umem->zc) { > > + bpf.command = XDP_SETUP_XSK_POOL; > > + bpf.xsk.pool = NULL; > > + bpf.xsk.queue_id = pool->queue_id; > > + > > + err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf); > > + > > + if (err) > > + WARN(1, "Failed to disable zero-copy!\n"); > > + } > > +} > > + > > int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *netdev, > > u16 queue_id, u16 flags) > > { > > @@ -122,6 +141,8 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct > > net_device *netdev, > > if (xsk_get_pool_from_qid(netdev, queue_id)) > > return -EBUSY; > > > > + pool->netdev = netdev; > > + pool->queue_id = queue_id; > > err = xsk_reg_pool_at_qid(netdev, pool, queue_id); > > if (err) > > return err; > > @@ -155,11 +176,15 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct > > net_device *netdev, > > if (err) > > goto err_unreg_pool; > > > > - pool->netdev = netdev; > > - pool->queue_id = queue_id; > > + if (!pool->dma_pages) { > > + WARN(1, "Driver did not DMA map zero-copy buffers"); > > + goto err_unreg_xsk; > > + } > > pool->umem->zc = true; > > return 0; > > > > +err_unreg_xsk: > > + xp_disable_drv_zc(pool); > > err_unreg_pool: > > if (!force_zc) > > err = 0; /* fallback to copy mode */ > > @@ -170,25 +195,10 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct > > net_device *netdev, > > > > void xp_clear_dev(struct xsk_buff_pool *pool) > > { > > - struct netdev_bpf bpf; > > - int err; > > - > > - ASSERT_RTNL(); > > - > > if (!pool->netdev) > > return; > > > > - if (pool->umem->zc) { > > - bpf.command = XDP_SETUP_XSK_POOL; > > - bpf.xsk.pool = NULL; > > - bpf.xsk.queue_id = pool->queue_id; > > - > > - err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf); > > - > > - if (err) > > - WARN(1, "Failed to disable zero-copy!\n"); > > - } > > - > > + xp_disable_drv_zc(pool); > > xsk_clear_pool_at_qid(pool->netdev, pool->queue_id); > > dev_put(pool->netdev); > > pool->netdev = NULL; > > @@ -233,14 +243,61 @@ void xp_put_pool(struct xsk_buf
[PATCH net-next v3 1/2] net: openvswitch: add masks cache hit counter
Add a counter that counts the number of masks cache hits, and export it through the megaflow netlink statistics. Reviewed-by: Paolo Abeni Reviewed-by: Tonghao Zhang Signed-off-by: Eelco Chaudron --- include/uapi/linux/openvswitch.h |2 +- net/openvswitch/datapath.c |5 - net/openvswitch/datapath.h |3 +++ net/openvswitch/flow_table.c | 19 ++- net/openvswitch/flow_table.h |3 ++- 5 files changed, 24 insertions(+), 8 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 9b14519e74d9..7cb76e5ca7cf 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -102,8 +102,8 @@ struct ovs_dp_megaflow_stats { __u64 n_mask_hit;/* Number of masks used for flow lookups. */ __u32 n_masks; /* Number of masks for the datapath. */ __u32 pad0; /* Pad for future expension. */ + __u64 n_cache_hit; /* Number of cache matches for flow lookups. */ __u64 pad1; /* Pad for future expension. */ - __u64 pad2; /* Pad for future expension. */ }; struct ovs_vport_stats { diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 95805f0e27bd..a54df1fe3ec4 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -225,13 +225,14 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key) struct dp_stats_percpu *stats; u64 *stats_counter; u32 n_mask_hit; + u32 n_cache_hit; int error; stats = this_cpu_ptr(dp->stats_percpu); /* Look up flow. */ flow = ovs_flow_tbl_lookup_stats(&dp->table, key, skb_get_hash(skb), -&n_mask_hit); +&n_mask_hit, &n_cache_hit); if (unlikely(!flow)) { struct dp_upcall_info upcall; @@ -262,6 +263,7 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key) u64_stats_update_begin(&stats->syncp); (*stats_counter)++; stats->n_mask_hit += n_mask_hit; + stats->n_cache_hit += n_cache_hit; u64_stats_update_end(&stats->syncp); } @@ -699,6 +701,7 @@ static void get_dp_stats(const struct datapath *dp, struct ovs_dp_stats *stats, stats->n_missed += local_stats.n_missed; stats->n_lost += local_stats.n_lost; mega_stats->n_mask_hit += local_stats.n_mask_hit; + mega_stats->n_cache_hit += local_stats.n_cache_hit; } } diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h index 697a2354194b..86d78613edb4 100644 --- a/net/openvswitch/datapath.h +++ b/net/openvswitch/datapath.h @@ -38,12 +38,15 @@ * @n_mask_hit: Number of masks looked up for flow match. * @n_mask_hit / (@n_hit + @n_missed) will be the average masks looked * up per packet. + * @n_cache_hit: The number of received packets that had their mask found using + * the mask cache. */ struct dp_stats_percpu { u64 n_hit; u64 n_missed; u64 n_lost; u64 n_mask_hit; + u64 n_cache_hit; struct u64_stats_sync syncp; }; diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index af22c9ee28dd..a5912ea05352 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -667,6 +667,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl, struct mask_array *ma, const struct sw_flow_key *key, u32 *n_mask_hit, + u32 *n_cache_hit, u32 *index) { u64 *usage_counters = this_cpu_ptr(ma->masks_usage_cntr); @@ -682,6 +683,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl, u64_stats_update_begin(&ma->syncp); usage_counters[*index]++; u64_stats_update_end(&ma->syncp); + (*n_cache_hit)++; return flow; } } @@ -719,7 +721,8 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl, struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl, const struct sw_flow_key *key, u32 skb_hash, - u32 *n_mask_hit) + u32 *n_mask_hit, + u32 *n_cache_hit) { struct mask_array *ma = rcu_dereference(tbl->mask_array); struct table_instance *ti = rcu_dereference(tbl->ti); @@ -729,10 +732,13 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl, int seg; *n_mask_hit
[PATCH net-next v3 0/2] net: openvswitch: masks cache enhancements
This patchset adds two enhancements to the Open vSwitch masks cache. Signed-off-by: Eelco Chaudron Changes in v3 [patch 2/2 only]: - Use is_power_of_2() function - Use array_size() function - Fix remaining sparse errors Changes in v2 [patch 2/2 only]: - Fix sparse warnings - Fix netlink policy items reported by Florian Westphal Eelco Chaudron (2): net: openvswitch: add masks cache hit counter net: openvswitch: make masks cache size configurable include/uapi/linux/openvswitch.h |3 + net/openvswitch/datapath.c | 19 ++ net/openvswitch/datapath.h |3 + net/openvswitch/flow_table.c | 116 -- net/openvswitch/flow_table.h | 13 5 files changed, 132 insertions(+), 22 deletions(-)
[PATCH net-next v3 2/2] net: openvswitch: make masks cache size configurable
This patch makes the masks cache size configurable, or with a size of 0, disable it. Reviewed-by: Paolo Abeni Signed-off-by: Eelco Chaudron --- Changes in v3: - Use is_power_of_2() function - Use array_size() function - Fix remaining sparse errors Changes in v2: - Fix sparse warnings - Fix netlink policy items reported by Florian Westphal include/uapi/linux/openvswitch.h |1 net/openvswitch/datapath.c | 14 + net/openvswitch/flow_table.c | 97 +- net/openvswitch/flow_table.h | 10 4 files changed, 108 insertions(+), 14 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 7cb76e5ca7cf..8300cc29dec8 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -86,6 +86,7 @@ enum ovs_datapath_attr { OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */ OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ OVS_DP_ATTR_PAD, + OVS_DP_ATTR_MASKS_CACHE_SIZE, __OVS_DP_ATTR_MAX }; diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index a54df1fe3ec4..114b2ddb8037 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -1498,6 +1498,7 @@ static size_t ovs_dp_cmd_msg_size(void) msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_stats)); msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_megaflow_stats)); msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_USER_FEATURES */ + msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_MASKS_CACHE_SIZE */ return msgsize; } @@ -1535,6 +1536,10 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb, if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features)) goto nla_put_failure; + if (nla_put_u32(skb, OVS_DP_ATTR_MASKS_CACHE_SIZE, + ovs_flow_tbl_masks_cache_size(&dp->table))) + goto nla_put_failure; + genlmsg_end(skb, ovs_header); return 0; @@ -1599,6 +1604,13 @@ static int ovs_dp_change(struct datapath *dp, struct nlattr *a[]) #endif } + if (a[OVS_DP_ATTR_MASKS_CACHE_SIZE]) { + u32 cache_size; + + cache_size = nla_get_u32(a[OVS_DP_ATTR_MASKS_CACHE_SIZE]); + ovs_flow_tbl_masks_cache_resize(&dp->table, cache_size); + } + dp->user_features = user_features; if (dp->user_features & OVS_DP_F_TC_RECIRC_SHARING) @@ -1894,6 +1906,8 @@ static const struct nla_policy datapath_policy[OVS_DP_ATTR_MAX + 1] = { [OVS_DP_ATTR_NAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 }, [OVS_DP_ATTR_UPCALL_PID] = { .type = NLA_U32 }, [OVS_DP_ATTR_USER_FEATURES] = { .type = NLA_U32 }, + [OVS_DP_ATTR_MASKS_CACHE_SIZE] = NLA_POLICY_RANGE(NLA_U32, 0, + PCPU_MIN_UNIT_SIZE / sizeof(struct mask_cache_entry)), }; static const struct genl_ops dp_datapath_genl_ops[] = { diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index a5912ea05352..5280aeeef628 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -38,8 +38,8 @@ #define MASK_ARRAY_SIZE_MIN16 #define REHASH_INTERVAL(10 * 60 * HZ) +#define MC_DEFAULT_HASH_ENTRIES256 #define MC_HASH_SHIFT 8 -#define MC_HASH_ENTRIES(1u << MC_HASH_SHIFT) #define MC_HASH_SEGS ((sizeof(uint32_t) * 8) / MC_HASH_SHIFT) static struct kmem_cache *flow_cache; @@ -341,15 +341,75 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask) } } +static void __mask_cache_destroy(struct mask_cache *mc) +{ + if (mc->mask_cache) + free_percpu(mc->mask_cache); + kfree(mc); +} + +static void mask_cache_rcu_cb(struct rcu_head *rcu) +{ + struct mask_cache *mc = container_of(rcu, struct mask_cache, rcu); + + __mask_cache_destroy(mc); +} + +static struct mask_cache *tbl_mask_cache_alloc(u32 size) +{ + struct mask_cache_entry __percpu *cache = NULL; + struct mask_cache *new; + + /* Only allow size to be 0, or a power of 2, and does not exceed +* percpu allocation size. +*/ + if ((!is_power_of_2(size) && size != 0) || + (size * sizeof(struct mask_cache_entry)) > PCPU_MIN_UNIT_SIZE) + return NULL; + + new = kzalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return NULL; + + new->cache_size = size; + if (new->cache_size > 0) { + cache = __alloc_percpu(array_size(sizeof(struct mask_cache_entry), + new->cache_size), + __alignof__(struct mask_cache_entry)); + if (!cache) { + kfree(new); + return NULL; + } +
Re: [PATCH RFC net-next] net: phy: add Marvell PHY PTP support
On Wed, Jul 29, 2020 at 06:19:32AM -0700, Richard Cochran wrote: > On Wed, Jul 29, 2020 at 11:58:07AM +0100, Russell King - ARM Linux admin > wrote: > > How do we deal with this situation - from what I can see from the > > ethtool API, we have to make a choice about which to use. How do we > > make that choice? > > Unfortunately the stack does not implement simultaneous MAC + PHY time > stamping. If your board has both, then you make the choice to use the > PHY by selecting NETWORK_PHY_TIMESTAMPING at kernel compile time. Which is more or less what I said in my email. However, the important question about how to select between the two, which is really what I'm after, has not been addressed. > (Also some MAC drivers do not defer to the PHY properly. Sometimes > you can work around that by de-selecting the MAC's PTP function in the > Kconfig if possible, but otherwise you need to patch the MAC driver.) ... which really doesn't work if you have a board where only some network interfaces have a PHY with PTP support, but all have PTP support in the MAC. If all MACs or the majority of MACs use a common PTP clock, it seems to me that you would want to use the MACs rather than the PHY, especially if the PHY doesn't offer as good a quality PTP clock as is available from the MAC. Randomly patching the kernel is out of the question, for arm based systems we want one kernel that works correctly across as many platforms as possible, and Kconfig choices to set platform specific details are basically unacceptable, let alone patching the kernel to make those decisions. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
HFSC latency control
If this is not the correct place to ask these questions, please let me know if you know about any forum, mailing list, IRC channel or wherever HFSC questions could be asked and answered. I have read some HFSC theory and some old messages in the archive of this mailing list. I am testing HFSC in my lab. Traffic is correctly shaped but I do not get the expected results regarding latency. tc qdisc add dev eth0 root handle 1: hfsc default 90 tc class add dev eth0 parent 1:0 classid 1:1 hfsc ls m2 100kbps ul m2 100kbps tc class add dev eth0 parent 1:1 classid 1:10 hfsc rt m1 50kbps d 100 m2 10kbps tc class add dev eth0 parent 1:1 classid 1:20 hfsc ls m2 70kbps tc class add dev eth0 parent 1:1 classid 1:90 hfsc ls m2 10kbps tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip src 192.168.10.0/24 flowid 1:10 tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip src 192.168.20.0/24 flowid 1:20 I expected to have the lowest latency on packets from 192.168.10.0/24, however they get much more delayed than any other traffic. What am I doing wrong? I would like to have one example where I can see HFSC controls latency. I have seen lots of theory but very few examples, and when I test them in my lab I don't see the expected results regarding latency. I generate traffic with Ostinato and test latency with ping and qperf.
Re: PROBLEM: (DSA/Microchip): 802.1Q-Header lost on KSZ9477-DSA ingress without bridge
On 7/28/2020 11:05 PM, Gaube, Marvin (THSE-TL1) wrote: > Summary: 802.1Q-Header lost on KSZ9477-DSA ingress without bridge > Keywords: networking, dsa, microchip, 802.1q, vlan > Full description: > > Hello, > we're trying to get 802.1Q-Tagged Ethernet Frames through an KSZ9477 > DSA-enabled switch without creating a bridge on the kernel side. Does it work if you have a bridge that is VLAN aware though? If it does, this would suggest that the default VLAN behavior without a bridge is too restrictive and needs changing. > Following setup: > Switchport 1 <-- KSZ9477 --> eth1 (CPU-Port) <---> lan1 This representation is confusing, is switchport 1 a network device or is this meant to be physical switch port number of 1 of the KSZ9477? > > No bridge is configured, only the interface directly. Untagged packets are > working without problems. The Switch uses the ksz9477-DSA-Driver with > Tail-Tagging ("DSA_TAG_PROTO_KSZ9477"). > When sending packets with 802.1Q-Header (tagged VLAN) into the Switchport, I > see them including the 802.1Q-Header on eth1. > They also appear on lan1, but with the 802.1Q-Header missing. > When I create an VLAN-Interface over lan1 (e.g. lan1.21), nothing arrives > there. > The other way around, everything works fine: Packets transmitted into lan1.21 > are appearing in 802.1Q-VLAN 21 on the Switchport 1. > > I assume that is not the intended behavior. > I haven't found an obvious reason for this behavior yet, but I suspect the > VLAN-Header gets stripped of anywhere around "dsa_switch_rcv" in > net/dsa/dsa.c or "ksz9477_rcv" in net/dsa/tag_ksz.c. Not sure how though, ksz9477_rcv() only removes the trail tag, this should leave any header intact. It seems to me that the switch is incorrectly configured and is not VLAN aware at all, nor passing VLAN tagged frames through on ingress to CPU when it should. -- Florian
Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function
On Wed, Jul 29, 2020 at 12:29:08PM +0200, Kurt Kanzenbach wrote: > I'll test it and send v3. But before, I've got another question that > somebody might have an answer to: > > The ptp v1 code always does locate the message type at > > msgtype = data + offset + OFF_PTP_CONTROL > > OFF_PTP_CONTROL is 32. However, looking at the ptp v1 header, the > message type is located at offset 20. What am I missing here? My source back in the day was the John Eidson book. In Appendix A it claims Table A.1. Common PTP message fields Field namePurpose messageType Identifies message as event or general control Indicates the message type, e.g., Sync So I think the code is correct. Thanks, Richard
[PATCH 1/5] seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
__SEQ_LOCKDEP() is an expression gate for the seqcount_LOCKNAME_t::lock member. Rename it to be about the member, not the gate condition. Later (PREEMPT_RT) patches will make the member available for !LOCKDEP configs. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/seqlock.h | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -133,20 +133,20 @@ static inline void seqcount_lockdep_read */ #ifdef CONFIG_LOCKDEP -#define __SEQ_LOCKDEP(expr)expr +#define __SEQ_LOCK(expr) expr #else -#define __SEQ_LOCKDEP(expr) +#define __SEQ_LOCK(expr) #endif #define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) { \ .seqcount = SEQCNT_ZERO(seq_name.seqcount), \ - __SEQ_LOCKDEP(.lock = (assoc_lock)) \ + __SEQ_LOCK(.lock= (assoc_lock)) \ } #define seqcount_locktype_init(s, assoc_lock) \ do { \ seqcount_init(&(s)->seqcount); \ - __SEQ_LOCKDEP((s)->lock = (assoc_lock));\ + __SEQ_LOCK((s)->lock = (assoc_lock)); \ } while (0) /** @@ -161,7 +161,7 @@ do { \ */ typedef struct seqcount_spinlock { seqcount_t seqcount; - __SEQ_LOCKDEP(spinlock_t*lock); + __SEQ_LOCK(spinlock_t *lock); } seqcount_spinlock_t; /** @@ -192,7 +192,7 @@ typedef struct seqcount_spinlock { */ typedef struct seqcount_raw_spinlock { seqcount_t seqcount; - __SEQ_LOCKDEP(raw_spinlock_t*lock); + __SEQ_LOCK(raw_spinlock_t *lock); } seqcount_raw_spinlock_t; /** @@ -223,7 +223,7 @@ typedef struct seqcount_raw_spinlock { */ typedef struct seqcount_rwlock { seqcount_t seqcount; - __SEQ_LOCKDEP(rwlock_t *lock); + __SEQ_LOCK(rwlock_t *lock); } seqcount_rwlock_t; /** @@ -257,7 +257,7 @@ typedef struct seqcount_rwlock { */ typedef struct seqcount_mutex { seqcount_t seqcount; - __SEQ_LOCKDEP(struct mutex *lock); + __SEQ_LOCK(struct mutex *lock); } seqcount_mutex_t; /** @@ -291,7 +291,7 @@ typedef struct seqcount_mutex { */ typedef struct seqcount_ww_mutex { seqcount_t seqcount; - __SEQ_LOCKDEP(struct ww_mutex *lock); + __SEQ_LOCK(struct ww_mutex *lock); } seqcount_ww_mutex_t; /** @@ -329,7 +329,7 @@ __seqcount_##locktype##_preemptible(seqc static inline void \ __seqcount_##locktype##_assert(seqcount_##locktype##_t *s) \ { \ - __SEQ_LOCKDEP(lockdep_assert_held(lockmember)); \ + __SEQ_LOCK(lockdep_assert_held(lockmember));\ } /*
[PATCH 3/5] seqlock: Fold seqcount_LOCKNAME_init() definition
Manual repetition is boring and error prone. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/seqlock.h | 61 +++- 1 file changed, 14 insertions(+), 47 deletions(-) --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -143,12 +143,6 @@ static inline void seqcount_lockdep_read __SEQ_LOCK(.lock= (assoc_lock)) \ } -#define seqcount_locktype_init(s, assoc_lock) \ -do { \ - seqcount_init(&(s)->seqcount); \ - __SEQ_LOCK((s)->lock = (assoc_lock)); \ -} while (0) - /** * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t * @name: Name of the seqcount_spinlock_t instance @@ -158,14 +152,6 @@ do { \ SEQCOUNT_LOCKTYPE_ZERO(name, lock) /** - * seqcount_spinlock_init - runtime initializer for seqcount_spinlock_t - * @s: Pointer to the seqcount_spinlock_t instance - * @lock: Pointer to the associated spinlock - */ -#define seqcount_spinlock_init(s, lock) \ - seqcount_locktype_init(s, lock) - -/** * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t * @name: Name of the seqcount_raw_spinlock_t instance * @lock: Pointer to the associated raw_spinlock @@ -174,14 +160,6 @@ do { \ SEQCOUNT_LOCKTYPE_ZERO(name, lock) /** - * seqcount_raw_spinlock_init - runtime initializer for seqcount_raw_spinlock_t - * @s: Pointer to the seqcount_raw_spinlock_t instance - * @lock: Pointer to the associated raw_spinlock - */ -#define seqcount_raw_spinlock_init(s, lock)\ - seqcount_locktype_init(s, lock) - -/** * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t * @name: Name of the seqcount_rwlock_t instance * @lock: Pointer to the associated rwlock @@ -190,14 +168,6 @@ do { \ SEQCOUNT_LOCKTYPE_ZERO(name, lock) /** - * seqcount_rwlock_init - runtime initializer for seqcount_rwlock_t - * @s: Pointer to the seqcount_rwlock_t instance - * @lock: Pointer to the associated rwlock - */ -#define seqcount_rwlock_init(s, lock) \ - seqcount_locktype_init(s, lock) - -/** * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t * @name: Name of the seqcount_mutex_t instance * @lock: Pointer to the associated mutex @@ -206,14 +176,6 @@ do { \ SEQCOUNT_LOCKTYPE_ZERO(name, lock) /** - * seqcount_mutex_init - runtime initializer for seqcount_mutex_t - * @s: Pointer to the seqcount_mutex_t instance - * @lock: Pointer to the associated mutex - */ -#define seqcount_mutex_init(s, lock) \ - seqcount_locktype_init(s, lock) - -/** * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t * @name: Name of the seqcount_ww_mutex_t instance * @lock: Pointer to the associated ww_mutex @@ -222,15 +184,7 @@ do { \ SEQCOUNT_LOCKTYPE_ZERO(name, lock) /** - * seqcount_ww_mutex_init - runtime initializer for seqcount_ww_mutex_t - * @s: Pointer to the seqcount_ww_mutex_t instance - * @lock: Pointer to the associated ww_mutex - */ -#define seqcount_ww_mutex_init(s, lock) \ - seqcount_locktype_init(s, lock) - -/** - * typedef seqcount_LOCKNAME_t - sequence counter with spinlock associated + * typedef seqcount_LOCKNAME_t - sequence counter with LOCKTYPR associated * @seqcount: The real sequence counter * @lock: Pointer to the associated spinlock * @@ -240,6 +194,12 @@ do { \ * that the write side critical section is properly serialized. */ +/** + * seqcount_LOCKNAME_init() - runtime initializer for seqcount_LOCKNAME_t + * @s: Pointer to the seqcount_LOCKNAME_t instance + * @lock: Pointer to the associated LOCKTYPE + */ + /* * SEQCOUNT_LOCKTYPE() - Instantiate seqcount_LOCKNAME_t and helpers * @locktype: actual typename @@ -253,6 +213,13 @@ typedef struct seqcount_##lockname { __SEQ_LOCK(locktype *lock); \ } seqcount_##lockname##_t; \ \ +static __always_inline void\ +seqcount_##lockname##_init(se
[PATCH 0/5] seqlock: Cleanups
Hi, These are some minor cleanups that go on top of darwi's seqlock patches: https://lkml.kernel.org/r/20200720155530.1173732-1-a.darw...@linutronix.de It's mostly trimming excessive manual repetition and a few naming niggles. The series has been exposed to 0-day for a while now, so I'm going to push the lot out to tip/locking/core. [ 0day found a Sparse bug in it's _Generic() implementation that has since been fixed by Luc ] --- seqlock.h | 292 +- 1 file changed, 84 insertions(+), 208 deletions(-)
[PATCH 5/5] seqcount: More consistent seqprop names
Attempt uniformity and brevity. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/seqlock.h | 52 1 file changed, 26 insertions(+), 26 deletions(-) --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -247,9 +247,9 @@ SEQCOUNT_LOCKTYPE(struct ww_mutex, ww_mu __seqprop_case((s), mutex, prop), \ __seqprop_case((s), ww_mutex, prop)) -#define __to_seqcount_t(s) __seqprop(s, ptr) -#define __associated_lock_exists_and_is_preemptible(s) __seqprop(s, preemptible) -#define __assert_write_section_is_protected(s) __seqprop(s, assert) +#define __seqcount_ptr(s) __seqprop(s, ptr) +#define __seqcount_lock_preemptible(s) __seqprop(s, preemptible) +#define __seqcount_assert_lock_held(s) __seqprop(s, assert) /** * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier @@ -266,7 +266,7 @@ SEQCOUNT_LOCKTYPE(struct ww_mutex, ww_mu * Return: count to be passed to read_seqcount_retry() */ #define __read_seqcount_begin(s) \ - __read_seqcount_t_begin(__to_seqcount_t(s)) + __read_seqcount_t_begin(__seqcount_ptr(s)) static inline unsigned __read_seqcount_t_begin(const seqcount_t *s) { @@ -289,7 +289,7 @@ static inline unsigned __read_seqcount_t * Return: count to be passed to read_seqcount_retry() */ #define raw_read_seqcount_begin(s) \ - raw_read_seqcount_t_begin(__to_seqcount_t(s)) + raw_read_seqcount_t_begin(__seqcount_ptr(s)) static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s) { @@ -305,7 +305,7 @@ static inline unsigned raw_read_seqcount * Return: count to be passed to read_seqcount_retry() */ #define read_seqcount_begin(s) \ - read_seqcount_t_begin(__to_seqcount_t(s)) + read_seqcount_t_begin(__seqcount_ptr(s)) static inline unsigned read_seqcount_t_begin(const seqcount_t *s) { @@ -325,7 +325,7 @@ static inline unsigned read_seqcount_t_b * Return: count to be passed to read_seqcount_retry() */ #define raw_read_seqcount(s) \ - raw_read_seqcount_t(__to_seqcount_t(s)) + raw_read_seqcount_t(__seqcount_ptr(s)) static inline unsigned raw_read_seqcount_t(const seqcount_t *s) { @@ -353,7 +353,7 @@ static inline unsigned raw_read_seqcount * Return: count to be passed to read_seqcount_retry() */ #define raw_seqcount_begin(s) \ - raw_seqcount_t_begin(__to_seqcount_t(s)) + raw_seqcount_t_begin(__seqcount_ptr(s)) static inline unsigned raw_seqcount_t_begin(const seqcount_t *s) { @@ -380,7 +380,7 @@ static inline unsigned raw_seqcount_t_be * Return: true if a read section retry is required, else false */ #define __read_seqcount_retry(s, start) \ - __read_seqcount_t_retry(__to_seqcount_t(s), start) + __read_seqcount_t_retry(__seqcount_ptr(s), start) static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start) { @@ -400,7 +400,7 @@ static inline int __read_seqcount_t_retr * Return: true if a read section retry is required, else false */ #define read_seqcount_retry(s, start) \ - read_seqcount_t_retry(__to_seqcount_t(s), start) + read_seqcount_t_retry(__seqcount_ptr(s), start) static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start) { @@ -414,10 +414,10 @@ static inline int read_seqcount_t_retry( */ #define raw_write_seqcount_begin(s)\ do { \ - if (__associated_lock_exists_and_is_preemptible(s)) \ + if (__seqcount_lock_preemptible(s)) \ preempt_disable(); \ \ - raw_write_seqcount_t_begin(__to_seqcount_t(s)); \ + raw_write_seqcount_t_begin(__seqcount_ptr(s)); \ } while (0) static inline void raw_write_seqcount_t_begin(seqcount_t *s) @@ -433,9 +433,9 @@ static inline void raw_write_seqcount_t_ */ #define raw_write_seqcount_end(s) \ do { \ - raw_write_seqcount_t_end(__to_seqcount_t(s)); \ + raw_write_seqcount_t_end(__seqcount_ptr(s));\ \ - if (__associated_lock_exists_and_is_preemptible(s)) \ + if (__seqcount_lock_preemptible(s)) \ preempt_enable();
[PATCH 2/5] seqlock: Fold seqcount_LOCKNAME_t definition
Manual repetition is boring and error prone. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/seqlock.h | 140 +--- 1 file changed, 38 insertions(+), 102 deletions(-) --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -150,21 +150,6 @@ do { \ } while (0) /** - * typedef seqcount_spinlock_t - sequence counter with spinlock associated - * @seqcount: The real sequence counter - * @lock: Pointer to the associated spinlock - * - * A plain sequence counter with external writer synchronization by a - * spinlock. The spinlock is associated to the sequence count in the - * static initializer or init function. This enables lockdep to validate - * that the write side critical section is properly serialized. - */ -typedef struct seqcount_spinlock { - seqcount_t seqcount; - __SEQ_LOCK(spinlock_t *lock); -} seqcount_spinlock_t; - -/** * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t * @name: Name of the seqcount_spinlock_t instance * @lock: Pointer to the associated spinlock @@ -181,21 +166,6 @@ typedef struct seqcount_spinlock { seqcount_locktype_init(s, lock) /** - * typedef seqcount_raw_spinlock_t - sequence count with raw spinlock associated - * @seqcount: The real sequence counter - * @lock: Pointer to the associated raw spinlock - * - * A plain sequence counter with external writer synchronization by a - * raw spinlock. The raw spinlock is associated to the sequence count in - * the static initializer or init function. This enables lockdep to - * validate that the write side critical section is properly serialized. - */ -typedef struct seqcount_raw_spinlock { - seqcount_t seqcount; - __SEQ_LOCK(raw_spinlock_t *lock); -} seqcount_raw_spinlock_t; - -/** * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t * @name: Name of the seqcount_raw_spinlock_t instance * @lock: Pointer to the associated raw_spinlock @@ -212,21 +182,6 @@ typedef struct seqcount_raw_spinlock { seqcount_locktype_init(s, lock) /** - * typedef seqcount_rwlock_t - sequence count with rwlock associated - * @seqcount: The real sequence counter - * @lock: Pointer to the associated rwlock - * - * A plain sequence counter with external writer synchronization by a - * rwlock. The rwlock is associated to the sequence count in the static - * initializer or init function. This enables lockdep to validate that - * the write side critical section is properly serialized. - */ -typedef struct seqcount_rwlock { - seqcount_t seqcount; - __SEQ_LOCK(rwlock_t *lock); -} seqcount_rwlock_t; - -/** * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t * @name: Name of the seqcount_rwlock_t instance * @lock: Pointer to the associated rwlock @@ -243,24 +198,6 @@ typedef struct seqcount_rwlock { seqcount_locktype_init(s, lock) /** - * typedef seqcount_mutex_t - sequence count with mutex associated - * @seqcount: The real sequence counter - * @lock: Pointer to the associated mutex - * - * A plain sequence counter with external writer synchronization by a - * mutex. The mutex is associated to the sequence counter in the static - * initializer or init function. This enables lockdep to validate that - * the write side critical section is properly serialized. - * - * The write side API functions write_seqcount_begin()/end() automatically - * disable and enable preemption when used with seqcount_mutex_t. - */ -typedef struct seqcount_mutex { - seqcount_t seqcount; - __SEQ_LOCK(struct mutex *lock); -} seqcount_mutex_t; - -/** * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t * @name: Name of the seqcount_mutex_t instance * @lock: Pointer to the associated mutex @@ -277,24 +214,6 @@ typedef struct seqcount_mutex { seqcount_locktype_init(s, lock) /** - * typedef seqcount_ww_mutex_t - sequence count with ww_mutex associated - * @seqcount: The real sequence counter - * @lock: Pointer to the associated ww_mutex - * - * A plain sequence counter with external writer synchronization by a - * ww_mutex. The ww_mutex is associated to the sequence counter in the static - * initializer or init function. This enables lockdep to validate that - * the write side critical section is properly serialized. - * - * The write side API functions write_seqcount_begin()/end() automatically - * disable and enable preemption when used with seqcount_ww_mutex_t. - */ -typedef struct seqcount_ww_mutex { - seqcount_t seqcount; - __SEQ_LOCK(struct ww_mutex *lock); -} seqcount_ww_mutex_t; - -/** * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t * @name: Name of the seqcount_ww_mutex_t instance * @lock: Pointer to the associated ww_
[PATCH 4/5] seqcount: Compress SEQCNT_LOCKNAME_ZERO()
Less is more. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/seqlock.h | 63 +--- 1 file changed, 18 insertions(+), 45 deletions(-) --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -138,51 +138,6 @@ static inline void seqcount_lockdep_read #define __SEQ_LOCK(expr) #endif -#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) { \ - .seqcount = SEQCNT_ZERO(seq_name.seqcount), \ - __SEQ_LOCK(.lock= (assoc_lock)) \ -} - -/** - * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t - * @name: Name of the seqcount_spinlock_t instance - * @lock: Pointer to the associated spinlock - */ -#define SEQCNT_SPINLOCK_ZERO(name, lock) \ - SEQCOUNT_LOCKTYPE_ZERO(name, lock) - -/** - * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t - * @name: Name of the seqcount_raw_spinlock_t instance - * @lock: Pointer to the associated raw_spinlock - */ -#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock) \ - SEQCOUNT_LOCKTYPE_ZERO(name, lock) - -/** - * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t - * @name: Name of the seqcount_rwlock_t instance - * @lock: Pointer to the associated rwlock - */ -#define SEQCNT_RWLOCK_ZERO(name, lock) \ - SEQCOUNT_LOCKTYPE_ZERO(name, lock) - -/** - * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t - * @name: Name of the seqcount_mutex_t instance - * @lock: Pointer to the associated mutex - */ -#define SEQCNT_MUTEX_ZERO(name, lock) \ - SEQCOUNT_LOCKTYPE_ZERO(name, lock) - -/** - * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t - * @name: Name of the seqcount_ww_mutex_t instance - * @lock: Pointer to the associated ww_mutex - */ -#define SEQCNT_WW_MUTEX_ZERO(name, lock) \ - SEQCOUNT_LOCKTYPE_ZERO(name, lock) - /** * typedef seqcount_LOCKNAME_t - sequence counter with LOCKTYPR associated * @seqcount: The real sequence counter @@ -263,6 +218,24 @@ SEQCOUNT_LOCKTYPE(rwlock_t,rwlock, fa SEQCOUNT_LOCKTYPE(struct mutex,mutex, true, s->lock) SEQCOUNT_LOCKTYPE(struct ww_mutex, ww_mutex, true, &s->lock->base) +/** + * SEQCNT_LOCKNAME_ZERO - static initializer for seqcount_LOCKNAME_t + * @name: Name of the seqcount_LOCKNAME_t instance + * @lock: Pointer to the associated LOCKTYPE + */ + +#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) { \ + .seqcount = SEQCNT_ZERO(seq_name.seqcount), \ + __SEQ_LOCK(.lock= (assoc_lock)) \ +} + +#define SEQCNT_SPINLOCK_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock) +#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock) +#define SEQCNT_RWLOCK_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock) +#define SEQCNT_MUTEX_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock) +#define SEQCNT_WW_MUTEX_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock) + + #define __seqprop_case(s, lockname, prop) \ seqcount_##lockname##_t: __seqcount_##lockname##_##prop((void *)(s))
Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function
Kurt Kanzenbach writes: > On Wed Jul 29 2020, Russell King - ARM Linux admin wrote: >> Would it make more sense to do: >> >> u8 *data = skb_mac_header(skb); >> u8 *ptr = data; >> >> if (type & PTP_CLASS_VLAN) >> ptr += VLAN_HLEN; >> >> switch (type & PTP_CLASS_PMASK) { >> case PTP_CLASS_IPV4: >> ptr += IPV4_HLEN(ptr) + UDP_HLEN; >> break; >> >> case PTP_CLASS_IPV6: >> ptr += IP6_HLEN + UDP_HLEN; >> break; >> >> case PTP_CLASS_L2: >> break; >> >> default: >> return NULL; >> } >> >> ptr += ETH_HLEN; >> >> if (ptr + 34 > skb->data + skb->len) >> return NULL; >> >> return ptr; >> >> in other words, compare pointers, so that whether skb_push() etc has >> been used on the skb is irrelevant? I like this! > The ptp v1 code always does locate the message type at > > msgtype = data + offset + OFF_PTP_CONTROL > > OFF_PTP_CONTROL is 32. However, looking at the ptp v1 header, the > message type is located at offset 20. What am I missing here? 0x20 == 32? I see it at offset 32 in IEEE 1588.
Re: [PATCH] bpf: Add bpf_ktime_get_real_ns
On 7/28/20 11:15 PM, Andrii Nakryiko wrote: > On Tue, Jul 28, 2020 at 1:57 PM David Ahern wrote: >> >> On 7/28/20 12:28 PM, Andrii Nakryiko wrote: >>> In some, yes, which also means that in some other they can't. So I'm >>> still worried about misuses of REALCLOCK, within (internal daemons >>> within the company) our outside (BCC tools and alike) of data centers. >>> Especially if people will start using it to measure elapsed time >>> between events. I'd rather not have to explain over and over again >>> that REALCLOCK is not for measuring passage of time. >> >> Why is documenting the type of clock and its limitations not sufficient? >> Users are going to make mistakes and use of gettimeofday to measure time >> differences is a common one for userspace code. That should not define >> or limit the ability to correctly and most directly do something in bpf. >> >> I have a patch to export local_clock as bpf_ktime_get_fast_ns. It too >> can be abused given that it has limitations (can not be used across CPUs >> and does not correlate to any exported clock), but it too has important >> use cases (twice as fast as bpf_ktime_get_ns and useful for per-cpu >> delta-time needs). >> >> Users have to know what they are doing; making mistakes is part of >> learning. Proper documentation is all you can do. > > I don't believe that's all we can do. Designing APIs that are less > error-prone is at least one way to go about that. One can find plenty > of examples where well-documented and standardized APIs are > nevertheless misused regularly. Also, "users read and follow > documentation" doesn't match my experience, unfortunately. > This API is about exposing a standard, well-known clock source to bpf programs. Your argument is no because some users might use it incorrectly, and I think that argument is wrong. Don't make people who know what they are doing jump through hoops because a novice might make a mistake.
Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
OK, we have a mode of operation that does not require driver intervention to manipulate the event queues so I think we're ok with this design. On Wed, Jul 29, 2020 at 06:19:52PM +0800, Jason Wang wrote: > > On 2020/7/29 下午5:55, Eli Cohen wrote: > >On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote: > >>On 2020/7/28 下午5:04, Eli Cohen wrote: > >>>On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: > +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) > +{ > + struct vhost_virtqueue *vq = &v->vqs[qid]; > + const struct vdpa_config_ops *ops = v->vdpa->config; > + struct vdpa_device *vdpa = v->vdpa; > + int ret, irq; > + > + spin_lock(&vq->call_ctx.ctx_lock); > + irq = ops->get_vq_irq(vdpa, qid); > + if (!vq->call_ctx.ctx || irq == -EINVAL) { > + spin_unlock(&vq->call_ctx.ctx_lock); > + return; > + } > + > >>>If I understand correctly, this will cause these IRQs to be forwarded > >>>directly to the VCPU, e.g. will be handled by the guest/qemu. > >> > >>Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. > >> > >So, usually the network driver knows how to handle interrups for its > >devices. I assume the virtio_net driver at the guest has some default > >processing but what if the underlying hardware device (such as the case > >of vdpa) needs to take some actions? > > > Virtio splits the bus operations out of device operations. So did > the driver. > > The virtio-net driver depends on a transport driver to talk to the > real device. Usually PCI is used as the transport for the device. In > this case virtio-pci driver is in charge of dealing with irq > allocation/free/configuration and it needs to co-operate with > platform specific irqchip (virtualized by KVM) to finish the work > like irq acknowledge etc. E.g for x86, the irq offloading can only > work when there's a hardware support of virtual irqchip (APICv) then > all stuffs could be done without vmexits. > > So no vendor specific part since the device and transport are all standard. > > > > Is there an option to do bounce the > >interrupt back to the vendor specific driver in the host so it can take > >these actions? > > > Currently not, but even if we can do this, I'm afraid we will lose > the performance advantage of irq bypassing. > > > > > >>>Does this mean that the host will not handle this interrupt? How does it > >>>work in case on level triggered interrupts? > >> > >>There's no guarantee that the KVM arch code can make sure the irq > >>bypass work for any type of irq. So if they the irq will still need > >>to be handled by host first. This means we should keep the host > >>interrupt handler as a slowpath (fallback). > >> > >>>In the case of ConnectX, I need to execute some code to acknowledge the > >>>interrupt. > >> > >>This turns out to be hard for irq bypassing to work. Is it because > >>the irq is shared or what kind of ack you need to do? > >I have an EQ which is a queue for events comming from the hardware. This > >EQ can created so it reports only completion events but I still need to > >execute code that roughly tells the device that I saw these event > >records and then arm it again so it can report more interrupts (e.g if > >more packets are received or sent). This is device specific code. > > > Any chance that the hardware can use MSI (which is not the case here)? > > Thanks > > > >>Thanks > >> > >> > >>>Can you explain how this should be done? > >>> >
Re: [PATCH v2 4/9] mlxsw: spectrum_ptp: Use generic helper function
On Wed Jul 29 2020, Richard Cochran wrote: > On Wed, Jul 29, 2020 at 12:29:08PM +0200, Kurt Kanzenbach wrote: >> I'll test it and send v3. But before, I've got another question that >> somebody might have an answer to: >> >> The ptp v1 code always does locate the message type at >> >> msgtype = data + offset + OFF_PTP_CONTROL >> >> OFF_PTP_CONTROL is 32. However, looking at the ptp v1 header, the >> message type is located at offset 20. What am I missing here? > > > My source back in the day was the John Eidson book. In Appendix A it claims > > >Table A.1. Common PTP message fields > >Field namePurpose > >messageType Identifies message as event or general >control Indicates the message type, e.g., Sync OK. That was the missing piece. I'll adjust patch 2 accordingly. Thanks a lot. Thanks, Kurt signature.asc Description: PGP signature
Re: [PATCH v4 bpf 2/2] selftests/bpf: extend map-in-map selftest to detect memory leaks
On Wed, Jul 29, 2020 at 06:09 AM CEST, Andrii Nakryiko wrote: > Add test validating that all inner maps are released properly after skeleton > is destroyed. To ensure determinism, trigger kernel-side synchronize_rcu() > before checking map existence by their IDs. > > Acked-by: Song Liu > Signed-off-by: Andrii Nakryiko > --- > .../selftests/bpf/prog_tests/btf_map_in_map.c | 124 -- > 1 file changed, 110 insertions(+), 14 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c > b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c > index f7ee8fa377ad..f6eee3fb933c 100644 > --- a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c > +++ b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c > @@ -5,10 +5,60 @@ > > #include "test_btf_map_in_map.skel.h" > > +static int duration; > + > +static __u32 bpf_map_id(struct bpf_map *map) > +{ > + struct bpf_map_info info; > + __u32 info_len = sizeof(info); > + int err; > + > + memset(&info, 0, info_len); > + err = bpf_obj_get_info_by_fd(bpf_map__fd(map), &info, &info_len); > + if (err) > + return 0; > + return info.id; > +} > + > +/* > + * Trigger synchronize_cpu() in kernel. Nit: synchronize_*r*cu(). > + * > + * ARRAY_OF_MAPS/HASH_OF_MAPS lookup/update operations trigger > + * synchronize_rcu(), if looking up/updating non-NULL element. Use this fact > + * to trigger synchronize_cpu(): create map-in-map, create a trivial ARRAY > + * map, update map-in-map with ARRAY inner map. Then cleanup. At the end, at > + * least one synchronize_rcu() would be called. > + */ That's a cool trick. I'm a bit confused by "looking up/updating non-NULL element". It looks like you're updating an element that is NULL/unset in the code below. What am I missing? > +static int kern_sync_rcu(void) > +{ > + int inner_map_fd, outer_map_fd, err, zero = 0; > + > + inner_map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 4, 1, 0); > + if (CHECK(inner_map_fd < 0, "inner_map_create", "failed %d\n", -errno)) > + return -1; > + > + outer_map_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS, NULL, > + sizeof(int), inner_map_fd, 1, 0); > + if (CHECK(outer_map_fd < 0, "outer_map_create", "failed %d\n", -errno)) > { > + close(inner_map_fd); > + return -1; > + } > + > + err = bpf_map_update_elem(outer_map_fd, &zero, &inner_map_fd, 0); > + if (err) > + err = -errno; > + CHECK(err, "outer_map_update", "failed %d\n", err); > + close(inner_map_fd); > + close(outer_map_fd); > + return err; > +} > + > void test_btf_map_in_map(void) > { > - int duration = 0, err, key = 0, val; > - struct test_btf_map_in_map* skel; > + int err, key = 0, val, i; > + struct test_btf_map_in_map *skel; > + int outer_arr_fd, outer_hash_fd; > + int fd, map1_fd, map2_fd, map1_id, map2_id; > > skel = test_btf_map_in_map__open_and_load(); > if (CHECK(!skel, "skel_open", "failed to open&load skeleton\n")) > @@ -18,32 +68,78 @@ void test_btf_map_in_map(void) > if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err)) > goto cleanup; > > + map1_fd = bpf_map__fd(skel->maps.inner_map1); > + map2_fd = bpf_map__fd(skel->maps.inner_map2); > + outer_arr_fd = bpf_map__fd(skel->maps.outer_arr); > + outer_hash_fd = bpf_map__fd(skel->maps.outer_hash); > + > /* inner1 = input, inner2 = input + 1 */ > - val = bpf_map__fd(skel->maps.inner_map1); > - bpf_map_update_elem(bpf_map__fd(skel->maps.outer_arr), &key, &val, 0); > - val = bpf_map__fd(skel->maps.inner_map2); > - bpf_map_update_elem(bpf_map__fd(skel->maps.outer_hash), &key, &val, 0); > + map1_fd = bpf_map__fd(skel->maps.inner_map1); > + bpf_map_update_elem(outer_arr_fd, &key, &map1_fd, 0); > + map2_fd = bpf_map__fd(skel->maps.inner_map2); > + bpf_map_update_elem(outer_hash_fd, &key, &map2_fd, 0); > skel->bss->input = 1; > usleep(1); > > - bpf_map_lookup_elem(bpf_map__fd(skel->maps.inner_map1), &key, &val); > + bpf_map_lookup_elem(map1_fd, &key, &val); > CHECK(val != 1, "inner1", "got %d != exp %d\n", val, 1); > - bpf_map_lookup_elem(bpf_map__fd(skel->maps.inner_map2), &key, &val); > + bpf_map_lookup_elem(map2_fd, &key, &val); > CHECK(val != 2, "inner2", "got %d != exp %d\n", val, 2); > > /* inner1 = input + 1, inner2 = input */ > - val = bpf_map__fd(skel->maps.inner_map2); > - bpf_map_update_elem(bpf_map__fd(skel->maps.outer_arr), &key, &val, 0); > - val = bpf_map__fd(skel->maps.inner_map1); > - bpf_map_update_elem(bpf_map__fd(skel->maps.outer_hash), &key, &val, 0); > + bpf_map_update_elem(outer_arr_fd, &key, &map2_fd, 0); > + bpf_map_update_elem(outer_hash_fd, &key, &map1_fd, 0); > skel->bss->input = 3; > usleep(1); > > - bpf_map_lookup_elem(bpf_map__fd(
Re: [PATCH 1/5] seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
On Wed, Jul 29, 2020 at 03:52:50PM +0200, Peter Zijlstra wrote: > __SEQ_LOCKDEP() is an expression gate for the > seqcount_LOCKNAME_t::lock member. Rename it to be about the member, > not the gate condition. > > Later (PREEMPT_RT) patches will make the member available for !LOCKDEP > configs. > > Signed-off-by: Peter Zijlstra (Intel) > --- Acked-by: Ahmed S. Darwish