date:20201014

[PATCH] Bluetooth: Use lock_sock() when acquiring lock in sco_conn_del

2020-10-14 Thread yanfei . xu

From: Yanfei Xu 

Locking slock-AF_BLUETOOTH-BTPROTO_SCO may happen in process context or
BH context. If in process context, we should use lock_sock(). As blow
warning, sco_conn_del() is called in process context, so let's use
lock_sock() instead of bh_lock_sock().


WARNING: inconsistent lock state
5.9.0-rc4-syzkaller #0 Not tainted

inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
syz-executor675/31233 [HC0[0]:SC0[0]:HE1:SE1] takes:
8880a75c50a0 (slock-AF_BLUETOOTH-BTPROTO_SCO){+.?.}-{2:2}, at:
spin_lock include/linux/spinlock.h:354 [inline]
8880a75c50a0 (slock-AF_BLUETOOTH-BTPROTO_SCO){+.?.}-{2:2}, at:
sco_conn_del+0x128/0x270 net/bluetooth/sco.c:176
{IN-SOFTIRQ-W} state was registered at:
  lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:354 [inline]
  sco_sock_timeout+0x24/0x140 net/bluetooth/sco.c:83
  call_timer_fn+0x1ac/0x760 kernel/time/timer.c:1413
  expire_timers kernel/time/timer.c:1458 [inline]
  __run_timers.part.0+0x67c/0xaa0 kernel/time/timer.c:1755
  __run_timers kernel/time/timer.c:1736 [inline]
  run_timer_softirq+0xae/0x1a0 kernel/time/timer.c:1768
  __do_softirq+0x1f7/0xa91 kernel/softirq.c:298
  asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
  __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
  do_softirq_own_stack+0x9d/0xd0 arch/x86/kernel/irq_64.c:77
  invoke_softirq kernel/softirq.c:393 [inline]
  __irq_exit_rcu kernel/softirq.c:423 [inline]
  irq_exit_rcu+0x235/0x280 kernel/softirq.c:435
  sysvec_apic_timer_interrupt+0x51/0xf0 arch/x86/kernel/apic/apic.c:1091
  asm_sysvec_apic_timer_interrupt+0x12/0x20
  arch/x86/include/asm/idtentry.h:581
  unwind_next_frame+0x139a/0x1f90 arch/x86/kernel/unwind_orc.c:607
  arch_stack_walk+0x81/0xf0 arch/x86/kernel/stacktrace.c:25
  stack_trace_save+0x8c/0xc0 kernel/stacktrace.c:123
  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
  kasan_set_track mm/kasan/common.c:56 [inline]
  __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
  slab_post_alloc_hook mm/slab.h:518 [inline]
  slab_alloc mm/slab.c:3312 [inline]
  kmem_cache_alloc+0x13a/0x3a0 mm/slab.c:3482
  __d_alloc+0x2a/0x950 fs/dcache.c:1709
  d_alloc+0x4a/0x230 fs/dcache.c:1788
  d_alloc_parallel+0xe9/0x18e0 fs/dcache.c:2540
  lookup_open.isra.0+0x9ac/0x1350 fs/namei.c:3030
  open_last_lookups fs/namei.c:3177 [inline]
  path_openat+0x96d/0x2730 fs/namei.c:3365
  do_filp_open+0x17e/0x3c0 fs/namei.c:3395
  do_sys_openat2+0x16d/0x420 fs/open.c:1168
  do_sys_open fs/open.c:1184 [inline]
  __do_sys_open fs/open.c:1192 [inline]
  __se_sys_open fs/open.c:1188 [inline]
  __x64_sys_open+0x119/0x1c0 fs/open.c:1188
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
irq event stamp: 853
hardirqs last  enabled at (853): []
__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
hardirqs last  enabled at (853): []
_raw_spin_unlock_irq+0x1f/0x80 kernel/locking/spinlock.c:199
hardirqs last disabled at (852): []
__raw_spin_lock_irq include/linux/spinlock_api_smp.h:126 [inline]
hardirqs last disabled at (852): []
_raw_spin_lock_irq+0xa4/0xd0 kernel/locking/spinlock.c:167
softirqs last  enabled at (0): []
copy_process+0x1a99/0x6920 kernel/fork.c:2018
softirqs last disabled at (0): [<>] 0x0

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
  
lock(slock-AF_BLUETOOTH-BTPROTO_SCO);

 *** DEADLOCK ***

3 locks held by syz-executor675/31233:
 #0: 88809f104f40 (&hdev->req_lock){+.+.}-{3:3}, at:
hci_dev_do_close+0xf5/0x1080 net/bluetooth/hci_core.c:1720
 #1: 88809f104078 (&hdev->lock){+.+.}-{3:3}, at:
hci_dev_do_close+0x253/0x1080 net/bluetooth/hci_core.c:1757
 #2: 8a9188c8 (hci_cb_list_lock){+.+.}-{3:3}, at:
hci_disconn_cfm include/net/bluetooth/hci_core.h:1435 [inline]
 #2: 8a9188c8 (hci_cb_list_lock){+.+.}-{3:3}, at:
hci_conn_hash_flush+0xc7/0x220 net/bluetooth/hci_conn.c:1557

stack backtrace:
CPU: 1 PID: 31233 Comm: syz-executor675 Not tainted 5.9.0-rc4-syzkaller
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 print_usage_bug kernel/locking/lockdep.c:4020 [inline]
 valid_state kernel/locking/lockdep.c:3361 [inline]
 mark_lock_irq kernel/locking/lockdep.c:3560 [inline]
 mark_lock.cold+0x7a/0x7f kernel/locking/lockdep.c:4006
 mark_usage kernel/locking/lockdep.c:3923 [inline]
 __lock_acquire+0x876/0x5570 kernel/locking/lockdep.c:4380
 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]

Re: [PATCH net-next v2 00/12] net: add and use function dev_fetch_sw_netstats for fetching pcpu_sw_netstats

2020-10-14 Thread Leon Romanovsky

On Wed, Oct 14, 2020 at 08:13:47AM +0200, Heiner Kallweit wrote:
> On 14.10.2020 07:42, Leon Romanovsky wrote:
> > On Tue, Oct 13, 2020 at 05:39:51PM -0700, Jakub Kicinski wrote:
> >> On Mon, 12 Oct 2020 10:00:11 +0200 Heiner Kallweit wrote:
> >>> In several places the same code is used to populate rtnl_link_stats64
> >>> fields with data from pcpu_sw_netstats. Therefore factor out this code
> >>> to a new function dev_fetch_sw_netstats().
> >>>
> >>> v2:
> >>> - constify argument netstats
> >>> - don't ignore netstats being NULL or an ERRPTR
> >>> - switch to EXPORT_SYMBOL_GPL
> >>
> >> Applied, thank you!
> >
> > Jakub,
> >
> > Is it possible to make sure that changelogs are not part of the commit
> > messages? We don't store previous revisions in the git repo, so it doesn't
> > give too much to anyone who is looking on git log later. The lore link
> > to the patch is more than enough.
> >
> I remember that once I did it the usual way (changelog below the ---) David
> requested the changelog to be part of the commit message. So obviously he
> sees some benefit in doing so.

Do you have a link? What is the benefit and how can we use it?

Usually such request comes to ensure that commit message is updated with
extra information (explanation) existed in changelog which is missing in
the patch.

Thanks

>
> > 44fa32f008ab ("net: add function dev_fetch_sw_netstats for fetching 
> > pcpu_sw_netstats")
> >
> > Thanks
> >
>

RE: [PATCH 1/1] net: ftgmac100: Fix Aspeed ast2600 TX hang issue

2020-10-14 Thread Dylan Hung

Hi Joel,

> -Original Message-
> From: Joel Stanley [mailto:j...@jms.id.au]
> Sent: Wednesday, October 14, 2020 2:41 PM
> To: Dylan Hung 
> Cc: David S . Miller ; Jakub Kicinski
> ; netdev@vger.kernel.org; Linux Kernel Mailing List
> ; Po-Yu Chuang ;
> linux-aspeed ; OpenBMC Maillist
> ; BMC-SW 
> Subject: Re: [PATCH 1/1] net: ftgmac100: Fix Aspeed ast2600 TX hang issue
> 
> On Wed, 14 Oct 2020 at 06:07, Dylan Hung 
> wrote:
> >
> > The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
> > hang when handling scatter-gather DMA.  Disable the problematic
> > feature by setting MAC register 0x58 bit28 and bit27.
> 
> Hi Dylan,
> 
> What are the symptoms of this issue? We are seeing this on our systems:
> 
> [29376.090637] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:442
> dev_watchdog+0x2f0/0x2f4
> [29376.099898] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0
> timed out
> 

May I know your soc version? This issue happens on ast2600 version A1.  The 
registers to fix this issue are meaningless/reserved on A0 chip, so it is okay 
to set them on either A0 or A1.
I was encountering this issue when I was running the iperf TX test.  The 
symptom is the TX descriptors are consumed, but no complete packet is sent out.

> > Signed-off-by: Dylan Hung 
> 
> This fixes support for the ast2600, so we can put:
> 
> Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle
> property")
> 
> Reviewed-by: Joel Stanley 
> 
> > ---
> >  drivers/net/ethernet/faraday/ftgmac100.c | 5 +
> > drivers/net/ethernet/faraday/ftgmac100.h | 8 
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > b/drivers/net/ethernet/faraday/ftgmac100.c
> > index 87236206366f..00024dd41147 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -1817,6 +1817,11 @@ static int ftgmac100_probe(struct
> platform_device *pdev)
> > priv->rxdes0_edorr_mask = BIT(30);
> > priv->txdes0_edotr_mask = BIT(30);
> > priv->is_aspeed = true;
> > +   /* Disable ast2600 problematic HW arbitration */
> > +   if (of_device_is_compatible(np, "aspeed,ast2600-mac"))
> {
> > +   iowrite32(FTGMAC100_TM_DEFAULT,
> > + priv->base +
> FTGMAC100_OFFSET_TM);
> > +   }
> > } else {
> > priv->rxdes0_edorr_mask = BIT(15);
> > priv->txdes0_edotr_mask = BIT(15); diff --git
> > a/drivers/net/ethernet/faraday/ftgmac100.h
> > b/drivers/net/ethernet/faraday/ftgmac100.h
> > index e5876a3fda91..63b3e02fab16 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.h
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.h
> > @@ -169,6 +169,14 @@
> >  #define FTGMAC100_MACCR_FAST_MODE  (1 << 19)
> >  #define FTGMAC100_MACCR_SW_RST (1 << 31)
> >
> > +/*
> > + * test mode control register
> > + */
> > +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28) #define
> > +FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27)
> > +#define FTGMAC100_TM_DEFAULT
> \
> > +   (FTGMAC100_TM_RQ_TX_VALID_DIS |
> FTGMAC100_TM_RQ_RR_IDLE_PREV)
> 
> Will aspeed issue an updated datasheet with this register documented?
> 
> 
> > +
> >  /*
> >   * PHY control register
> >   */
> > --
> > 2.17.1
> >

Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register

2020-10-14 Thread Pablo Neira Ayuso

On Wed, Oct 14, 2020 at 02:06:28AM +0200, Pablo Neira Ayuso wrote:
> On Fri, Oct 09, 2020 at 10:05:48PM +0200, Florian Westphal wrote:
> > Jozsef Kadlecsik  wrote:
> > > > The "delay unregister" remark was wrt. the "all rules were deleted"
> > > > case, i.e. add a "grace period" rather than acting right away when
> > > > conntrack use count did hit 0.
> > > 
> > > Now I understand it, thanks really. The hooks are removed, so conntrack 
> > > cannot "see" the packets and the entries become stale. 
> > 
> > Yes.
> > 
> > > What is the rationale behind "remove the conntrack hooks when there are 
> > > no 
> > > rule left referring to conntrack"? Performance optimization? But then the 
> > > content of the whole conntrack table could be deleted too... ;-)
> > 
> > Yes, this isn't the case at the moment -- only hooks are removed,
> > entries will eventually time out.
> > 
> > > > Conntrack entries are not removed, only the base hooks get 
> > > > unregistered. 
> > > > This is a problem for tcp window tracking.
> > > > 
> > > > When re-register occurs, kernel is supposed to switch the existing 
> > > > entries to "loose" mode so window tracking won't flag packets as 
> > > > invalid, but apparently this isn't enough to handle keepalive case.
> > > 
> > > "loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
> > > enables/disables picking up already established connections. 
> > > 
> > > nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
> > > for non RST packets.
> > 
> > You are right, mixup on my part.
> > 
> > > But both seems to be modified only via the proc entries.
> > 
> > Yes, we iterate table on re-register and modify the existing entries.
> 
> For iptables-nft, it might be possible to avoid this deregister +
> register ct hooks in the same transaction: Maybe add something like
> nf_ct_netns_get_all() to bump refcounters by one _iff_ they are > 0
> before starting the transaction processing, then call
> nf_ct_netns_put_all() which decrements refcounters and unregister
> hooks if they reach 0.

Hm, scratch that, put_all() would create an imbalance with this
conditional increment.

> The only problem with this approach is that this pulls in the
> conntrack module, to solve that, struct nf_ct_hook in
> net/netfilter/core.c could be used to store the reference to
> ->netns_get_all and ->net_put_all.
> 
> Legacy would still be flawed though.

[PATCH 2/3] dt-bindings: net: bluetooth: Add broadcom BCM4389 support

2020-10-14 Thread Amitesh Chandra

From: Amitesh Chandra 

Add bindings for BCM4389 bluetooth controller.

Signed-off-by: Amitesh Chandra 
---
 Documentation/devicetree/bindings/net/broadcom-bluetooth.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt 
b/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt
index 4fa00e2..ae48e42 100644
--- a/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt
+++ b/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt
@@ -14,6 +14,7 @@ Required properties:
* "brcm,bcm4330-bt"
* "brcm,bcm43438-bt"
* "brcm,bcm4345c5"
+   * "brcm,bcm4389-bt"
 
 Optional properties:
 
-- 
2.7.4

Re: [PATCH net-next v2 00/12] net: add and use function dev_fetch_sw_netstats for fetching pcpu_sw_netstats

2020-10-14 Thread Heiner Kallweit

On 14.10.2020 09:53, Leon Romanovsky wrote:
> On Wed, Oct 14, 2020 at 08:13:47AM +0200, Heiner Kallweit wrote:
>> On 14.10.2020 07:42, Leon Romanovsky wrote:
>>> On Tue, Oct 13, 2020 at 05:39:51PM -0700, Jakub Kicinski wrote:
 On Mon, 12 Oct 2020 10:00:11 +0200 Heiner Kallweit wrote:
> In several places the same code is used to populate rtnl_link_stats64
> fields with data from pcpu_sw_netstats. Therefore factor out this code
> to a new function dev_fetch_sw_netstats().
>
> v2:
> - constify argument netstats
> - don't ignore netstats being NULL or an ERRPTR
> - switch to EXPORT_SYMBOL_GPL

 Applied, thank you!
>>>
>>> Jakub,
>>>
>>> Is it possible to make sure that changelogs are not part of the commit
>>> messages? We don't store previous revisions in the git repo, so it doesn't
>>> give too much to anyone who is looking on git log later. The lore link
>>> to the patch is more than enough.
>>>
>> I remember that once I did it the usual way (changelog below the ---) David
>> requested the changelog to be part of the commit message. So obviously he
>> sees some benefit in doing so.
> 
> Do you have a link? What is the benefit and how can we use it?
> 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1873080.html

> Usually such request comes to ensure that commit message is updated with
> extra information (explanation) existed in changelog which is missing in
> the patch.
> 
> Thanks
> 
>>
>>> 44fa32f008ab ("net: add function dev_fetch_sw_netstats for fetching 
>>> pcpu_sw_netstats")
>>>
>>> Thanks
>>>
>>

Re: [PATCH net-next v2 00/12] net: add and use function dev_fetch_sw_netstats for fetching pcpu_sw_netstats

2020-10-14 Thread Johannes Berg

On Wed, 2020-10-14 at 09:59 +0200, Heiner Kallweit wrote:
> 
> > Do you have a link? What is the benefit and how can we use it?
> > 
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1873080.html

There was also a long discussion a year or so back, starting at

http://lore.kernel.org/r/7b73e1b7-cc34-982d-2a9c-acf62b88d...@linuxfoundation.org

johannes

Re: [PATCH net v2] net: fix pos incrementment in ipv6_route_seq_next

2020-10-14 Thread Eric Dumazet




On 10/13/20 8:31 PM, Yonghong Song wrote:
> Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> tried to fix the issue where seq_file pos is not increased
> if a NULL element is returned with seq_ops->next(). See bug
>   https://bugzilla.kernel.org/show_bug.cgi?id=206283
> The commit effectively does:
>   - increase pos for all seq_ops->start()
>   - increase pos for all seq_ops->next()
> 
> For ipv6_route, increasing pos for all seq_ops->next() is correct.
> But increasing pos for seq_ops->start() is not correct
> since pos is used to determine how many items to skip during
> seq_ops->start():
>   iter->skip = *pos;
> seq_ops->start() just fetches the *current* pos item.
> The item can be skipped only after seq_ops->show() which essentially
> is the beginning of seq_ops->next().
> 
> For example, I have 7 ipv6 route entries,
>   root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
>    40  00 
>  0400 0001  0001 eth0
>   fe80 40  00 
>  0100 0001  0001 eth0
>    00  00 
>   0001  00200200   lo
>   0001 80  00 
>   0003  8021   lo
>   fe802050e3fffebd3be8 80  00 
>   0002  8021 eth0
>   ff00 08  00 
>  0100 0004  0001 eth0
>    00  00 
>   0001  00200200   lo
>   0+1 records in
>   0+1 records out
>   1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
>   root@arch-fb-vm1:~/net-next
> 
> In the above, I specify buffer size 4096, so all records can be returned
> to user space with a single trip to the kernel.
> 
> If I use buffer size 128, since each record size is 149, internally
> kernel seq_read() will read 149 into its internal buffer and return the data
> to user space in two read() syscalls. Then user read() syscall will trigger
> next seq_ops->start(). Since the current implementation increased pos even
> for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
> record is #1.
> 
>   root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
>    40  00 
>  0400 0001  0001 eth0
>    00  00 
>   0001  00200200   lo
>   fe802050e3fffebd3be8 80  00 
>   0002  8021 eth0
>    00  00 
>   0001  00200200   lo
> 4+1 records in
> 4+1 records out
> 600 bytes copied, 0.00127758 s, 470 kB/s
> 
> To fix the problem, create a fake pos pointer so seq_ops->start()
> won't actually increase seq_file pos. With this fix, the
> above `dd` command with `bs=128` will show correct result.
> 
> Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> Cc: Vasily Averin 
> Cc: Andrii Nakryiko 
> Cc: Alexei Starovoitov 
> Suggested-by: Vasily Averin 
> Signed-off-by: Yonghong Song 
> ---
>  net/ipv6/ip6_fib.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Changelog:
>  v1 -> v2:
>   - instead of push increment of *pos in ipv6_route_seq_next() for
> seq_ops->next() only. Add a face pos pointer in seq_ops->start()
> and use it when calling ipv6_route_seq_next().
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 141c0a4c569a..e633b2b7deda 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -2622,8 +2622,10 @@ static void *ipv6_route_seq_start(struct seq_file 
> *seq, loff_t *pos)
>   iter->skip = *pos;
>  
>   if (iter->tbl) {
> + loff_t p;

Please init this, otherwise I can guarantee syzbot will be not happy.

p = *pos;

> +
>   ipv6_route_seq_setup_walk(iter, net);
> - return ipv6_route_seq_next(seq, NULL, pos);
> + return ipv6_route_seq_next(seq, NULL, &p);
>   } else {
>   return NULL;
>   }
>

Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register

2020-10-14 Thread Florian Westphal

Pablo Neira Ayuso  wrote:
> > Yes, we iterate table on re-register and modify the existing entries.
> 
> For iptables-nft, it might be possible to avoid this deregister +
> register ct hooks in the same transaction: Maybe add something like
> nf_ct_netns_get_all() to bump refcounters by one _iff_ they are > 0
> before starting the transaction processing, then call
> nf_ct_netns_put_all() which decrements refcounters and unregister
> hooks if they reach 0.

No need, its already fine.  Decrement happens from destroy path,
so new rules are already in place.

> The only problem with this approach is that this pulls in the
> conntrack module, to solve that, struct nf_ct_hook in
> net/netfilter/core.c could be used to store the reference to
> ->netns_get_all and ->net_put_all.
> 
> Legacy would still be flawed though.

Its fine too, new rule blob gets handled (and match/target checkentry
called) before old one is dismantled.

We only have a 0 refcount + hook unregister when rules get
flushed/removed explicitly.

Re: [PATCH net-next v2 00/12] net: add and use function dev_fetch_sw_netstats for fetching pcpu_sw_netstats

2020-10-14 Thread Leon Romanovsky

On Wed, Oct 14, 2020 at 10:01:20AM +0200, Johannes Berg wrote:
> On Wed, 2020-10-14 at 09:59 +0200, Heiner Kallweit wrote:
> >
> > > Do you have a link? What is the benefit and how can we use it?
> > >
> > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1873080.html

So why is it usable?
The combination of Link, b4 and git range-diff gives everything in much
more reliable way.

>
> There was also a long discussion a year or so back, starting at
>
> http://lore.kernel.org/r/7b73e1b7-cc34-982d-2a9c-acf62b88d...@linuxfoundation.org

I participated in that discussion too :)

Thanks

>
> johannes
>

Re: [PATCH net-next v2 00/12] net: add and use function dev_fetch_sw_netstats for fetching pcpu_sw_netstats

2020-10-14 Thread Heiner Kallweit

On 14.10.2020 07:42, Leon Romanovsky wrote:
> On Tue, Oct 13, 2020 at 05:39:51PM -0700, Jakub Kicinski wrote:
>> On Mon, 12 Oct 2020 10:00:11 +0200 Heiner Kallweit wrote:
>>> In several places the same code is used to populate rtnl_link_stats64
>>> fields with data from pcpu_sw_netstats. Therefore factor out this code
>>> to a new function dev_fetch_sw_netstats().
>>>
>>> v2:
>>> - constify argument netstats
>>> - don't ignore netstats being NULL or an ERRPTR
>>> - switch to EXPORT_SYMBOL_GPL
>>
>> Applied, thank you!
> 
> Jakub,
> 
> Is it possible to make sure that changelogs are not part of the commit
> messages? We don't store previous revisions in the git repo, so it doesn't
> give too much to anyone who is looking on git log later. The lore link
> to the patch is more than enough.
> 
I remember that once I did it the usual way (changelog below the ---) David
requested the changelog to be part of the commit message. So obviously he
sees some benefit in doing so. 

> 44fa32f008ab ("net: add function dev_fetch_sw_netstats for fetching 
> pcpu_sw_netstats")
> 
> Thanks
>

Re: [PATCH] Add support for mv88e6393x family of Marvell.

2020-10-14 Thread kernel test robot

Hi Pavana,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.9 next-20201013]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Pavana-Sharma/Add-support-for-mv88e6393x-family-of-Marvell/20201014-130754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
b5fc7a89e58bcc059a3d5e4db79c481fb437de59
config: riscv-randconfig-r035-20201014 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
e7fe3c6dfede8d5781bd000741c3dea7088307a4)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/0baa6c1f154d28ded190828d5b70521d78bf1239
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Pavana-Sharma/Add-support-for-mv88e6393x-family-of-Marvell/20201014-130754
git checkout 0baa6c1f154d28ded190828d5b70521d78bf1239
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:564:9: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   return inw(addr);
  ^
   arch/riscv/include/asm/io.h:55:76: note: expanded from macro 'inw'
   #define inw(c)  ({ u16 __v; __io_pbr(); __v = 
readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
   
~~ ^
   arch/riscv/include/asm/mmio.h:88:76: note: expanded from macro 'readw_cpu'
   #define readw_cpu(c)({ u16 __r = le16_to_cpu((__force 
__le16)__raw_readw(c)); __r; })

^
   include/uapi/linux/byteorder/little_endian.h:36:51: note: expanded from 
macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
 ^
   In file included from drivers/net/dsa/mv88e6xxx/serdes.c:10:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:572:9: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   return inl(addr);
  ^
   arch/riscv/include/asm/io.h:56:76: note: expanded from macro 'inl'
   #define inl(c)  ({ u32 __v; __io_pbr(); __v = 
readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
   
~~ ^
   arch/riscv/include/asm/mmio.h:89:76: note: expanded from macro 'readl_cpu'
   #define readl_cpu(c)({ u32 __r = le32_to_cpu((__force 
__le32)__raw_readl(c)); __r; })

^
   include/uapi/linux/byteorder/little_endian.h:34:51: note: expanded from 
macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
 ^
   In file included from drivers/net/dsa/mv88e6xxx/serdes.c:10:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:580:2: warning: performing pointer arithmetic on a 
nul

Re: [PATCH bpf] bpf: sockmap: add locking annotations to iterator

2020-10-14 Thread Jakub Sitnicki

On Mon, Oct 12, 2020 at 11:18 AM CEST, Lorenz Bauer wrote:
> The sparse checker currently outputs the following warnings:
>
> include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 
> 'sock_hash_seq_start' - wrong count at exit
> include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 
> 'sock_map_seq_start' - wrong count at exit
>
> Add the necessary __acquires and __release annotations to make the
> iterator locking schema palatable to sparse. Also add __must_hold
> for good measure.
>
> The kernel codebase uses both __acquires(rcu) and __acquires(RCU).
> I couldn't find any guidance which one is preferred, so I used
> what is easier to type out.
>
> Fixes: 0365351524d7 ("net: Allow iterating sockmap and sockhash")
> Reported-by: kernel test robot 
> Signed-off-by: Lorenz Bauer 
> ---

Acked-by: Jakub Sitnicki

Re: [Patch net v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly

2020-10-14 Thread Xie He

On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn
 wrote:
>
> There is agreement that hard_header_len should be the length of link
> layer headers visible to the upper layers, needed_headroom the
> additional room required for headers that are not exposed, i.e., those
> pushed inside ndo_start_xmit.
>
> The link layer header length also has to agree with the interface
> hardware type (ARPHRD_..).
>
> Tunnel devices have not always been consistent in this, but today
> "bare" ip tunnel devices without additional headers (ipip, sit, ..) do
> match this and advertise 0 byte hard_header_len. Bareudp, vxlan and
> geneve also conform to this. Known exception that probably needs to be
> addressed is sit, which still advertises LL_MAX_HEADER and so has
> exposed quite a few syzkaller issues. Side note, it is not entirely
> clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and
> why they are needed.
>
> GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.
> The second makes sense, as it appears as an Ethernet device. The first
> should match "bare" ip tunnel devices, if following the above logic.
> Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE
> header around in collect medata mode") implements. It changes
> dev->type to ARPHRD_NONE in collect_md mode.
>
> Some of the inconsistency comes from the various modes of the GRE
> driver. Which brings us to ipgre_header_ops. It is set only in two
> special cases.
>
> Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")
> added ipgre_header_ops.parse to be able to receive the inner ip source
> address with PF_PACKET recvfrom. And apparently relies on
> ipgre_header_ops.create to be able to set an address, which implies
> SOCK_DGRAM.
>
> The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its
> implementation starts with the beautiful comment "/* Nice toy.
> Unfortunately, useless in real life :-)". From the rest of that
> detailed comment, it is not clear to me why it would need to expose
> the headers. The example does not use packet sockets.
>
> A packet socket cannot know devices details such as which configurable
> mode a device may be in. And different modes conflict with the basic
> rule that for a given well defined link layer type, i.e., dev->type,
> header length can be expected to be consistent. In an ideal world
> these exceptions would not exist, therefore.
>
> Unfortunately, this is legacy behavior that will have to continue to
> be supported.

Thanks for your explanation. So header_ops for GRE devices is only
used in 2 special situations. In normal situations, header_ops is not
used for GRE devices. And we consider not using header_ops should be
the ideal arrangement for GRE devices.

Can we create a new dev->type (like ARPHRD_IPGRE_SPECIAL) for GRE
devices that use header_ops? I guess changing dev->type will not
affect the interface to the user space? This way we can solve the
problem of the same dev->type having different hard_header_len values.

Also, for the second special situation, if there's no obvious reason
to use header_ops, maybe we can consider removing header_ops for this
situation.

RE: [PATCH 14/17] can: flexcan: remove ack_grp and ack_bit handling from driver

2020-10-14 Thread Joakim Zhang


> -Original Message-
> From: Marc Kleine-Budde 
> Sent: 2020年10月8日 5:32
> To: netdev@vger.kernel.org
> Cc: da...@davemloft.net; linux-...@vger.kernel.org;
> ker...@pengutronix.de; Marc Kleine-Budde ; Joakim
> Zhang 
> Subject: [PATCH 14/17] can: flexcan: remove ack_grp and ack_bit handling from
> driver
> 
> Since commit:
> 
> 048e3a34a2e7 can: flexcan: poll MCR_LPM_ACK instead of GPR ACK for
> stop mode acknowledgment
> 
> the driver polls the IP core's internal bit MCR[LPM_ACK] as stop mode
> acknowledge and not the acknowledgment on chip level.
> 
> This means the 4th and 5th value of the property "fsl,stop-mode" isn't used
> anymore. This patch removes the used "ack_gpr" and "ack_bit" from the
> driver.
> 
> Link:
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flore.kern
> el.org%2Fr%2F20201006203748.1750156-15-mkl%40pengutronix.de&dat
> a=02%7C01%7Cqiangqing.zhang%40nxp.com%7C1540ad5bf7bd4a1e10a508d8
> 6b087a67%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637377031
> 436785787&sdata=ierIIVdSqZFLklIvgMokHX6LU77cEWQgUGzUi6CHdDI%
> 3D&reserved=0
> Fixes: 048e3a34a2e7 ("can: flexcan: poll MCR_LPM_ACK instead of GPR ACK
> for stop mode acknowledgment")
> Cc: Joakim Zhang 
> Signed-off-by: Marc Kleine-Budde 

[...]
>   /* stop mode property format is:
> -  * <&gpr req_gpr req_bit ack_gpr ack_bit>.
> +  * <&gpr req_gpr>.

Hi Marc,

Sorry for response delay, stop mode property format should be "<&gpr req_gpr 
req_bit>", I saw this code change has went into linux-next, so I will correct 
it by the way next time when I upsteam wakeup function for i.MX8.

Need I update stop mode property in dts file? Although this function won't be 
broken without dts update.

Best Regards,
Joakim Zhang

[PATCH net] net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info

2020-10-14 Thread Leon Romanovsky

From: Leon Romanovsky 

The access of tcf_tunnel_info() produces the following splat, so fix it
by dereferencing the tcf_tunnel_key_params pointer with marker that
internal tcfa_liock is held.

 =
 WARNING: suspicious RCU usage
 5.9.0+ #1 Not tainted
 -
 include/net/tc_act/tc_tunnel_key.h:59 suspicious rcu_dereference_protected() 
usage!
 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by tc/34839:
  #0: 88828572c2a0 (&p->tcfa_lock){+...}-{2:2}, at: 
tc_setup_flow_action+0xb3/0x48b5
 stack backtrace:
 CPU: 1 PID: 34839 Comm: tc Not tainted 5.9.0+ #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
 Call Trace:
  dump_stack+0x9a/0xd0
  tc_setup_flow_action+0x14cb/0x48b5
  fl_hw_replace_filter+0x347/0x690 [cls_flower]
  fl_change+0x2bad/0x4875 [cls_flower]
  tc_new_tfilter+0xf6f/0x1ba0
  rtnetlink_rcv_msg+0x5f2/0x870
  netlink_rcv_skb+0x124/0x350
  netlink_unicast+0x433/0x700
  netlink_sendmsg+0x6f1/0xbd0
  sock_sendmsg+0xb0/0xe0
  sys_sendmsg+0x4fa/0x6d0
  ___sys_sendmsg+0x12e/0x1b0
  __sys_sendmsg+0xa4/0x120
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f1f8cd4fe57
 Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 
8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 
c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
 RSP: 002b:7ffdc1e193b8 EFLAGS: 0246 ORIG_RAX: 002e
 RAX: ffda RBX:  RCX: 7f1f8cd4fe57
 RDX:  RSI: 7ffdc1e19420 RDI: 0003
 RBP: 5f85aafa R08: 0001 R09: 7ffdc1e1936c
 R10: 0040522d R11: 0246 R12: 0001
 R13:  R14: 7ffdc1e1d6f0 R15: 00482420

Fixes: 3ebaf6da0716 ("net: sched: Do not assume RTNL is held in tunnel key 
action helpers")
Fixes: 7a47281439ba ("net: sched: lock action when translating it to 
flow_action infra")
Signed-off-by: Leon Romanovsky 
---
 include/net/tc_act/tc_tunnel_key.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/tc_act/tc_tunnel_key.h 
b/include/net/tc_act/tc_tunnel_key.h
index e1057b255f69..879fe8cff581 100644
--- a/include/net/tc_act/tc_tunnel_key.h
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -56,7 +56,10 @@ static inline struct ip_tunnel_info *tcf_tunnel_info(const 
struct tc_action *a)
 {
 #ifdef CONFIG_NET_CLS_ACT
struct tcf_tunnel_key *t = to_tunnel_key(a);
-   struct tcf_tunnel_key_params *params = rtnl_dereference(t->params);
+   struct tcf_tunnel_key_params *params;
+
+   params = rcu_dereference_protected(t->params,
+  lockdep_is_held(&a->tcfa_lock));

return ¶ms->tcft_enc_metadata->u.tun_info;
 #else
--
2.26.2

Re: [PATCH] net: mscc: ocelot: Allow using without PCI on t1040 SoC

2020-10-14 Thread Vladimir Oltean

Hi Maxim,

On Wed, Oct 14, 2020 at 09:11:05AM +0300, Maxim Kochetkov wrote:
> There is no need to select FSL_ENETC_MDIO on t1040 SoC (ppc).
> 
> Signed-off-by: Maxim Kochetkov 
> ---

Please submit your changes after you've used net-next for a while.
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/drivers/net/dsa/ocelot/Kconfig

Re: [PATCH] can: Explain PDU in CAN_ISOTP help text

2020-10-14 Thread Marc Kleine-Budde

On 10/13/20 6:43 PM, Oliver Hartkopp wrote:
> On 13.10.20 16:13, Geert Uytterhoeven wrote:
>> The help text for the CAN_ISOTP config symbol uses the acronym "PDU".
>> However, this acronym is not explained here, nor in
>> Documentation/networking/can.rst.
>> Expand the acronym to make it easier for users to decide if they need to
>> enable the CAN_ISOTP option or not.
>>
>> Signed-off-by: Geert Uytterhoeven 
> 
> Acked-by: Oliver Hartkopp 
> 
> Yes, when you are so deep into it that PDU becomes a word like dog or 
> cat ;-)

As v5.9 is out and isotp will hit mainline quite soon, I'll queue this via
can/master.

Tnx,
Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 1/1] net: ftgmac100: add handling of mdio/phy nodes for ast2400/2500

2020-10-14 Thread Joel Stanley

Hi Ivan,

On Tue, 13 Oct 2020 at 12:38, Ivan Mikhaylov  wrote:
>
> phy-handle can't be handled well for ast2400/2500 which has an embedded
> MDIO controller. Add ftgmac100_mdio_setup for ast2400/2500 and initialize
> PHYs from mdio child node with of_mdiobus_register.

Good idea. The driver has become a mess of different ways to connect
the phy and it needs to be cleaned up. I have a patch that fixes
rmmod, which is currently broken.



>
> Signed-off-by: Ivan Mikhaylov 
> ---
>  drivers/net/ethernet/faraday/ftgmac100.c | 114 ++-
>  1 file changed, 69 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index 87236206366f..e32066519ec1 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1044,11 +1044,47 @@ static void ftgmac100_adjust_link(struct net_device 
> *netdev)
> schedule_work(&priv->reset_task);
>  }
>
> -static int ftgmac100_mii_probe(struct ftgmac100 *priv, phy_interface_t intf)
> +static int ftgmac100_mii_probe(struct net_device *netdev)
>  {
> -   struct net_device *netdev = priv->netdev;
> +   struct ftgmac100 *priv = netdev_priv(netdev);
> +   struct platform_device *pdev = to_platform_device(priv->dev);
> +   struct device_node *np = pdev->dev.of_node;
> +   phy_interface_t phy_intf = PHY_INTERFACE_MODE_RGMII;
> struct phy_device *phydev;
>
> +   /* Get PHY mode from device-tree */
> +   if (np) {
> +   /* Default to RGMII. It's a gigabit part after all */
> +   phy_intf = of_get_phy_mode(np, &phy_intf);
> +   if (phy_intf < 0)
> +   phy_intf = PHY_INTERFACE_MODE_RGMII;
> +
> +   /* Aspeed only supports these. I don't know about other IP
> +* block vendors so I'm going to just let them through for
> +* now. Note that this is only a warning if for some obscure
> +* reason the DT really means to lie about it or it's a newer
> +* part we don't know about.
> +*
> +* On the Aspeed SoC there are additionally straps and SCU
> +* control bits that could tell us what the interface is
> +* (or allow us to configure it while the IP block is held
> +* in reset). For now I chose to keep this driver away from
> +* those SoC specific bits and assume the device-tree is
> +* right and the SCU has been configured properly by pinmux
> +* or the firmware.
> +*/
> +   if (priv->is_aspeed &&
> +   phy_intf != PHY_INTERFACE_MODE_RMII &&
> +   phy_intf != PHY_INTERFACE_MODE_RGMII &&
> +   phy_intf != PHY_INTERFACE_MODE_RGMII_ID &&
> +   phy_intf != PHY_INTERFACE_MODE_RGMII_RXID &&
> +   phy_intf != PHY_INTERFACE_MODE_RGMII_TXID) {
> +   netdev_warn(netdev,
> +   "Unsupported PHY mode %s !\n",
> +   phy_modes(phy_intf));
> +   }

Why do we move this?

> +   }
> +
> phydev = phy_find_first(priv->mii_bus);
> if (!phydev) {
> netdev_info(netdev, "%s: no PHY found\n", netdev->name);
> @@ -1056,7 +1092,7 @@ static int ftgmac100_mii_probe(struct ftgmac100 *priv, 
> phy_interface_t intf)
> }
>
> phydev = phy_connect(netdev, phydev_name(phydev),
> -&ftgmac100_adjust_link, intf);
> +&ftgmac100_adjust_link, phy_intf);
>
> if (IS_ERR(phydev)) {
> netdev_err(netdev, "%s: Could not attach to PHY\n", 
> netdev->name);
> @@ -1601,8 +1637,8 @@ static int ftgmac100_setup_mdio(struct net_device 
> *netdev)
>  {
> struct ftgmac100 *priv = netdev_priv(netdev);
> struct platform_device *pdev = to_platform_device(priv->dev);
> -   phy_interface_t phy_intf = PHY_INTERFACE_MODE_RGMII;
> struct device_node *np = pdev->dev.of_node;
> +   struct device_node *mdio_np;
> int i, err = 0;
> u32 reg;
>
> @@ -1623,39 +1659,6 @@ static int ftgmac100_setup_mdio(struct net_device 
> *netdev)
> iowrite32(reg, priv->base + FTGMAC100_OFFSET_REVR);
> }
>
> -   /* Get PHY mode from device-tree */
> -   if (np) {
> -   /* Default to RGMII. It's a gigabit part after all */
> -   err = of_get_phy_mode(np, &phy_intf);
> -   if (err)
> -   phy_intf = PHY_INTERFACE_MODE_RGMII;
> -
> -   /* Aspeed only supports these. I don't know about other IP
> -* block vendors so I'm going to just let them through for
> -* now. Note that this is only a warning if for some obscure
> -* reason the

[PATCH] Add support for mv88e6393x family of Marvell.

2020-10-14 Thread Pavana Sharma

The Marvell 88E6393X device is a single-chip integration of a 11-port
Ethernet switch with eight integrated Gigabit Ethernet (GbE) transceivers
and three 10-Gigabit interfaces.

This patch adds functionalities specific to mv88e6393x family (88E6393X,
88E6193X and 88E6191X)

Signed-off-by: Pavana Sharma 
---
 drivers/net/dsa/mv88e6xxx/chip.c|  90 +
 drivers/net/dsa/mv88e6xxx/chip.h|   2 +
 drivers/net/dsa/mv88e6xxx/global1.h |   2 +
 drivers/net/dsa/mv88e6xxx/global2.c |   7 +
 drivers/net/dsa/mv88e6xxx/global2.h |   8 +
 drivers/net/dsa/mv88e6xxx/port.c| 302 
 drivers/net/dsa/mv88e6xxx/port.h|  39 +++-
 drivers/net/dsa/mv88e6xxx/serdes.c  | 242 ++
 drivers/net/dsa/mv88e6xxx/serdes.h  |  39 
 9 files changed, 730 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index f0dbc05e30a4..241ff788b0b1 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -634,6 +634,23 @@ static void mv88e6390x_phylink_validate(struct 
mv88e6xxx_chip *chip, int port,
mv88e6390_phylink_validate(chip, port, mask, state);
 }
 
+static void mv88e6393x_phylink_validate(struct mv88e6xxx_chip *chip, int port,
+   unsigned long *mask,
+   struct phylink_link_state *state)
+{
+   if (port == 0 || port >= 9) {
+   phylink_set(mask, 1baseT_Full);
+   phylink_set(mask, 1baseKR_Full);
+   phylink_set(mask, 2500baseX_Full);
+   phylink_set(mask, 2500baseT_Full);
+   }
+
+   phylink_set(mask, 1000baseT_Full);
+   phylink_set(mask, 1000baseX_Full);
+
+   mv88e6065_phylink_validate(chip, port, mask, state);
+}
+
 static void mv88e6xxx_validate(struct dsa_switch *ds, int port,
   unsigned long *supported,
   struct phylink_link_state *state)
@@ -4141,6 +4158,56 @@ static const struct mv88e6xxx_ops mv88e6191_ops = {
.phylink_validate = mv88e6390_phylink_validate,
 };
 
+static const struct mv88e6xxx_ops mv88e6193x_ops = {
+   /* MV88E6XXX_FAMILY_6393X */
+   .setup_errata = mv88e6393x_setup_errata,
+   .irl_init_all = mv88e6390_g2_irl_init_all,
+   .get_eeprom = mv88e6xxx_g2_get_eeprom8,
+   .set_eeprom = mv88e6xxx_g2_set_eeprom8,
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+   .port_set_link = mv88e6xxx_port_set_link,
+   .port_set_speed_duplex = mv88e6393x_port_set_speed_duplex,
+   .port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
+   .port_tag_remap = mv88e6390_port_tag_remap,
+   .port_set_frame_mode = mv88e6351_port_set_frame_mode,
+   .port_set_egress_floods = mv88e6352_port_set_egress_floods,
+   .port_set_ether_type = mv88e6393x_port_set_ether_type,
+   .port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
+   .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
+   .port_pause_limit = mv88e6390_port_pause_limit,
+   .port_set_cmode = mv88e6393x_port_set_cmode,
+   .port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
+   .port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
+   .port_get_cmode = mv88e6352_port_get_cmode,
+   .stats_snapshot = mv88e6390_g1_stats_snapshot,
+   .stats_set_histogram = mv88e6390_g1_stats_set_histogram,
+   .stats_get_sset_count = mv88e6320_stats_get_sset_count,
+   .stats_get_strings = mv88e6320_stats_get_strings,
+   .stats_get_stats = mv88e6390_stats_get_stats,
+   .set_cpu_port = mv88e6393x_port_set_cpu_dest,
+   .set_egress_port = mv88e6393x_set_egress_port,
+   .watchdog_ops = &mv88e6390_watchdog_ops,
+   .mgmt_rsvd2cpu = mv88e6393x_port_mgmt_rsvd2cpu,
+   .pot_clear = mv88e6xxx_g2_pot_clear,
+   .reset = mv88e6352_g1_reset,
+   .rmu_disable = mv88e6390_g1_rmu_disable,
+   .vtu_getnext = mv88e6390_g1_vtu_getnext,
+   .vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+   .serdes_power = mv88e6393x_serdes_power,
+   .serdes_get_lane = mv88e6393x_serdes_get_lane,
+   /* Check status register pause & lpa register */
+   .serdes_pcs_get_state = mv88e6390_serdes_pcs_get_state,
+   .serdes_irq_mapping = mv88e6390_serdes_irq_mapping,
+   .serdes_irq_enable = mv88e6393x_serdes_irq_enable,
+   .serdes_irq_status = mv88e6393x_serdes_irq_status,
+   .gpio_ops = &mv88e6352_gpio_ops,
+   .avb_ops = &mv88e6390_avb_ops,
+   .ptp_ops = &mv88e6352_ptp_ops,
+   .phylink_validate = mv88e6393x_phylink_validate,
+};
+
 static const struct mv88e6xxx_ops mv88e6240_ops = {
/* MV88E6XXX_FAMILY_6352 */
.ieee_pri_map = mv88e6085_g1_ieee_pri_map,
@@ -5073,6 +5140,29 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] =

Re: [PATCH 1/1] net: ftgmac100: Fix Aspeed ast2600 TX hang issue

2020-10-14 Thread Joel Stanley

On Wed, 14 Oct 2020 at 06:07, Dylan Hung  wrote:
>
> The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
> hang when handling scatter-gather DMA.  Disable the problematic feature
> by setting MAC register 0x58 bit28 and bit27.

Hi Dylan,

What are the symptoms of this issue? We are seeing this on our systems:

[29376.090637] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:442
dev_watchdog+0x2f0/0x2f4
[29376.099898] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0 timed out

> Signed-off-by: Dylan Hung 

This fixes support for the ast2600, so we can put:

Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")

Reviewed-by: Joel Stanley 

> ---
>  drivers/net/ethernet/faraday/ftgmac100.c | 5 +
>  drivers/net/ethernet/faraday/ftgmac100.h | 8 
>  2 files changed, 13 insertions(+)
>
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index 87236206366f..00024dd41147 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1817,6 +1817,11 @@ static int ftgmac100_probe(struct platform_device 
> *pdev)
> priv->rxdes0_edorr_mask = BIT(30);
> priv->txdes0_edotr_mask = BIT(30);
> priv->is_aspeed = true;
> +   /* Disable ast2600 problematic HW arbitration */
> +   if (of_device_is_compatible(np, "aspeed,ast2600-mac")) {
> +   iowrite32(FTGMAC100_TM_DEFAULT,
> + priv->base + FTGMAC100_OFFSET_TM);
> +   }
> } else {
> priv->rxdes0_edorr_mask = BIT(15);
> priv->txdes0_edotr_mask = BIT(15);
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.h 
> b/drivers/net/ethernet/faraday/ftgmac100.h
> index e5876a3fda91..63b3e02fab16 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.h
> +++ b/drivers/net/ethernet/faraday/ftgmac100.h
> @@ -169,6 +169,14 @@
>  #define FTGMAC100_MACCR_FAST_MODE  (1 << 19)
>  #define FTGMAC100_MACCR_SW_RST (1 << 31)
>
> +/*
> + * test mode control register
> + */
> +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28)
> +#define FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27)
> +#define FTGMAC100_TM_DEFAULT 
>   \
> +   (FTGMAC100_TM_RQ_TX_VALID_DIS | FTGMAC100_TM_RQ_RR_IDLE_PREV)

Will aspeed issue an updated datasheet with this register documented?


> +
>  /*
>   * PHY control register
>   */
> --
> 2.17.1
>

Re: [PATCH 0/7] TC-ETF support PTP clocks series

2020-10-14 Thread Meisinger, Andreas

Hello Thomas,
Sorry about the wrong format/style of my last mail, hope I did get it right 
this time.

Let me first point at an important thing because we did have discussions here 
about it too. As of the manpages Linux CLOCK_TAI seems to be defined as an 
nonsettable clock which does have the same behaviour as of international atomic 
time TAI. Yet if it's nonsettable it can't be linked or synchronized to the 
value of International Atomic Time?
On the other hand there seems to be code in place to change the CLOCK_TAI 
offset thus making it basically sort of "setable"?

> The user space daemon which does the correlation between these PTP domains 
> and TAI is required in any case, so the magic
> clock TAI_PRIVATE is not having any advantage.

I think a userspace daemon handling the translation information between 
different clocks would be fine. I think it's not really that relevant who 
exactly does apply the translation. Kernel level might be a little bit more 
precise but I guess it'd depend on other factors too.
Yet all translation based approaches have in common, setting a clock, renders 
translations done in past invalid. It would be required to fix old translated 
values along with setting the clock. I assume we couldn't distinguish between 
"translated" values and genuine values after translation, so fixing them might 
not be possible at all.
In our usecase we do have a clock for network operation which can't be set. We 
can guarantee this because we are able to define the network not being 
operational when the defined time is not available 😉.
Having this defined the remaining option would  be the target clock to be set. 
As of your suggestion that would be CLOCK_TAI. So at this point "setable" or 
"nonsettable" would become important. Here "setable" would introduce a 
dependency between the clocks. Being independent from clocks outside of our 
control was exactly the reason to introduce the separate network clock. To me 
this means if CLOCK_TAI could be changed by anything it couldn't be the base 
clock for our usecase if it can't it might be a solution.

> Depending on the frequency drift between CLOCK_TAI and clock PTP/$N the timer 
> expiry might be slightly inaccurate, but
> surely not more inaccurate than if that conversion is done purely in user 
> space.
>
> The self rearming posix timers would work too, but the self rearming is based 
> on CLOCK_TAI, so rounding errors and drift
> would be accumulative. So I'd rather stay away from them.

As of the time ranges typically used in tsn networks the drift error for single 
shot timers most likely isn't a big deal. But as you suggest I'd stay away from 
long running timers as well rearming ones too, I guess they'd be mostly 
unusable.

> If such a coordination exists, then the whole problem in the TSN stack is 
> gone. The core can always operate on TAI and the
> network device which runs in a different time universe would use the same 
> conversion data e.g. to queue a packet for HW
> based time triggered transmission. Again subject to slight inaccuracy, but it 
> does not come with all the problems of dynamic
> clocks, locking issues etc. As the frequency drift between PTP domains is 
> neither fast changing nor randomly jumping around
> the inaccuracy might even be a mostly academic problem.

These multiple conversion errors would happen even for applications aware of 
it's target timescale.
This might usually be an academic issue but for some of our usecases conversion 
errors aren’t allowed at all.
In any case adding conversion errors sounds strange as our goal is to improve 
precision.

I don't see a way to avoid conversion errors except of somehow passing the 
original timestamp down to network card level.
Right now there's only one timestamp in CLOCK_TAI format which is used by ETF 
as well as by network card thus causing trouble if time is not same there.
If we'd add an (optional) second timestamp to SKB which would have to be set to 
network card time we could avoid the conversion error. As we do know which 
network card will be used for sending the SKB we wouldn't need a clock 
identifier for the second timestamp.
For situations where the application is not aware of the network cards 
timespace it wouldn't provide the second timestamp. In these situations it'd 
behave identical to your suggestion. Here the CLOCK_TAI timestamp would be 
translated to network card time based on the information of the userspace 
daemon.

What's your opinion about a second timestamp?

Best regards
Andreas Meisinger

Siemens AG
Digital Industries
Process Automation
DI PA DCP
Gleiwitzer Str. 555
90475 Nürnberg, Deutschland
Tel.: +49 911 95822720
mailto:andreas.meisin...@siemens.com

Re: [PATCH 1/1] net: dsa: seville: fix buffer size of the queue system

2020-10-14 Thread Vladimir Oltean

On Wed, Oct 14, 2020 at 08:14:04AM +0300, Maxim Kochetkov wrote:
> The VSC9953 Seville switch has 2 megabits of buffer split into 4360
> words of 60 bytes each. 2048 * 1024 is 2 megabytes instead of 2 megabytes.
  ~
  megabits
> 2 megabytes is (2048 / 8) * 1024 = 256 * 1024.
~  ^
megabits   bytes
> 
> Signed-off-by: Maxim Kochetkov 
> Fixes: a63ed92d217f ("net: dsa: seville: fix buffer size of the queue system")
> ---

Others:
Can you please use a different commit message? Like
"net: dsa: seville: the packet buffer is 2 megabits, not megabytes"
It simplifies the work of backporters to not have more than 1 commit
with the same title.

>  drivers/net/dsa/ocelot/seville_vsc9953.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/ocelot/seville_vsc9953.c 
> b/drivers/net/dsa/ocelot/seville_vsc9953.c
> index 9e9fd19e1d00..e2cd49eec037 100644
> --- a/drivers/net/dsa/ocelot/seville_vsc9953.c
> +++ b/drivers/net/dsa/ocelot/seville_vsc9953.c
> @@ -1010,7 +1010,7 @@ static const struct felix_info seville_info_vsc9953 = {
>   .vcap_is2_keys  = vsc9953_vcap_is2_keys,
>   .vcap_is2_actions   = vsc9953_vcap_is2_actions,
>   .vcap   = vsc9953_vcap_props,
> - .shared_queue_sz= 2048 * 1024,
> + .shared_queue_sz= 256 * 1024,

I suppose I haven't caught this because I've been running with this
patch in my tree:
https://patchwork.ozlabs.org/project/netdev/patch/20201013134849.395986-2-vladimir.olt...@nxp.com/

>   .num_mact_rows  = 2048,
>   .num_ports  = 10,
>   .mdio_bus_alloc = vsc9953_mdio_bus_alloc,
> --
> 2.27.0

Re: [PATCH bpf] bpf: selftest: Fix flaky tcp_hdr_options test when adding addr to lo

2020-10-14 Thread Andrii Nakryiko

On Tue, Oct 13, 2020 at 4:13 AM Martin KaFai Lau  wrote:
>
> The tcp_hdr_options test adds a "::eB9F" addr to the lo dev.
> However, this non loopback address will have a race on ipv6 dad
> which may lead to EADDRNOTAVAIL error from time to time.
>
> Even nodad is used in the iproute2 command, there is still a race in
> when the route will be added.  This will then lead to ENETUNREACH from
> time to time.
>
> To avoid the above, this patch uses the default loopback address "::1"
> to do the test.
>
> Fixes: ad2f8eb0095e ("bpf: selftests: Tcp header options")
> Reported-by: Andrii Nakryiko 
> Signed-off-by: Martin KaFai Lau 
> ---

Less shelling out is always good :)

Acked-by: Andrii Nakryiko 

>  .../bpf/prog_tests/tcp_hdr_options.c  | 26 +--
>  .../bpf/progs/test_misc_tcp_hdr_options.c |  2 +-
>  2 files changed, 2 insertions(+), 26 deletions(-)
>

[...]

Re: vxlan_asymmetric.sh test failed every time

2020-10-14 Thread Hangbin Liu

On Tue, Oct 13, 2020 at 10:49:30AM +0300, Ido Schimmel wrote:
> On Tue, Oct 13, 2020 at 12:39:43PM +0800, Hangbin Liu wrote:
> > Hi Ido,
> > 
> > When run vxlan_asymmetric.sh on RHEL8, It failed every time. I though that
> > it may failed because the kernel version is too old. But today I tried with
> > latest kernel, it still failed. Would you please help check if I missed
> > any configuration?
> 
> Works OK for me:
> 
> $ sudo ./vxlan_asymmetric.sh veth0 veth1 veth2 veth3 veth4 veth5
> TEST: ping: local->local vid 10->vid 20 [ OK ]
> TEST: ping: local->remote vid 10->vid 10[ OK ]
> TEST: ping: local->remote vid 20->vid 20[ OK ]
> TEST: ping: local->remote vid 10->vid 20[ OK ]
> TEST: ping: local->remote vid 20->vid 10[ OK ]
> INFO: deleting neighbours from vlan interfaces
> TEST: ping: local->local vid 10->vid 20 [ OK ]
> TEST: ping: local->remote vid 10->vid 10[ OK ]
> TEST: ping: local->remote vid 20->vid 20[ OK ]
> TEST: ping: local->remote vid 10->vid 20[ OK ]
> TEST: ping: local->remote vid 20->vid 10[ OK ]
> TEST: neigh_suppress: on / neigh exists: yes[ OK ]
> TEST: neigh_suppress: on / neigh exists: no [ OK ]
> TEST: neigh_suppress: off / neigh exists: no[ OK ]
> TEST: neigh_suppress: off / neigh exists: yes   [ OK ]
> 
> # uname -r
> 5.9.0-rc8-custom-36808-gccdf7fae3afa
> 
> # ip -V
> ip utility, iproute2-5.8.0
> 
> # netsniff-ng -v
> netsniff-ng 0.6.7 (Polygon Window), Git id: (none)
> 
> The first failure might be related to your rp_filter settings. Can you
> please try with this patch?
> 
> diff --git a/tools/testing/selftests/net/forwarding/vxlan_asymmetric.sh 
> b/tools/testing/selftests/net/forwarding/vxlan_asymmetric.sh
> index a0b5f57d6bd3..0727e2012b68 100755
> --- a/tools/testing/selftests/net/forwarding/vxlan_asymmetric.sh
> +++ b/tools/testing/selftests/net/forwarding/vxlan_asymmetric.sh
> @@ -215,10 +215,16 @@ switch_create()
>  
> bridge fdb add 00:00:5e:00:01:01 dev br1 self local vlan 10
> bridge fdb add 00:00:5e:00:01:01 dev br1 self local vlan 20
> +
> +   sysctl_set net.ipv4.conf.all.rp_filter 0
> +   sysctl_set net.ipv4.conf.vlan10-v.rp_filter 0
> +   sysctl_set net.ipv4.conf.vlan20-v.rp_filter 0
>  }
>  
>  switch_destroy()
>  {
> +   sysctl_restore net.ipv4.conf.all.rp_filter
> +
> bridge fdb del 00:00:5e:00:01:01 dev br1 self local vlan 20
> bridge fdb del 00:00:5e:00:01:01 dev br1 self local vlan 10
>  
> @@ -359,6 +365,10 @@ ns_switch_create()
>  
> bridge fdb add 00:00:5e:00:01:01 dev br1 self local vlan 10
> bridge fdb add 00:00:5e:00:01:01 dev br1 self local vlan 20
> +
> +   sysctl_set net.ipv4.conf.all.rp_filter 0
> +   sysctl_set net.ipv4.conf.vlan10-v.rp_filter 0
> +   sysctl_set net.ipv4.conf.vlan20-v.rp_filter 0
>  }
>  export -f ns_switch_create

Thanks a lot for help debugging this issue, this patch works for me.

Tested-by: Hangbin Liu

Re: [PATCH] net: sockmap: Don't call bpf_prog_put() on NULL pointer

2020-10-14 Thread Jakub Sitnicki

On Mon, Oct 12, 2020 at 07:09 PM CEST, Alex Dewar wrote:
> If bpf_prog_inc_not_zero() fails for skb_parser, then bpf_prog_put() is
> called unconditionally on skb_verdict, even though it may be NULL. Fix
> and tidy up error path.
>
> Addresses-Coverity-ID: 1497799: Null pointer dereferences (FORWARD_NULL)
> Fixes: 743df8b7749f ("bpf, sockmap: Check skb_verdict and skb_parser programs 
> explicitly")
> Signed-off-by: Alex Dewar 
> ---

Acked-by: Jakub Sitnicki

WARNING: at net/netfilter/nf_tables_api.c:622 lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables]

2020-10-14 Thread Naresh Kamboju

While running kselftest netfilter on arm64 hikey device on Linux next
20201013 the following
kernel warning noticed.

metadata:
  git branch: master
  git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  git commit: f2fb1afc57304f9dd68c20a08270e287470af2eb
  git describe: next-20201013
  make_kernelversion: 5.9.0
  kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/hikey/lkft/linux-next/879/config

steps to reproduce:
---
# cd /opt/kselftests/default-in-kernel/
# ./run_kselftest.sh -c netfilter

crash log:

# selftests: netfilter: nft_trans_stress.sh
[ 1913.862919] [ cut here ]
[ 1913.869773] WARNING: CPU: 2 PID: 31416 at
/usr/src/kernel/net/netfilter/nf_tables_api.c:622
lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables]
[ 1913.885399] Modules linked in: nf_tables nfnetlink act_mirred
cls_u32 sch_etf xt_conntrack nf_conntrack nf_defrag_ipv4 libcrc32c
ip6_tables nf_defrag_ipv6 ip_tables x_tables netdevsim 8021q garp mrp
bridge stp llc sch_fq sch_ingress veth algif_hash wl18xx wlcore
mac80211 cfg80211 snd_soc_hdmi_codec hci_uart btqca btbcm crct10dif_ce
snd_soc_audio_graph_card snd_soc_simple_card_utils adv7511 wlcore_sdio
cec bluetooth kirin_drm lima rfkill dw_drm_dsi gpu_sched
drm_kms_helper drm fuse [last unloaded: test_blackhole_dev]
[ 1913.941924] CPU: 2 PID: 31416 Comm: nft Tainted: GW
5.9.0-next-20201013 #1
[ 1913.954131] Hardware name: HiKey Development Board (DT)
[ 1913.963342] pstate: 0005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
[ 1913.973483] pc : lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables]
[ 1913.984271] lr : lockdep_nfnl_nft_mutex_not_held+0x18/0x38 [nf_tables]
[ 1913.995018] sp : 800013bc3550
[ 1914.002559] x29: 800013bc3550 x28: 800013bc3930
[ 1914.012197] x27: 0001 x26: 45dc4e00
[ 1914.021880] x25: 0001 x24: 45dc4e00
[ 1914.031565] x23: 800013bc3930 x22: 0001
[ 1914.041298] x21: 80001275 x20: 800013bc3668
[ 1914.051068] x19: 80001275 x18: 
[ 1914.060876] x17:  x16: 
[ 1914.070699] x15:  x14: 89996d48
[ 1914.080534] x13: ff00 x12: 0028
[ 1914.090418] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[ 1914.100355] x9 : fefefefefefefeff x8 : 7f7f7f7f7f7f7f7f
[ 1914.110325] x7 : fefeff53544f4d48 x6 : 7ab8
[ 1914.120339] x5 : 0005 x4 : 0001
[ 1914.130388] x3 : 0001 x2 : 
[ 1914.140454] x1 :  x0 : 0001
[ 1914.150529] Call trace:
[ 1914.157789]  lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables]
[ 1914.168967]  nft_chain_parse_hook+0x58/0x320 [nf_tables]
[ 1914.179342]  nf_tables_addchain.isra.66+0xb8/0x510 [nf_tables]
[ 1914.190340]  nf_tables_newchain+0x408/0x618 [nf_tables]
[ 1914.200734]  nfnetlink_rcv_batch+0x4a0/0x610 [nfnetlink]
[ 1914.211284]  nfnetlink_rcv+0x174/0x1a8 [nfnetlink]
[ 1914.221351]  netlink_unicast+0x1dc/0x290
[ 1914.230589]  netlink_sendmsg+0x2b8/0x3f8
[ 1914.239840]  sys_sendmsg+0x288/0x2d0
[ 1914.249117]  ___sys_sendmsg+0x90/0xd0
[ 1914.258154]  __sys_sendmsg+0x78/0xd0
[ 1914.267140]  __arm64_sys_sendmsg+0x2c/0x38
[ 1914.276705]  el0_svc_common.constprop.3+0x7c/0x198
[ 1914.287041]  do_el0_svc+0x34/0xa0
[ 1914.295928]  el0_sync_handler+0x128/0x190
[ 1914.305567]  el0_sync+0x140/0x180
[ 1914.314535] CPU: 2 PID: 31416 Comm: nft Tainted: GW
5.9.0-next-20201013 #1
[ 1914.328670] Hardware name: HiKey Development Board (DT)
[ 1914.339812] Call trace:
[ 1914.348184]  dump_backtrace+0x0/0x1f0
[ 1914.357841]  show_stack+0x2c/0x80
[ 1914.367181]  dump_stack+0xf8/0x160
[ 1914.376615]  __warn+0xac/0x168
[ 1914.385732]  report_bug+0xcc/0x180
[ 1914.395242]  bug_handler+0x24/0x78
[ 1914.404783]  call_break_hook+0x80/0xa0
[ 1914.414725]  brk_handler+0x28/0x68
[ 1914.424358]  do_debug_exception+0xbc/0x128
[ 1914.434744]  el1_sync_handler+0x7c/0x128
[ 1914.445017]  el1_sync+0x7c/0x100
[ 1914.454625]  lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables]
[ 1914.467351]  nft_chain_parse_hook+0x58/0x320 [nf_tables]
[ 1914.479276]  nf_tables_addchain.isra.66+0xb8/0x510 [nf_tables]
[ 1914.491818]  nf_tables_newchain+0x408/0x618 [nf_tables]
[ 1914.503774]  nfnetlink_rcv_batch+0x4a0/0x610 [nfnetlink]
[ 1914.515899]  nfnetlink_rcv+0x174/0x1a8 [nfnetlink]
[ 1914.527525]  netlink_unicast+0x1dc/0x290
[ 1914.538318]  netlink_sendmsg+0x2b8/0x3f8
[ 1914.549125]  sys_sendmsg+0x288/0x2d0
[ 1914.559959]  ___sys_sendmsg+0x90/0xd0
[ 1914.570557]  __sys_sendmsg+0x78/0xd0
[ 1914.58]  __arm64_sys_sendmsg+0x2c/0x38
[ 1914.592241]  el0_svc_common.constprop.3+0x7c/0x198
[ 1914.604152]  do_el0_svc+0x34/0xa0
[ 1914.614497]  el0_sync_handler+0x128/0x190
[ 1914.625540]  el0_sync+0x140/0x180
[ 1914.635652] irq event stamp: 0
[ 1914.645091] hardirqs last  enabled at (0): [<>] 0x0
[ 1914.6

Re: [PATCH v2 bpf-next 3/4] selftests/bpf: Add profiler test

2020-10-14 Thread Andrii Nakryiko

On Tue, Oct 13, 2020 at 2:03 PM Alexei Starovoitov
 wrote:
>
> On Tue, Oct 13, 2020 at 12:56 PM Jiri Olsa  wrote:
> >
> > On Thu, Oct 08, 2020 at 06:12:39PM -0700, Alexei Starovoitov wrote:
> >
> > SNIP
> >
> > > +
> > > +#ifdef UNROLL
> > > +#pragma unroll
> > > +#endif
> > > + for (int i = 0; i < MAX_CGROUPS_PATH_DEPTH; i++) {
> > > + filepart_length =
> > > + bpf_probe_read_str(payload, MAX_PATH, 
> > > BPF_CORE_READ(cgroup_node, name));
> > > + if (!cgroup_node)
> > > + return payload;
> > > + if (cgroup_node == cgroup_root_node)
> > > + *root_pos = payload - payload_start;
> > > + if (filepart_length <= MAX_PATH) {
> > > + barrier_var(filepart_length);
> > > + payload += filepart_length;
> > > + }
> > > + cgroup_node = BPF_CORE_READ(cgroup_node, parent);
> > > + }
> > > + return payload;
> > > +}
> > > +
> > > +static ino_t get_inode_from_kernfs(struct kernfs_node* node)
> > > +{
> > > + struct kernfs_node___52* node52 = (void*)node;
> > > +
> > > + if (bpf_core_field_exists(node52->id.ino)) {
> > > + barrier_var(node52);
> > > + return BPF_CORE_READ(node52, id.ino);
> > > + } else {
> > > + barrier_var(node);
> > > + return (u64)BPF_CORE_READ(node, id);
> > > + }
> > > +}
> > > +
> > > +int pids_cgrp_id = 1;
> >
> >
> > hi,
> > I'm getting compilation failure with this:
> >
> >   CLNG-LLC [test_maps] profiler2.o
> > In file included from progs/profiler2.c:6:
> > progs/profiler.inc.h:246:5: error: redefinition of 'pids_cgrp_id' 
> > as different kind of symbol
> > int pids_cgrp_id = 1;
> > ^
> > 
> > /home/jolsa/linux-qemu/tools/testing/selftests/bpf/tools/include/vmlinux.h:14531:2:
> >  note: previous definition is here
> > pids_cgrp_id = 11,
>
> Interesting.
> You probably have CONFIG_CGROUP_PIDS in your .config?
> I don't and bpf CI doesn't have it either, so this issue wasn't spotted 
> earlier.
>
> I can hard code 11, of course, but
> but maybe Andrii has a cool way to use co-re to deal with this?

Cool or not, but we do have a way to deal with it. :)

> I think
> "extern bool CONFIG_CGROUP_PIDS __kconfig"
> won't work.
> A good opportunity to try to use bpf_core_enum_value_exists() ?

There are fews parts here.

First, we can't rely that vmlinux.h has pids_cgrp_id enum defined, so
we need to define our own. But in such a way that it doesn't collide
with enum cgroup_subsys_id in vmlinux.h. ___suffix for the rescue
here:

enum cgroup_subsys_id___local {
pids_cgrp_id___local = 1, /* anything but zero */
};

If value is zero, built-in will complain. Need to check with Yonghong
on why that is. For all things CO-RE, ___local suffix is going to be
ignored.

Second, detecting if the kernel even has this pids_cgrp_id defined.
That could be done with __kconfig as you mentioned, actually (but need
__weak as well). But one can also use bpf_core_enum_value_exists as
well. So it's either:

extern bool CONFIG_CGROUP_PIDS __kconfig __weak;

...

if (ENABLE_CGROUP_V1_RESOLVER && CONFIG_CGROUP_PIDS) {
   ...
}


Or a bit more verbose way with relos:

if (ENABLE_CGROUP_V1_RESOLVER &&
bpf_core_enum_value_exists(enum cgroup_subsys_id___local,
pids_cgrp_id___local)) {
   ...
}

Third, actually getting the value of enum. This one would be
impossible without CO-RE reloc, but that's exactly what
bpf_core_enum_value() exists for:


int cgrp_id = bpf_core_enum_value(enum cgroup_subsys_id___local,
pids_cgrp_id___local);


I'd go with Kconfig + bpf_core_enum_value(), as it's shorter and
nicer. This compiles and works with my Kconfig, but I haven't checked
with CONFIG_CGROUP_PIDS defined.


diff --git a/tools/testing/selftests/bpf/progs/profiler.inc.h
b/tools/testing/selftests/bpf/progs/profiler.inc.h
index 00578311a423..79b8d2860a5c 100644
--- a/tools/testing/selftests/bpf/progs/profiler.inc.h
+++ b/tools/testing/selftests/bpf/progs/profiler.inc.h
@@ -243,7 +243,11 @@ static ino_t get_inode_from_kernfs(struct
kernfs_node* node)
}
 }

-int pids_cgrp_id = 1;
+extern bool CONFIG_CGROUP_PIDS __kconfig __weak;
+
+enum cgroup_subsys_id___local {
+   pids_cgrp_id___local = 1, /* anything but zero */
+};

 static INLINE void* populate_cgroup_info(struct cgroup_data_t* cgroup_data,
 struct task_struct* task,
@@ -253,7 +257,9 @@ static INLINE void* populate_cgroup_info(struct
cgroup_data_t* cgroup_data,
BPF_CORE_READ(task, nsproxy, cgroup_ns, root_cset,
dfl_cgrp, kn);
struct kernfs_node* proc_kernfs = BPF_CORE_READ(task, cgroups,
dfl_cgrp, kn);

-   if (ENABLE_CGROUP_V1_RESOLVER) {
+   if (ENABLE_CGROUP_V1_RESOLVER && CONFIG_CGROUP_PIDS) {
+   int cgrp_id = bpf_core_enum_value(enum
cgroup_subsys_id___local, pids_cgrp_id__

[PATCH] bpfilter: Fix build error with CONFIG_BPFILTER_UMH

2020-10-14 Thread YueHaibing

IF CONFIG_BPFILTER_UMH is set, building fails:

In file included from /usr/include/sys/socket.h:33:0,
 from net/bpfilter/main.c:6:
/usr/include/bits/socket.h:390:10: fatal error: asm/socket.h: No such file or 
directory
 #include 
  ^~
compilation terminated.
scripts/Makefile.userprogs:43: recipe for target 'net/bpfilter/main.o' failed
make[2]: *** [net/bpfilter/main.o] Error 1

Add missing include path to fix this.

Signed-off-by: YueHaibing 
---
 net/bpfilter/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index cdac82b8c53a..389ea76ccc0b 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -5,7 +5,7 @@
 
 userprogs := bpfilter_umh
 bpfilter_umh-objs := main.o
-userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
+userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi 
-I usr/include/
 
 ifeq ($(CONFIG_BPFILTER_UMH), y)
 # builtin bpfilter_umh should be linked with -static
-- 
2.17.1

Re: [PATCH bpf-next V3 0/6] bpf: New approach for BPF MTU handling

2020-10-14 Thread Alexei Starovoitov

On Tue, Oct 13, 2020 at 04:07:26PM -0700, Jakub Kicinski wrote:
> On Tue, 13 Oct 2020 22:40:09 +0200 Jesper Dangaard Brouer wrote:
> > > FWIW I took a quick swing at testing it with the HW I have and it did
> > > exactly what hardware should do. The TX unit entered an error state 
> > > and then the driver detected that and reset it a few seconds later.  
> > 
> > The drivers (i40e, mlx5, ixgbe) I tested with didn't entered an error
> > state, when getting packets exceeding the MTU.  I didn't go much above
> > 4K, so maybe I didn't trigger those cases.
> 
> You probably need to go above 16k to get out of the acceptable jumbo
> frame size. I tested ixgbe by converting TSO frames to large TCP frames,
> at low probability.

how about we set __bpf_skb_max_len() to jumbo like 8k and be done with it.

I guess some badly written driver/fw may still hang with <= 8k skb
that bpf redirected from one netdev with mtu=jumbo to another
netdev with mtu=1500, but then it's really a job of the driver/fw
to deal with it cleanly.

I think checking skb->tx_dev->mtu for every xmited packet is not great.
For typical load balancer it would be good to have MRU 1500 and MTU 15xx.
Especially if it's internet facing. Just to drop all known big
packets in hw via MRU check.
But the stack doesn't have MRU vs MTU distinction and XDP_TX doesn't
adhere to MTU. xdp_data_hard_end is the limit.
So xdp already allows growing the packet beyond MTU.
I think upgrading artificial limit in __bpf_skb_max_len() to 8k will
keep it safe enough for all practical cases and will avoid unnecessary
checks and complexity in xmit path.

Re: [PATCH] net: sockmap: Don't call bpf_prog_put() on NULL pointer

2020-10-14 Thread Alex Dewar


On 14/10/2020 10:32, Jakub Sitnicki wrote:

On Mon, Oct 12, 2020 at 07:09 PM CEST, Alex Dewar wrote:

If bpf_prog_inc_not_zero() fails for skb_parser, then bpf_prog_put() is
called unconditionally on skb_verdict, even though it may be NULL. Fix
and tidy up error path.

Addresses-Coverity-ID: 1497799: Null pointer dereferences (FORWARD_NULL)
Fixes: 743df8b7749f ("bpf, sockmap: Check skb_verdict and skb_parser programs 
explicitly")
Signed-off-by: Alex Dewar 
---

Note to maintainers: the issue exists only in bpf-next where we have:

   
https://lore.kernel.org/bpf/160239294756.8495.5796595770890272219.stgit@john-Precision-5820-Tower/

The patch also looks like it is supposed to be applied on top of the above.

Yes, the patch is based on linux-next.

Re: [PATCH net-next v6 4/7] net: dsa: hellcreek: Add support for hardware timestamping

2020-10-14 Thread Vladimir Oltean

On Mon, Oct 12, 2020 at 02:42:54PM -0700, Richard Cochran wrote:
> If you want, you can run your PHC using the linuxptp "free_running"
> option.  Then, you can use the TIME_STATUS_NP management request to
> use the remote time signal in your application.

I was expecting some sort of reaction to this from Kamil or Kurt.

I don't think that 'using the remote time signal in an application' is
all that needs to be done with the gPTP time, at least for a switch with
the hardware features that hellcreek has. Ultimately it should be fed
back into the hardware, such that the scheduler based on 802.1Q clause
8.6.8.4 "Enhancements for scheduled traffic" has some time scale based
on which it can run. Running tc-taprio offload on top of an
unsynchronized clock is not something productive.

So the discussion is about how to have the cake and eat it at the same
time. Silicon vendors eager to follow the latest trends in standards are
implementing hybrid PTP clocks, where an unsynchronizable version of the
clock delivers MAC timestamps to the application stack, and a
synchronizable wrapper over that same clock is what gets fed into the
offloading engines, like the ones behind the tc-taprio and tc-gate
offload. Some of these vendors perform cross-timestamping (they deliver
a timestamp from the MAC with 2, or 3, or 4, timestamps, depending on
how many PHCs that MAC has wired to it), some don't, and just deliver a
single timestamp from a configurable source.

The operating system is supposed to ??? in order to synchronize the
synchronizable clock to the virtual time retrieved via TIME_STATUS_NP
that you're talking about. The question is what to replace that ???
with, of course.

> > I'm not an expert in kernel implementation but we have plans to discuss
> > possible approaches in the near future.
>
> I don't see any need for kernel changes in this area.

I'm not an expert in kernel implementation either, but perhaps in the
light of this, you can revisit the idea that kernel changes will not be
needed (or explain more, if you still think they aren't).

Since IEEE 60802 keeps talking about multiple time domains to be used
with 802.1AS-rev (a 'universal clock domain' and a 'working clock
domain'), a decision needs to be taken somewhere about which time base
you're going to use as a source for synchronizing your tc-taprio clock.
That decision should best be taken at the application level, so in my
opinion this is an argument that the application should have explicit
access to the unsynchronizable and to the synchronizable versions of the
PTP clock.

In the Linux kernel API, a network interface can have at most one PHC.

--

DISCLAIMER
Yes, I know full well that everyone can write a standard, but not
everyone can implement one. At the end of the day, I'm not trying to
make an argument whether the end result is worth making all these
changes. I'm only here to learn what other people are doing, how, and
most importantly, why.

[PATCH net-next V3] cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr

2020-10-14 Thread Ayush Sawal

This patch changes the module name to "ch_ipsec" and prepends
"ch_ipsec" string instead of "chcr" in all debug messages and
function names.

V1->V2:
-Removed inline keyword from functions.
-Removed CH_IPSEC prefix from pr_debug.
-Used proper indentation for the continuation line of the function
arguments.

V2->V3:
Fix the checkpatch.pl warnings.

Fixes: 1b77be463929 ("crypto/chcr: Moving chelsio's inline ipsec functionality 
to /drivers/net")
Signed-off-by: Ayush Sawal 
---
 drivers/crypto/chelsio/chcr_core.h|   2 -
 .../inline_crypto/ch_ipsec/chcr_ipsec.c   | 135 +-
 2 files changed, 68 insertions(+), 69 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_core.h 
b/drivers/crypto/chelsio/chcr_core.h
index bb092b6b36b2..b02f981e7c32 100644
--- a/drivers/crypto/chelsio/chcr_core.h
+++ b/drivers/crypto/chelsio/chcr_core.h
@@ -137,6 +137,4 @@ int chcr_uld_rx_handler(void *handle, const __be64 *rsp,
 int chcr_uld_tx_handler(struct sk_buff *skb, struct net_device *dev);
 int chcr_handle_resp(struct crypto_async_request *req, unsigned char *input,
 int err);
-int chcr_ipsec_xmit(struct sk_buff *skb, struct net_device *dev);
-void chcr_add_xfrmops(const struct cxgb4_lld_info *lld);
 #endif /* __CHCR_CORE_H__ */
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c 
b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
index 0e7d25169407..072299b14b8d 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
@@ -35,7 +35,7 @@
  * Atul Gupta (atul.gu...@chelsio.com)
  */
 
-#define pr_fmt(fmt) "chcr:" fmt
+#define pr_fmt(fmt) "ch_ipsec: " fmt
 
 #include 
 #include 
@@ -72,20 +72,21 @@
 static LIST_HEAD(uld_ctx_list);
 static DEFINE_MUTEX(dev_mutex);
 
-static int chcr_xfrm_add_state(struct xfrm_state *x);
-static void chcr_xfrm_del_state(struct xfrm_state *x);
-static void chcr_xfrm_free_state(struct xfrm_state *x);
-static bool chcr_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x);
-static void chcr_advance_esn_state(struct xfrm_state *x);
+static bool ch_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x);
 static int ch_ipsec_uld_state_change(void *handle, enum cxgb4_state new_state);
+static int ch_ipsec_xmit(struct sk_buff *skb, struct net_device *dev);
 static void *ch_ipsec_uld_add(const struct cxgb4_lld_info *infop);
-
-static const struct xfrmdev_ops chcr_xfrmdev_ops = {
-   .xdo_dev_state_add  = chcr_xfrm_add_state,
-   .xdo_dev_state_delete   = chcr_xfrm_del_state,
-   .xdo_dev_state_free = chcr_xfrm_free_state,
-   .xdo_dev_offload_ok = chcr_ipsec_offload_ok,
-   .xdo_dev_state_advance_esn = chcr_advance_esn_state,
+static void ch_ipsec_advance_esn_state(struct xfrm_state *x);
+static void ch_ipsec_xfrm_free_state(struct xfrm_state *x);
+static void ch_ipsec_xfrm_del_state(struct xfrm_state *x);
+static int ch_ipsec_xfrm_add_state(struct xfrm_state *x);
+
+static const struct xfrmdev_ops ch_ipsec_xfrmdev_ops = {
+   .xdo_dev_state_add  = ch_ipsec_xfrm_add_state,
+   .xdo_dev_state_delete   = ch_ipsec_xfrm_del_state,
+   .xdo_dev_state_free = ch_ipsec_xfrm_free_state,
+   .xdo_dev_offload_ok = ch_ipsec_offload_ok,
+   .xdo_dev_state_advance_esn = ch_ipsec_advance_esn_state,
 };
 
 static struct cxgb4_uld_info ch_ipsec_uld_info = {
@@ -95,8 +96,8 @@ static struct cxgb4_uld_info ch_ipsec_uld_info = {
.rxq_size = 1024,
.add = ch_ipsec_uld_add,
.state_change = ch_ipsec_uld_state_change,
-   .tx_handler = chcr_ipsec_xmit,
-   .xfrmdev_ops = &chcr_xfrmdev_ops,
+   .tx_handler = ch_ipsec_xmit,
+   .xfrmdev_ops = &ch_ipsec_xfrmdev_ops,
 };
 
 static void *ch_ipsec_uld_add(const struct cxgb4_lld_info *infop)
@@ -119,7 +120,7 @@ static int ch_ipsec_uld_state_change(void *handle, enum 
cxgb4_state new_state)
 {
struct ipsec_uld_ctx *u_ctx = handle;
 
-   pr_info("new_state %u\n", new_state);
+   pr_debug("new_state %u\n", new_state);
switch (new_state) {
case CXGB4_STATE_UP:
pr_info("%s: Up\n", pci_name(u_ctx->lldi.pdev));
@@ -140,8 +141,8 @@ static int ch_ipsec_uld_state_change(void *handle, enum 
cxgb4_state new_state)
return 0;
 }
 
-static inline int chcr_ipsec_setauthsize(struct xfrm_state *x,
-struct ipsec_sa_entry *sa_entry)
+static int ch_ipsec_setauthsize(struct xfrm_state *x,
+   struct ipsec_sa_entry *sa_entry)
 {
int hmac_ctrl;
int authsize = x->aead->alg_icv_len / 8;
@@ -164,8 +165,8 @@ static inline int chcr_ipsec_setauthsize(struct xfrm_state 
*x,
return hmac_ctrl;
 }
 
-static inline int chcr_ipsec_setkey(struct xfrm_state *x,
-   struct ipsec_sa_entry *sa_entry)
+static int ch_ipsec_setkey(struct xfrm_state *x,
+

Re: [PATCH 14/17] can: flexcan: remove ack_grp and ack_bit handling from driver

2020-10-14 Thread Marc Kleine-Budde

On 10/14/20 10:53 AM, Joakim Zhang wrote:
>> Since commit:
>>
>> 048e3a34a2e7 can: flexcan: poll MCR_LPM_ACK instead of GPR ACK for
>> stop mode acknowledgment
>>
>> the driver polls the IP core's internal bit MCR[LPM_ACK] as stop mode
>> acknowledge and not the acknowledgment on chip level.
>>
>> This means the 4th and 5th value of the property "fsl,stop-mode" isn't used
>> anymore. This patch removes the used "ack_gpr" and "ack_bit" from the
>> driver.
>>
>> Link:
>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flore.kern
>> el.org%2Fr%2F20201006203748.1750156-15-mkl%40pengutronix.de&dat
>> a=02%7C01%7Cqiangqing.zhang%40nxp.com%7C1540ad5bf7bd4a1e10a508d8
>> 6b087a67%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637377031
>> 436785787&sdata=ierIIVdSqZFLklIvgMokHX6LU77cEWQgUGzUi6CHdDI%
>> 3D&reserved=0
>> Fixes: 048e3a34a2e7 ("can: flexcan: poll MCR_LPM_ACK instead of GPR ACK
>> for stop mode acknowledgment")
>> Cc: Joakim Zhang 
>> Signed-off-by: Marc Kleine-Budde 
> 
> [...]
>>  /* stop mode property format is:
>> - * <&gpr req_gpr req_bit ack_gpr ack_bit>.
>> + * <&gpr req_gpr>.
> 
> Hi Marc,
> 
> Sorry for response delay, stop mode property format should be "<&gpr req_gpr
> req_bit>", I saw this code change has went into linux-next, so I will correct
> it by the way next time when I upsteam wakeup function for i.MX8.

Doh! I wrongly deleted "req_bit" in the comment, but the code should be all
right. I'll add that back.

> Need I update stop mode property in dts file? Although this function won't be
> broken without dts update.

Yes, you can send patches to update the dts (after net-next was merged to 
linus).

Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |



signature.asc
Description: OpenPGP digital signature

Re: WARNING: at net/netfilter/nf_tables_api.c:622 lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables]

2020-10-14 Thread Naresh Kamboju

On Wed, 14 Oct 2020 at 12:20, Naresh Kamboju  wrote:
>
> While running kselftest netfilter on arm64 hikey device on Linux next
> 20201013 the following
> kernel warning noticed.

Same issue noticed on i386.

# selftests: netfilter: nft_trans_stress.sh
[ 1092.615814] [ cut here ]
[ 1092.620454] WARNING: CPU: 0 PID: 4504 at
/usr/src/kernel/net/netfilter/nf_tables_api.c:622
lockdep_nfnl_nft_mutex_not_held+0x20/0x30 [nf_tables]
[ 1092.633405] Modules linked in: nf_tables act_mirred cls_u32
mpls_iptunnel mpls_router sch_etf xt_conntrack nf_conntrack
nf_defrag_ipv4 libcrc32c ip6_tables nf_defrag_ipv6 ip_tables netdevsim
vrf 8021q bridge stp llc sch_fq veth algif_hash x86_pkg_temp_thermal
fuse [last unloaded: test_blackhole_dev]
[ 1092.659896] CPU: 0 PID: 4504 Comm: nft Tainted: GW
5.9.0-next-20201013 #1
[ 1092.668078] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 1092.675558] EIP: lockdep_nfnl_nft_mutex_not_held+0x20/0x30 [nf_tables]
[ 1092.682091] Code: 26 00 31 c0 5d c3 8d 74 26 00 3e 8d 74 26 00 55
b8 0a 00 00 00 89 e5 e8 3e 1a 90 e2 84 c0 75 0a 5d c3 90 8d b4 26 00
00 00 00 <0f> 0b 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 3e 8d 74 26
00 55
[ 1092.700837] EAX: 0001 EBX: c3d76300 ECX: 0001 EDX: 
[ 1092.707105] ESI: e4ec7a7c EDI: e4ec7c84 EBP: e4ec7a00 ESP: e4ec7a00
[ 1092.713377] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010202
[ 1092.720173] CR0: 80050033 CR2: b7b85002 CR3: 03616000 CR4: 003506d0
[ 1092.726441] Call Trace:
[ 1092.728901]  nft_chain_parse_hook+0x3f/0x2b0 [nf_tables]
[ 1092.734221]  ? prep_new_page+0x12a/0x130
[ 1092.738146]  ? get_page_from_freelist+0xdc5/0xf50
[ 1092.742850]  ? lock_acquire+0x191/0x330
[ 1092.746692]  nf_tables_addchain.constprop.68+0xb3/0x630 [nf_tables]
[ 1092.752957]  ? nft_chain_lookup.part.38+0x19d/0x350 [nf_tables]
[ 1092.758883]  nf_tables_newchain+0x408/0x660 [nf_tables]
[ 1092.764122]  ? nf_tables_addchain.constprop.68+0x630/0x630 [nf_tables]
[ 1092.770654]  nfnetlink_rcv_batch+0x4fc/0x740
[ 1092.774930]  ? security_capable+0x33/0x50
[ 1092.778950]  ? __nla_parse+0x1e/0x30
[ 1092.782536]  nfnetlink_rcv+0x10d/0x130
[ 1092.786288]  netlink_unicast+0x195/0x250
[ 1092.790215]  netlink_sendmsg+0x27d/0x430
[ 1092.794141]  ? netlink_unicast+0x250/0x250
[ 1092.798238]  sock_sendmsg+0x5c/0x60
[ 1092.801733]  sys_sendmsg+0x199/0x1e0
[ 1092.805659]  ? __vma_adjust+0x28e/0x8e0
[ 1092.809497]  ___sys_sendmsg+0x5e/0xa0
[ 1092.813164]  ? lock_acquire+0x191/0x330
[ 1092.817004]  ? __local_bh_enable_ip+0x78/0xd0
[ 1092.821370]  ? __local_bh_enable_ip+0x78/0xd0
[ 1092.825730]  ? _raw_spin_unlock_bh+0x2a/0x30
[ 1092.830002]  ? trace_hardirqs_on+0x48/0xd0
[ 1092.834102]  ? __local_bh_enable_ip+0x78/0xd0
[ 1092.838458]  ? release_sock+0x71/0xa0
[ 1092.842126]  ? _raw_spin_unlock_bh+0x2a/0x30
[ 1092.846400]  ? release_sock+0x71/0xa0
[ 1092.850071]  ? lock_acquire+0x191/0x330
[ 1092.853914]  ? sock_setsockopt+0x54f/0xf80
[ 1092.858013]  ? ktime_get_coarse_real_ts64+0xde/0xf0
[ 1092.862889]  ? ktime_get_coarse_real_ts64+0xde/0xf0
[ 1092.867769]  __sys_sendmsg+0x3e/0x80
[ 1092.871352]  __ia32_sys_socketcall+0x20a/0x340
[ 1092.875806]  __do_fast_syscall_32+0x54/0x90
[ 1092.88]  do_fast_syscall_32+0x29/0x60
[ 1092.884012]  do_SYSENTER_32+0x15/0x20
[ 1092.887677]  entry_SYSENTER_32+0x9f/0xf2
[ 1092.891603] EIP: 0xb7f3b549
[ 1092.894401] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08
03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f
34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90
8d 76
[ 1092.913147] EAX: ffda EBX: 0010 ECX: bff259a4 EDX: 
[ 1092.919412] ESI:  EDI: 0006 EBP: bff26ad8 ESP: bff25990
[ 1092.925677] DS: 007b ES: 007b FS:  GS: 0033 SS: 007b EFLAGS: 0282
[ 1092.932466] CPU: 0 PID: 4504 Comm: nft Tainted: GW
5.9.0-next-20201013 #1
[ 1092.940643] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 1092.948112] Call Trace:
[ 1092.950558]  dump_stack+0x6d/0x8b
[ 1092.953877]  ? lockdep_nfnl_nft_mutex_not_held+0x20/0x30 [nf_tables]
[ 1092.960228]  __warn+0x7a/0xe0
[ 1092.963195]  ? lockdep_nfnl_nft_mutex_not_held+0x20/0x30 [nf_tables]
[ 1092.969546]  report_bug+0xa9/0x150
[ 1092.972953]  ? exc_overflow+0x40/0x40
[ 1092.976617]  handle_bug+0x2d/0x60
[ 1092.979927]  exc_invalid_op+0x1b/0x70
[ 1092.983584]  handle_exception+0x140/0x140
[ 1092.987589] EIP: lockdep_nfnl_nft_mutex_not_held+0x20/0x30 [nf_tables]
[ 1092.994106] Code: 26 00 31 c0 5d c3 8d 74 26 00 3e 8d 74 26 00 55
b8 0a 00 00 00 89 e5 e8 3e 1a 90 e2 84 c0 75 0a 5d c3 90 8d b4 26 00
00 00 00 <0f> 0b 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 3e 8d 74 26
00 55
[ 1093.012850] EAX: 0001 EBX: c3d76300 ECX: 0001 EDX: 
[ 1093.019108] ESI: e4ec7a7c EDI: e4ec7c84 EBP: e4ec7a00 ESP: e4ec7a00
[ 1093.025364] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010202
[ 1093.032144]  ? exc_overflow+0x40/0x40
[ 1093.035811

[PATCH v2 3/7] staging: qlge: coredump via devlink health reporter

2020-10-14 Thread Coiby Xu

$ devlink health dump show DEVICE reporter coredump -p -j
{
"Core Registers": {
"segment": 1,
"values": [ 
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 ]
},
"Test Logic Regs": {
"segment": 2,
"values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
},
"RMII Registers": {
"segment": 3,
"values": [ 
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 ]
},
...
"Sem Registers": {
"segment": 50,
"values": [ 0,0,0,0 ]
}
}

Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/qlge_devlink.c | 130 ++--
 1 file changed, 124 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/qlge/qlge_devlink.c 
b/drivers/staging/qlge/qlge_devlink.c
index d9c71f45211f..b75ec5bff26a 100644
--- a/drivers/staging/qlge/qlge_devlink.c
+++ b/drivers/staging/qlge/qlge_devlink.c
@@ -2,16 +2,134 @@
 #include "qlge.h"
 #include "qlge_devlink.h"
 
-static int
-qlge_reporter_coredump(struct devlink_health_reporter *reporter,
-  struct devlink_fmsg *fmsg, void *priv_ctx,
-  struct netlink_ext_ack *extack)
+static int qlge_fill_seg_(struct devlink_fmsg *fmsg,
+ struct mpi_coredump_segment_header *seg_header,
+ u32 *reg_data)
 {
-   return 0;
+   int regs_num = (seg_header->seg_size
+   - sizeof(struct mpi_coredump_segment_header)) / 
sizeof(u32);
+   int err;
+   int i;
+
+   err = devlink_fmsg_pair_nest_start(fmsg, seg_header->description);
+   if (err)
+   return err;
+   err = devlink_fmsg_obj_nest_start(fmsg);
+   if (err)
+   return err;
+   err = devlink_fmsg_u32_pair_put(fmsg, "segment", seg_header->seg_num);
+   if (err)
+   return err;
+   err = devlink_fmsg_arr_pair_nest_start(fmsg, "values");
+   if (err)
+   return err;
+   for (i = 0; i < regs_num; i++) {
+   err = devlink_fmsg_u32_put(fmsg, *reg_data);
+   if (err)
+   return err;
+   reg_data++;
+   }
+   err = devlink_fmsg_obj_nest_end(fmsg);
+   if (err)
+   return err;
+   err = devlink_fmsg_arr_pair_nest_end(fmsg);
+   if (err)
+   return err;
+   err = devlink_fmsg_pair_nest_end(fmsg);
+   return err;
+}
+
+#define FILL_SEG(seg_hdr, seg_regs)\
+   do {\
+   err = qlge_fill_seg_(fmsg, &dump->seg_hdr, dump->seg_regs); \
+   if (err) {  \
+   kvfree(dump);   \
+   return err; \
+   }   \
+   } while (0)
+
+static int qlge_reporter_coredump(struct devlink_health_reporter *reporter,
+ struct devlink_fmsg *fmsg, void *priv_ctx,
+ struct netlink_ext_ack *extack)
+{
+   int err = 0;
+
+   struct qlge_adapter *qdev = devlink_health_reporter_priv(reporter);
+   struct qlge_mpi_coredump *dump;
+
+   if (!netif_running(qdev->ndev))
+   return 0;
+
+   dump = kvmalloc(sizeof(*dump), GFP_KERNEL);
+   if (!dump)
+   return -ENOMEM;
+
+   err = qlge_core_dump(qdev, dump);
+   if (err) {
+   kvfree(dump);
+   return err;
+   }
+
+   FILL_SEG(core_regs_seg_hdr, mpi_core_regs);
+   FILL_SEG(test_logic_regs_seg_hdr, test_logic_regs);
+   FILL_SEG(rmii_regs_seg_hdr, rmii_regs);
+   FILL_SEG(fcmac1_regs_seg_hdr, fcmac1_regs);
+   FILL_SEG(fcmac2_regs_seg_hdr, fcmac2_regs);
+   FILL_SEG(fc1_mbx_regs_seg_hdr, fc1_mbx_regs);
+   FILL_SEG(ide_regs_seg_hdr, ide_regs);
+   FILL_SEG(nic1_mbx_regs_seg_hdr, nic1_mbx_regs);
+   FILL_SEG(smbus_regs_seg_hdr, smbus_regs);
+   FILL_SEG(fc2_mbx_regs_seg_hdr, fc2_mbx_regs);
+   FILL_SEG(nic2_mbx_regs_seg_hdr, nic2_mbx_regs);
+   FILL_SEG(i2c_regs_seg_hdr, i2c_regs);
+   FILL_SEG(memc_regs_seg_hdr, memc_regs);
+   FILL_SEG(pbus_regs_seg_hdr, pbus_regs);
+   FILL_SEG(mde_regs_seg_hdr, mde_regs);
+   FILL_SEG(nic_regs_seg_hdr, nic_regs);
+   FILL_SEG(nic2_regs_seg_hdr, nic2_regs);
+   FILL_SEG(xgmac1_seg_hdr, xgmac1);
+   FILL_SEG(xgmac2_seg_hdr, xgmac2);
+

[PATCH v2 4/7] staging: qlge: support force_coredump option for devlink health dump

2020-10-14 Thread Coiby Xu

With force_coredump module parameter set, devlink health dump will
reset the MPI RISC first which takes 5 secs to be finished.

Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/qlge_devlink.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/staging/qlge/qlge_devlink.c 
b/drivers/staging/qlge/qlge_devlink.c
index b75ec5bff26a..92db531ad5e0 100644
--- a/drivers/staging/qlge/qlge_devlink.c
+++ b/drivers/staging/qlge/qlge_devlink.c
@@ -56,10 +56,17 @@ static int qlge_reporter_coredump(struct 
devlink_health_reporter *reporter,
 
struct qlge_adapter *qdev = devlink_health_reporter_priv(reporter);
struct qlge_mpi_coredump *dump;
+   wait_queue_head_t wait;
 
if (!netif_running(qdev->ndev))
return 0;
 
+   if (test_bit(QL_FRC_COREDUMP, &qdev->flags)) {
+   qlge_queue_fw_error(qdev);
+   init_waitqueue_head(&wait);
+   wait_event_timeout(wait, 0, 5 * HZ);
+   }
+
dump = kvmalloc(sizeof(*dump), GFP_KERNEL);
if (!dump)
return -ENOMEM;
-- 
2.28.0

[PATCH v2 7/7] staging: qlge: add documentation for debugging qlge

2020-10-14 Thread Coiby Xu

Instructions and examples on kernel data structures dumping and
coredump.

Signed-off-by: Coiby Xu 
---
 .../networking/device_drivers/index.rst   |   1 +
 .../device_drivers/qlogic/index.rst   |  18 +++
 .../networking/device_drivers/qlogic/qlge.rst | 118 ++
 MAINTAINERS   |   6 +
 4 files changed, 143 insertions(+)
 create mode 100644 Documentation/networking/device_drivers/qlogic/index.rst
 create mode 100644 Documentation/networking/device_drivers/qlogic/qlge.rst

diff --git a/Documentation/networking/device_drivers/index.rst 
b/Documentation/networking/device_drivers/index.rst
index a3113ffd7a16..d8279de7bf25 100644
--- a/Documentation/networking/device_drivers/index.rst
+++ b/Documentation/networking/device_drivers/index.rst
@@ -15,6 +15,7 @@ Contents:
ethernet/index
fddi/index
hamradio/index
+   qlogic/index
wan/index
wifi/index
 
diff --git a/Documentation/networking/device_drivers/qlogic/index.rst 
b/Documentation/networking/device_drivers/qlogic/index.rst
new file mode 100644
index ..ad05b04286e4
--- /dev/null
+++ b/Documentation/networking/device_drivers/qlogic/index.rst
@@ -0,0 +1,18 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+QLogic QLGE Device Drivers
+===
+
+Contents:
+
+.. toctree::
+   :maxdepth: 2
+
+   qlge
+
+.. only::  subproject and html
+
+   Indices
+   ===
+
+   * :ref:`genindex`
diff --git a/Documentation/networking/device_drivers/qlogic/qlge.rst 
b/Documentation/networking/device_drivers/qlogic/qlge.rst
new file mode 100644
index ..0b888253d152
--- /dev/null
+++ b/Documentation/networking/device_drivers/qlogic/qlge.rst
@@ -0,0 +1,118 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+QLogic QLGE 10Gb Ethernet device driver
+===
+
+This driver use drgn and devlink for debugging.
+
+Dump kernel data structures in drgn
+---
+
+To dump kernel data structures, the following Python script can be used
+in drgn:
+
+.. code-block:: python
+
+   def align(x, a):
+   """the alignment a should be a power of 2
+   """
+   mask = a - 1
+   return (x+ mask) & ~mask
+
+   def struct_size(struct_type):
+   struct_str = "struct {}".format(struct_type)
+   return sizeof(Object(prog, struct_str, address=0x0))
+
+   def netdev_priv(netdevice):
+   NETDEV_ALIGN = 32
+   return netdevice.value_() + align(struct_size("net_device"), 
NETDEV_ALIGN)
+
+   name = 'xxx'
+   qlge_device = None
+   netdevices = prog['init_net'].dev_base_head.address_of_()
+   for netdevice in list_for_each_entry("struct net_device", netdevices, 
"dev_list"):
+   if netdevice.name.string_().decode('ascii') == name:
+   print(netdevice.name)
+
+   ql_adapter = Object(prog, "struct ql_adapter", 
address=netdev_priv(qlge_device))
+
+The struct ql_adapter will be printed in drgn as follows,
+
+>>> ql_adapter
+(struct ql_adapter){
+.ricb = (struct ricb){
+.base_cq = (u8)0,
+.flags = (u8)120,
+.mask = (__le16)26637,
+.hash_cq_id = (u8 [1024]){ 172, 142, 255, 255 },
+.ipv6_hash_key = (__le32 [10]){},
+.ipv4_hash_key = (__le32 [4]){},
+},
+.flags = (unsigned long)0,
+.wol = (u32)0,
+.nic_stats = (struct nic_stats){
+.tx_pkts = (u64)0,
+.tx_bytes = (u64)0,
+.tx_mcast_pkts = (u64)0,
+.tx_bcast_pkts = (u64)0,
+.tx_ucast_pkts = (u64)0,
+.tx_ctl_pkts = (u64)0,
+.tx_pause_pkts = (u64)0,
+...
+},
+.active_vlans = (unsigned long [64]){
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 52780853100545, 18446744073709551615,
+18446619461681283072, 0, 42949673024, 2147483647,
+},
+.rx_ring = (struct rx_ring [17]){
+{
+.cqicb = (struct cqicb){
+.msix_vect = (u8)0,
+.reserved1 = (u8)0,
+.reserved2 = (u8)0,
+.flags = (u8)0,
+.len = (__le16)0,
+.rid = (__le16)0,
+...
+},
+.cq_base = (void *)0x0,
+.cq_base_dma = (dma_addr_t)0,
+}
+

[PATCH v2 6/7] staging: qlge: clean up debugging code in the QL_ALL_DUMP ifdef land

2020-10-14 Thread Coiby Xu

The debugging code in the following ifdef land
 - QL_ALL_DUMP
 - QL_REG_DUMP
 - QL_DEV_DUMP
 - QL_CB_DUMP
 - QL_IB_DUMP
 - QL_OB_DUMP

becomes unnecessary because,
 - Device status and general registers can be obtained by ethtool.
 - Coredump can be done via devlink health reporter.
 - Structure related to the hardware (struct ql_adapter) can be obtained
   by crash or drgn.

Suggested-by: Benjamin Poirier 
Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/TODO   |   4 -
 drivers/staging/qlge/qlge.h |  82 
 drivers/staging/qlge/qlge_dbg.c | 688 
 drivers/staging/qlge/qlge_ethtool.c |   2 -
 drivers/staging/qlge/qlge_main.c|   7 +-
 5 files changed, 1 insertion(+), 782 deletions(-)

diff --git a/drivers/staging/qlge/TODO b/drivers/staging/qlge/TODO
index e68c95f47754..c76394b9451b 100644
--- a/drivers/staging/qlge/TODO
+++ b/drivers/staging/qlge/TODO
@@ -14,10 +14,6 @@
   queues" is confusing.
 * struct rx_ring is used for rx and tx completions, with some members relevant
   to one case only
-* there is an inordinate amount of disparate debugging code, most of which is
-  of questionable value. In particular, qlge_dbg.c has hundreds of lines of
-  code bitrotting away in ifdef land (doesn't compile since commit
-  18c49b91777c ("qlge: do vlan cleanup", v3.1-rc1), 8 years ago).
 * the flow control implementation in firmware is buggy (sends a flood of pause
   frames, resets the link, device and driver buffer queues become
   desynchronized), disable it by default
diff --git a/drivers/staging/qlge/qlge.h b/drivers/staging/qlge/qlge.h
index 5eb5c9a6fb84..1a26cbc67b79 100644
--- a/drivers/staging/qlge/qlge.h
+++ b/drivers/staging/qlge/qlge.h
@@ -2281,86 +2281,4 @@ void qlge_check_lb_frame(struct qlge_adapter *qdev, 
struct sk_buff *skb);
 int qlge_own_firmware(struct qlge_adapter *qdev);
 int qlge_clean_lb_rx_ring(struct rx_ring *rx_ring, int budget);
 
-/* #define QL_ALL_DUMP */
-/* #define QL_REG_DUMP */
-/* #define QL_DEV_DUMP */
-/* #define QL_CB_DUMP */
-/* #define QL_IB_DUMP */
-/* #define QL_OB_DUMP */
-
-#ifdef QL_REG_DUMP
-void qlge_dump_xgmac_control_regs(struct qlge_adapter *qdev);
-void qlge_dump_routing_entries(struct qlge_adapter *qdev);
-void qlge_dump_regs(struct qlge_adapter *qdev);
-#define QL_DUMP_REGS(qdev) qlge_dump_regs(qdev)
-#define QL_DUMP_ROUTE(qdev) qlge_dump_routing_entries(qdev)
-#define QL_DUMP_XGMAC_CONTROL_REGS(qdev) qlge_dump_xgmac_control_regs(qdev)
-#else
-#define QL_DUMP_REGS(qdev)
-#define QL_DUMP_ROUTE(qdev)
-#define QL_DUMP_XGMAC_CONTROL_REGS(qdev)
-#endif
-
-#ifdef QL_STAT_DUMP
-void qlge_dump_stat(struct qlge_adapter *qdev);
-#define QL_DUMP_STAT(qdev) qlge_dump_stat(qdev)
-#else
-#define QL_DUMP_STAT(qdev)
-#endif
-
-#ifdef QL_DEV_DUMP
-void qlge_dump_qdev(struct qlge_adapter *qdev);
-#define QL_DUMP_QDEV(qdev) qlge_dump_qdev(qdev)
-#else
-#define QL_DUMP_QDEV(qdev)
-#endif
-
-#ifdef QL_CB_DUMP
-void qlge_dump_wqicb(struct wqicb *wqicb);
-void qlge_dump_tx_ring(struct tx_ring *tx_ring);
-void qlge_dump_ricb(struct ricb *ricb);
-void qlge_dump_cqicb(struct cqicb *cqicb);
-void qlge_dump_rx_ring(struct rx_ring *rx_ring);
-void qlge_dump_hw_cb(struct qlge_adapter *qdev, int size, u32 bit, u16 q_id);
-#define QL_DUMP_RICB(ricb) qlge_dump_ricb(ricb)
-#define QL_DUMP_WQICB(wqicb) qlge_dump_wqicb(wqicb)
-#define QL_DUMP_TX_RING(tx_ring) qlge_dump_tx_ring(tx_ring)
-#define QL_DUMP_CQICB(cqicb) qlge_dump_cqicb(cqicb)
-#define QL_DUMP_RX_RING(rx_ring) qlge_dump_rx_ring(rx_ring)
-#define QL_DUMP_HW_CB(qdev, size, bit, q_id) \
-   qlge_dump_hw_cb(qdev, size, bit, q_id)
-#else
-#define QL_DUMP_RICB(ricb)
-#define QL_DUMP_WQICB(wqicb)
-#define QL_DUMP_TX_RING(tx_ring)
-#define QL_DUMP_CQICB(cqicb)
-#define QL_DUMP_RX_RING(rx_ring)
-#define QL_DUMP_HW_CB(qdev, size, bit, q_id)
-#endif
-
-#ifdef QL_OB_DUMP
-void qlge_dump_tx_desc(struct qlge_adapter *qdev, struct tx_buf_desc *tbd);
-void qlge_dump_ob_mac_iocb(struct qlge_adapter *qdev, struct ob_mac_iocb_req 
*ob_mac_iocb);
-void qlge_dump_ob_mac_rsp(struct qlge_adapter *qdev, struct ob_mac_iocb_rsp 
*ob_mac_rsp);
-#define QL_DUMP_OB_MAC_IOCB(qdev, ob_mac_iocb) qlge_dump_ob_mac_iocb(qdev, 
ob_mac_iocb)
-#define QL_DUMP_OB_MAC_RSP(qdev, ob_mac_rsp) qlge_dump_ob_mac_rsp(qdev, 
ob_mac_rsp)
-#else
-#define QL_DUMP_OB_MAC_IOCB(qdev, ob_mac_iocb)
-#define QL_DUMP_OB_MAC_RSP(qdev, ob_mac_rsp)
-#endif
-
-#ifdef QL_IB_DUMP
-void qlge_dump_ib_mac_rsp(struct qlge_adapter *qdev, struct ib_mac_iocb_rsp 
*ib_mac_rsp);
-#define QL_DUMP_IB_MAC_RSP(qdev, ib_mac_rsp) qlge_dump_ib_mac_rsp(qdev, 
ib_mac_rsp)
-#else
-#define QL_DUMP_IB_MAC_RSP(qdev, ib_mac_rsp)
-#endif
-
-#ifdef QL_ALL_DUMP
-void qlge_dump_all(struct qlge_adapter *qdev);
-#define QL_DUMP_ALL(qdev) qlge_dump_all(qdev)
-#else
-#define QL_DUMP_ALL(qdev)
-#endif
-
 #endif /* _QLGE_H_ */
diff --git a/drivers/staging/qlge/qlge_dbg.c b/drivers/staging/qlge/qlge_dbg.c
index 43bc9580da9e..37e593f0fd82 100644
--- a/drivers/staging/

[PATCH v2 5/7] staging: qlge: remove mpi_core_to_log which sends coredump to the kernel ring buffer

2020-10-14 Thread Coiby Xu

devlink health could be used to get coredump. No need to send so much
data to the kernel ring buffer.

Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/TODO   |  2 --
 drivers/staging/qlge/qlge.h |  3 ---
 drivers/staging/qlge/qlge_dbg.c | 11 ---
 drivers/staging/qlge/qlge_ethtool.c |  1 -
 drivers/staging/qlge/qlge_main.c|  2 --
 drivers/staging/qlge/qlge_mpi.c |  6 --
 6 files changed, 25 deletions(-)

diff --git a/drivers/staging/qlge/TODO b/drivers/staging/qlge/TODO
index 5ac55664c3e2..e68c95f47754 100644
--- a/drivers/staging/qlge/TODO
+++ b/drivers/staging/qlge/TODO
@@ -18,8 +18,6 @@
   of questionable value. In particular, qlge_dbg.c has hundreds of lines of
   code bitrotting away in ifdef land (doesn't compile since commit
   18c49b91777c ("qlge: do vlan cleanup", v3.1-rc1), 8 years ago).
-* triggering an ethtool regdump will hexdump a 176k struct to dmesg depending
-  on some module parameters.
 * the flow control implementation in firmware is buggy (sends a flood of pause
   frames, resets the link, device and driver buffer queues become
   desynchronized), disable it by default
diff --git a/drivers/staging/qlge/qlge.h b/drivers/staging/qlge/qlge.h
index 4a48bcc88fbd..5eb5c9a6fb84 100644
--- a/drivers/staging/qlge/qlge.h
+++ b/drivers/staging/qlge/qlge.h
@@ -2145,7 +2145,6 @@ struct qlge_adapter {
u32 port_init;
u32 link_status;
struct qlge_mpi_coredump *mpi_coredump;
-   u32 core_is_dumped;
u32 link_config;
u32 led_config;
u32 max_frame_size;
@@ -2158,7 +2157,6 @@ struct qlge_adapter {
struct delayed_work mpi_work;
struct delayed_work mpi_port_cfg_work;
struct delayed_work mpi_idc_work;
-   struct delayed_work mpi_core_to_log;
struct completion ide_completion;
const struct nic_operations *nic_ops;
u16 device_id;
@@ -2249,7 +2247,6 @@ int qlge_write_cfg(struct qlge_adapter *qdev, void *ptr, 
int size, u32 bit,
 void qlge_queue_fw_error(struct qlge_adapter *qdev);
 void qlge_mpi_work(struct work_struct *work);
 void qlge_mpi_reset_work(struct work_struct *work);
-void qlge_mpi_core_to_log(struct work_struct *work);
 int qlge_wait_reg_rdy(struct qlge_adapter *qdev, u32 reg, u32 bit, u32 ebit);
 void qlge_queue_asic_error(struct qlge_adapter *qdev);
 void qlge_set_ethtool_ops(struct net_device *ndev);
diff --git a/drivers/staging/qlge/qlge_dbg.c b/drivers/staging/qlge/qlge_dbg.c
index 3d904f15568d..43bc9580da9e 100644
--- a/drivers/staging/qlge/qlge_dbg.c
+++ b/drivers/staging/qlge/qlge_dbg.c
@@ -1313,17 +1313,6 @@ void qlge_get_dump(struct qlge_adapter *qdev, void *buff)
}
 }
 
-/* Coredump to messages log file using separate worker thread */
-void qlge_mpi_core_to_log(struct work_struct *work)
-{
-   struct qlge_adapter *qdev =
-   container_of(work, struct qlge_adapter, mpi_core_to_log.work);
-
-   print_hex_dump(KERN_DEBUG, "Core is dumping to log file!\n",
-  DUMP_PREFIX_OFFSET, 32, 4, qdev->mpi_coredump,
-  sizeof(*qdev->mpi_coredump), false);
-}
-
 #ifdef QL_REG_DUMP
 static void qlge_dump_intr_states(struct qlge_adapter *qdev)
 {
diff --git a/drivers/staging/qlge/qlge_ethtool.c 
b/drivers/staging/qlge/qlge_ethtool.c
index 3e577e1bc27c..c65d58fe159b 100644
--- a/drivers/staging/qlge/qlge_ethtool.c
+++ b/drivers/staging/qlge/qlge_ethtool.c
@@ -617,7 +617,6 @@ static void qlge_get_regs(struct net_device *ndev,
struct qlge_adapter *qdev = netdev_priv(ndev);
 
qlge_get_dump(qdev, p);
-   qdev->core_is_dumped = 0;
if (!test_bit(QL_FRC_COREDUMP, &qdev->flags))
regs->len = sizeof(struct qlge_mpi_coredump);
else
diff --git a/drivers/staging/qlge/qlge_main.c b/drivers/staging/qlge/qlge_main.c
index 7a4bae3c12d0..128dd2fa2d41 100644
--- a/drivers/staging/qlge/qlge_main.c
+++ b/drivers/staging/qlge/qlge_main.c
@@ -3808,7 +3808,6 @@ static void qlge_cancel_all_work_sync(struct qlge_adapter 
*qdev)
cancel_delayed_work_sync(&qdev->mpi_reset_work);
cancel_delayed_work_sync(&qdev->mpi_work);
cancel_delayed_work_sync(&qdev->mpi_idc_work);
-   cancel_delayed_work_sync(&qdev->mpi_core_to_log);
cancel_delayed_work_sync(&qdev->mpi_port_cfg_work);
 }
 
@@ -4503,7 +4502,6 @@ static int qlge_init_device(struct pci_dev *pdev, struct 
qlge_adapter *qdev,
INIT_DELAYED_WORK(&qdev->mpi_work, qlge_mpi_work);
INIT_DELAYED_WORK(&qdev->mpi_port_cfg_work, qlge_mpi_port_cfg_work);
INIT_DELAYED_WORK(&qdev->mpi_idc_work, qlge_mpi_idc_work);
-   INIT_DELAYED_WORK(&qdev->mpi_core_to_log, qlge_mpi_core_to_log);
init_completion(&qdev->ide_completion);
mutex_init(&qdev->mpi_mutex);
 
diff --git a/drivers/staging/qlge/qlge_mpi.c b/drivers/staging/qlge/qlge_mpi.c
index e67d2f8652a3..7dd9e2de30e5 100644
--- a/drivers/staging/qlge/qlge_mpi.c
+++ b/drivers/staging/qlge/qlge_mpi.c
@@ -1269,11 +1269,

[PATCH v2 2/7] staging: qlge: Initialize devlink health dump framework

2020-10-14 Thread Coiby Xu

Initialize devlink health dump framework for the qlge driver so the
coredump could be done via devlink.

struct qlge_adapter is now used as the private data struct of
struct devlink so it could exist independently of struct net_device
and devlink reload could be supported in the future.

Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/Kconfig|  1 +
 drivers/staging/qlge/Makefile   |  2 +-
 drivers/staging/qlge/qlge.h |  5 +++
 drivers/staging/qlge/qlge_devlink.c | 31 +++
 drivers/staging/qlge/qlge_devlink.h |  9 ++
 drivers/staging/qlge/qlge_main.c| 48 +
 6 files changed, 82 insertions(+), 14 deletions(-)
 create mode 100644 drivers/staging/qlge/qlge_devlink.c
 create mode 100644 drivers/staging/qlge/qlge_devlink.h

diff --git a/drivers/staging/qlge/Kconfig b/drivers/staging/qlge/Kconfig
index a3cb25a3ab80..6d831ed67965 100644
--- a/drivers/staging/qlge/Kconfig
+++ b/drivers/staging/qlge/Kconfig
@@ -3,6 +3,7 @@
 config QLGE
tristate "QLogic QLGE 10Gb Ethernet Driver Support"
depends on ETHERNET && PCI
+   select NET_DEVLINK
help
This driver supports QLogic ISP8XXX 10Gb Ethernet cards.
 
diff --git a/drivers/staging/qlge/Makefile b/drivers/staging/qlge/Makefile
index 1dc2568e820c..07c1898a512e 100644
--- a/drivers/staging/qlge/Makefile
+++ b/drivers/staging/qlge/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_QLGE) += qlge.o
 
-qlge-objs := qlge_main.o qlge_dbg.o qlge_mpi.o qlge_ethtool.o
+qlge-objs := qlge_main.o qlge_dbg.o qlge_mpi.o qlge_ethtool.o qlge_devlink.o
diff --git a/drivers/staging/qlge/qlge.h b/drivers/staging/qlge/qlge.h
index 6ee83e7efd7c..4a48bcc88fbd 100644
--- a/drivers/staging/qlge/qlge.h
+++ b/drivers/staging/qlge/qlge.h
@@ -2060,6 +2060,10 @@ struct nic_operations {
int (*port_initialize)(struct qlge_adapter *qdev);
 };
 
+struct qlge_netdev_priv {
+   struct ql_adapter *qdev;
+};
+
 /*
  * The main Adapter structure definition.
  * This structure has all fields relevant to the hardware.
@@ -2077,6 +2081,7 @@ struct qlge_adapter {
struct pci_dev *pdev;
struct net_device *ndev;/* Parent NET device */
 
+   struct devlink_health_reporter *reporter;
/* Hardware information */
u32 chip_rev_id;
u32 fw_rev_id;
diff --git a/drivers/staging/qlge/qlge_devlink.c 
b/drivers/staging/qlge/qlge_devlink.c
new file mode 100644
index ..d9c71f45211f
--- /dev/null
+++ b/drivers/staging/qlge/qlge_devlink.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include "qlge.h"
+#include "qlge_devlink.h"
+
+static int
+qlge_reporter_coredump(struct devlink_health_reporter *reporter,
+  struct devlink_fmsg *fmsg, void *priv_ctx,
+  struct netlink_ext_ack *extack)
+{
+   return 0;
+}
+
+static const struct devlink_health_reporter_ops qlge_reporter_ops = {
+   .name = "dummy",
+   .dump = qlge_reporter_coredump,
+};
+
+void qlge_health_create_reporters(struct qlge_adapter *priv)
+{
+   struct devlink_health_reporter *reporter;
+   struct devlink *devlink;
+
+   devlink = priv_to_devlink(priv);
+   priv->reporter =
+   devlink_health_reporter_create(devlink, &qlge_reporter_ops,
+  0, priv);
+   if (IS_ERR(priv->reporter))
+   netdev_warn(priv->ndev,
+   "Failed to create reporter, err = %ld\n",
+   PTR_ERR(reporter));
+}
diff --git a/drivers/staging/qlge/qlge_devlink.h 
b/drivers/staging/qlge/qlge_devlink.h
new file mode 100644
index ..19078e1ac694
--- /dev/null
+++ b/drivers/staging/qlge/qlge_devlink.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef QLGE_DEVLINK_H
+#define QLGE_DEVLINK_H
+
+#include 
+
+void qlge_health_create_reporters(struct qlge_adapter *priv);
+
+#endif /* QLGE_DEVLINK_H */
diff --git a/drivers/staging/qlge/qlge_main.c b/drivers/staging/qlge/qlge_main.c
index 19e72279b0ce..7a4bae3c12d0 100644
--- a/drivers/staging/qlge/qlge_main.c
+++ b/drivers/staging/qlge/qlge_main.c
@@ -42,6 +42,7 @@
 #include 
 
 #include "qlge.h"
+#include "qlge_devlink.h"
 
 char qlge_driver_name[] = DRV_NAME;
 const char qlge_driver_version[] = DRV_VERSION;
@@ -4382,10 +4383,10 @@ static void qlge_release_all(struct pci_dev *pdev)
pci_release_regions(pdev);
 }
 
-static int qlge_init_device(struct pci_dev *pdev, struct net_device *ndev,
+static int qlge_init_device(struct pci_dev *pdev, struct qlge_adapter *qdev,
int cards_found)
 {
-   struct qlge_adapter *qdev = netdev_priv(ndev);
+   struct net_device *ndev = qdev->ndev;
int err = 0;
 
memset((void *)qdev, 0, sizeof(*qdev));
@@ -4548,27 +4549,34 @@ static void qlge_timer(struct timer_list *t)
mod_timer(&qdev->timer, jiffies + (5 * HZ));
 }
 
+static const struct devlink_ops qlge_devlink_o

Re: [PATCH net-next] net: openvswitch: fix to make sure flow_lookup() is not preempted

2020-10-14 Thread Eelco Chaudron





On 13 Oct 2020, at 14:53, Sebastian Andrzej Siewior wrote:


On 2020-10-13 14:44:19 [+0200], Eelco Chaudron wrote:

The flow_lookup() function uses per CPU variables, which must not be
preempted. However, this is fine in the general napi use case where
the local BH is disabled. But, it's also called in the netlink
context, which is preemptible. The below patch makes sure that even
in the netlink path, preemption is disabled.

Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on 
usage")

Reported-by: Juri Lelli 
Signed-off-by: Eelco Chaudron 
---
 net/openvswitch/flow_table.c |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/flow_table.c 
b/net/openvswitch/flow_table.c

index 87c286ad660e..16289386632b 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -850,9 +850,17 @@ struct sw_flow *ovs_flow_tbl_lookup(struct 
flow_table *tbl,

struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
u32 __always_unused n_mask_hit;
u32 __always_unused n_cache_hit;
+   struct sw_flow *flow;
u32 index = 0;

-	return flow_lookup(tbl, ti, ma, key, &n_mask_hit, &n_cache_hit, 
&index);
+	/* This function gets called trough the netlink interface and 
therefore
+	 * is preemptible. However, flow_lookup() function needs to be 
called

+* with preemption disabled due to CPU specific variables.
+*/


Once again. u64_stats_update_begin(). What protects you against
concurrent access.


Thanks Sebastian for repeating this, as I thought I went over the 
seqcount code and thought it should be fine for my use case. However 
based on this comment I went over it again, and found the logic part I 
was constantly missing :)


My idea is to send a v2 patch and in addition to the preempt_disable() 
also make the seqcount part per CPU. I noticed other parts of the 
networking stack doing it the same way. So the patch would look 
something like:


@@ -731,7 +732,7 @@ static struct sw_flow *flow_lookup(struct flow_table 
*tbl,

   u32 *n_cache_hit,
   u32 *index)
 {
-   u64 *usage_counters = this_cpu_ptr(ma->masks_usage_cntr);
+   struct mask_array_stats *stats = 
this_cpu_ptr(ma->masks_usage_stats);

struct sw_flow *flow;
struct sw_flow_mask *mask;
int i;
@@ -741,9 +742,9 @@ static struct sw_flow *flow_lookup(struct flow_table 
*tbl,

if (mask) {
flow = masked_flow_lookup(ti, key, mask, 
n_mask_hit);

if (flow) {
-   u64_stats_update_begin(&ma->syncp);
-   usage_counters[*index]++;
-   u64_stats_update_end(&ma->syncp);
+   u64_stats_update_begin(&stats->syncp);
+   stats->usage_cntr[*index]++;
+   u64_stats_update_end(&stats->syncp);
(*n_cache_hit)++;
return flow;
}

Let me know your thoughts.


Thanks,

Eelco

Re: [PATCH net-next v5 01/10] net: bridge: extend the process of special frames

2020-10-14 Thread Nikolay Aleksandrov

On Mon, 2020-10-12 at 14:04 +, Henrik Bjoernlund wrote:
> This patch extends the processing of frames in the bridge. Currently MRP
> frames needs special processing and the current implementation doesn't
> allow a nice way to process different frame types. Therefore try to
> improve this by adding a list that contains frame types that need
> special processing. This list is iterated for each input frame and if
> there is a match based on frame type then these functions will be called
> and decide what to do with the frame. It can process the frame then the
> bridge doesn't need to do anything or don't process so then the bridge
> will do normal forwarding.
> 
> Signed-off-by: Henrik Bjoernlund  
> Reviewed-by: Horatiu Vultur  
> ---
>  net/bridge/br_device.c  |  1 +
>  net/bridge/br_input.c   | 33 -
>  net/bridge/br_mrp.c | 19 +++
>  net/bridge/br_private.h | 19 ---
>  4 files changed, 60 insertions(+), 12 deletions(-)
> 

Looks good.
Acked-by: Nikolay Aleksandrov

Re: [PATCH net-next v5 08/10] bridge: cfm: Netlink GET configuration Interface.

2020-10-14 Thread Nikolay Aleksandrov

On Mon, 2020-10-12 at 14:04 +, Henrik Bjoernlund wrote:
> This is the implementation of CFM netlink configuration
> get information interface.
> 
> Add new nested netlink attributes. These attributes are used by the
> user space to get configuration information.
> 
> GETLINK:
> Request filter RTEXT_FILTER_CFM_CONFIG:
> Indicating that CFM configuration information must be delivered.
> 
> IFLA_BRIDGE_CFM:
> Points to the CFM information.
> 
> IFLA_BRIDGE_CFM_MEP_CREATE_INFO:
> This indicate that MEP instance create parameters are following.
> IFLA_BRIDGE_CFM_MEP_CONFIG_INFO:
> This indicate that MEP instance config parameters are following.
> IFLA_BRIDGE_CFM_CC_CONFIG_INFO:
> This indicate that MEP instance CC functionality
> parameters are following.
> IFLA_BRIDGE_CFM_CC_RDI_INFO:
> This indicate that CC transmitted CCM PDU RDI
> parameters are following.
> IFLA_BRIDGE_CFM_CC_CCM_TX_INFO:
> This indicate that CC transmitted CCM PDU parameters are
> following.
> IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO:
> This indicate that the added peer MEP IDs are following.
> 
> CFM nested attribute has the following attributes in next level.
> 
> GETLINK RTEXT_FILTER_CFM_CONFIG:
> IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE:
> The created MEP instance number.
> The type is u32.
> IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN:
> The created MEP domain.
> The type is u32 (br_cfm_domain).
> It must be BR_CFM_PORT.
> This means that CFM frames are transmitted and received
> directly on the port - untagged. Not in a VLAN.
> IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION:
> The created MEP direction.
> The type is u32 (br_cfm_mep_direction).
> It must be BR_CFM_MEP_DIRECTION_DOWN.
> This means that CFM frames are transmitted and received on
> the port. Not in the bridge.
> IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX:
> The created MEP residence port ifindex.
> The type is u32 (ifindex).
> 
> IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE:
> The deleted MEP instance number.
> The type is u32.
> 
> IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE:
> The configured MEP instance number.
> The type is u32.
> IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC:
> The configured MEP unicast MAC address.
> The type is 6*u8 (array).
> This is used as SMAC in all transmitted CFM frames.
> IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL:
> The configured MEP unicast MD level.
> The type is u32.
> It must be in the range 1-7.
> No CFM frames are passing through this MEP on lower levels.
> IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID:
> The configured MEP ID.
> The type is u32.
> It must be in the range 0-0x1FFF.
> This MEP ID is inserted in any transmitted CCM frame.
> 
> IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE:
> The configured MEP instance number.
> The type is u32.
> IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE:
> The Continuity Check (CC) functionality is enabled or disabled.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL:
> The CC expected receive interval of CCM frames.
> The type is u32 (br_cfm_ccm_interval).
> This is also the transmission interval of CCM frames when enabled.
> IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID:
> The CC expected receive MAID in CCM frames.
> The type is CFM_MAID_LENGTH*u8.
> This is MAID is also inserted in transmitted CCM frames.
> 
> IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE:
> The configured MEP instance number.
> The type is u32.
> IFLA_BRIDGE_CFM_CC_PEER_MEPID:
> The CC Peer MEP ID added.
> The type is u32.
> When a Peer MEP ID is added and CC is enabled it is expected to
> receive CCM frames from that Peer MEP.
> 
> IFLA_BRIDGE_CFM_CC_RDI_INSTANCE:
> The configured MEP instance number.
> The type is u32.
> IFLA_BRIDGE_CFM_CC_RDI_RDI:
> The RDI that is inserted in transmitted CCM PDU.
> The type is u32 (bool).
> 
> IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE:
> The configured MEP instance number.
> The type is u32.
> IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC:
> The transmitted CCM frame destination MAC address.
> The type is 6*u8 (array).
> This is used as DMAC in all transmitted CFM frames.
> IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE:
> The transmitted CCM frame update (increment) of sequence
> number is enabled or disabled.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD:
> The period of time where CCM frame are transmitted.
> The type is u32.
> The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX
> must be done before timeout to k

Re: [PATCH net-next v6 4/7] net: dsa: hellcreek: Add support for hardware timestamping

2020-10-14 Thread Richard Cochran

On Wed, Oct 14, 2020 at 12:57:47PM +0300, Vladimir Oltean wrote:
> So the discussion is about how to have the cake and eat it at the same
> time.

And I wish for a pony.  With sparkles.  And a unicorn.  And a rainbow.

> Silicon vendors eager to follow the latest trends in standards are
> implementing hybrid PTP clocks, where an unsynchronizable version of the
> clock delivers MAC timestamps to the application stack, and a
> synchronizable wrapper over that same clock is what gets fed into the
> offloading engines, like the ones behind the tc-taprio and tc-gate
> offload. Some of these vendors perform cross-timestamping (they deliver
> a timestamp from the MAC with 2, or 3, or 4, timestamps, depending on
> how many PHCs that MAC has wired to it), some don't, and just deliver a
> single timestamp from a configurable source.

Sounds like it will be nearly impossible to make a single tc-taprio
framework that fits all the hardware variants.

> The operating system is supposed to ??? in order to synchronize the
> synchronizable clock to the virtual time retrieved via TIME_STATUS_NP
> that you're talking about. The question is what to replace that ???
> with, of course.

You have a choice.  Either you synchronize the local PHC to the global
TAI time base or not.  If you do synchronize the PHC, then everything
(like the globally scheduled time slots) just works.  If you decide to
follow the nonsensical idea (following 802.1-AS) and leave the PHC
free running, then you will have a difficult time scheduling those
time windows.

So it is all up to you.

> I'm not an expert in kernel implementation either, but perhaps in the
> light of this, you can revisit the idea that kernel changes will not be
> needed (or explain more, if you still think they aren't).

I am not opposed to kernel changes, but there must be:

- A clear statement of the background context, and
- an explanation of the issue to solved, and
- a realistic solution that will support the wide variety of HW. 

> DISCLAIMER
> Yes, I know full well that everyone can write a standard, but not
> everyone can implement one. At the end of the day, I'm not trying to
> make an argument whether the end result is worth making all these
> changes.

+1

That is the question.  You can easily solve this issue by simply
synchronizing the PHC to the global time base.

Thanks,
Richard

Re: [PATCH net-next v5 09/10] bridge: cfm: Netlink GET status Interface.

2020-10-14 Thread Nikolay Aleksandrov

On Mon, 2020-10-12 at 14:04 +, Henrik Bjoernlund wrote:
> This is the implementation of CFM netlink status
> get information interface.
> 
> Add new nested netlink attributes. These attributes are used by the
> user space to get status information.
> 
> GETLINK:
> Request filter RTEXT_FILTER_CFM_STATUS:
> Indicating that CFM status information must be delivered.
> 
> IFLA_BRIDGE_CFM:
> Points to the CFM information.
> 
> IFLA_BRIDGE_CFM_MEP_STATUS_INFO:
> This indicate that the MEP instance status are following.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO:
> This indicate that the peer MEP status are following.
> 
> CFM nested attribute has the following attributes in next level.
> 
> GETLINK RTEXT_FILTER_CFM_STATUS:
> IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE:
> The MEP instance number of the delivered status.
> The type is u32.
> IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN:
> The MEP instance received CFM PDU with unexpected Opcode.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN:
> The MEP instance received CFM PDU with unexpected version.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN:
> The MEP instance received CCM PDU with MD level lower than
> configured level. This frame is discarded.
> The type is u32 (bool).
> 
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE:
> The MEP instance number of the delivered status.
> The type is u32.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID:
> The added Peer MEP ID of the delivered status.
> The type is u32.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT:
> The CCM defect status.
> The type is u32 (bool).
> True means no CCM frame is received for 3.25 intervals.
> IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI:
> The last received CCM PDU RDI.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE:
> The last received CCM PDU Port Status TLV value field.
> The type is u8.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE:
> The last received CCM PDU Interface Status TLV value field.
> The type is u8.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN:
> A CCM frame has been received from Peer MEP.
> The type is u32 (bool).
> This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN:
> A CCM frame with TLV has been received from Peer MEP.
> The type is u32 (bool).
> This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN:
> A CCM frame with unexpected sequence number has been received
> from Peer MEP.
> The type is u32 (bool).
> When a sequence number is not one higher than previously received
> then it is unexpected.
> This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
> 
> Signed-off-by: Henrik Bjoernlund  
> Reviewed-by: Horatiu Vultur  
> ---
>  include/uapi/linux/if_bridge.h |  29 +
>  include/uapi/linux/rtnetlink.h |   1 +
>  net/bridge/br_cfm_netlink.c| 105 +
>  net/bridge/br_netlink.c|  16 -
>  net/bridge/br_private.h|   6 ++
>  5 files changed, 154 insertions(+), 3 deletions(-)
> 
> 

Acked-by: Nikolay Aleksandrov

Re: [PATCH net-next v6 4/7] net: dsa: hellcreek: Add support for hardware timestamping

2020-10-14 Thread Kurt Kanzenbach

On Wed Oct 14 2020, Richard Cochran wrote:
> On Wed, Oct 14, 2020 at 12:57:47PM +0300, Vladimir Oltean wrote:
>> So the discussion is about how to have the cake and eat it at the same
>> time.
>
> And I wish for a pony.  With sparkles.  And a unicorn.  And a rainbow.
>
>> Silicon vendors eager to follow the latest trends in standards are
>> implementing hybrid PTP clocks, where an unsynchronizable version of the
>> clock delivers MAC timestamps to the application stack, and a
>> synchronizable wrapper over that same clock is what gets fed into the
>> offloading engines, like the ones behind the tc-taprio and tc-gate
>> offload. Some of these vendors perform cross-timestamping (they deliver
>> a timestamp from the MAC with 2, or 3, or 4, timestamps, depending on
>> how many PHCs that MAC has wired to it), some don't, and just deliver a
>> single timestamp from a configurable source.
>
> Sounds like it will be nearly impossible to make a single tc-taprio
> framework that fits all the hardware variants.

Why? All the gate operations work on the synchronized clock. I assume
all Qbv capable switches have a synchronized clock?

It's just that some switches have multiple PHCs instead of a single
one. It seems to be quite common to have a free-running as well as a
synchronized clock. In order for a better(?) or more accurate(?) ptp
implementation they expose not a single but rather multiple timestamps
from all PHCs (-> cross-timestamping) to user space for the ptp event
messages. That's at least my very limited understanding.

>
>> The operating system is supposed to ??? in order to synchronize the
>> synchronizable clock to the virtual time retrieved via TIME_STATUS_NP
>> that you're talking about. The question is what to replace that ???
>> with, of course.
>
> You have a choice.  Either you synchronize the local PHC to the global
> TAI time base or not.  If you do synchronize the PHC, then everything
> (like the globally scheduled time slots) just works.  If you decide to
> follow the nonsensical idea (following 802.1-AS) and leave the PHC
> free running, then you will have a difficult time scheduling those
> time windows.
>
> So it is all up to you.
>
>> I'm not an expert in kernel implementation either, but perhaps in the
>> light of this, you can revisit the idea that kernel changes will not be
>> needed (or explain more, if you still think they aren't).
>
> I am not opposed to kernel changes, but there must be:
>
> - A clear statement of the background context, and
> - an explanation of the issue to solved, and
> - a realistic solution that will support the wide variety of HW. 

Agreed.

Thanks,
Kurt


signature.asc
Description: PGP signature

Re: [PATCH v3 2/2] vhost-vdpa: fix page pinning leakage in error path

2020-10-14 Thread Jason Wang




On 2020/10/14 下午2:52, Michael S. Tsirkin wrote:

On Tue, Oct 13, 2020 at 04:42:59PM -0700, si-wei liu wrote:

On 10/9/2020 7:27 PM, Jason Wang wrote:

On 2020/10/3 下午1:02, Si-Wei Liu wrote:

Pinned pages are not properly accounted particularly when
mapping error occurs on IOTLB update. Clean up dangling
pinned pages for the error path. As the inflight pinned
pages, specifically for memory region that strides across
multiple chunks, would need more than one free page for
book keeping and accounting. For simplicity, pin pages
for all memory in the IOVA range in one go rather than
have multiple pin_user_pages calls to make up the entire
region. This way it's easier to track and account the
pages already mapped, particularly for clean-up in the
error path.

Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Si-Wei Liu
---
Changes in v3:
- Factor out vhost_vdpa_map() change to a separate patch

Changes in v2:
- Fix incorrect target SHA1 referenced

   drivers/vhost/vdpa.c | 119
++-
   1 file changed, 71 insertions(+), 48 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 0f27919..dad41dae 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -595,21 +595,19 @@ static int
vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
   struct vhost_dev *dev = &v->vdev;
   struct vhost_iotlb *iotlb = dev->iotlb;
   struct page **page_list;
-unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
+struct vm_area_struct **vmas;
   unsigned int gup_flags = FOLL_LONGTERM;
-unsigned long npages, cur_base, map_pfn, last_pfn = 0;
-unsigned long locked, lock_limit, pinned, i;
+unsigned long map_pfn, last_pfn = 0;
+unsigned long npages, lock_limit;
+unsigned long i, nmap = 0;
   u64 iova = msg->iova;
+long pinned;
   int ret = 0;
 if (vhost_iotlb_itree_first(iotlb, msg->iova,
   msg->iova + msg->size - 1))
   return -EEXIST;
   -page_list = (struct page **) __get_free_page(GFP_KERNEL);
-if (!page_list)
-return -ENOMEM;
-
   if (msg->perm & VHOST_ACCESS_WO)
   gup_flags |= FOLL_WRITE;
   @@ -617,61 +615,86 @@ static int
vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
   if (!npages)
   return -EINVAL;
   +page_list = kvmalloc_array(npages, sizeof(struct page *),
GFP_KERNEL);
+vmas = kvmalloc_array(npages, sizeof(struct vm_area_struct *),
+  GFP_KERNEL);

This will result high order memory allocation which was what the code
tried to avoid originally.

Using an unlimited size will cause a lot of side effects consider VM or
userspace may try to pin several TB of memory.

Hmmm, that's a good point. Indeed, if the guest memory demand is huge or the
host system is running short of free pages, kvmalloc will be problematic and
less efficient than the __get_free_page implementation.

OK so ... Jason, what's the plan?

How about you send a patchset with
1. revert this change
2. fix error handling leak



Work for me, but it looks like siwei want to do this.

So it's better for to send the patchset.

Thanks

[PATCH v3] net: Add mhi-net driver

2020-10-14 Thread Loic Poulain

This patch adds a new network driver implementing MHI transport for
network packets. Packets can be in any format, though QMAP (rmnet)
is the usual protocol (flow control + PDN mux).

It support two MHI devices, IP_HW0 which is, the path to the IPA
(IP accelerator) on qcom modem, And IP_SW0 which is the software
driven IP path (to modem CPU).

Signed-off-by: Loic Poulain 
---
  v2: - rebase on net-next
  - remove useless skb_linearize
  - check error type on mhi_queue return
  - rate limited errors
  - Schedule RX refill only on 'low' buf level
  - SET_NETDEV_DEV in probe
  - reorder device remove sequence
  v3: - Stop channels on net_register error
  - Remove useles parentheses
  - Add driver .owner

 drivers/net/Kconfig   |   7 ++
 drivers/net/Makefile  |   1 +
 drivers/net/mhi_net.c | 284 ++
 3 files changed, 292 insertions(+)
 create mode 100644 drivers/net/mhi_net.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 1368d1d..11a6357 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -426,6 +426,13 @@ config VSOCKMON
  mostly intended for developers or support to debug vsock issues. If
  unsure, say N.
 
+config MHI_NET
+   tristate "MHI network driver"
+   depends on MHI_BUS
+   help
+ This is the network driver for MHI.  It can be used with
+ QCOM based WWAN modems (like SDX55).  Say Y or M.
+
 endif # NET_CORE
 
 config SUNGEM_PHY
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 94b6080..8312037 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
 obj-$(CONFIG_VSOCKMON) += vsockmon.o
+obj-$(CONFIG_MHI_NET) += mhi_net.o
 
 #
 # Networking Drivers
diff --git a/drivers/net/mhi_net.c b/drivers/net/mhi_net.c
new file mode 100644
index 000..c184aa9
--- /dev/null
+++ b/drivers/net/mhi_net.c
@@ -0,0 +1,284 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* MHI Network driver - Network over MHI
+ *
+ * Copyright (C) 2020 Linaro Ltd 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MIN_MTUETH_MIN_MTU
+#define MAX_MTU0x
+#define DEFAULT_MTU8192
+
+struct mhi_net_stats {
+   u64 rx_packets;
+   u64 rx_bytes;
+   u64 rx_errors;
+   u64 rx_dropped;
+   u64 tx_packets;
+   u64 tx_bytes;
+   u64 tx_errors;
+   u64 tx_dropped;
+   atomic_t rx_queued;
+};
+
+struct mhi_net_dev {
+   struct mhi_device *mdev;
+   struct net_device *ndev;
+   struct delayed_work rx_refill;
+   struct mhi_net_stats stats;
+   u32 rx_queue_sz;
+};
+
+static int mhi_ndo_open(struct net_device *ndev)
+{
+   struct mhi_net_dev *mhi_netdev = netdev_priv(ndev);
+
+   /* Feed the rx buffer pool */
+   schedule_delayed_work(&mhi_netdev->rx_refill, 0);
+
+   /* Carrier is established via out-of-band channel (e.g. qmi) */
+   netif_carrier_on(ndev);
+
+   netif_start_queue(ndev);
+
+   return 0;
+}
+
+static int mhi_ndo_stop(struct net_device *ndev)
+{
+   struct mhi_net_dev *mhi_netdev = netdev_priv(ndev);
+
+   netif_stop_queue(ndev);
+   netif_carrier_off(ndev);
+   cancel_delayed_work_sync(&mhi_netdev->rx_refill);
+
+   return 0;
+}
+
+static int mhi_ndo_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+   struct mhi_net_dev *mhi_netdev = netdev_priv(ndev);
+   struct mhi_device *mdev = mhi_netdev->mdev;
+   int err;
+
+   skb_tx_timestamp(skb);
+
+   /* mhi_queue_skb is not thread-safe, but xmit is serialized by the
+* network core. Once MHI core will be thread save, migrate to
+* NETIF_F_LLTX support.
+*/
+   err = mhi_queue_skb(mdev, DMA_TO_DEVICE, skb, skb->len, MHI_EOT);
+   if (err == -ENOMEM) {
+   netif_stop_queue(ndev);
+   return NETDEV_TX_BUSY;
+   } else if (unlikely(err)) {
+   net_err_ratelimited("%s: Failed to queue TX buf (%d)\n",
+   ndev->name, err);
+   mhi_netdev->stats.tx_dropped++;
+   kfree_skb(skb);
+   }
+
+   return NETDEV_TX_OK;
+}
+
+static void mhi_ndo_get_stats64(struct net_device *ndev,
+   struct rtnl_link_stats64 *stats)
+{
+   struct mhi_net_dev *mhi_netdev = netdev_priv(ndev);
+
+   stats->rx_packets = mhi_netdev->stats.rx_packets;
+   stats->rx_bytes = mhi_netdev->stats.rx_bytes;
+   stats->rx_errors = mhi_netdev->stats.rx_errors;
+   stats->rx_dropped = mhi_netdev->stats.rx_dropped;
+   stats->tx_packets = mhi_netdev->stats.tx_packets;
+   stats->tx_bytes = mhi_netdev->stats.tx_bytes;
+   stats->tx_errors = mhi_netdev->stats.tx_errors;
+   stats->tx_dropped = mhi_netdev->stats.tx_dropped;
+}
+
+static const struct net_device_ops mhi

selftests: netfilter: nft_nat.sh: /dev/stdin:2:9-15: Error: syntax error, unexpected counter

2020-10-14 Thread Naresh Kamboju

While running kselftest netfilter test on x86_64 devices linux next
tag 20201013 kernel
these errors are noticed. This not specific to kernel version we have
noticed these errors
earlier also.

Am I missing configs ?
Please refer to the config file we are using.
We are using the minimal busybox shell.
BusyBox v1.27.2 (2020-07-17 18:42:50 UTC) multi-call binary.

metadata:
  git branch: master
  git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  git commit: f2fb1afc57304f9dd68c20a08270e287470af2eb
  git describe: next-20201013
  make_kernelversion: 5.9.0
  kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-corei7-64/lkft/linux-next/879/config

Test output log:

selftests: netfilter: nft_nat.sh
[ 1207.251385] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 1207.342479] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
# /dev/stdin:2:9-15: Error: syntax error, unexpected counter
# counter ns0in {}
# ^^^
# /dev/stdin:3:9-15: Error: syntax error, unexpected counter
# counter ns1in {}
# ^^^
# /dev/stdin:4:9-15: Error: syntax error, unexpected counter
# counter ns2in {}
# ^^^
# /dev/stdin:6:9-15: Error: syntax error, unexpected counter
# counter ns0out {}
# ^^^



# /dev/stdin:12:9-15: Error: syntax error, unexpected counter
# counter ns2in6 {}
# ^^^
# /dev/stdin:14:9-15: Error: syntax error, unexpected counter
# counter ns0out6 {}
# ^^^
[ 1208.229989] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
# :1:6-12: Error: syntax error, unexpected counter
# list counter inet filter ns0in
#  ^^^
# ERROR: ns0in counter in ns1-loU9Vlmj has unexpected value (expected
packets 1 bytes 84) at check_counters 1
# :1:6-12: Error: syntax error, unexpected counter
# list counter inet filter ns0in
#  ^^^
# :1:6-12: Error: syntax error, unexpected counter
# list counter inet filter ns0out
#  ^^^
# ERROR: ns0out counter in ns1-loU9Vlmj has unexpected value (expected
packets 1 bytes 84) at check_counters 2
# :1:6-12: Error: syntax error, unexpected counter
# list counter inet filter ns0out
#  ^^^
# :1:6-12: Error: syntax error, unexpected counter
# list counter inet filter ns0in6

# ERROR: ns1 counter in ns0-loU9Vlmj has unexpected value (expected
packets 1 bytes 104) at check_ns0_counters 5
# :1:6-12: Error: syntax error, unexpected counter
# list counter inet filter ns1
#  ^^^



# :1:16-19: Error: syntax error, unexpected inet
# reset counters inet
#
# :1:16-19: Error: syntax error, unexpected inet
# reset counters inet
#
# FAIL: nftables v0.7 (Scrooge McDuck)
not ok 2 selftests: netfilter: nft_nat.sh # exit=1
# selftests: netfilter: bridge_brouter.sh
# SKIP: Could not run test without ebtables
ok 3 selftests: netfilter: bridge_brouter.sh # SKIP
# selftests: netfilter: conntrack_icmp_related.sh
[ 1215.679815] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 1215.698932] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 1215.711612] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 1216.678043] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
# internal:0:0-0: Error: Could not open file \"-\": No such file or directory
#
#
# internal:0:0-0: Error: Could not open file \"-\": No such file or directory
#
#
# internal:0:0-0: Error: Could not open file \"-\": No such file or directory
#
#
# internal:0:0-0: Error: Could not open file \"-\": No such file or directory
#
#
# internal:0:0-0: Error: Could not open file \"-\": No such file or directory
#
#
# :1:6-12: Error: syntax error, unexpected counter
# list counter inet filter unknown
#  ^^^
# ERROR: counter unknown in nsclient1 has unexpected value (expected
packets 0 bytes 0)
# :1:6-12: Error: syntax error, unexpected counter
# list counter inet filter unknown
#  ^^^



# ERROR: counter related in nsclient1 has unexpected value (expected
packets 2 bytes 1856)
# :1:6-12: Error: syntax error, unexpected counter
# list counter inet filter related
#  ^^^
# ERROR: icmp error RELATED state test has failed
not ok 4 selftests: netfilter: conntrack_icmp_related.sh # exit=1
# selftests: netfilter: nft_flowtable.sh
# Cannot create namespace file \"/var/run/netns/ns1\": File exists
[ 1230.570705] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 1230.757525] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 1230.843221] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
# internal:0:0-0: Error: Could not open file \"-\": No such file or directory
#
#
# PASS: netns routing/connectivity: ns1 can reach ns2
# BusyBox v1.27.2 (2020-07-17 18:42:50 UTC) multi-call binary.
#
# Usage: nc [IPADDR PORT]
# BusyBox v1.27.2 (2020-07-17 18:42:50 UTC) multi-call binary.
#
# Usage: nc [IPADDR PORT]
# FAIL: file mismatch for ns1 -> ns2
# -rw--- 1 root root 1079296 Oct 13 09:54 /tmp/

Re: [PATCH 07/23] wfx: add bus_sdio.c

2020-10-14 Thread Jérôme Pouiller

Hello Pali,

On Tuesday 13 October 2020 22:11:56 CEST Pali Rohár wrote:
> Hello!
> 
> On Monday 12 October 2020 12:46:32 Jerome Pouiller wrote:
> > +#define SDIO_VENDOR_ID_SILABS0x
> > +#define SDIO_DEVICE_ID_SILABS_WF200  0x1000
> > +static const struct sdio_device_id wfx_sdio_ids[] = {
> > + { SDIO_DEVICE(SDIO_VENDOR_ID_SILABS, SDIO_DEVICE_ID_SILABS_WF200) },
> 
> Please move ids into common include file include/linux/mmc/sdio_ids.h
> where are all SDIO ids. Now all drivers have ids defined in that file.
> 
> > + // FIXME: ignore VID/PID and only rely on device tree
> > + // { SDIO_DEVICE(SDIO_ANY_ID, SDIO_ANY_ID) },
> 
> What is the reason for ignoring vendor and device ids?

The device has a particularity, its VID/PID is :1000 (as you can see
above). This value is weird. The risk of collision with another device is
high.

So, maybe the device should be probed only if it appears in the DT. Since
WF200 targets embedded platforms, I don't think it is a problem to rely on
DT. You will find another FIXME further in the code about that:

+   dev_warn(&func->dev,
+"device is not declared in DT, features will be 
limited\n");
+   // FIXME: ignore VID/PID and only rely on device tree
+   // return -ENODEV;

However, it wouldn't be usual way to manage SDIO devices (and it is the
reason why the code is commented out).

Anyway, if we choose to rely on the DT, should we also check the VID/PID?

Personally, I am in favor to probe the device only if VID/PID match and if
a DT node is found, even if it is not the usual way.

-- 
Jérôme Pouiller

[PATCH net-next] net: ptp: get rid of IPV4_HLEN() and OFF_IHL macros

2020-10-14 Thread Christian Eggers

Both macros are already marked for removal. IPV4_HLEN(data) is
misleading as it expects an Ethernet header instead of an IPv4 header as
argument. Because it is defined (and only used) within PTP, it should be
named PTP_IPV4_HLEN or similar.

As the whole rest of the IPv4 stack has no problems using iphdr->ihl
directly, also PTP should be able to do so.

OFF_IHL has only been used by IPV4_HLEN. Additionally it is superfluous
as ETH_HLEN already exists for the same.

Signed-off-by: Christian Eggers 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c   | 4 ++--
 drivers/net/ethernet/chelsio/cxgb4/sge.c | 5 -
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 4 +++-
 drivers/net/ethernet/xscale/ixp4xx_eth.c | 4 +++-
 include/linux/ptp_classify.h | 2 --
 net/core/ptp_classifier.c| 6 +++---
 6 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
index 70dbee89118e..b32a9006b222 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
@@ -83,8 +83,8 @@ bool is_ptp_enabled(struct sk_buff *skb, struct net_device 
*dev)
  */
 bool cxgb4_ptp_is_ptp_rx(struct sk_buff *skb)
 {
-   struct udphdr *uh = (struct udphdr *)(skb->data + ETH_HLEN +
- IPV4_HLEN(skb->data));
+   struct iphdr *ih = (struct iphdr *)(skb->data + ETH_HLEN);
+   struct udphdr *uh = (struct udphdr *)((char *)ih + (ih->ihl << 2));
 
return  uh->dest == htons(PTP_EVENT_PORT) &&
uh->source == htons(PTP_EVENT_PORT);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index a9e9c7ae565d..c8bec874bc66 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3386,7 +3387,9 @@ static noinline int t4_systim_to_hwstamp(struct adapter 
*adapter,
 
data = skb->data + sizeof(*cpl);
skb_pull(skb, 2 * sizeof(u64) + sizeof(struct cpl_rx_mps_pkt));
-   offset = ETH_HLEN + IPV4_HLEN(skb->data) + UDP_HLEN;
+   offset = ETH_HLEN;
+   offset += ((struct iphdr *)(skb->data + offset))->ihl << 2;
+   offset += UDP_HLEN;
if (skb->len < offset + OFF_PTP_SEQUENCE_ID + sizeof(short))
return RX_PTP_PKT_ERR;
 
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index ade8c44c01cd..4e95621997d1 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -113,7 +113,9 @@ static int pch_ptp_match(struct sk_buff *skb, u16 uid_hi, 
u32 uid_lo, u16 seqid)
if (ptp_classify_raw(skb) == PTP_CLASS_NONE)
return 0;
 
-   offset = ETH_HLEN + IPV4_HLEN(data) + UDP_HLEN;
+   offset = ETH_HLEN;
+   offset += ((struct iphdr *)(data + offset))->ihl << 2;
+   offset += UDP_HLEN;
 
if (skb->len < offset + OFF_PTP_SEQUENCE_ID + sizeof(seqid))
return 0;
diff --git a/drivers/net/ethernet/xscale/ixp4xx_eth.c 
b/drivers/net/ethernet/xscale/ixp4xx_eth.c
index 2e5202923510..7443bc1f9bec 100644
--- a/drivers/net/ethernet/xscale/ixp4xx_eth.c
+++ b/drivers/net/ethernet/xscale/ixp4xx_eth.c
@@ -264,7 +264,9 @@ static int ixp_ptp_match(struct sk_buff *skb, u16 uid_hi, 
u32 uid_lo, u16 seqid)
if (ptp_classify_raw(skb) != PTP_CLASS_V1_IPV4)
return 0;
 
-   offset = ETH_HLEN + IPV4_HLEN(data) + UDP_HLEN;
+   offset = ETH_HLEN;
+   offset += ((struct iphdr *)(data + offset))->ihl << 2;
+   offset += UDP_HLEN;
 
if (skb->len < offset + OFF_PTP_SEQUENCE_ID + sizeof(seqid))
return 0;
diff --git a/include/linux/ptp_classify.h b/include/linux/ptp_classify.h
index c6487b7ab026..56b2d7d66177 100644
--- a/include/linux/ptp_classify.h
+++ b/include/linux/ptp_classify.h
@@ -40,8 +40,6 @@
 /* Below defines should actually be removed at some point in time. */
 #define IP6_HLEN   40
 #define UDP_HLEN   8
-#define OFF_IHL14
-#define IPV4_HLEN(data) (((struct iphdr *)(data + OFF_IHL))->ihl << 2)
 
 struct clock_identity {
u8 id[8];
diff --git a/net/core/ptp_classifier.c b/net/core/ptp_classifier.c
index e33fde06d528..6a964639b704 100644
--- a/net/core/ptp_classifier.c
+++ b/net/core/ptp_classifier.c
@@ -114,9 +114,11 @@ struct ptp_header *ptp_parse_header(struct sk_buff *skb, 
unsigned int type)
if (type & PTP_CLASS_VLAN)
ptr += VLAN_HLEN;
 
+   ptr += ETH_HLEN;
+
switch (type & PTP_CLASS_PMASK) {
case PTP_CLASS_IPV4:
-   ptr += IPV4_HLEN(ptr) + UDP_HLEN;
+   ptr += (((struct iphdr *)ptr)->ihl << 2) + UDP_

WARNING in __rate_control_send_low

2020-10-14 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:bbf5c979 Linux 5.9
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12dc474f90
kernel config:  https://syzkaller.appspot.com/x/.config?x=3d8333c88fe898d7
dashboard link: https://syzkaller.appspot.com/bug?extid=fdc5123366fb9c3fdc6d
compiler:   gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+fdc5123366fb9c3fd...@syzkaller.appspotmail.com

[ cut here ]
no supported rates for sta (null) (0x, band 0) in rate_mask 0xfff with 
flags 0x20
WARNING: CPU: 1 PID: 169 at net/mac80211/rate.c:349 
__rate_control_send_low+0x4eb/0x5e0 net/mac80211/rate.c:349
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 169 Comm: kworker/u4:5 Not tainted 5.9.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Workqueue: phy9 ieee80211_scan_work
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 panic+0x382/0x7fb kernel/panic.c:231
 __warn.cold+0x20/0x4b kernel/panic.c:600
 report_bug+0x1bd/0x210 lib/bug.c:198
 handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234
 exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:__rate_control_send_low+0x4eb/0x5e0 net/mac80211/rate.c:349
Code: 14 48 89 44 24 08 e8 d4 8d b0 f9 44 8b 44 24 24 45 89 e9 44 89 e1 48 8b 
74 24 08 44 89 f2 48 c7 c7 40 24 5f 89 e8 b7 ca 80 f9 <0f> 0b e9 e0 fd ff ff e8 
a9 8d b0 f9 41 83 cd 10 e9 02 fc ff ff e8
RSP: 0018:c900013f7688 EFLAGS: 00010282
RAX:  RBX: 88801e243468 RCX: 
RDX: 8880a884e100 RSI: 815f5a55 RDI: f5200027eec3
RBP: 88805f373148 R08: 0001 R09: 8880ae5318e7
R10:  R11:  R12: 
R13: 0020 R14:  R15: 0090
 rate_control_send_low+0x261/0x610 net/mac80211/rate.c:374
 rate_control_get_rate+0x1b9/0x5a0 net/mac80211/rate.c:887
 ieee80211_tx_h_rate_ctrl+0xa0f/0x1660 net/mac80211/tx.c:749
 invoke_tx_handlers_early+0xaf3/0x25e0 net/mac80211/tx.c:1784
 ieee80211_tx+0x250/0x430 net/mac80211/tx.c:1926
 ieee80211_xmit+0x2dd/0x3b0 net/mac80211/tx.c:2015
 __ieee80211_tx_skb_tid_band+0x20a/0x290 net/mac80211/tx.c:5351
 ieee80211_tx_skb_tid_band net/mac80211/ieee80211_i.h:1986 [inline]
 ieee80211_send_scan_probe_req net/mac80211/scan.c:610 [inline]
 ieee80211_scan_state_send_probe+0x39f/0x910 net/mac80211/scan.c:638
 ieee80211_scan_work+0x6df/0x19e0 net/mac80211/scan.c:1071
 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
 kthread+0x3b5/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Re: [PATCH 07/23] wfx: add bus_sdio.c

2020-10-14 Thread Pali Rohár

On Wednesday 14 October 2020 13:52:15 Jérôme Pouiller wrote:
> Hello Pali,
> 
> On Tuesday 13 October 2020 22:11:56 CEST Pali Rohár wrote:
> > Hello!
> > 
> > On Monday 12 October 2020 12:46:32 Jerome Pouiller wrote:
> > > +#define SDIO_VENDOR_ID_SILABS0x
> > > +#define SDIO_DEVICE_ID_SILABS_WF200  0x1000
> > > +static const struct sdio_device_id wfx_sdio_ids[] = {
> > > + { SDIO_DEVICE(SDIO_VENDOR_ID_SILABS, SDIO_DEVICE_ID_SILABS_WF200) },
> > 
> > Please move ids into common include file include/linux/mmc/sdio_ids.h
> > where are all SDIO ids. Now all drivers have ids defined in that file.
> > 
> > > + // FIXME: ignore VID/PID and only rely on device tree
> > > + // { SDIO_DEVICE(SDIO_ANY_ID, SDIO_ANY_ID) },
> > 
> > What is the reason for ignoring vendor and device ids?
> 
> The device has a particularity, its VID/PID is :1000 (as you can see
> above). This value is weird. The risk of collision with another device is
> high.

Those ids looks strange. You are from Silabs, can you check internally
in Silabs if ids are really correct? And which sdio vendor id you in
Silabs got assigned for your products?

I know that sdio devices with multiple functions may have different sdio
vendor/device id particular function and in common CIS (function 0).

Could not be a problem that on one place is vendor/device id correct and
on other place is that strange value?

I have sent following patch (now part of upstream kernel) which exports
these ids to userspace:
https://lore.kernel.org/linux-mmc/20200527110858.17504-2-p...@kernel.org/T/#u

Also for debugging ids and information about sdio cards, I sent another
patch which export additional data:
https://lore.kernel.org/linux-mmc/20200727133837.19086-1-p...@kernel.org/T/#u

Could you try them and look at /sys/class/mmc_host/ attribute outputs?

> So, maybe the device should be probed only if it appears in the DT. Since
> WF200 targets embedded platforms, I don't think it is a problem to rely on
> DT. You will find another FIXME further in the code about that:
> 
> +   dev_warn(&func->dev,
> +"device is not declared in DT, features will be 
> limited\n");
> +   // FIXME: ignore VID/PID and only rely on device tree
> +   // return -ENODEV;
> 
> However, it wouldn't be usual way to manage SDIO devices (and it is the
> reason why the code is commented out).
> 
> Anyway, if we choose to rely on the DT, should we also check the VID/PID?
> 
> Personally, I am in favor to probe the device only if VID/PID match and if
> a DT node is found, even if it is not the usual way.

Normally all sdio devices are hotplugged in linux kernel based on sdio
device and vendor ids. And these ids are unique identifiers of sdio
devices. So should be enough for detection.

Months ago I have checked it and moved all SDIO device and vendor ids
into common include/linux/mmc/sdio_ids.h file. I would like to not have
this "mess" again, which was basically fully cleaned.

I'm adding linux-mmc mailing list and Ulf Hansson to loop.

Ulf, can you look at this "problem"? What do you think about those
"strange" sdio ids?

[PATCH] net: phy: Prevent reporting advertised modes when autoneg is off

2020-10-14 Thread Łukasz Stelmach

Do not report advertised link modes when autonegotiation is turned
off. mii_ethtool_get_link_ksettings() exhibits the same behaviour.

Signed-off-by: Łukasz Stelmach 
---
 drivers/net/phy/phy.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 35525a671400..3cadf224fdb2 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -315,7 +315,8 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
   struct ethtool_link_ksettings *cmd)
 {
linkmode_copy(cmd->link_modes.supported, phydev->supported);
-   linkmode_copy(cmd->link_modes.advertising, phydev->advertising);
+   if (phydev->autoneg)
+   linkmode_copy(cmd->link_modes.advertising, phydev->advertising);
linkmode_copy(cmd->link_modes.lp_advertising, phydev->lp_advertising);
 
cmd->base.speed = phydev->speed;
-- 
2.26.2

Re: iptables userspace API broken due to added value in nf_inet_hooks

2020-10-14 Thread Pablo Neira Ayuso

On Wed, Oct 14, 2020 at 02:59:47PM +0200, Jason A. Donenfeld wrote:
> Hey Pablo,
> 
> In 60a3815da702fd9e4759945f26cce5c47d3967ad, you added another enum
> value to nf_inet_hooks:
> 
> --- a/include/uapi/linux/netfilter.h
> +++ b/include/uapi/linux/netfilter.h
> @@ -45,6 +45,7 @@ enum nf_inet_hooks {
>NF_INET_FORWARD,
>NF_INET_LOCAL_OUT,
>NF_INET_POST_ROUTING,
> +   NF_INET_INGRESS,
>NF_INET_NUMHOOKS
> };
> 
> That seems fine, but actually it changes the value of
> NF_INET_NUMHOOKS, which is used in struct ipt_getinfo:
> 
> /* The argument to IPT_SO_GET_INFO */
> struct ipt_getinfo {
>/* Which table: caller fills this in. */
>char name[XT_TABLE_MAXNAMELEN];
> 
>/* Kernel fills these in. */
>/* Which hook entry points are valid: bitmask */
>unsigned int valid_hooks;
> 
>/* Hook entry points: one per netfilter hook. */
>unsigned int hook_entry[NF_INET_NUMHOOKS];
> 
>/* Underflow points. */
>unsigned int underflow[NF_INET_NUMHOOKS];
> 
>/* Number of entries */
>unsigned int num_entries;
> 
>/* Size of entries. */
>unsigned int size;
> };
> 
> This in turn makes that struct bigger, which means this check in
> net/ipv4/netfilter/ip_tables.c fails:
> 
> static int get_info(struct net *net, void __user *user, const int *len)
> {
>char name[XT_TABLE_MAXNAMELEN];
>struct xt_table *t;
>int ret;
> 
>if (*len != sizeof(struct ipt_getinfo))
>return -EINVAL;
> 
> This is affecting my CI, which attempts to use an older iptables with
> net-next and fails with:
> 
> iptables v1.8.4 (legacy): can't initialize iptables table `filter':
> Module is wrong version
> Perhaps iptables or your kernel needs to be upgraded.
> 
> Is this kind of breakage okay? If there's an exception carved out for
> breaking the iptables API, just let me know, and I'll look into making
> adjustments to work around it in my CI. On the other hand, if this
> breakage was unintentional, now you know.

Oh right, I'll need a new IPT_INET_NUMHOOKS for this.

I'll submit a patch, that's for the heads up.

Re: iptables userspace API broken due to added value in nf_inet_hooks

2020-10-14 Thread Pablo Neira Ayuso

On Wed, Oct 14, 2020 at 03:01:15PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Oct 14, 2020 at 02:59:47PM +0200, Jason A. Donenfeld wrote:
> > Hey Pablo,
> > 
> > In 60a3815da702fd9e4759945f26cce5c47d3967ad, you added another enum
> > value to nf_inet_hooks:
> > 
> > --- a/include/uapi/linux/netfilter.h
> > +++ b/include/uapi/linux/netfilter.h
> > @@ -45,6 +45,7 @@ enum nf_inet_hooks {
> >NF_INET_FORWARD,
> >NF_INET_LOCAL_OUT,
> >NF_INET_POST_ROUTING,
> > +   NF_INET_INGRESS,
> >NF_INET_NUMHOOKS
> > };
> > 
> > That seems fine, but actually it changes the value of
> > NF_INET_NUMHOOKS, which is used in struct ipt_getinfo:
> > 
> > /* The argument to IPT_SO_GET_INFO */
> > struct ipt_getinfo {
> >/* Which table: caller fills this in. */
> >char name[XT_TABLE_MAXNAMELEN];
> > 
> >/* Kernel fills these in. */
> >/* Which hook entry points are valid: bitmask */
> >unsigned int valid_hooks;
> > 
> >/* Hook entry points: one per netfilter hook. */
> >unsigned int hook_entry[NF_INET_NUMHOOKS];
> > 
> >/* Underflow points. */
> >unsigned int underflow[NF_INET_NUMHOOKS];
> > 
> >/* Number of entries */
> >unsigned int num_entries;
> > 
> >/* Size of entries. */
> >unsigned int size;
> > };
> > 
> > This in turn makes that struct bigger, which means this check in
> > net/ipv4/netfilter/ip_tables.c fails:
> > 
> > static int get_info(struct net *net, void __user *user, const int *len)
> > {
> >char name[XT_TABLE_MAXNAMELEN];
> >struct xt_table *t;
> >int ret;
> > 
> >if (*len != sizeof(struct ipt_getinfo))
> >return -EINVAL;
> > 
> > This is affecting my CI, which attempts to use an older iptables with
> > net-next and fails with:
> > 
> > iptables v1.8.4 (legacy): can't initialize iptables table `filter':
> > Module is wrong version
> > Perhaps iptables or your kernel needs to be upgraded.
> > 
> > Is this kind of breakage okay? If there's an exception carved out for
> > breaking the iptables API, just let me know, and I'll look into making
> > adjustments to work around it in my CI. On the other hand, if this
> > breakage was unintentional, now you know.
> 
> Oh right, I'll need a new IPT_INET_NUMHOOKS for this.
> 
> I'll submit a patch, that's for the heads up.

s/that's/thanks

iptables userspace API broken due to added value in nf_inet_hooks

2020-10-14 Thread Jason A. Donenfeld

Hey Pablo,

In 60a3815da702fd9e4759945f26cce5c47d3967ad, you added another enum
value to nf_inet_hooks:

--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -45,6 +45,7 @@ enum nf_inet_hooks {
   NF_INET_FORWARD,
   NF_INET_LOCAL_OUT,
   NF_INET_POST_ROUTING,
+   NF_INET_INGRESS,
   NF_INET_NUMHOOKS
};

That seems fine, but actually it changes the value of
NF_INET_NUMHOOKS, which is used in struct ipt_getinfo:

/* The argument to IPT_SO_GET_INFO */
struct ipt_getinfo {
   /* Which table: caller fills this in. */
   char name[XT_TABLE_MAXNAMELEN];

   /* Kernel fills these in. */
   /* Which hook entry points are valid: bitmask */
   unsigned int valid_hooks;

   /* Hook entry points: one per netfilter hook. */
   unsigned int hook_entry[NF_INET_NUMHOOKS];

   /* Underflow points. */
   unsigned int underflow[NF_INET_NUMHOOKS];

   /* Number of entries */
   unsigned int num_entries;

   /* Size of entries. */
   unsigned int size;
};

This in turn makes that struct bigger, which means this check in
net/ipv4/netfilter/ip_tables.c fails:

static int get_info(struct net *net, void __user *user, const int *len)
{
   char name[XT_TABLE_MAXNAMELEN];
   struct xt_table *t;
   int ret;

   if (*len != sizeof(struct ipt_getinfo))
   return -EINVAL;

This is affecting my CI, which attempts to use an older iptables with
net-next and fails with:

iptables v1.8.4 (legacy): can't initialize iptables table `filter':
Module is wrong version
Perhaps iptables or your kernel needs to be upgraded.

Is this kind of breakage okay? If there's an exception carved out for
breaking the iptables API, just let me know, and I'll look into making
adjustments to work around it in my CI. On the other hand, if this
breakage was unintentional, now you know.

Jason

Re: [PATCH v2 2/7] staging: qlge: Initialize devlink health dump framework

2020-10-14 Thread Dan Carpenter

On Wed, Oct 14, 2020 at 06:43:01PM +0800, Coiby Xu wrote:
>  static int qlge_probe(struct pci_dev *pdev,
> const struct pci_device_id *pci_entry)
>  {
>   struct net_device *ndev = NULL;
>   struct qlge_adapter *qdev = NULL;
> + struct devlink *devlink;
>   static int cards_found;
>   int err = 0;
>  
> - ndev = alloc_etherdev_mq(sizeof(struct qlge_adapter),
> + devlink = devlink_alloc(&qlge_devlink_ops, sizeof(struct qlge_adapter));
> + if (!devlink)
> + return -ENOMEM;
> +
> + qdev = devlink_priv(devlink);
> +
> + ndev = alloc_etherdev_mq(sizeof(struct qlge_netdev_priv),
>min(MAX_CPUS,
>netif_get_num_default_rss_queues()));
>   if (!ndev)
> - return -ENOMEM;
> + goto devlink_free;
>  
> - err = qlge_init_device(pdev, ndev, cards_found);
> - if (err < 0) {
> - free_netdev(ndev);
> - return err;

In the old code, if qlge_init_device() fails then it frees "ndev".

> - }
> + qdev->ndev = ndev;
> + err = qlge_init_device(pdev, qdev, cards_found);
> + if (err < 0)
> + goto devlink_free;

But the patch introduces a new resource leak.

>  
> - qdev = netdev_priv(ndev);
>   SET_NETDEV_DEV(ndev, &pdev->dev);
>   ndev->hw_features = NETIF_F_SG |
>   NETIF_F_IP_CSUM |
> @@ -4611,8 +4619,14 @@ static int qlge_probe(struct pci_dev *pdev,
>   qlge_release_all(pdev);
>   pci_disable_device(pdev);
>   free_netdev(ndev);
> - return err;
> + goto devlink_free;
>   }
> +
> + err = devlink_register(devlink, &pdev->dev);
> + if (err)
> + goto devlink_free;
> +
> + qlge_health_create_reporters(qdev);
>   /* Start up the timer to trigger EEH if
>* the bus goes dead
>*/
> @@ -4623,6 +4637,10 @@ static int qlge_probe(struct pci_dev *pdev,
>   atomic_set(&qdev->lb_count, 0);
>   cards_found++;
>   return 0;
> +
> +devlink_free:
> + devlink_free(devlink);
> + return err;
>  }

The best way to write error handling code is keep tracke of the most
recent allocation which was allocated successfully.

one = alloc();
if (!one)
return -ENOMEM;  //  <-- nothing allocated successfully

two = alloc();
if (!two) {
ret = -ENOMEM;
goto free_one; // <-- one was allocated successfully
   // Notice that the label name says what
   // the goto does.
}

three = alloc();
if (!three) {
ret = -ENOMEM;
goto free_two; // <-- two allocated, two freed.
}

...

return 0;

free_two:
free(two);
free_one:
free(one);

return ret;

In the old code qlge_probe() freed things before returning, and that's
fine if there is only two allocations in the function but when there are
three or more allocations, then use gotos to unwind.

Ideally there would be a ql_deinit_device() function to mirror the
ql_init_device() function.  The ql_init_device() is staging quality
code with leaks and bad label names.  It should be re-written to free
things one step at a time instead of calling ql_release_all().

Anyway, let's not introduce new leaks at least.

regards,
dan carpenter

Re: [PATCH 1/1] net: ftgmac100: add handling of mdio/phy nodes for ast2400/2500

2020-10-14 Thread Ivan Mikhaylov

On Wed, 2020-10-14 at 05:23 +, Joel Stanley wrote:
> Hi Ivan,
> 
> On Tue, 13 Oct 2020 at 12:38, Ivan Mikhaylov  wrote:
> > phy-handle can't be handled well for ast2400/2500 which has an embedded
> > MDIO controller. Add ftgmac100_mdio_setup for ast2400/2500 and initialize
> > PHYs from mdio child node with of_mdiobus_register.
> 
> Good idea. The driver has become a mess of different ways to connect
> the phy and it needs to be cleaned up. I have a patch that fixes
> rmmod, which is currently broken.
> 
> 
> 
> > Signed-off-by: Ivan Mikhaylov 
> > ---
> >  drivers/net/ethernet/faraday/ftgmac100.c | 114 ++-
> >  1 file changed, 69 insertions(+), 45 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > b/drivers/net/ethernet/faraday/ftgmac100.c
> > index 87236206366f..e32066519ec1 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -1044,11 +1044,47 @@ static void ftgmac100_adjust_link(struct net_device
> > *netdev)
> > schedule_work(&priv->reset_task);
> >  }
> > 
> > -static int ftgmac100_mii_probe(struct ftgmac100 *priv, phy_interface_t
> > intf)
> > +static int ftgmac100_mii_probe(struct net_device *netdev)
> >  {
> > -   struct net_device *netdev = priv->netdev;
> > +   struct ftgmac100 *priv = netdev_priv(netdev);
> > +   struct platform_device *pdev = to_platform_device(priv->dev);
> > +   struct device_node *np = pdev->dev.of_node;
> > +   phy_interface_t phy_intf = PHY_INTERFACE_MODE_RGMII;
> > struct phy_device *phydev;
> > 
> > +   /* Get PHY mode from device-tree */
> > +   if (np) {
> > +   /* Default to RGMII. It's a gigabit part after all */
> > +   phy_intf = of_get_phy_mode(np, &phy_intf);
> > +   if (phy_intf < 0)
> > +   phy_intf = PHY_INTERFACE_MODE_RGMII;
> > +
> > +   /* Aspeed only supports these. I don't know about other IP
> > +* block vendors so I'm going to just let them through for
> > +* now. Note that this is only a warning if for some obscure
> > +* reason the DT really means to lie about it or it's a
> > newer
> > +* part we don't know about.
> > +*
> > +* On the Aspeed SoC there are additionally straps and SCU
> > +* control bits that could tell us what the interface is
> > +* (or allow us to configure it while the IP block is held
> > +* in reset). For now I chose to keep this driver away from
> > +* those SoC specific bits and assume the device-tree is
> > +* right and the SCU has been configured properly by pinmux
> > +* or the firmware.
> > +*/
> > +   if (priv->is_aspeed &&
> > +   phy_intf != PHY_INTERFACE_MODE_RMII &&
> > +   phy_intf != PHY_INTERFACE_MODE_RGMII &&
> > +   phy_intf != PHY_INTERFACE_MODE_RGMII_ID &&
> > +   phy_intf != PHY_INTERFACE_MODE_RGMII_RXID &&
> > +   phy_intf != PHY_INTERFACE_MODE_RGMII_TXID) {
> > +   netdev_warn(netdev,
> > +   "Unsupported PHY mode %s !\n",
> > +   phy_modes(phy_intf));
> > +   }
> 
> Why do we move this?

I've tried to detach PHY connect from ftgmac100_setup_mdio register function.
Tried to decouple MDIO and PHY levels.

> 
> > +   }
> > +
> > phydev = phy_find_first(priv->mii_bus);
> > if (!phydev) {
> > netdev_info(netdev, "%s: no PHY found\n", netdev->name);
> > @@ -1056,7 +1092,7 @@ static int ftgmac100_mii_probe(struct ftgmac100 *priv,
> > phy_interface_t intf)
> > }
> > 
> > phydev = phy_connect(netdev, phydev_name(phydev),
> > -&ftgmac100_adjust_link, intf);
> > +&ftgmac100_adjust_link, phy_intf);
> > 
> > if (IS_ERR(phydev)) {
> > netdev_err(netdev, "%s: Could not attach to PHY\n", netdev-
> > >name);
> > @@ -1601,8 +1637,8 @@ static int ftgmac100_setup_mdio(struct net_device
> > *netdev)
> >  {
> > struct ftgmac100 *priv = netdev_priv(netdev);
> > struct platform_device *pdev = to_platform_device(priv->dev);
> > -   phy_interface_t phy_intf = PHY_INTERFACE_MODE_RGMII;
> > struct device_node *np = pdev->dev.of_node;
> > +   struct device_node *mdio_np;
> > int i, err = 0;
> > u32 reg;
> > 
> > @@ -1623,39 +1659,6 @@ static int ftgmac100_setup_mdio(struct net_device
> > *netdev)
> > iowrite32(reg, priv->base + FTGMAC100_OFFSET_REVR);
> > }
> > 
> > -   /* Get PHY mode from device-tree */
> > -   if (np) {
> > -   /* Default to RGMII. It's a gigabit part after all */
> > -

RE: [PATCH net] can: peak_usb: add range checking in decode operations

2020-10-14 Thread Stéphane Grosjean

Hello Dan,

Don't know if this patch is still relevant, but:

- there is absolutely no reason for the device firmware to provide a channel 
index greater than or equal to 2, because the IP core of these USB devices 
handles 2 channels only. Anyway, these changes are correct.
- considering the verification of the length "cfd->len" on the other hand, this 
one comes directly from can_send() via dev_queue_xmit() AFAIK and it seems to 
me that the underlying driver can assume that its value is smaller than 64.

Regards,
---
Stéphane Grosjean
PEAK-System France
132, rue André Bisiaux
F-54320 MAXEVILLE
Tél : +(33) 9.72.54.51.97



De : Dan Carpenter 
Envoyé : jeudi 13 août 2020 16:06
À : Wolfgang Grandegger ; Stéphane Grosjean 

Cc : Marc Kleine-Budde ; David S. Miller 
; Jakub Kicinski ; Andri Yngvason 
; Oliver Hartkopp ; 
linux-...@vger.kernel.org ; netdev@vger.kernel.org 
; kernel-janit...@vger.kernel.org 

Objet : [PATCH net] can: peak_usb: add range checking in decode operations

These values come from skb->data so Smatch considers them untrusted.  I
believe Smatch is correct but I don't have a way to test this.

The usb_if->dev[] array has 2 elements but the index is in the 0-15
range without checks.  The cfd->len can be up to 255 but the maximum
valid size is CANFD_MAX_DLEN (64) so that could lead to memory
corruption.

Fixes: 0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB 
adapters")
Signed-off-by: Dan Carpenter 
---
 drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 48 +-
 1 file changed, 37 insertions(+), 11 deletions(-)

diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c 
b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
index 47cc1ff5b88e..dee3e689b54d 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
@@ -468,12 +468,18 @@ static int pcan_usb_fd_decode_canmsg(struct 
pcan_usb_fd_if *usb_if,
  struct pucan_msg *rx_msg)
 {
 struct pucan_rx_msg *rm = (struct pucan_rx_msg *)rx_msg;
-   struct peak_usb_device *dev = usb_if->dev[pucan_msg_get_channel(rm)];
-   struct net_device *netdev = dev->netdev;
+   struct peak_usb_device *dev;
+   struct net_device *netdev;
 struct canfd_frame *cfd;
 struct sk_buff *skb;
 const u16 rx_msg_flags = le16_to_cpu(rm->flags);

+   if (pucan_msg_get_channel(rm) >= ARRAY_SIZE(usb_if->dev))
+   return -ENOMEM;
+
+   dev = usb_if->dev[pucan_msg_get_channel(rm)];
+   netdev = dev->netdev;
+
 if (rx_msg_flags & PUCAN_MSG_EXT_DATA_LEN) {
 /* CANFD frame case */
 skb = alloc_canfd_skb(netdev, &cfd);
@@ -519,15 +525,21 @@ static int pcan_usb_fd_decode_status(struct 
pcan_usb_fd_if *usb_if,
  struct pucan_msg *rx_msg)
 {
 struct pucan_status_msg *sm = (struct pucan_status_msg *)rx_msg;
-   struct peak_usb_device *dev = usb_if->dev[pucan_stmsg_get_channel(sm)];
-   struct pcan_usb_fd_device *pdev =
-   container_of(dev, struct pcan_usb_fd_device, dev);
+   struct pcan_usb_fd_device *pdev;
 enum can_state new_state = CAN_STATE_ERROR_ACTIVE;
 enum can_state rx_state, tx_state;
-   struct net_device *netdev = dev->netdev;
+   struct peak_usb_device *dev;
+   struct net_device *netdev;
 struct can_frame *cf;
 struct sk_buff *skb;

+   if (pucan_stmsg_get_channel(sm) >= ARRAY_SIZE(usb_if->dev))
+   return -ENOMEM;
+
+   dev = usb_if->dev[pucan_stmsg_get_channel(sm)];
+   pdev = container_of(dev, struct pcan_usb_fd_device, dev);
+   netdev = dev->netdev;
+
 /* nothing should be sent while in BUS_OFF state */
 if (dev->can.state == CAN_STATE_BUS_OFF)
 return 0;
@@ -579,9 +591,14 @@ static int pcan_usb_fd_decode_error(struct pcan_usb_fd_if 
*usb_if,
 struct pucan_msg *rx_msg)
 {
 struct pucan_error_msg *er = (struct pucan_error_msg *)rx_msg;
-   struct peak_usb_device *dev = usb_if->dev[pucan_ermsg_get_channel(er)];
-   struct pcan_usb_fd_device *pdev =
-   container_of(dev, struct pcan_usb_fd_device, dev);
+   struct pcan_usb_fd_device *pdev;
+   struct peak_usb_device *dev;
+
+   if (pucan_ermsg_get_channel(er) >= ARRAY_SIZE(usb_if->dev))
+   return -EINVAL;
+
+   dev = usb_if->dev[pucan_ermsg_get_channel(er)];
+   pdev = container_of(dev, struct pcan_usb_fd_device, dev);

 /* keep a trace of tx and rx error counters for later use */
 pdev->bec.txerr = er->tx_err_cnt;
@@ -595,11 +612,17 @@ static int pcan_usb_fd_decode_overrun(struct 
pcan_usb_fd_if *usb_if,
   struct pucan_msg *rx_msg)
 {
 struct pcan_ufd_ovr_msg *ov = (struct pcan_ufd_ovr_msg *)rx_msg;
-   struct peak_usb_device *dev = usb_if->dev[pufd_omsg_get_ch

[PATCH v2 1/1] net: dsa: seville: the packet buffer is 2 megabits, not megabytes

2020-10-14 Thread Maxim Kochetkov

The VSC9953 Seville switch has 2 megabits of buffer split into 4360
words of 60 bytes each. 2048 * 1024 is 2 megabytes instead of 2 megabits.
2 megabits is (2048 / 8) * 1024 = 256 * 1024.

Signed-off-by: Maxim Kochetkov 
Fixes: a63ed92d217f ("net: dsa: seville: fix buffer size of the queue system")
---
 drivers/net/dsa/ocelot/seville_vsc9953.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/ocelot/seville_vsc9953.c 
b/drivers/net/dsa/ocelot/seville_vsc9953.c
index 9e9fd19e1d00..e2cd49eec037 100644
--- a/drivers/net/dsa/ocelot/seville_vsc9953.c
+++ b/drivers/net/dsa/ocelot/seville_vsc9953.c
@@ -1010,7 +1010,7 @@ static const struct felix_info seville_info_vsc9953 = {
.vcap_is2_keys  = vsc9953_vcap_is2_keys,
.vcap_is2_actions   = vsc9953_vcap_is2_actions,
.vcap   = vsc9953_vcap_props,
-   .shared_queue_sz= 2048 * 1024,
+   .shared_queue_sz= 256 * 1024,
.num_mact_rows  = 2048,
.num_ports  = 10,
.mdio_bus_alloc = vsc9953_mdio_bus_alloc,
--
2.27.0

Re: [PATCH] net: phy: Prevent reporting advertised modes when autoneg is off

2020-10-14 Thread Russell King - ARM Linux admin

On Wed, Oct 14, 2020 at 02:56:50PM +0200, Łukasz Stelmach wrote:
> Do not report advertised link modes when autonegotiation is turned
> off. mii_ethtool_get_link_ksettings() exhibits the same behaviour.

Please explain why this is a desirable change.

Referring to some other piece of code isn't a particularly good reason
especially when that piece of code is likely derived from fairly old
code (presumably mii_ethtool_get_link_ksettings()'s behaviour is
designed to be compatible with mii_ethtool_gset()).

In any case, the mii.c code does fill in the advertising mask even when
autoneg is disabled, because, rightly or wrongly, the advertising mask
contains more than just the link modes, it includes the interface(s)
as well. Your change means phylib no longer reports the interface modes
which is at odds with the mii.c code.

> Signed-off-by: Łukasz Stelmach 
> ---
>  drivers/net/phy/phy.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index 35525a671400..3cadf224fdb2 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -315,7 +315,8 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
>  struct ethtool_link_ksettings *cmd)
>  {
>   linkmode_copy(cmd->link_modes.supported, phydev->supported);
> - linkmode_copy(cmd->link_modes.advertising, phydev->advertising);
> + if (phydev->autoneg)
> + linkmode_copy(cmd->link_modes.advertising, phydev->advertising);
>   linkmode_copy(cmd->link_modes.lp_advertising, phydev->lp_advertising);
>  
>   cmd->base.speed = phydev->speed;
> -- 
> 2.26.2
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Re: [PATCH] net: sockmap: Don't call bpf_prog_put() on NULL pointer

2020-10-14 Thread Jakub Sitnicki

On Mon, Oct 12, 2020 at 07:09 PM CEST, Alex Dewar wrote:
> If bpf_prog_inc_not_zero() fails for skb_parser, then bpf_prog_put() is
> called unconditionally on skb_verdict, even though it may be NULL. Fix
> and tidy up error path.
>
> Addresses-Coverity-ID: 1497799: Null pointer dereferences (FORWARD_NULL)
> Fixes: 743df8b7749f ("bpf, sockmap: Check skb_verdict and skb_parser programs 
> explicitly")
> Signed-off-by: Alex Dewar 
> ---

Note to maintainers: the issue exists only in bpf-next where we have:

  
https://lore.kernel.org/bpf/160239294756.8495.5796595770890272219.stgit@john-Precision-5820-Tower/

The patch also looks like it is supposed to be applied on top of the above.

Re: [PATCH 01/23] dt-bindings: introduce silabs,wfx.yaml

2020-10-14 Thread Jérôme Pouiller

On Tuesday 13 October 2020 18:49:35 CEST Rob Herring wrote:
> On Mon, Oct 12, 2020 at 12:46:26PM +0200, Jerome Pouiller wrote:
> > From: Jérôme Pouiller 
[...]
> > +  Note that in add of the properties below, the WFx driver also supports
> > +  `mac-address` and `local-mac-address` as described in
> > +  Documentation/devicetree/bindings/net/ethernet.txt
> 
> Note what ethernet.txt contains... This should have a $ref to
> ethernet-controller.yaml to express the above.
> 
> You can add 'mac-address: true' if you want to be explicit about what
> properties are used.

Here, only mac-address and local-mac-address are supported. So, would the
code below do the job?

  local-mac-address:
$ref: ethernet-controller.yaml#/properties/local-mac-address

  mac-address:
$ref: ethernet-controller.yaml#/properties/mac-address

[...]
> > +  spi-max-frequency:
> > +description: (SPI only) Maximum SPI clocking speed of device in Hz.
> 
> No need to redefine a common property.

When a property is specific to a bus, I would have like to explicitly
say it. That's why I redefined the description.

[...]
> > +  config-file:
> > +description: Use an alternative file as PDS. Default is `wf200.pds`. 
> > Only
> > +  necessary for development/debug purpose.
> 
> 'firmware-name' is typically what we'd use here. Though if just for
> debug/dev, perhaps do a debugfs interface for this instead. As DT should
> come from the firmware/bootloader, requiring changing the DT for
> dev/debug is not the easiest workflow compared to doing something from
> userspace.

This file is not a firmware. It mainly contains data related to the
antenna. At the beginning, this property has been added for
development. With the time, I think it can be used to  have one disk
image for several devices that differ only in antenna.

I am going to remove the part about development/debug purpose.

[...]
> Will need additionalProperties or unevaluatedProperties depending on
> whether you list out properties from ethernet-controller.yaml or not.

I think I need to specify "additionalProperties: true" since the user can
also use properties defined for the SPI devices.

In fact, I would like to write something like:

allOf:
$ref: spi-controller.yaml#/patternProperties/^.*@[0-9a-f]+$/properties

-- 
Jérôme Pouiller

Re: [PATCH v2 1/1] net: dsa: seville: the packet buffer is 2 megabits, not megabytes

2020-10-14 Thread Vladimir Oltean

On Wed, Oct 14, 2020 at 04:27:43PM +0300, Maxim Kochetkov wrote:
> The VSC9953 Seville switch has 2 megabits of buffer split into 4360
> words of 60 bytes each. 2048 * 1024 is 2 megabytes instead of 2 megabits.
> 2 megabits is (2048 / 8) * 1024 = 256 * 1024.
> 
> Signed-off-by: Maxim Kochetkov 
> Fixes: a63ed92d217f ("net: dsa: seville: fix buffer size of the queue system")
> ---

Reviewed-by: Vladimir Oltean 

This should go to "net".

Re: [PATCH AUTOSEL 5.8 18/24] net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails

2020-10-14 Thread Sasha Levin


On Tue, Oct 13, 2020 at 12:01:06AM +0300, Petko Manolov wrote:

On 20-10-12 12:11:18, Joe Perches wrote:

On Mon, 2020-10-12 at 15:02 -0400, Sasha Levin wrote:
> From: Anant Thazhemadam 
>
> [ Upstream commit f45a4248ea4cc13ed50618ff066849f9587226b2 ]
>
> When get_registers() fails in set_ethernet_addr(),the uninitialized
> value of node_id gets copied over as the address.
> So, check the return value of get_registers().
>
> If get_registers() executed successfully (i.e., it returns
> sizeof(node_id)), copy over the MAC address using ether_addr_copy()
> (instead of using memcpy()).
>
> Else, if get_registers() failed instead, a randomly generated MAC
> address is set as the MAC address instead.

This autosel is premature.

This patch always sets a random MAC.
See the follow on patch: https://lkml.org/lkml/2020/10/11/131
To my knowledge, this follow-ob has yet to be applied:


ACK, the follow-on patch has got the correct semantics.


I'll hold off on this patch until the follow-on is merged, thanks!

--
Thanks,
Sasha

[PATCH] ixgbe: fail to create xfrm offload of IPsec tunnel mode SA

2020-10-14 Thread Antony Antony

Based on talks and indirect references ixgbe IPsec offlod do not
support IPsec tunnel mode offload. It can only support IPsec transport
mode offload. Now explicitly fail when creating non transport mode SA
 with offload to avoid false performance expectations.

Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Antony Antony 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 5 +
 drivers/net/ethernet/intel/ixgbevf/ipsec.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index eca73526ac86..54d47265a7ac 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -575,6 +575,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
return -EINVAL;
}
 
+   if (xs->props.mode != XFRM_MODE_TRANSPORT) {
+   netdev_err(dev, "Unsupported mode for ipsec offload\n");
+   return -EINVAL;
+   }
+
if (ixgbe_ipsec_check_mgmt_ip(xs)) {
netdev_err(dev, "IPsec IP addr clash with mgmt filters\n");
return -EINVAL;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c 
b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 5170dd9d8705..caaea2c920a6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -272,6 +272,11 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs)
return -EINVAL;
}
 
+   if (xs->props.mode != XFRM_MODE_TRANSPORT) {
+   netdev_err(dev, "Unsupported mode for ipsec offload\n");
+   return -EINVAL;
+   }
+
if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
struct rx_sa rsa;
 
-- 
2.21.3

Re: [PATCH] net: phy: Prevent reporting advertised modes when autoneg is off

2020-10-14 Thread Lukasz Stelmach

It was <2020-10-14 śro 14:32>, when Russell King - ARM Linux admin wrote:
> On Wed, Oct 14, 2020 at 02:56:50PM +0200, Łukasz Stelmach wrote:
>> Do not report advertised link modes when autonegotiation is turned
>> off. mii_ethtool_get_link_ksettings() exhibits the same behaviour.
>
> Please explain why this is a desirable change.
>

To make the behavior uniform accross different drivers. For example
ethtool shows different reports on different hardware depending on
whether the driver uses phylib or mii. I don't insist on accepting my
patch. I merely propos it as a means of the unification. Maybe it is
mii.c that should be changed.

> Referring to some other piece of code isn't a particularly good reason
> especially when that piece of code is likely derived from fairly old
> code (presumably mii_ethtool_get_link_ksettings()'s behaviour is
> designed to be compatible with mii_ethtool_gset()).

Well according to git phy_ethtool_ksettings_get() was first (2011-04-15,
phy_ethtool_get_link_ksettings() soon after) while
mii_ethtool_get_link_ksettings() is half a year younger. Indeed, maybe I
should patch mii_ethtool_get_link_ksettings() instead.

> In any case, the mii.c code does fill in the advertising mask even
> when autoneg is disabled, because, rightly or wrongly, the advertising
> mask contains more than just the link modes, it includes the
> interface(s) as well. Your change means phylib no longer reports the
> interface modes which is at odds with the mii.c code.

I am afraid you are wrong. There is a rather big if()[1], which
depending on AN beeing enabled fills more or less information. Yes this
if() looks like it has been yanked from mii_ethtool_gset(). When
advertising is converted and copied to cmd->link_modes.advertising at
the end of mii_ethtool_get_link_ksettings() it is 0[2] if autonegotiation
is disabled.

[1] https://elixir.bootlin.com/linux/v5.9/source/drivers/net/mii.c#L174
[2] https://elixir.bootlin.com/linux/v5.9/source/drivers/net/mii.c#L215

>> Signed-off-by: Łukasz Stelmach 
>> ---
>>  drivers/net/phy/phy.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index 35525a671400..3cadf224fdb2 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -315,7 +315,8 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
>> struct ethtool_link_ksettings *cmd)
>>  {
>>  linkmode_copy(cmd->link_modes.supported, phydev->supported);
>> -linkmode_copy(cmd->link_modes.advertising, phydev->advertising);
>> +if (phydev->autoneg)
>> +linkmode_copy(cmd->link_modes.advertising, phydev->advertising);
>>  linkmode_copy(cmd->link_modes.lp_advertising, phydev->lp_advertising);
>>  
>>  cmd->base.speed = phydev->speed;
>> -- 
>> 2.26.2
>> 
>> 

-- 
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics

signature.asc
Description: PGP signature

Re: vxlan_asymmetric.sh test failed every time

2020-10-14 Thread Ido Schimmel

On Wed, Oct 14, 2020 at 09:39:16AM +0800, Hangbin Liu wrote:
> Thanks a lot for help debugging this issue, this patch works for me.

Also patched vxlan_symmetric.sh and applied to our tree. Will submit
tomorrow if nothing explodes in regression.

Thanks for reporting and testing.

[PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next

2020-10-14 Thread Yonghong Song

Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
tried to fix the issue where seq_file pos is not increased
if a NULL element is returned with seq_ops->next(). See bug
  https://bugzilla.kernel.org/show_bug.cgi?id=206283
The commit effectively does:
  - increase pos for all seq_ops->start()
  - increase pos for all seq_ops->next()

For ipv6_route, increasing pos for all seq_ops->next() is correct.
But increasing pos for seq_ops->start() is not correct
since pos is used to determine how many items to skip during
seq_ops->start():
  iter->skip = *pos;
seq_ops->start() just fetches the *current* pos item.
The item can be skipped only after seq_ops->show() which essentially
is the beginning of seq_ops->next().

For example, I have 7 ipv6 route entries,
  root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
   40  00 
 0400 0001  0001 eth0
  fe80 40  00 
 0100 0001  0001 eth0
   00  00 
  0001  00200200   lo
  0001 80  00 
  0003  8021   lo
  fe802050e3fffebd3be8 80  00 
  0002  8021 eth0
  ff00 08  00 
 0100 0004  0001 eth0
   00  00 
  0001  00200200   lo
  0+1 records in
  0+1 records out
  1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
  root@arch-fb-vm1:~/net-next

In the above, I specify buffer size 4096, so all records can be returned
to user space with a single trip to the kernel.

If I use buffer size 128, since each record size is 149, internally
kernel seq_read() will read 149 into its internal buffer and return the data
to user space in two read() syscalls. Then user read() syscall will trigger
next seq_ops->start(). Since the current implementation increased pos even
for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
record is #1.

  root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
   40  00 
 0400 0001  0001 eth0
   00  00 
  0001  00200200   lo
  fe802050e3fffebd3be8 80  00 
  0002  8021 eth0
   00  00 
  0001  00200200   lo
4+1 records in
4+1 records out
600 bytes copied, 0.00127758 s, 470 kB/s

To fix the problem, create a fake pos pointer so seq_ops->start()
won't actually increase seq_file pos. With this fix, the
above `dd` command with `bs=128` will show correct result.

Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index")
Cc: Andrii Nakryiko 
Cc: Alexei Starovoitov 
Suggested-by: Vasily Averin 
Reviewed-by: Vasily Averin 
Signed-off-by: Yonghong Song 
---
 net/ipv6/ip6_fib.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Changelog:
 v2 -> v3:
  - initialize local variable "p" to avoid potential syzbot complaint. (Eric)
 v1 -> v2:
  - instead of push increment of *pos in ipv6_route_seq_next() for
seq_ops->next() only. Add a face pos pointer in seq_ops->start()
and use it when calling ipv6_route_seq_next().

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 141c0a4c569a..605cdd38a919 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -2622,8 +2622,10 @@ static void *ipv6_route_seq_start(struct seq_file *seq, 
loff_t *pos)
iter->skip = *pos;
 
if (iter->tbl) {
+   loff_t p = 0;
+
ipv6_route_seq_setup_walk(iter, net);
-   return ipv6_route_seq_next(seq, NULL, pos);
+   return ipv6_route_seq_next(seq, NULL, &p);
} else {
return NULL;
}
-- 
2.24.1

Re: [Patch net v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly

2020-10-14 Thread Willem de Bruijn

On Wed, Oct 14, 2020 at 4:52 AM Xie He  wrote:
>
> On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn
>  wrote:
> >
> > There is agreement that hard_header_len should be the length of link
> > layer headers visible to the upper layers, needed_headroom the
> > additional room required for headers that are not exposed, i.e., those
> > pushed inside ndo_start_xmit.
> >
> > The link layer header length also has to agree with the interface
> > hardware type (ARPHRD_..).
> >
> > Tunnel devices have not always been consistent in this, but today
> > "bare" ip tunnel devices without additional headers (ipip, sit, ..) do
> > match this and advertise 0 byte hard_header_len. Bareudp, vxlan and
> > geneve also conform to this. Known exception that probably needs to be
> > addressed is sit, which still advertises LL_MAX_HEADER and so has
> > exposed quite a few syzkaller issues. Side note, it is not entirely
> > clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and
> > why they are needed.
> >
> > GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.
> > The second makes sense, as it appears as an Ethernet device. The first
> > should match "bare" ip tunnel devices, if following the above logic.
> > Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE
> > header around in collect medata mode") implements. It changes
> > dev->type to ARPHRD_NONE in collect_md mode.
> >
> > Some of the inconsistency comes from the various modes of the GRE
> > driver. Which brings us to ipgre_header_ops. It is set only in two
> > special cases.
> >
> > Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")
> > added ipgre_header_ops.parse to be able to receive the inner ip source
> > address with PF_PACKET recvfrom. And apparently relies on
> > ipgre_header_ops.create to be able to set an address, which implies
> > SOCK_DGRAM.
> >
> > The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its
> > implementation starts with the beautiful comment "/* Nice toy.
> > Unfortunately, useless in real life :-)". From the rest of that
> > detailed comment, it is not clear to me why it would need to expose
> > the headers. The example does not use packet sockets.
> >
> > A packet socket cannot know devices details such as which configurable
> > mode a device may be in. And different modes conflict with the basic
> > rule that for a given well defined link layer type, i.e., dev->type,
> > header length can be expected to be consistent. In an ideal world
> > these exceptions would not exist, therefore.
> >
> > Unfortunately, this is legacy behavior that will have to continue to
> > be supported.
>
> Thanks for your explanation. So header_ops for GRE devices is only
> used in 2 special situations. In normal situations, header_ops is not
> used for GRE devices. And we consider not using header_ops should be
> the ideal arrangement for GRE devices.
>
> Can we create a new dev->type (like ARPHRD_IPGRE_SPECIAL) for GRE
> devices that use header_ops? I guess changing dev->type will not
> affect the interface to the user space? This way we can solve the
> problem of the same dev->type having different hard_header_len values.

But does that address any real issue?

If anything, it would make sense to keep ARHPHRD_IPGRE for tunnels
that expect headers and switch to ARPHRD_NONE for those that do not.
As the collect_md commit I mentioned above does.

> Also, for the second special situation, if there's no obvious reason
> to use header_ops, maybe we can consider removing header_ops for this
> situation.

Unfortunately, there's no knowing if some application is using this
broadcast mode *with* a process using packet sockets.

Re: [PATCH bpf-next] selftests/bpf: fix compilation error in progs/profiler.inc.h

2020-10-14 Thread Song Liu




> On Oct 13, 2020, at 9:36 PM, Song Liu  wrote:
> 
> Fix the following error when compiling selftests/bpf
> 
> progs/profiler.inc.h:246:5: error: redefinition of 'pids_cgrp_id' as 
> different kind of symbol
> 
> pids_cgrp_id is used in cgroup code, and included in vmlinux.h. Fix the
> error by renaming pids_cgrp_id as pids_cgroup_id.
> 
> Fixes: 03d4d13fab3f ("selftests/bpf: Add profiler test")
> Signed-off-by: Song Liu 

I forgot to mention

Reported-by: Jiri Olsa 

> ---
> tools/testing/selftests/bpf/progs/profiler.inc.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/progs/profiler.inc.h 
> b/tools/testing/selftests/bpf/progs/profiler.inc.h
> index 00578311a4233..b554c1e40b9fb 100644
> --- a/tools/testing/selftests/bpf/progs/profiler.inc.h
> +++ b/tools/testing/selftests/bpf/progs/profiler.inc.h
> @@ -243,7 +243,7 @@ static ino_t get_inode_from_kernfs(struct kernfs_node* 
> node)
>   }
> }
> 
> -int pids_cgrp_id = 1;
> +int pids_cgroup_id = 1;
> 
> static INLINE void* populate_cgroup_info(struct cgroup_data_t* cgroup_data,
>struct task_struct* task,
> @@ -262,7 +262,7 @@ static INLINE void* populate_cgroup_info(struct 
> cgroup_data_t* cgroup_data,
>   BPF_CORE_READ(task, cgroups, subsys[i]);
>   if (subsys != NULL) {
>   int subsys_id = BPF_CORE_READ(subsys, ss, id);
> - if (subsys_id == pids_cgrp_id) {
> + if (subsys_id == pids_cgroup_id) {
>   proc_kernfs = BPF_CORE_READ(subsys, 
> cgroup, kn);
>   root_kernfs = BPF_CORE_READ(subsys, ss, 
> root, kf_root, kn);
>   break;
> -- 
> 2.24.1
>

[PATCH net] net: dsa: ksz: fix padding size of skb

2020-10-14 Thread Christian Eggers

__skb_put_padto() is called in order to ensure a minimal size of the
sk_buff. The required minimal size is ETH_ZLEN + the size required for
the tail tag.

The current argument misses the size for the tail tag. The expression
"skb->len + padlen" can be simplified to ETH_ZLEN.

Too small sk_buffs typically result from cloning in
dsa_skb_tx_timestamp(). The cloned sk_buff may not meet the minimum size
requirements.

Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing")
Signed-off-by: Christian Eggers 
---
 net/dsa/tag_ksz.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 945a9bd5ba35..8ef2085349e7 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -24,7 +24,7 @@ static struct sk_buff *ksz_common_xmit(struct sk_buff *skb,
 
if (skb_tailroom(skb) >= padlen + len) {
/* Let dsa_slave_xmit() free skb */
-   if (__skb_put_padto(skb, skb->len + padlen, false))
+   if (__skb_put_padto(skb, ETH_ZLEN + len, false))
return NULL;
 
nskb = skb;
@@ -45,7 +45,7 @@ static struct sk_buff *ksz_common_xmit(struct sk_buff *skb,
/* Let skb_put_padto() free nskb, and let dsa_slave_xmit() free
 * skb
 */
-   if (skb_put_padto(nskb, nskb->len + padlen))
+   if (skb_put_padto(nskb, ETH_ZLEN + len))
return NULL;
 
consume_skb(skb);
-- 
Christian Eggers
Embedded software developer

Arnold & Richter Cine Technik GmbH & Co. Betriebs KG
Sitz: Muenchen - Registergericht: Amtsgericht Muenchen - Handelsregisternummer: 
HRA 57918
Persoenlich haftender Gesellschafter: Arnold & Richter Cine Technik GmbH
Sitz: Muenchen - Registergericht: Amtsgericht Muenchen - Handelsregisternummer: 
HRB 54477
Geschaeftsfuehrer: Dr. Michael Neuhaeuser; Stephan Schenk; Walter Trauninger; 
Markus Zeiler

Re: [ PATCH v2 1/2] ibmveth: Switch order of ibmveth_helper calls.

2020-10-14 Thread Willem de Bruijn

On Tue, Oct 13, 2020 at 7:21 PM David Wilder  wrote:
>
> ibmveth_rx_csum_helper() must be called after ibmveth_rx_mss_helper()
> as ibmveth_rx_csum_helper() may alter ip and tcp checksum values.
>
> Fixes: 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk
> VEA.")
> Signed-off-by: David Wilder 
> Reviewed-by: Thomas Falcon 
> Reviewed-by: Cristobal Forno 
> Reviewed-by: Pradeep Satyanarayana 

Acked-by: Willem de Bruijn

Re: [ PATCH v2 2/2] ibmveth: Identify ingress large send packets.

2020-10-14 Thread Willem de Bruijn

On Tue, Oct 13, 2020 at 7:21 PM David Wilder  wrote:
>
> Ingress large send packets are identified by either:
> The IBMVETH_RXQ_LRG_PKT flag in the receive buffer
> or with a -1 placed in the ip header checksum.
> The method used depends on firmware version. Frame
> geometry and sufficient header validation is performed by the
> hypervisor eliminating the need for further header checks here.
>
> Fixes: 7b5967389f5a ("ibmveth: set correct gso_size and gso_type")
> Signed-off-by: David Wilder 
> Reviewed-by: Thomas Falcon 
> Reviewed-by: Cristobal Forno 
> Reviewed-by: Pradeep Satyanarayana 

Acked-by: Willem de Bruijn 

Thanks for clarifying the header validation. I clearly had missed that :)

Re: [PATCH net-next 3/3] macb: support the two tx descriptors on at91rm9200

2020-10-14 Thread Willy Tarreau

Hi Claudiu,

first, thanks for your feedback!

On Wed, Oct 14, 2020 at 04:08:00PM +, claudiu.bez...@microchip.com wrote:
> > @@ -3994,11 +3996,10 @@ static netdev_tx_t at91ether_start_xmit(struct 
> > sk_buff *skb,
> > struct net_device *dev)
> >  {
> > struct macb *lp = netdev_priv(dev);
> > +   unsigned long flags;
> > 
> > -   if (macb_readl(lp, TSR) & MACB_BIT(RM9200_BNQ)) {
> > -   int desc = 0;
> > -
> > -   netif_stop_queue(dev);
> > +   if (lp->rm9200_tx_len < 2) {
> > +   int desc = lp->rm9200_tx_tail;
> 
> I think you also want to protect these reads with spin_lock() to avoid
> concurrency with the interrupt handler.

I don't think it's needed because the condition doesn't change below
us as the interrupt handler only decrements. However I could use a
READ_ONCE to make things cleaner. And in practice this test was kept
to keep some sanity checks but it never fails, as if the queue length
reaches 2, the queue is stopped (and I never got the device busy message
either before nor after the patch).

> > /* Store packet information (to free when Tx completed) */
> > lp->rm9200_txq[desc].skb = skb;
> > @@ -4012,6 +4013,15 @@ static netdev_tx_t at91ether_start_xmit(struct 
> > sk_buff *skb,
> > return NETDEV_TX_OK;
> > }
> > 
> > +   spin_lock_irqsave(&lp->lock, flags);
> > +
> > +   lp->rm9200_tx_tail = (desc + 1) & 1;
> > +   lp->rm9200_tx_len++;
> > +   if (lp->rm9200_tx_len > 1)
> > +   netif_stop_queue(dev);

This is where we guarantee that we won't call start_xmit() again with
rm9200_tx_len >= 2.

> > @@ -4088,21 +4100,39 @@ static irqreturn_t at91ether_interrupt(int irq, 
> > void *dev_id)
> > at91ether_rx(dev);
> > 
> > /* Transmit complete */
> > -   if (intstatus & MACB_BIT(TCOMP)) {
> > +   if (intstatus & (MACB_BIT(TCOMP) | MACB_BIT(RM9200_TBRE))) {
> > /* The TCOM bit is set even if the transmission failed */
> > if (intstatus & (MACB_BIT(ISR_TUND) | MACB_BIT(ISR_RLE)))
> > dev->stats.tx_errors++;
> > 
> > -   desc = 0;
> > -   if (lp->rm9200_txq[desc].skb) {
> > +   spin_lock(&lp->lock);
> 
> Also, this lock could be moved before while, below, as you want to protect
> the rm9200_tx_len and rm9200_tx_tails members of lp as I understand.

Sure, but it actually *is* before the while(). I'm sorry if that was not
visible from the context of the diff. The while is just a few lins below,
thus rm9200_tx_len and rm9200_tx_tail are properly protected. Do not
hesitate to tell me if something is not clear or if I'm wrong!

Thanks!
Willy

Re: [PATCH net] net: dsa: ksz: fix padding size of skb

2020-10-14 Thread Vladimir Oltean

On Wed, Oct 14, 2020 at 06:17:19PM +0200, Christian Eggers wrote:
> __skb_put_padto() is called in order to ensure a minimal size of the
> sk_buff. The required minimal size is ETH_ZLEN + the size required for
> the tail tag.
>
> The current argument misses the size for the tail tag. The expression
> "skb->len + padlen" can be simplified to ETH_ZLEN.
>
> Too small sk_buffs typically result from cloning in
> dsa_skb_tx_timestamp(). The cloned sk_buff may not meet the minimum size
> requirements.
>
> Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing")
> Signed-off-by: Christian Eggers 
> ---

Reviewed-by: Vladimir Oltean

[PATCH ethtool] netlink: fix allocation failure handling in dump_features()

2020-10-14 Thread Michal Kubecek

On allocation failure, dump_features() would set ret to -ENOMEM but then
return 0 anyway. As there is nothing to free in this case anyway, the
easiest fix is to simply return -ENOMEM rather than jumping to out_free
label - which can be dropped as well as this was its only use.

Fixes: f2c17e107900 ("netlink: add netlink handler for gfeatures (-k)")
Reported-by: Ivan Vecera 
Signed-off-by: Michal Kubecek 
---
 netlink/features.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/netlink/features.c b/netlink/features.c
index 3f1240437350..2a0899e6eb04 100644
--- a/netlink/features.c
+++ b/netlink/features.c
@@ -117,11 +117,9 @@ int dump_features(const struct nlattr *const *tb,
ret = prepare_feature_results(tb, &results);
if (ret < 0)
return -EFAULT;
-
-   ret = -ENOMEM;
feature_flags = calloc(results.count, sizeof(feature_flags[0]));
if (!feature_flags)
-   goto out_free;
+   return -ENOMEM;
 
/* map netdev features to legacy flags */
for (i = 0; i < results.count; i++) {
@@ -182,7 +180,6 @@ int dump_features(const struct nlattr *const *tb,
dump_feature(&results, NULL, NULL, i, name, "");
}
 
-out_free:
free(feature_flags);
return 0;
 }
-- 
2.28.0

Re: [PATCH net] net: dsa: ksz: fix padding size of skb

2020-10-14 Thread Vladimir Oltean

On Wed, Oct 14, 2020 at 07:47:50PM +0300, Vladimir Oltean wrote:
> On Wed, Oct 14, 2020 at 06:17:19PM +0200, Christian Eggers wrote:
> > __skb_put_padto() is called in order to ensure a minimal size of the
> > sk_buff. The required minimal size is ETH_ZLEN + the size required for
> > the tail tag.
> >
> > The current argument misses the size for the tail tag. The expression
> > "skb->len + padlen" can be simplified to ETH_ZLEN.
> >
> > Too small sk_buffs typically result from cloning in
> > dsa_skb_tx_timestamp(). The cloned sk_buff may not meet the minimum size
> > requirements.
> >
> > Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing")
> > Signed-off-by: Christian Eggers 
> > ---
> 
> Reviewed-by: Vladimir Oltean 

Actually no, I take that back.

This statement:

> The expression "skb->len + padlen" can be simplified to ETH_ZLEN.

is false.
skb->len + padlen == ETH_ZLEN only if skb->len is less than ETH_ZLEN.
Otherwise, skb->len + padlen == skb->len.

Otherwise said, the frame must be padded to
max(skb->len, ETH_ZLEN) + tail tag length.

So please keep the "skb->len + padlen + len".

Thanks,
-Vladimir

Re: [PATCH net] net: dsa: ksz: fix padding size of skb

2020-10-14 Thread Christian Eggers

Hi Vladimir,

On Wednesday, 14 October 2020, 18:54:10 CEST, Vladimir Oltean wrote:
> On Wed, Oct 14, 2020 at 07:47:50PM +0300, Vladimir Oltean wrote:
> > On Wed, Oct 14, 2020 at 06:17:19PM +0200, Christian Eggers wrote:
> > > __skb_put_padto() is called in order to ensure a minimal size of the
> > > sk_buff. The required minimal size is ETH_ZLEN + the size required for
> > > the tail tag.
> > > 
> > > The current argument misses the size for the tail tag. The expression
> > > "skb->len + padlen" can be simplified to ETH_ZLEN.
> > > 
> > > Too small sk_buffs typically result from cloning in
> > > dsa_skb_tx_timestamp(). The cloned sk_buff may not meet the minimum size
> > > requirements.
> > > 
> > > Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing")
> > > Signed-off-by: Christian Eggers 
> > > ---
> > 
> > Reviewed-by: Vladimir Oltean 
> 
> Actually no, I take that back.
> 
> This statement:
> > The expression "skb->len + padlen" can be simplified to ETH_ZLEN.
> 
> is false.
> skb->len + padlen == ETH_ZLEN only if skb->len is less than ETH_ZLEN.
ok, my comment is false.

> Otherwise, skb->len + padlen == skb->len.
> 
> Otherwise said, the frame must be padded to
> max(skb->len, ETH_ZLEN) + tail tag length.
At first I thought the same when working on this. But IMHO the padding must 
only ensure the minimum required size, there is no need to pad to the "real" 
size of the skb. The check for the tailroom above ensures that enough memory 
for the "real" size is available.

> So please keep the "skb->len + padlen + len".
> 
> Thanks,
> -Vladimir
Best regards
Christian

Re: [PATCH net] net: dsa: ksz: fix padding size of skb

2020-10-14 Thread Vladimir Oltean

On Wed, Oct 14, 2020 at 07:02:13PM +0200, Christian Eggers wrote:
> > Otherwise said, the frame must be padded to
> > max(skb->len, ETH_ZLEN) + tail tag length.
> At first I thought the same when working on this. But IMHO the padding must
> only ensure the minimum required size, there is no need to pad to the "real"
> size of the skb. The check for the tailroom above ensures that enough memory
> for the "real" size is available.

Yes, that's right, that's the current logic, but what's the point of
your patch, then, if the call to __skb_put_padto is only supposed to
ensure ETH_ZLEN length?
In fact, __skb_put_padto fundamentally does:
- an extension of skb->len to the requested argument, via __skb_put
- a zero-filling of the extra area
So if you include the length of the tag in the call to __skb_put_padto,
then what's the other skb_put() from ksz8795_xmit, ksz9477_xmit,
ksz9893_xmit going to do? Aren't you increasing the frame length twice
by the length of one tag when you are doing this? What problem are you
actually trying to solve?
Can you show a skb_dump(KERN_ERR, skb, true) before and after your change?

Re: [PATCH net] net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info

2020-10-14 Thread Cong Wang

On Wed, Oct 14, 2020 at 1:56 AM Leon Romanovsky  wrote:
>
> From: Leon Romanovsky 
>
> The access of tcf_tunnel_info() produces the following splat, so fix it
> by dereferencing the tcf_tunnel_key_params pointer with marker that
> internal tcfa_liock is held.

Looks reasonable to me,

Acked-by: Cong Wang 

Thanks.

[PATCH net 2/3] net/smc: fix valid DMBE buffer sizes

2020-10-14 Thread Karsten Graul

The SMCD_DMBE_SIZES should include all valid DMBE buffer sizes, so the
correct value is 6 which means 1MB. With 7 the registration of an ISM
buffer would always fail because of the invalid size requested.
Fix that and set the value to 6.

Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Karsten Graul 
---
 net/smc/smc_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index f1dbb5025c0b..5de637472a11 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1596,7 +1596,7 @@ static int smcr_buf_map_usable_links(struct 
smc_link_group *lgr,
return rc;
 }
 
-#define SMCD_DMBE_SIZES7 /* 0 -> 16KB, 1 -> 32KB, .. 6 -> 1MB 
*/
+#define SMCD_DMBE_SIZES6 /* 0 -> 16KB, 1 -> 32KB, .. 6 -> 1MB 
*/
 
 static struct smc_buf_desc *smcd_new_buf_create(struct smc_link_group *lgr,
bool is_dmb, int bufsize)
-- 
2.17.1

[PATCH net 1/3] net/smc: fix use-after-free of delayed events

2020-10-14 Thread Karsten Graul

When a delayed event is enqueued then the event worker will send this
event the next time it is running and no other flow is currently
active. The event handler is called for the delayed event, and the
pointer to the event keeps set in lgr->delayed_event. This pointer is
cleared later in the processing by smc_llc_flow_start().
This can lead to a use-after-free condition when the processing does not
reach smc_llc_flow_start(), but frees the event because of an error
situation. Then the delayed_event pointer is still set but the event is
freed.
Fix this by always clearing the delayed event pointer when the event is
provided to the event handler for processing, and remove the code to
clear it in smc_llc_flow_start().

Fixes: 555da9af827d ("net/smc: add event-based llc_flow framework")
Signed-off-by: Karsten Graul 
---
 net/smc/smc_llc.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index 2db967f2fb50..39039a82f24f 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -233,8 +233,6 @@ static bool smc_llc_flow_start(struct smc_llc_flow *flow,
default:
flow->type = SMC_LLC_FLOW_NONE;
}
-   if (qentry == lgr->delayed_event)
-   lgr->delayed_event = NULL;
smc_llc_flow_qentry_set(flow, qentry);
spin_unlock_bh(&lgr->llc_flow_lock);
return true;
@@ -1603,13 +1601,12 @@ static void smc_llc_event_work(struct work_struct *work)
struct smc_llc_qentry *qentry;
 
if (!lgr->llc_flow_lcl.type && lgr->delayed_event) {
-   if (smc_link_usable(lgr->delayed_event->link)) {
-   smc_llc_event_handler(lgr->delayed_event);
-   } else {
-   qentry = lgr->delayed_event;
-   lgr->delayed_event = NULL;
+   qentry = lgr->delayed_event;
+   lgr->delayed_event = NULL;
+   if (smc_link_usable(qentry->link))
+   smc_llc_event_handler(qentry);
+   else
kfree(qentry);
-   }
}
 
 again:
-- 
2.17.1

[PATCH net 3/3] net/smc: fix invalid return code in smcd_new_buf_create()

2020-10-14 Thread Karsten Graul

smc_ism_register_dmb() returns error codes set by the ISM driver which
are not guaranteed to be negative or in the errno range. Such values
would not be handled by ERR_PTR() and finally the return code will be
used as a memory address.
Fix that by using a valid negative errno value with ERR_PTR().

Fixes: 72b7f6c48708 ("net/smc: unique reason code for exceeded max dmb count")
Signed-off-by: Karsten Graul 
---
 net/smc/smc_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 5de637472a11..d790c43c473f 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1615,7 +1615,8 @@ static struct smc_buf_desc *smcd_new_buf_create(struct 
smc_link_group *lgr,
rc = smc_ism_register_dmb(lgr, bufsize, buf_desc);
if (rc) {
kfree(buf_desc);
-   return (rc == -ENOMEM) ? ERR_PTR(-EAGAIN) : ERR_PTR(rc);
+   return (rc == -ENOMEM) ? ERR_PTR(-EAGAIN) :
+ERR_PTR(-EIO);
}
buf_desc->pages = virt_to_page(buf_desc->cpu_addr);
/* CDC header stored in buf. So, pretend it was smaller */
-- 
2.17.1

[PATCH net 0/3] net/smc: fixes 2020-10-14

2020-10-14 Thread Karsten Graul

Please apply the following patch series for smc to netdev's net tree.

The first patch fixes a possible use-after-free of delayed llc events.
Patch 2 corrects the number of DMB buffer sizes. And patch 3 ensures
a correctly formatted return code when smc_ism_register_dmb() fails to
create a new DMB.

Karsten Graul (3):
  net/smc: fix use-after-free of delayed events
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix invalid return code in smcd_new_buf_create()

 net/smc/smc_core.c |  5 +++--
 net/smc/smc_llc.c  | 13 +
 2 files changed, 8 insertions(+), 10 deletions(-)

-- 
2.17.1

Re: [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next

2020-10-14 Thread Martin KaFai Lau

On Wed, Oct 14, 2020 at 07:46:12AM -0700, Yonghong Song wrote:
> Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> tried to fix the issue where seq_file pos is not increased
> if a NULL element is returned with seq_ops->next(). See bug
>   https://bugzilla.kernel.org/show_bug.cgi?id=206283
> The commit effectively does:
>   - increase pos for all seq_ops->start()
>   - increase pos for all seq_ops->next()
> 
> For ipv6_route, increasing pos for all seq_ops->next() is correct.
> But increasing pos for seq_ops->start() is not correct
> since pos is used to determine how many items to skip during
> seq_ops->start():
>   iter->skip = *pos;
> seq_ops->start() just fetches the *current* pos item.
> The item can be skipped only after seq_ops->show() which essentially
> is the beginning of seq_ops->next().
> 
> For example, I have 7 ipv6 route entries,
>   root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
>    40  00 
>  0400 0001  0001 eth0
>   fe80 40  00 
>  0100 0001  0001 eth0
>    00  00 
>   0001  00200200   lo
>   0001 80  00 
>   0003  8021   lo
>   fe802050e3fffebd3be8 80  00 
>   0002  8021 eth0
>   ff00 08  00 
>  0100 0004  0001 eth0
>    00  00 
>   0001  00200200   lo
>   0+1 records in
>   0+1 records out
>   1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
>   root@arch-fb-vm1:~/net-next
> 
> In the above, I specify buffer size 4096, so all records can be returned
> to user space with a single trip to the kernel.
> 
> If I use buffer size 128, since each record size is 149, internally
> kernel seq_read() will read 149 into its internal buffer and return the data
> to user space in two read() syscalls. Then user read() syscall will trigger
> next seq_ops->start(). Since the current implementation increased pos even
> for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
> record is #1.
> 
>   root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
>    40  00 
>  0400 0001  0001 eth0
>    00  00 
>   0001  00200200   lo
>   fe802050e3fffebd3be8 80  00 
>   0002  8021 eth0
>    00  00 
>   0001  00200200   lo
> 4+1 records in
> 4+1 records out
> 600 bytes copied, 0.00127758 s, 470 kB/s
> 
> To fix the problem, create a fake pos pointer so seq_ops->start()
> won't actually increase seq_file pos. With this fix, the
> above `dd` command with `bs=128` will show correct result.
> 
> Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> Cc: Andrii Nakryiko 
> Cc: Alexei Starovoitov 
> Suggested-by: Vasily Averin 
> Reviewed-by: Vasily Averin 
> Signed-off-by: Yonghong Song 
> ---
>  net/ipv6/ip6_fib.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Changelog:
>  v2 -> v3:
>   - initialize local variable "p" to avoid potential syzbot complaint. (Eric)
Acked-by: Martin KaFai Lau

Re: [PATCH net] can: peak_usb: add range checking in decode operations

2020-10-14 Thread Oliver Hartkopp


Hi Stephane,

On 14.10.20 15:22, Stéphane Grosjean wrote:

Hello Dan,

Don't know if this patch is still relevant, but:

- there is absolutely no reason for the device firmware to provide a channel 
index greater than or equal to 2, because the IP core of these USB devices 
handles 2 channels only. Anyway, these changes are correct.
- considering the verification of the length "cfd->len" on the other hand, this 
one comes directly from can_send() via dev_queue_xmit() AFAIK and it seems to me that the 
underlying driver can assume that its value is smaller than 64.


In fact there are many inbound checks e.g. with 
can_dropped_invalid_skb() to make sure the network layer gets proper CAN 
skbs (with ETH_P_CAN(FD) ethertypes).


On the outgoing path the CAN driver gets these ETH_P_CAN(FD) CAN skbs an 
just copies the CAN ID and the up to 64 bytes of data from that skb.


But remember that you can also generate CAN frames via AF_PACKET sockets 
which does not perform the sanity checks from can_send():

https://github.com/linux-can/can-tests/blob/master/netlayer/tst-packet.c

Copying 64 byte from the skb into an I/O attached CAN controller is 
always a safe operation - but when you send the content through another 
medium (e.g. USB) the length values should be checked.


Best regards,
Oliver



Regards,
---
Stéphane Grosjean
PEAK-System France
132, rue André Bisiaux
F-54320 MAXEVILLE
Tél : +(33) 9.72.54.51.97



De : Dan Carpenter 
Envoyé : jeudi 13 août 2020 16:06
À : Wolfgang Grandegger ; Stéphane Grosjean 

Cc : Marc Kleine-Budde ; David S. Miller ; Jakub Kicinski 
; Andri Yngvason ; Oliver Hartkopp ; 
linux-...@vger.kernel.org ; netdev@vger.kernel.org ; 
kernel-janit...@vger.kernel.org 
Objet : [PATCH net] can: peak_usb: add range checking in decode operations

These values come from skb->data so Smatch considers them untrusted.  I
believe Smatch is correct but I don't have a way to test this.

The usb_if->dev[] array has 2 elements but the index is in the 0-15
range without checks.  The cfd->len can be up to 255 but the maximum
valid size is CANFD_MAX_DLEN (64) so that could lead to memory
corruption.

Fixes: 0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB 
adapters")
Signed-off-by: Dan Carpenter 
---
  drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 48 +-
  1 file changed, 37 insertions(+), 11 deletions(-)

diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c 
b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
index 47cc1ff5b88e..dee3e689b54d 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
@@ -468,12 +468,18 @@ static int pcan_usb_fd_decode_canmsg(struct 
pcan_usb_fd_if *usb_if,
   struct pucan_msg *rx_msg)
  {
  struct pucan_rx_msg *rm = (struct pucan_rx_msg *)rx_msg;
-   struct peak_usb_device *dev = usb_if->dev[pucan_msg_get_channel(rm)];
-   struct net_device *netdev = dev->netdev;
+   struct peak_usb_device *dev;
+   struct net_device *netdev;
  struct canfd_frame *cfd;
  struct sk_buff *skb;
  const u16 rx_msg_flags = le16_to_cpu(rm->flags);

+   if (pucan_msg_get_channel(rm) >= ARRAY_SIZE(usb_if->dev))
+   return -ENOMEM;
+
+   dev = usb_if->dev[pucan_msg_get_channel(rm)];
+   netdev = dev->netdev;
+
  if (rx_msg_flags & PUCAN_MSG_EXT_DATA_LEN) {
  /* CANFD frame case */
  skb = alloc_canfd_skb(netdev, &cfd);
@@ -519,15 +525,21 @@ static int pcan_usb_fd_decode_status(struct 
pcan_usb_fd_if *usb_if,
   struct pucan_msg *rx_msg)
  {
  struct pucan_status_msg *sm = (struct pucan_status_msg *)rx_msg;
-   struct peak_usb_device *dev = usb_if->dev[pucan_stmsg_get_channel(sm)];
-   struct pcan_usb_fd_device *pdev =
-   container_of(dev, struct pcan_usb_fd_device, dev);
+   struct pcan_usb_fd_device *pdev;
  enum can_state new_state = CAN_STATE_ERROR_ACTIVE;
  enum can_state rx_state, tx_state;
-   struct net_device *netdev = dev->netdev;
+   struct peak_usb_device *dev;
+   struct net_device *netdev;
  struct can_frame *cf;
  struct sk_buff *skb;

+   if (pucan_stmsg_get_channel(sm) >= ARRAY_SIZE(usb_if->dev))
+   return -ENOMEM;
+
+   dev = usb_if->dev[pucan_stmsg_get_channel(sm)];
+   pdev = container_of(dev, struct pcan_usb_fd_device, dev);
+   netdev = dev->netdev;
+
  /* nothing should be sent while in BUS_OFF state */
  if (dev->can.state == CAN_STATE_BUS_OFF)
  return 0;
@@ -579,9 +591,14 @@ static int pcan_usb_fd_decode_error(struct pcan_usb_fd_if 
*usb_if,
  struct pucan_msg *rx_msg)
  {
  struct pucan_error_msg *er = (struct pucan_error_msg *)rx_msg;
-   struct peak_usb_device *dev = usb_if->dev[pucan_ermsg_get_channel(er)];
-   struct

[PATCH bpf-next] bpf: Fix register equivalence tracking.

2020-10-14 Thread Alexei Starovoitov

From: Alexei Starovoitov 

The 64-bit JEQ/JNE handling in reg_set_min_max() was clearing reg->id in either
true or false branch. In the case 'if (reg->id)' check was done on the other
branch the counter part register would have reg->id == 0 when called into
find_equal_scalars(). In such case the helper would incorrectly identify other
registers with id == 0 as equivalent and propagate the state incorrectly.
Fix it by preserving ID across reg_set_min_max().
In other words any kind of comparison operator on the scalar register
should preserve its ID to recognize:
r1 = r2
if (r1 == 20) {
  #1 here both r1 and r2 == 20
} else if (r2 < 20) {
  #2 here both r1 and r2 < 20
}

The patch is addressing #1 case. The #2 was working correctly already.

Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register 
assignments.")
Signed-off-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 38 ---
 .../testing/selftests/bpf/verifier/regalloc.c | 26 +
 2 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c43a5e8f0818..39d7f44e7c92 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1010,14 +1010,9 @@ static const int caller_saved[CALLER_SAVED_REGS] = {
 static void __mark_reg_not_init(const struct bpf_verifier_env *env,
struct bpf_reg_state *reg);
 
-/* Mark the unknown part of a register (variable offset or scalar value) as
- * known to have the value @imm.
- */
-static void __mark_reg_known(struct bpf_reg_state *reg, u64 imm)
+/* This helper doesn't clear reg->id */
+static void ___mark_reg_known(struct bpf_reg_state *reg, u64 imm)
 {
-   /* Clear id, off, and union(map_ptr, range) */
-   memset(((u8 *)reg) + sizeof(reg->type), 0,
-  offsetof(struct bpf_reg_state, var_off) - sizeof(reg->type));
reg->var_off = tnum_const(imm);
reg->smin_value = (s64)imm;
reg->smax_value = (s64)imm;
@@ -1030,6 +1025,17 @@ static void __mark_reg_known(struct bpf_reg_state *reg, 
u64 imm)
reg->u32_max_value = (u32)imm;
 }
 
+/* Mark the unknown part of a register (variable offset or scalar value) as
+ * known to have the value @imm.
+ */
+static void __mark_reg_known(struct bpf_reg_state *reg, u64 imm)
+{
+   /* Clear id, off, and union(map_ptr, range) */
+   memset(((u8 *)reg) + sizeof(reg->type), 0,
+  offsetof(struct bpf_reg_state, var_off) - sizeof(reg->type));
+   ___mark_reg_known(reg, imm);
+}
+
 static void __mark_reg32_known(struct bpf_reg_state *reg, u64 imm)
 {
reg->var_off = tnum_const_subreg(reg->var_off, imm);
@@ -7001,14 +7007,18 @@ static void reg_set_min_max(struct bpf_reg_state 
*true_reg,
struct bpf_reg_state *reg =
opcode == BPF_JEQ ? true_reg : false_reg;
 
-   /* For BPF_JEQ, if this is false we know nothing Jon Snow, but
-* if it is true we know the value for sure. Likewise for
-* BPF_JNE.
+   /* JEQ/JNE comparison doesn't change the register equivalence.
+* r1 = r2;
+* if (r1 == 42) goto label;
+* ...
+* label: // here both r1 and r2 are known to be 42.
+*
+* Hence when marking register as known preserve it's ID.
 */
if (is_jmp32)
__mark_reg32_known(reg, val32);
else
-   __mark_reg_known(reg, val);
+   ___mark_reg_known(reg, val);
break;
}
case BPF_JSET:
@@ -7551,7 +7561,8 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,

reg_combine_min_max(&other_branch_regs[insn->src_reg],

&other_branch_regs[insn->dst_reg],
src_reg, dst_reg, opcode);
-   if (src_reg->id) {
+   if (src_reg->id &&
+   !WARN_ON_ONCE(src_reg->id != 
other_branch_regs[insn->src_reg].id)) {
find_equal_scalars(this_branch, src_reg);
find_equal_scalars(other_branch, 
&other_branch_regs[insn->src_reg]);
}
@@ -7563,7 +7574,8 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
opcode, is_jmp32);
}
 
-   if (dst_reg->type == SCALAR_VALUE && dst_reg->id) {
+   if (dst_reg->type == SCALAR_VALUE && dst_reg->id &&
+   !WARN_ON_ONCE(dst_reg->id != other_branch_regs[insn->dst_reg].id)) {
find_equal_scalars(this_branch, dst_reg);
find_equal_scalars(other_branch, 
&other_branch_regs[insn->dst_reg]);
}
diff --git a/tools/testing/selftests/bpf/verifier/regalloc.c 
b/tools/testing/

Re: [PATCH] ixgbe: fail to create xfrm offload of IPsec tunnel mode SA

2020-10-14 Thread Shannon Nelson


On 10/14/20 7:17 AM, Antony Antony wrote:

Based on talks and indirect references ixgbe IPsec offlod do not
support IPsec tunnel mode offload. It can only support IPsec transport
mode offload. Now explicitly fail when creating non transport mode SA
  with offload to avoid false performance expectations.

Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Antony Antony 


Acked-by: Shannon Nelson 


---
  drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 5 +
  drivers/net/ethernet/intel/ixgbevf/ipsec.c | 5 +
  2 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index eca73526ac86..54d47265a7ac 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -575,6 +575,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
return -EINVAL;
}
  
+	if (xs->props.mode != XFRM_MODE_TRANSPORT) {

+   netdev_err(dev, "Unsupported mode for ipsec offload\n");
+   return -EINVAL;
+   }
+
if (ixgbe_ipsec_check_mgmt_ip(xs)) {
netdev_err(dev, "IPsec IP addr clash with mgmt filters\n");
return -EINVAL;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c 
b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 5170dd9d8705..caaea2c920a6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -272,6 +272,11 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs)
return -EINVAL;
}
  
+	if (xs->props.mode != XFRM_MODE_TRANSPORT) {

+   netdev_err(dev, "Unsupported mode for ipsec offload\n");
+   return -EINVAL;
+   }
+
if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
struct rx_sa rsa;

Re: [PATCH] IPv6: sr: Fix End.X nexthop to use oif.

2020-10-14 Thread kernel test robot

Hi Reji,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.9 next-20201013]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Reji-Thomas/IPv6-sr-Fix-End-X-nexthop-to-use-oif/20201013-204935
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
865c50e1d279671728c2936cb7680eb89355eeea
config: riscv-randconfig-r035-20201014 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
e7fe3c6dfede8d5781bd000741c3dea7088307a4)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/8d40085b9b014197ce7a7e8927730796bf50adb0
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Reji-Thomas/IPv6-sr-Fix-End-X-nexthop-to-use-oif/20201013-204935
git checkout 8d40085b9b014197ce7a7e8927730796bf50adb0
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from net/ipv6/seg6_local.c:11:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:556:9: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   return inb(addr);
  ^
   arch/riscv/include/asm/io.h:54:76: note: expanded from macro 'inb'
   #define inb(c)  ({ u8  __v; __io_pbr(); __v = 
readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
   
~~ ^
   arch/riscv/include/asm/mmio.h:87:48: note: expanded from macro 'readb_cpu'
   #define readb_cpu(c)({ u8  __r = __raw_readb(c); __r; })
^
   In file included from net/ipv6/seg6_local.c:11:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:564:9: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   return inw(addr);
  ^
   arch/riscv/include/asm/io.h:55:76: note: expanded from macro 'inw'
   #define inw(c)  ({ u16 __v; __io_pbr(); __v = 
readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
   
~~ ^
   arch/riscv/include/asm/mmio.h:88:76: note: expanded from macro 'readw_cpu'
   #define readw_cpu(c)({ u16 __r = le16_to_cpu((__force 
__le16)__raw_readw(c)); __r; })

^
   include/uapi/linux/byteorder/little_endian.h:36:51: note: expanded from 
macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
 ^
   In file included from net/ipv6/seg6_local.c:11:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:572:9: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   return inl(addr);
  ^
   arch/riscv/include/asm/io.h:56:76: note: expanded from macro 'inl'
   #define inl(c)  ({ u32 __v; __io_pbr(); __v = 
readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
   
~~ ^
   arch/riscv/include/asm/mmio.h:89:76: note: expanded from macro 'readl_cpu'
   #define readl_cpu(c)({ u32 __r = le32_to_cpu((__force 
__le32)__raw_readl(c)); __r; })

^
   include/uapi/linux/byteorder/little_endian.h:3

Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register

2020-10-14 Thread Francesco Ruggeri

On Wed, Oct 14, 2020 at 1:23 AM Florian Westphal  wrote:
>
> Pablo Neira Ayuso  wrote:
> > Legacy would still be flawed though.
>
> Its fine too, new rule blob gets handled (and match/target checkentry
> called) before old one is dismantled.
>
> We only have a 0 refcount + hook unregister when rules get
> flushed/removed explicitly.

Should the patch be used in the meantime while this gets
worked out?

Francesco

Re: selftests: netfilter: nft_nat.sh: /dev/stdin:2:9-15: Error: syntax error, unexpected counter

2020-10-14 Thread Pablo Neira Ayuso

On Wed, Oct 14, 2020 at 05:19:33PM +0530, Naresh Kamboju wrote:
> While running kselftest netfilter test on x86_64 devices linux next
> tag 20201013 kernel
> these errors are noticed. This not specific to kernel version we have
> noticed these errors
> earlier also.
> 
> Am I missing configs ?

What nftables version are you using?

> Please refer to the config file we are using.
> We are using the minimal busybox shell.
> BusyBox v1.27.2 (2020-07-17 18:42:50 UTC) multi-call binary.
> 
> metadata:
>   git branch: master
>   git repo: 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   git commit: f2fb1afc57304f9dd68c20a08270e287470af2eb
>   git describe: next-20201013
>   make_kernelversion: 5.9.0
>   kernel-config:
> http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-corei7-64/lkft/linux-next/879/config
> 
> Test output log:
> 
> selftests: netfilter: nft_nat.sh
> [ 1207.251385] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 1207.342479] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> # /dev/stdin:2:9-15: Error: syntax error, unexpected counter
> # counter ns0in {}
> # ^^^
> # /dev/stdin:3:9-15: Error: syntax error, unexpected counter
> # counter ns1in {}
> # ^^^
> # /dev/stdin:4:9-15: Error: syntax error, unexpected counter
> # counter ns2in {}
> # ^^^
> # /dev/stdin:6:9-15: Error: syntax error, unexpected counter
> # counter ns0out {}
> # ^^^
> 
> 
> 
> # /dev/stdin:12:9-15: Error: syntax error, unexpected counter
> # counter ns2in6 {}
> # ^^^
> # /dev/stdin:14:9-15: Error: syntax error, unexpected counter
> # counter ns0out6 {}
> # ^^^
> [ 1208.229989] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> # :1:6-12: Error: syntax error, unexpected counter
> # list counter inet filter ns0in
> #  ^^^
> # ERROR: ns0in counter in ns1-loU9Vlmj has unexpected value (expected
> packets 1 bytes 84) at check_counters 1
> # :1:6-12: Error: syntax error, unexpected counter
> # list counter inet filter ns0in
> #  ^^^
> # :1:6-12: Error: syntax error, unexpected counter
> # list counter inet filter ns0out
> #  ^^^
> # ERROR: ns0out counter in ns1-loU9Vlmj has unexpected value (expected
> packets 1 bytes 84) at check_counters 2
> # :1:6-12: Error: syntax error, unexpected counter
> # list counter inet filter ns0out
> #  ^^^
> # :1:6-12: Error: syntax error, unexpected counter
> # list counter inet filter ns0in6
> 
> # ERROR: ns1 counter in ns0-loU9Vlmj has unexpected value (expected
> packets 1 bytes 104) at check_ns0_counters 5
> # :1:6-12: Error: syntax error, unexpected counter
> # list counter inet filter ns1
> #  ^^^
> 
> 
> 
> # :1:16-19: Error: syntax error, unexpected inet
> # reset counters inet
> #
> # :1:16-19: Error: syntax error, unexpected inet
> # reset counters inet
> #
> # FAIL: nftables v0.7 (Scrooge McDuck)
> not ok 2 selftests: netfilter: nft_nat.sh # exit=1
> # selftests: netfilter: bridge_brouter.sh
> # SKIP: Could not run test without ebtables
> ok 3 selftests: netfilter: bridge_brouter.sh # SKIP
> # selftests: netfilter: conntrack_icmp_related.sh
> [ 1215.679815] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 1215.698932] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 1215.711612] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 1216.678043] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
> # internal:0:0-0: Error: Could not open file \"-\": No such file or directory
> #
> #
> # internal:0:0-0: Error: Could not open file \"-\": No such file or directory
> #
> #
> # internal:0:0-0: Error: Could not open file \"-\": No such file or directory
> #
> #
> # internal:0:0-0: Error: Could not open file \"-\": No such file or directory
> #
> #
> # internal:0:0-0: Error: Could not open file \"-\": No such file or directory
> #
> #
> # :1:6-12: Error: syntax error, unexpected counter
> # list counter inet filter unknown
> #  ^^^
> # ERROR: counter unknown in nsclient1 has unexpected value (expected
> packets 0 bytes 0)
> # :1:6-12: Error: syntax error, unexpected counter
> # list counter inet filter unknown
> #  ^^^
> 
> 
> 
> # ERROR: counter related in nsclient1 has unexpected value (expected
> packets 2 bytes 1856)
> # :1:6-12: Error: syntax error, unexpected counter
> # list counter inet filter related
> #  ^^^
> # ERROR: icmp error RELATED state test has failed
> not ok 4 selftests: netfilter: conntrack_icmp_related.sh # exit=1
> # selftests: netfilter: nft_flowtable.sh
> # Cannot create namespace file \"/var/run/netns/ns1\": File exists
> [ 1230.570705] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 1230.757525] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 1230.843221] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> # internal:0:0-0: Error: Could not open fi

[PATCH net-next] netfilter: restore NF_INET_NUMHOOKS

2020-10-14 Thread Pablo Neira Ayuso

This definition is used by the iptables legacy UAPI, restore it.

Fixes: d3519cb89f6d ("netfilter: nf_tables: add inet ingress support")
Reported-by: Jason A. Donenfeld 
Tested-by: Jason A. Donenfeld 
Signed-off-by: Pablo Neira Ayuso 
---
@Jakub: if you please can take this into net-next, it is fixing fallout
from the inet ingress support. Thank you.

 include/net/netfilter/nf_tables.h | 4 +++-
 include/uapi/linux/netfilter.h| 4 ++--
 net/netfilter/nf_tables_api.c | 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 3965ce18226f..3f7e56b1171e 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -14,6 +14,8 @@
 #include 
 #include 
 
+#define NFT_MAX_HOOKS  (NF_INET_INGRESS + 1)
+
 struct module;
 
 #define NFT_JUMP_STACK_SIZE16
@@ -979,7 +981,7 @@ struct nft_chain_type {
int family;
struct module   *owner;
unsigned inthook_mask;
-   nf_hookfn   *hooks[NF_MAX_HOOKS];
+   nf_hookfn   *hooks[NFT_MAX_HOOKS];
int (*ops_register)(struct net *net, const 
struct nf_hook_ops *ops);
void(*ops_unregister)(struct net *net, 
const struct nf_hook_ops *ops);
 };
diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index 6a6179af0d7c..ef9a44286e23 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -45,8 +45,8 @@ enum nf_inet_hooks {
NF_INET_FORWARD,
NF_INET_LOCAL_OUT,
NF_INET_POST_ROUTING,
-   NF_INET_INGRESS,
-   NF_INET_NUMHOOKS
+   NF_INET_NUMHOOKS,
+   NF_INET_INGRESS = NF_INET_NUMHOOKS,
 };
 
 enum nf_dev_hooks {
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index f22ad21d0230..7f1c184c00d2 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1864,7 +1864,7 @@ static int nft_chain_parse_hook(struct net *net,
if (IS_ERR(type))
return PTR_ERR(type);
}
-   if (hook->num > NF_MAX_HOOKS || !(type->hook_mask & (1 << hook->num)))
+   if (hook->num >= NFT_MAX_HOOKS || !(type->hook_mask & (1 << hook->num)))
return -EOPNOTSUPP;
 
if (type->type == NFT_CHAIN_T_NAT &&
-- 
2.20.1

Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register

2020-10-14 Thread Florian Westphal

Francesco Ruggeri  wrote:
> On Wed, Oct 14, 2020 at 1:23 AM Florian Westphal  wrote:
> >
> > Pablo Neira Ayuso  wrote:
> > > Legacy would still be flawed though.
> >
> > Its fine too, new rule blob gets handled (and match/target checkentry
> > called) before old one is dismantled.
> >
> > We only have a 0 refcount + hook unregister when rules get
> > flushed/removed explicitly.
> 
> Should the patch be used in the meantime while this gets
> worked out?

I think the patch is correct, and I do NOT see a better solution.

Re: [Patch net v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly

2020-10-14 Thread Xie He

On Wed, Oct 14, 2020 at 8:12 AM Willem de Bruijn
 wrote:
>
> On Wed, Oct 14, 2020 at 4:52 AM Xie He  wrote:
> >
> > On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn
> >  wrote:
> > >
> > > There is agreement that hard_header_len should be the length of link
> > > layer headers visible to the upper layers, needed_headroom the
> > > additional room required for headers that are not exposed, i.e., those
> > > pushed inside ndo_start_xmit.
> > >
> > > The link layer header length also has to agree with the interface
> > > hardware type (ARPHRD_..).
> > >
> > > Tunnel devices have not always been consistent in this, but today
> > > "bare" ip tunnel devices without additional headers (ipip, sit, ..) do
> > > match this and advertise 0 byte hard_header_len. Bareudp, vxlan and
> > > geneve also conform to this. Known exception that probably needs to be
> > > addressed is sit, which still advertises LL_MAX_HEADER and so has
> > > exposed quite a few syzkaller issues. Side note, it is not entirely
> > > clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and
> > > why they are needed.
> > >
> > > GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.
> > > The second makes sense, as it appears as an Ethernet device. The first
> > > should match "bare" ip tunnel devices, if following the above logic.
> > > Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE
> > > header around in collect medata mode") implements. It changes
> > > dev->type to ARPHRD_NONE in collect_md mode.
> > >
> > > Some of the inconsistency comes from the various modes of the GRE
> > > driver. Which brings us to ipgre_header_ops. It is set only in two
> > > special cases.
> > >
> > > Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")
> > > added ipgre_header_ops.parse to be able to receive the inner ip source
> > > address with PF_PACKET recvfrom. And apparently relies on
> > > ipgre_header_ops.create to be able to set an address, which implies
> > > SOCK_DGRAM.
> > >
> > > The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its
> > > implementation starts with the beautiful comment "/* Nice toy.
> > > Unfortunately, useless in real life :-)". From the rest of that
> > > detailed comment, it is not clear to me why it would need to expose
> > > the headers. The example does not use packet sockets.
> > >
> > > A packet socket cannot know devices details such as which configurable
> > > mode a device may be in. And different modes conflict with the basic
> > > rule that for a given well defined link layer type, i.e., dev->type,
> > > header length can be expected to be consistent. In an ideal world
> > > these exceptions would not exist, therefore.
> > >
> > > Unfortunately, this is legacy behavior that will have to continue to
> > > be supported.
> >
> > Thanks for your explanation. So header_ops for GRE devices is only
> > used in 2 special situations. In normal situations, header_ops is not
> > used for GRE devices. And we consider not using header_ops should be
> > the ideal arrangement for GRE devices.
> >
> > Can we create a new dev->type (like ARPHRD_IPGRE_SPECIAL) for GRE
> > devices that use header_ops? I guess changing dev->type will not
> > affect the interface to the user space? This way we can solve the
> > problem of the same dev->type having different hard_header_len values.
>
> But does that address any real issue?

It doesn't address any issue visible when using. Just to solve the
problem of the same dev->type having different hard_header_len values
which you mentioned. Making this change will not affect the user in
any way. So I think it is valuable to make this change.

> If anything, it would make sense to keep ARHPHRD_IPGRE for tunnels
> that expect headers and switch to ARPHRD_NONE for those that do not.
> As the collect_md commit I mentioned above does.

I thought we agreed that ideally GRE devices would not have
header_ops. Currently GRE devices (in normal situations) indeed do not
use header_ops (and use ARHPHRD_IPGRE as dev->type). I think we should
keep this behavior.

To solve the problem of the same dev->type having different
hard_header_len values which you mentioned. I think we should create a
new dev->type (ARPHRD_IPGRE_SPECIAL) for GRE devices that use
header_ops.

Also, for collect_md, I think we should use ARHPHRD_IPGRE. I see no
reason to use ARPHRD_NONE.

> > Also, for the second special situation, if there's no obvious reason
> > to use header_ops, maybe we can consider removing header_ops for this
> > situation.
>
> Unfortunately, there's no knowing if some application is using this
> broadcast mode *with* a process using packet sockets.

We can't always keep the interface to the user space unchanged when
fixing problems. When we fix drivers by adding hard_header_len or
removing hard_header_len, we ARE changing the interface. I did these
fixes a lot. I also changed skb->protocol when sending skbs for some
drivers, which in fact was also

Re: [PATCH net-next 3/3] macb: support the two tx descriptors on at91rm9200

2020-10-14 Thread Claudiu.Beznea

Hi Willy,

On 11.10.2020 12:09, Willy Tarreau wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> The at91rm9200 variant used by a few chips including the MSC313 supports
> two Tx descriptors (one frame being serialized and another one queued).
> However the driver only implemented a single one, which adds a dead time
> after each transfer to receive and process the interrupt and wake the
> queue up, preventing from reaching line rate.
> 
> This patch implements a very basic 2-deep queue to address this limitation.
> The tests run on a Breadbee board equipped with an MSC313E show that at
> 1 GHz, HTTP traffic on medium-sized objects (45kB) was limited to exactly
> 50 Mbps before this patch, and jumped to 76 Mbps with this patch. And tests
> on a single TCP stream with an MTU of 576 jump from 10kpps to 15kpps. With
> 1500 byte packets it's now possible to reach line rate versus 75 Mbps
> before.
> 
> Cc: Nicolas Ferre 
> Cc: Claudiu Beznea 
> Cc: Daniel Palmer 
> Signed-off-by: Willy Tarreau 
> ---
>  drivers/net/ethernet/cadence/macb.h  |  2 ++
>  drivers/net/ethernet/cadence/macb_main.c | 46 +++-
>  2 files changed, 40 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.h 
> b/drivers/net/ethernet/cadence/macb.h
> index e87db95fb0f6..f8133003981f 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -1208,6 +1208,8 @@ struct macb {
> 
> /* AT91RM9200 transmit queue (1 on wire + 1 queued) */
> struct macb_tx_skb  rm9200_txq[2];
> +   unsigned intrm9200_tx_tail;
> +   unsigned intrm9200_tx_len;
> unsigned intmax_tx_length;
> 
> u64 ethtool_stats[GEM_STATS_LEN + QUEUE_STATS_LEN 
> * MACB_MAX_QUEUES];
> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
> b/drivers/net/ethernet/cadence/macb_main.c
> index ca6e5456906a..6ff8e4b0b95d 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -3909,6 +3909,7 @@ static int at91ether_start(struct macb *lp)
>  MACB_BIT(ISR_TUND) |
>  MACB_BIT(ISR_RLE)  |
>  MACB_BIT(TCOMP)|
> +MACB_BIT(RM9200_TBRE)  |
>  MACB_BIT(ISR_ROVR) |
>  MACB_BIT(HRESP));
> 
> @@ -3925,6 +3926,7 @@ static void at91ether_stop(struct macb *lp)
>  MACB_BIT(ISR_TUND) |
>  MACB_BIT(ISR_RLE)  |
>  MACB_BIT(TCOMP)|
> +MACB_BIT(RM9200_TBRE)  |
>  MACB_BIT(ISR_ROVR) |
>  MACB_BIT(HRESP));
> 
> @@ -3994,11 +3996,10 @@ static netdev_tx_t at91ether_start_xmit(struct 
> sk_buff *skb,
> struct net_device *dev)
>  {
> struct macb *lp = netdev_priv(dev);
> +   unsigned long flags;
> 
> -   if (macb_readl(lp, TSR) & MACB_BIT(RM9200_BNQ)) {
> -   int desc = 0;
> -
> -   netif_stop_queue(dev);
> +   if (lp->rm9200_tx_len < 2) {
> +   int desc = lp->rm9200_tx_tail;

I think you also want to protect these reads with spin_lock() to avoid
concurrency with the interrupt handler.

> 
> /* Store packet information (to free when Tx completed) */
> lp->rm9200_txq[desc].skb = skb;
> @@ -4012,6 +4013,15 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff 
> *skb,
> return NETDEV_TX_OK;
> }
> 
> +   spin_lock_irqsave(&lp->lock, flags);
> +
> +   lp->rm9200_tx_tail = (desc + 1) & 1;
> +   lp->rm9200_tx_len++;
> +   if (lp->rm9200_tx_len > 1)
> +   netif_stop_queue(dev);
> +
> +   spin_unlock_irqrestore(&lp->lock, flags);
> +
> /* Set address of the data in the Transmit Address register */
> macb_writel(lp, TAR, lp->rm9200_txq[desc].mapping);
> /* Set length of the packet in the Transmit Control register 
> */
> @@ -4077,6 +4087,8 @@ static irqreturn_t at91ether_interrupt(int irq, void 
> *dev_id)
> struct macb *lp = netdev_priv(dev);
> u32 intstatus, ctl;
> unsigned int desc;
> +   unsigned int qlen;
> +   u32 tsr;
> 
> /* MAC Interrupt Status register indicates what interrupts are 
> pending.
>  * It is automatically cleared once read.
> @@ -4088,21 +4100,39 @@ static irqreturn_t at91ether_interrupt(int irq, void 
> *dev_id)
> at91ether_rx(dev);
> 
> /* Transmit complete */
> -   if (intstatus & MACB_BIT(TCOMP)) {
> +   if (intstatus & (MACB_BIT(TCOMP) | MACB_BIT(RM9200_TBRE))) {
>

kernel panic: Fatal exception (3)

2020-10-14 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:c77fb07f Merge branch 'netlink-export-policy-on-validation..
git tree:   net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1722ff0050
kernel config:  https://syzkaller.appspot.com/x/.config?x=fa2bf4058104211
dashboard link: https://syzkaller.appspot.com/bug?extid=ec762a6342ad0d3c0d8f
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1281459f90
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=175556f050

Bisection is inconclusive: the issue happens on the oldest tested release.

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11932b4850
final oops: https://syzkaller.appspot.com/x/report.txt?x=13932b4850
console output: https://syzkaller.appspot.com/x/log.txt?x=15932b4850

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ec762a6342ad0d3c0...@syzkaller.appspotmail.com

FS:  () GS:8880ae50() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 004c9428 CR3: 09e8d000 CR4: 001506e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Re: [Patch net v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly

2020-10-14 Thread Willem de Bruijn

On Wed, Oct 14, 2020 at 3:48 PM Xie He  wrote:
>
> On Wed, Oct 14, 2020 at 8:12 AM Willem de Bruijn
>  wrote:
> >
> > On Wed, Oct 14, 2020 at 4:52 AM Xie He  wrote:
> > >
> > > On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn
> > >  wrote:
> > > >
> > > > There is agreement that hard_header_len should be the length of link
> > > > layer headers visible to the upper layers, needed_headroom the
> > > > additional room required for headers that are not exposed, i.e., those
> > > > pushed inside ndo_start_xmit.
> > > >
> > > > The link layer header length also has to agree with the interface
> > > > hardware type (ARPHRD_..).
> > > >
> > > > Tunnel devices have not always been consistent in this, but today
> > > > "bare" ip tunnel devices without additional headers (ipip, sit, ..) do
> > > > match this and advertise 0 byte hard_header_len. Bareudp, vxlan and
> > > > geneve also conform to this. Known exception that probably needs to be
> > > > addressed is sit, which still advertises LL_MAX_HEADER and so has
> > > > exposed quite a few syzkaller issues. Side note, it is not entirely
> > > > clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and
> > > > why they are needed.
> > > >
> > > > GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.
> > > > The second makes sense, as it appears as an Ethernet device. The first
> > > > should match "bare" ip tunnel devices, if following the above logic.
> > > > Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE
> > > > header around in collect medata mode") implements. It changes
> > > > dev->type to ARPHRD_NONE in collect_md mode.
> > > >
> > > > Some of the inconsistency comes from the various modes of the GRE
> > > > driver. Which brings us to ipgre_header_ops. It is set only in two
> > > > special cases.
> > > >
> > > > Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")
> > > > added ipgre_header_ops.parse to be able to receive the inner ip source
> > > > address with PF_PACKET recvfrom. And apparently relies on
> > > > ipgre_header_ops.create to be able to set an address, which implies
> > > > SOCK_DGRAM.
> > > >
> > > > The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its
> > > > implementation starts with the beautiful comment "/* Nice toy.
> > > > Unfortunately, useless in real life :-)". From the rest of that
> > > > detailed comment, it is not clear to me why it would need to expose
> > > > the headers. The example does not use packet sockets.
> > > >
> > > > A packet socket cannot know devices details such as which configurable
> > > > mode a device may be in. And different modes conflict with the basic
> > > > rule that for a given well defined link layer type, i.e., dev->type,
> > > > header length can be expected to be consistent. In an ideal world
> > > > these exceptions would not exist, therefore.
> > > >
> > > > Unfortunately, this is legacy behavior that will have to continue to
> > > > be supported.
> > >
> > > Thanks for your explanation. So header_ops for GRE devices is only
> > > used in 2 special situations. In normal situations, header_ops is not
> > > used for GRE devices. And we consider not using header_ops should be
> > > the ideal arrangement for GRE devices.
> > >
> > > Can we create a new dev->type (like ARPHRD_IPGRE_SPECIAL) for GRE
> > > devices that use header_ops? I guess changing dev->type will not
> > > affect the interface to the user space? This way we can solve the
> > > problem of the same dev->type having different hard_header_len values.
> >
> > But does that address any real issue?
>
> It doesn't address any issue visible when using. Just to solve the
> problem of the same dev->type having different hard_header_len values
> which you mentioned. Making this change will not affect the user in
> any way. So I think it is valuable to make this change.
>
> > If anything, it would make sense to keep ARHPHRD_IPGRE for tunnels
> > that expect headers and switch to ARPHRD_NONE for those that do not.
> > As the collect_md commit I mentioned above does.
>
> I thought we agreed that ideally GRE devices would not have
> header_ops. Currently GRE devices (in normal situations) indeed do not
> use header_ops (and use ARHPHRD_IPGRE as dev->type). I think we should
> keep this behavior.
>
> To solve the problem of the same dev->type having different
> hard_header_len values which you mentioned. I think we should create a
> new dev->type (ARPHRD_IPGRE_SPECIAL) for GRE devices that use
> header_ops.
>
> Also, for collect_md, I think we should use ARHPHRD_IPGRE. I see no
> reason to use ARPHRD_NONE.

What does ARPHRD_IPGRE define beyond ARPHRD_NONE? And same for
ARPHRD_TUNNEL variants. If they are indistinguishable, they are the
same and might as well have the same label.

> > > Also, for the second special situation, if there's no obvious reason
> > > to use header_ops, maybe we can consider removing header_ops for this
> > > situation.
> >
> > Unfortunately, ther

1 2 >

1 - 100 of 177 matches

Mail list logo