Re: [PATCH] ipvs: Fix use-after-free in ip_vs_in

2019-05-17 Thread Julian Anastasov


Hello,

On Wed, 15 May 2019, YueHaibing wrote:

> BUG: KASAN: use-after-free in ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
> Read of size 4 at addr 8881e9b26e2c by task sshd/5603
> 
> CPU: 0 PID: 5603 Comm: sshd Not tainted 4.19.39+ #30
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Call Trace:
>  dump_stack+0x71/0xab
>  print_address_description+0x6a/0x270
>  kasan_report+0x179/0x2c0
>  ? ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
>  ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
>  ? tcp_in_window+0xfe0/0xfe0 [nf_conntrack]
>  ? ip_vs_in_icmp+0xcc0/0xcc0 [ip_vs]
>  ? ipt_do_table+0x4f1/0xad0 [ip_tables]
>  ? ip_vs_out+0x126/0x8f0 [ip_vs]
>  ? common_interrupt+0xa/0xf
>  ip_vs_in+0xd8/0x170 [ip_vs]
>  ? ip_vs_in.part.29+0xd20/0xd20 [ip_vs]
>  ? nf_nat_ipv4_fn+0x21/0xc0 [nf_nat_ipv4]
>  ? nf_nat_packet+0x4b/0x90 [nf_nat]
>  ? nf_nat_ipv4_local_fn+0xf9/0x160 [nf_nat_ipv4]
>  ? ip_vs_remote_request4+0x50/0x50 [ip_vs]
>  nf_hook_slow+0x5f/0xe0
>  ? sock_write_iter+0x121/0x1c0
>  __ip_local_out+0x1d5/0x250
>  ? ip_finish_output+0x430/0x430
>  ? ip_forward_options+0x2d0/0x2d0
>  ? ip_copy_addrs+0x2d/0x40
>  ? __ip_queue_xmit+0x2ca/0x730
>  ip_local_out+0x19/0x60
>  __tcp_transmit_skb+0xba1/0x14f0
>  ? __tcp_select_window+0x330/0x330
>  ? pvclock_clocksource_read+0xd1/0x180
>  ? kvm_sched_clock_read+0xd/0x20
>  ? sched_clock+0x5/0x10
>  ? sched_clock_cpu+0x18/0x100
>  tcp_write_xmit+0x41f/0x1ed0
>  ? _copy_from_iter_full+0xca/0x340
>  __tcp_push_pending_frames+0x52/0x140
>  tcp_sendmsg_locked+0x787/0x1600
>  ? __wake_up_common_lock+0x80/0x130
>  ? tcp_sendpage+0x60/0x60
>  ? remove_wait_queue+0x84/0xb0
>  ? mutex_unlock+0x1d/0x40
>  ? n_tty_read+0x4f7/0xd20
>  ? check_stack_object+0x21/0x60
>  ? inet_sk_set_state+0xb0/0xb0
>  tcp_sendmsg+0x27/0x40
>  sock_sendmsg+0x6d/0x80
>  sock_write_iter+0x121/0x1c0
>  ? sock_sendmsg+0x80/0x80
>  ? ldsem_up_read+0x13/0x40
>  ? iov_iter_init+0x77/0xb0
>  __vfs_write+0x23e/0x370
>  ? kernel_read+0xa0/0xa0
>  ? do_vfs_ioctl+0x134/0x900
>  ? __set_current_blocked+0x7e/0x90
>  ? __audit_syscall_entry+0x18e/0x1f0
>  ? ktime_get_coarse_real_ts64+0x51/0x70
>  vfs_write+0xe7/0x230
>  ksys_write+0xa1/0x120
>  ? __ia32_sys_read+0x50/0x50
>  ? __audit_syscall_exit+0x3ce/0x450
>  do_syscall_64+0x73/0x200
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7ff6f6147c60
> Code: 73 01 c3 48 8b 0d 28 12 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 
> 00 00 83 3d 5d 73 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 
> 31 c3 48 83
> RSP: 002b:7ffd772ead18 EFLAGS: 0246 ORIG_RAX: 0001
> RAX: ffda RBX: 0034 RCX: 7ff6f6147c60
> RDX: 0034 RSI: 55df30a31270 RDI: 0003
> RBP: 55df30a31270 R08:  R09: 
> R10: 7ffd772ead70 R11: 0246 R12: 7ffd772ead74
> R13: 7ffd772eae20 R14: 7ffd772eae24 R15: 55df2f12ddc0
> 
> Allocated by task 6052:
>  kasan_kmalloc+0xa0/0xd0
>  __kmalloc+0x10a/0x220
>  ops_init+0x97/0x190
>  register_pernet_operations+0x1ac/0x360
>  register_pernet_subsys+0x24/0x40
>  0xc0ea016d
>  do_one_initcall+0x8b/0x253
>  do_init_module+0xe3/0x335
>  load_module+0x2fc0/0x3890
>  __do_sys_finit_module+0x192/0x1c0
>  do_syscall_64+0x73/0x200
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Freed by task 6067:
>  __kasan_slab_free+0x130/0x180
>  kfree+0x90/0x1a0
>  ops_free_list.part.7+0xa6/0xc0
>  unregister_pernet_operations+0x18b/0x1f0
>  unregister_pernet_subsys+0x1d/0x30
>  ip_vs_cleanup+0x1d/0xd2f [ip_vs]
>  __x64_sys_delete_module+0x20c/0x300
>  do_syscall_64+0x73/0x200
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The buggy address belongs to the object at 8881e9b26600 which belongs to 
> the cache kmalloc-4096 of size 4096
> The buggy address is located 2092 bytes inside of 4096-byte region 
> [8881e9b26600, 8881e9b27600)
> The buggy address belongs to the page:
> page:ea0007a6c800 count:1 mapcount:0 mapping:888107c0e600 index:0x0 
> compound_mapcount: 0
> flags: 0x17c0008100(slab|head)
> raw: 0017c0008100 dead0100 dead0200 888107c0e600
> raw:  80070007 0001 
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  8881e9b26d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  8881e9b26d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >8881e9b26e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ^
>  8881e9b26e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  8881e9b26f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 
> while unregistering ipvs module, ops_free_list calls
> __ip_vs_cleanup, then nf_unregister_net_hooks be called to
> do remove nf hook entries. It need a RCU period to finish,
> however net->ipvs is set to NULL immediately, which will
> trigger NULL pointer dereference when a packet is hooked
> an

Re: [RFC bpf-next 0/7] busy poll support for AF_XDP sockets

2019-05-17 Thread Magnus Karlsson
On Fri, May 17, 2019 at 1:50 AM Samudrala, Sridhar
 wrote:
>
> On 5/16/2019 5:37 AM, Magnus Karlsson wrote:
> >
> > After a number of surprises and issues in the driver here are now the
> > first set of results. 64 byte packets at 40Gbit/s line rate. All
> > results in Mpps. Note that I just used my local system and kernel build
> > for these numbers so they are not performance tuned. Jesper would
> > likely get better results on his setup :-). Explanation follows after
> > the table.
> >
> >Applications
> > method  cores  irqstxpushrxdrop  l2fwd
> > ---
> > r-t-c 2 y   35.9  11.28.6
> > poll  2 y   34.2   9.48.3
> > r-t-c 1 y   18.1   N/A6.2
> > poll  1 y   14.6   8.45.9
> > busypoll  2 y   31.9  10.57.9
> > busypoll  1 y   21.5   8.76.2
> > busypoll  1 n   22.0  10.37.3
> >
> > r-t-c = Run-to-completion, the mode where we in Rx uses no syscalls
> >  and only spin on the pointers in the ring.
> > poll = Use the regular syscall poll()
> > busypoll = Use the regular syscall poll() in busy-poll mode. The RFC I
> > sent out.
> >
> > cores == 2 means that softirq/ksoftirqd is one a different core from
> > the application. 2 cores are consumed in total.
> > cores == 1 means that both softirq/ksoftirqd and the application runs
> > on the same core. Only 1 core is used in total.
> >
> > irqs == 'y' is the normal case. irqs == 'n' means that I have created a
> >  new napi context with the AF_XDP queues inside that does not
> >  have any interrupts associated with it. No other traffic goes
> >  to this napi context.
> >
> > N/A = This combination does not make sense since the application will
> >not yield due to run-to-completion without any syscalls
> >whatsoever. It works, but it crawls in the 30 Kpps
> >range. Creating huge rings would help, but did not do that.
> >
> > The applications are the ones from the xdpsock sample application in
> > samples/bpf/.
> >
> > Some things I had to do to get these results:
> >
> > * The current buffer allocation scheme in i40e where we continuously
> >try to access the fill queue until we find some entries, is not
> >effective if we are on a single core. Instead, we try once and call
> >a function that sets a flag. This flag is then checked in the xsk
> >poll code, and if it is set we schedule napi so that it can try to
> >allocate some buffers from the fill ring again. Note that this flag
> >has to propagate all the way to user space so that the application
> >knows that it has to call poll(). I currently set a flag in the Rx
> >ring to indicate that the application should call poll() to resume
> >the driver. This is similar to what the io_uring in the storage
> >subsystem does. It is not enough to return POLLERR from poll() as
> >that will only work for the case when we are using poll(). But I do
> >that as well.
> >
> > * Implemented Sridhar's suggestion on adding busy_loop_end callbacks
> >that terminate the busy poll loop if the Rx queue is empty or the Tx
> >queue is full.
> >
> > * There is a race in the setup code in i40e when it is used with
> >busy-poll. The fact that busy-poll calls the napi_busy_loop code
> >before interrupts have been registered and enabled seems to trigger
> >some bug where nothing gets transmitted. This only happens for
> >busy-poll. Poll and run-to-completion only enters the napi loop of
> >i40e by interrupts and only then after interrupts have been enabled,
> >which is the last thing that is done after setup. I have just worked
> >around it by introducing a sleep(1) in the application for these
> >experiments. Ugly, but should not impact the numbers, I believe.
> >
> > * The 1 core case is sensitive to the amount of work done reported
> >from the driver. This was not correct in the XDP code of i40e and
> >let to bad performance. Now it reports the correct values for
> >Rx. Note that i40e does not honor the napi budget on Tx and sets
> >that to 256, and these are not reported back to the napi
> >library.
> >
> > Some observations:
> >
> > * Cannot really explain the drop in performance for txpush when going
> >from 2 cores to 1. As stated before, the reporting of Tx work is not
> >really propagated to the napi infrastructure. Tried reporting this
> >in a correct manner (completely ignoring Rx for this experiment) but
> >the results were the same. Will dig deeper into this to screen out
> >any stupid mistakes.
> >
> > * With the fixes above, all my driver processing is in softirq for 1
> >core. It ne

[PATCH] net: caif: fix the value of size argument of snprintf

2019-05-17 Thread Weikang shi
From: swkhack 

Because the function snprintf write at most size bytes(including the
null byte).So the value of the argument size need not to minus one.

Signed-off-by: swkhack 
---
 net/caif/cfdbgl.c  | 2 +-
 net/caif/cfdgml.c  | 3 +--
 net/caif/cfutill.c | 2 +-
 net/caif/cfveil.c  | 2 +-
 net/caif/cfvidl.c  | 2 +-
 5 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/caif/cfdbgl.c b/net/caif/cfdbgl.c
index 7aae0b568..cce839bf4 100644
--- a/net/caif/cfdbgl.c
+++ b/net/caif/cfdbgl.c
@@ -26,7 +26,7 @@ struct cflayer *cfdbgl_create(u8 channel_id, struct dev_info 
*dev_info)
cfsrvl_init(dbg, channel_id, dev_info, false);
dbg->layer.receive = cfdbgl_receive;
dbg->layer.transmit = cfdbgl_transmit;
-   snprintf(dbg->layer.name, CAIF_LAYER_NAME_SZ - 1, "dbg%d", channel_id);
+   snprintf(dbg->layer.name, CAIF_LAYER_NAME_SZ, "dbg%d", channel_id);
return &dbg->layer;
 }
 
diff --git a/net/caif/cfdgml.c b/net/caif/cfdgml.c
index 3bdddb32d..58fdb99a3 100644
--- a/net/caif/cfdgml.c
+++ b/net/caif/cfdgml.c
@@ -33,8 +33,7 @@ struct cflayer *cfdgml_create(u8 channel_id, struct dev_info 
*dev_info)
cfsrvl_init(dgm, channel_id, dev_info, true);
dgm->layer.receive = cfdgml_receive;
dgm->layer.transmit = cfdgml_transmit;
-   snprintf(dgm->layer.name, CAIF_LAYER_NAME_SZ - 1, "dgm%d", channel_id);
-   dgm->layer.name[CAIF_LAYER_NAME_SZ - 1] = '\0';
+   snprintf(dgm->layer.name, CAIF_LAYER_NAME_SZ, "dgm%d", channel_id);
return &dgm->layer;
 }
 
diff --git a/net/caif/cfutill.c b/net/caif/cfutill.c
index 1728fa447..be7c43a92 100644
--- a/net/caif/cfutill.c
+++ b/net/caif/cfutill.c
@@ -33,7 +33,7 @@ struct cflayer *cfutill_create(u8 channel_id, struct dev_info 
*dev_info)
cfsrvl_init(util, channel_id, dev_info, true);
util->layer.receive = cfutill_receive;
util->layer.transmit = cfutill_transmit;
-   snprintf(util->layer.name, CAIF_LAYER_NAME_SZ - 1, "util1");
+   snprintf(util->layer.name, CAIF_LAYER_NAME_SZ, "util1");
return &util->layer;
 }
 
diff --git a/net/caif/cfveil.c b/net/caif/cfveil.c
index 262224581..35dd3a600 100644
--- a/net/caif/cfveil.c
+++ b/net/caif/cfveil.c
@@ -32,7 +32,7 @@ struct cflayer *cfvei_create(u8 channel_id, struct dev_info 
*dev_info)
cfsrvl_init(vei, channel_id, dev_info, true);
vei->layer.receive = cfvei_receive;
vei->layer.transmit = cfvei_transmit;
-   snprintf(vei->layer.name, CAIF_LAYER_NAME_SZ - 1, "vei%d", channel_id);
+   snprintf(vei->layer.name, CAIF_LAYER_NAME_SZ, "vei%d", channel_id);
return &vei->layer;
 }
 
diff --git a/net/caif/cfvidl.c b/net/caif/cfvidl.c
index b3b110e8a..73615e3b3 100644
--- a/net/caif/cfvidl.c
+++ b/net/caif/cfvidl.c
@@ -29,7 +29,7 @@ struct cflayer *cfvidl_create(u8 channel_id, struct dev_info 
*dev_info)
cfsrvl_init(vid, channel_id, dev_info, false);
vid->layer.receive = cfvidl_receive;
vid->layer.transmit = cfvidl_transmit;
-   snprintf(vid->layer.name, CAIF_LAYER_NAME_SZ - 1, "vid1");
+   snprintf(vid->layer.name, CAIF_LAYER_NAME_SZ, "vid1");
return &vid->layer;
 }
 
-- 
2.17.1



Re: [PATCH] ipvs: Fix use-after-free in ip_vs_in

2019-05-17 Thread YueHaibing
On 2019/5/17 15:30, Julian Anastasov wrote:
> 
>   Hello,
> 
> On Wed, 15 May 2019, YueHaibing wrote:
> 
>> BUG: KASAN: use-after-free in ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
>> Read of size 4 at addr 8881e9b26e2c by task sshd/5603
>>
>> CPU: 0 PID: 5603 Comm: sshd Not tainted 4.19.39+ #30
>> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> Call Trace:
>>  dump_stack+0x71/0xab
>>  print_address_description+0x6a/0x270
>>  kasan_report+0x179/0x2c0
>>  ? ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
>>  ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
>>  ? tcp_in_window+0xfe0/0xfe0 [nf_conntrack]
>>  ? ip_vs_in_icmp+0xcc0/0xcc0 [ip_vs]
>>  ? ipt_do_table+0x4f1/0xad0 [ip_tables]
>>  ? ip_vs_out+0x126/0x8f0 [ip_vs]
>>  ? common_interrupt+0xa/0xf
>>  ip_vs_in+0xd8/0x170 [ip_vs]
>>  ? ip_vs_in.part.29+0xd20/0xd20 [ip_vs]
>>  ? nf_nat_ipv4_fn+0x21/0xc0 [nf_nat_ipv4]
>>  ? nf_nat_packet+0x4b/0x90 [nf_nat]
>>  ? nf_nat_ipv4_local_fn+0xf9/0x160 [nf_nat_ipv4]
>>  ? ip_vs_remote_request4+0x50/0x50 [ip_vs]
>>  nf_hook_slow+0x5f/0xe0
>>  ? sock_write_iter+0x121/0x1c0
>>  __ip_local_out+0x1d5/0x250
>>  ? ip_finish_output+0x430/0x430
>>  ? ip_forward_options+0x2d0/0x2d0
>>  ? ip_copy_addrs+0x2d/0x40
>>  ? __ip_queue_xmit+0x2ca/0x730
>>  ip_local_out+0x19/0x60
>>  __tcp_transmit_skb+0xba1/0x14f0
>>  ? __tcp_select_window+0x330/0x330
>>  ? pvclock_clocksource_read+0xd1/0x180
>>  ? kvm_sched_clock_read+0xd/0x20
>>  ? sched_clock+0x5/0x10
>>  ? sched_clock_cpu+0x18/0x100
>>  tcp_write_xmit+0x41f/0x1ed0
>>  ? _copy_from_iter_full+0xca/0x340
>>  __tcp_push_pending_frames+0x52/0x140
>>  tcp_sendmsg_locked+0x787/0x1600
>>  ? __wake_up_common_lock+0x80/0x130
>>  ? tcp_sendpage+0x60/0x60
>>  ? remove_wait_queue+0x84/0xb0
>>  ? mutex_unlock+0x1d/0x40
>>  ? n_tty_read+0x4f7/0xd20
>>  ? check_stack_object+0x21/0x60
>>  ? inet_sk_set_state+0xb0/0xb0
>>  tcp_sendmsg+0x27/0x40
>>  sock_sendmsg+0x6d/0x80
>>  sock_write_iter+0x121/0x1c0
>>  ? sock_sendmsg+0x80/0x80
>>  ? ldsem_up_read+0x13/0x40
>>  ? iov_iter_init+0x77/0xb0
>>  __vfs_write+0x23e/0x370
>>  ? kernel_read+0xa0/0xa0
>>  ? do_vfs_ioctl+0x134/0x900
>>  ? __set_current_blocked+0x7e/0x90
>>  ? __audit_syscall_entry+0x18e/0x1f0
>>  ? ktime_get_coarse_real_ts64+0x51/0x70
>>  vfs_write+0xe7/0x230
>>  ksys_write+0xa1/0x120
>>  ? __ia32_sys_read+0x50/0x50
>>  ? __audit_syscall_exit+0x3ce/0x450
>>  do_syscall_64+0x73/0x200
>>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> RIP: 0033:0x7ff6f6147c60
>> Code: 73 01 c3 48 8b 0d 28 12 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 
>> 44 00 00 83 3d 5d 73 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 
>> 73 31 c3 48 83
>> RSP: 002b:7ffd772ead18 EFLAGS: 0246 ORIG_RAX: 0001
>> RAX: ffda RBX: 0034 RCX: 7ff6f6147c60
>> RDX: 0034 RSI: 55df30a31270 RDI: 0003
>> RBP: 55df30a31270 R08:  R09: 
>> R10: 7ffd772ead70 R11: 0246 R12: 7ffd772ead74
>> R13: 7ffd772eae20 R14: 7ffd772eae24 R15: 55df2f12ddc0
>>
>> Allocated by task 6052:
>>  kasan_kmalloc+0xa0/0xd0
>>  __kmalloc+0x10a/0x220
>>  ops_init+0x97/0x190
>>  register_pernet_operations+0x1ac/0x360
>>  register_pernet_subsys+0x24/0x40
>>  0xc0ea016d
>>  do_one_initcall+0x8b/0x253
>>  do_init_module+0xe3/0x335
>>  load_module+0x2fc0/0x3890
>>  __do_sys_finit_module+0x192/0x1c0
>>  do_syscall_64+0x73/0x200
>>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Freed by task 6067:
>>  __kasan_slab_free+0x130/0x180
>>  kfree+0x90/0x1a0
>>  ops_free_list.part.7+0xa6/0xc0
>>  unregister_pernet_operations+0x18b/0x1f0
>>  unregister_pernet_subsys+0x1d/0x30
>>  ip_vs_cleanup+0x1d/0xd2f [ip_vs]
>>  __x64_sys_delete_module+0x20c/0x300
>>  do_syscall_64+0x73/0x200
>>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> The buggy address belongs to the object at 8881e9b26600 which belongs to 
>> the cache kmalloc-4096 of size 4096
>> The buggy address is located 2092 bytes inside of 4096-byte region 
>> [8881e9b26600, 8881e9b27600)
>> The buggy address belongs to the page:
>> page:ea0007a6c800 count:1 mapcount:0 mapping:888107c0e600 index:0x0 
>> compound_mapcount: 0
>> flags: 0x17c0008100(slab|head)
>> raw: 0017c0008100 dead0100 dead0200 888107c0e600
>> raw:  80070007 0001 
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>  8881e9b26d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>  8881e9b26d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> 8881e9b26e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ^
>>  8881e9b26e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>  8881e9b26f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>
>> while unregistering ipvs module, ops_free_list calls
>> __ip_vs_cleanup, then nf_unregister_net_hooks be called to
>> do remove nf

[PATCH] net/mlx5e: Add bonding device for indr block to offload the packet received from bonding device

2019-05-17 Thread wenxu
From: wenxu 

The mlx5e support the lag mode. When add mlx_p0 and mlx_p1 to bond0.
packet received from mlx_p0 or mlx_p1 and in the ingress tc flower
forward to vf0. The tc rule can't be offloaded for the non indr
rejistor block for the bonding device.

Signed-off-by: wenxu 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 91e24f1..134fa0b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -796,6 +796,7 @@ static int mlx5e_nic_rep_netdevice_event(struct 
notifier_block *nb,
struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
 
if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
+   !netif_is_bond_master(netdev) &&
!is_vlan_dev(netdev))
return NOTIFY_OK;
 
-- 
1.8.3.1



Re: [PATCH] net/mlx5e: Add bonding device for indr block to offload the packet received from bonding device

2019-05-17 Thread Sergei Shtylyov

Hello!

On 17.05.2019 11:21, we...@ucloud.cn wrote:


From: wenxu 

The mlx5e support the lag mode. When add mlx_p0 and mlx_p1 to bond0.
packet received from mlx_p0 or mlx_p1 and in the ingress tc flower
forward to vf0. The tc rule can't be offloaded for the non indr
rejistor block for the bonding device.


   What is "non indr rejistor"?


Signed-off-by: wenxu 

[...]

MBR, Sergei


[PATCH v2] net/mlx5e: Add bonding device for indr block to offload the packet received from bonding device

2019-05-17 Thread wenxu
From: wenxu 

The mlx5e support the lag mode. When add mlx_p0 and mlx_p1 to bond0.
packet received from mlx_p0 or mlx_p1 and in the ingress tc flower
forward to vf0. The tc rule can't be offloaded because there is
no indr_register_block for the bonding device.

Signed-off-by: wenxu 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 91e24f1..134fa0b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -796,6 +796,7 @@ static int mlx5e_nic_rep_netdevice_event(struct 
notifier_block *nb,
struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
 
if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
+   !netif_is_bond_master(netdev) &&
!is_vlan_dev(netdev))
return NOTIFY_OK;
 
-- 
1.8.3.1



5.1 `ip route get addr/cidr` regression

2019-05-17 Thread Jason A. Donenfeld
Hi,

I'm back now and catching up with a lot of things. A few people have
mentioned to me that wg-quick(8), a bash script that makes a bunch of
iproute2 invocations, appears to be broken on 5.1. I've distilled the
behavior change down to the following.

Behavior on 5.0:

+ ip link add wg0 type dummy
+ ip address add 192.168.50.2/24 dev wg0
+ ip link set mtu 1420 up dev wg0
+ ip route get 192.168.50.0/24
broadcast 192.168.50.0 dev wg0 src 192.168.50.2 uid 0
   cache 

Behavior on 5.1:

+ ip link add wg0 type dummy
+ ip address add 192.168.50.2/24 dev wg0
+ ip link set mtu 1420 up dev wg0
+ ip route get 192.168.50.0/24
RTNETLINK answers: Invalid argument

Upon investigating, I'm not sure that `ip route get` was ever suitable
for getting details on a particular route. So I'll adjust the
wg-quick(8) code accordingly. But FYI, this is unexpected userspace
breakage.

Jason

On Sat, Mar 23, 2019 at 2:00 PM  wrote:
>
> After upgrading iproute2 from 4.20 to 5.0 the following error occurs:
>
> $ ip route show table default
> Error: ipv4: FIB table does not exist.
> Dump terminated
>
> The command works for all tables other than 'default' one. It seems related 
> to this commit[1]
>
> I also saw "Error: ipv4: MR table does not exist." message in logs related to 
> this commit[2] but don't know exact command to reproduce it. I've seen some 
> fixups[3] for mentioned commits and wonder it they weren't complete.
>
> I reproduced it on Linux 4.20.17 and 5.0.3.
>
> [1] 
> https://github.com/torvalds/linux/commit/18a8021a7be3207686851208f91a2f105b2d4703#diff-04a14e4f51765994f87e7e5e7681d0e1R861
>  
> 
>
> [2] 
> https://github.com/torvalds/linux/commit/cb167893f41e21e6bd283d78e53489289dc0592d#diff-9900db808ce5e5dd24a7341cd8ed1609R2545
>  
> 
>
> [3] 
> https://github.com/torvalds/linux/commit/73155879b3c1ac3ace35208a54a3a160ec520bef


Re: [PATCH] can: gw: Fix error path of cgw_module_init

2019-05-17 Thread Marc Kleine-Budde
On 5/16/19 5:54 PM, YueHaibing wrote:
> This patch fix error path for cgw_module_init
> to avoid possible crash if some error occurs.
> 
> Fixes: c1aabdf379bc ("can-gw: add netlink based CAN routing")
> Signed-off-by: YueHaibing 
> ---
>  net/can/gw.c | 46 +++---
>  1 file changed, 31 insertions(+), 15 deletions(-)
> 
> diff --git a/net/can/gw.c b/net/can/gw.c
> index 53859346..8b53ec7 100644
> --- a/net/can/gw.c
> +++ b/net/can/gw.c
> @@ -1046,32 +1046,48 @@ static __init int cgw_module_init(void)
>   pr_info("can: netlink gateway (rev " CAN_GW_VERSION ") max_hops=%d\n",
>   max_hops);
>  
> - register_pernet_subsys(&cangw_pernet_ops);
> + ret = register_pernet_subsys(&cangw_pernet_ops);
> + if (ret)
> + return ret;
> +
> + ret = -ENOMEM;
>   cgw_cache = kmem_cache_create("can_gw", sizeof(struct cgw_job),
> 0, 0, NULL);
> -
>   if (!cgw_cache)
> - return -ENOMEM;
> + goto out_cache_create;
>  
>   /* set notifier */
>   notifier.notifier_call = cgw_notifier;
> - register_netdevice_notifier(¬ifier);
> + ret = register_netdevice_notifier(¬ifier);
> + if (ret)
> + goto out_register_notifier;
>  
>   ret = rtnl_register_module(THIS_MODULE, PF_CAN, RTM_GETROUTE,
>  NULL, cgw_dump_jobs, 0);
> - if (ret) {
> - unregister_netdevice_notifier(¬ifier);
> - kmem_cache_destroy(cgw_cache);
> - return -ENOBUFS;
> - }
> -
> - /* Only the first call to rtnl_register_module can fail */
> - rtnl_register_module(THIS_MODULE, PF_CAN, RTM_NEWROUTE,
> -  cgw_create_job, NULL, 0);
> - rtnl_register_module(THIS_MODULE, PF_CAN, RTM_DELROUTE,
> -  cgw_remove_job, NULL, 0);
> + if (ret)
> + goto out_rtnl_register1;
> +
> + ret = rtnl_register_module(THIS_MODULE, PF_CAN, RTM_NEWROUTE,
> +cgw_create_job, NULL, 0);
> + if (ret)
> + goto out_rtnl_register2;
> + ret = rtnl_register_module(THIS_MODULE, PF_CAN, RTM_DELROUTE,
> +cgw_remove_job, NULL, 0);
> + if (ret)
> + goto out_rtnl_register2;
>  
>   return 0;
> +
> +out_rtnl_register2:
> + rtnl_unregister_all(PF_CAN);

Currently gw.c is the only user of rtnl_register_module(PF_CAN), but
PF_CAN is not specific to gw. Better change this to individual
rtnl_unregister(int protocol, int msgtype).

> +out_rtnl_register1:
> + unregister_netdevice_notifier(¬ifier);
> +out_register_notifier:
> + kmem_cache_destroy(cgw_cache);
> +out_cache_create:
> + unregister_pernet_subsys(&cangw_pernet_ops);
> +
> + return ret;
>  }
>  
>  static __exit void cgw_module_exit(void)
> 

Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


[PATCH] lib: Correct comment of prandom_seed

2019-05-17 Thread Philippe Mazenauer
Variable 'entropy' was wrongly documented as 'seed', changed comment to
reflect actual variable name.

../lib/random32.c:179: warning: Function parameter or member 'entropy' not 
described in 'prandom_seed'
../lib/random32.c:179: warning: Excess function parameter 'seed' description in 
'prandom_seed'

Signed-off-by: Philippe Mazenauer 
---
 lib/random32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/random32.c b/lib/random32.c
index 4aaa76404d56..763b920a6206 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -171,9 +171,9 @@ static void prandom_seed_early(struct rnd_state *state, u32 
seed,
 
 /**
  * prandom_seed - add entropy to pseudo random number generator
- * @seed: seed value
+ * @entropy: entropy value
  *
- * Add some additional seeding to the prandom pool.
+ * Add some additional entropy to the prandom pool.
  */
 void prandom_seed(u32 entropy)
 {
-- 
2.17.1



Re: [PATCH] lib: Correct comment of prandom_seed

2019-05-17 Thread Lee Jones
On Fri, 17 May 2019, Philippe Mazenauer wrote:

> Variable 'entropy' was wrongly documented as 'seed', changed comment to
> reflect actual variable name.
> 
> ../lib/random32.c:179: warning: Function parameter or member 'entropy' not 
> described in 'prandom_seed'
> ../lib/random32.c:179: warning: Excess function parameter 'seed' description 
> in 'prandom_seed'
> 
> Signed-off-by: Philippe Mazenauer 
> ---
>  lib/random32.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Looks reasonable:

Acked-by: Lee Jones 

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog


Re: [PATCH v2] RDMA: Directly cast the sockaddr union to sockaddr

2019-05-17 Thread Simon Horman
On Thu, May 16, 2019 at 12:21:48PM -0300, Jason Gunthorpe wrote:
> On Thu, May 16, 2019 at 02:44:28PM +0200, Simon Horman wrote:
> > On Mon, May 13, 2019 at 09:55:21PM -0300, Jason Gunthorpe wrote:
> > > gcc 9 now does allocation size tracking and thinks that passing the member
> > > of a union and then accessing beyond that member's bounds is an overflow.
> > > 
> > > Instead of using the union member, use the entire union with a cast to
> > > get to the sockaddr. gcc will now know that the memory extends the full
> > > size of the union.
> > > 
> > > Signed-off-by: Jason Gunthorpe 
> > >  drivers/infiniband/core/addr.c   | 16 
> > >  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  5 ++---
> > >  drivers/infiniband/hw/ocrdma/ocrdma_hw.c |  5 ++---
> > >  3 files changed, 12 insertions(+), 14 deletions(-)
> > > 
> > > I missed the ocrdma files in the v1
> > > 
> > > We can revisit what to do with that repetitive union after the merge
> > > window, but this simple patch will eliminate the warnings for now.
> > > 
> > > Linus, I'll send this as a PR tomorrow - there is also a bug fix for
> > > the rdma-netlink changes posted that should go too.
> > 
> > <2c>
> > I would be very happy to see this revisited in such a way
> > that some use is made of the C type system (instead of casts).
> > 
> 
> Well, I was thinking of swapping the union to sockaddr_storage ..
> 
> Do you propose to add a union to the kernel's sockaddr storage?

I understand you have been down that rabbit hole before but,
yes, in an ideal world that would be my preference.


Re: [PATCH] ARM: dts: Introduce the NXP LS1021A-TSN board

2019-05-17 Thread Vladimir Oltean
On Fri, 17 May 2019 at 04:05, Shawn Guo  wrote:
>
> On Mon, May 06, 2019 at 04:08:00AM +0300, Vladimir Oltean wrote:
> > The LS1021A-TSN is a development board built by VVDN/Argonboards in
> > partnership with NXP.
> >
> > It features the LS1021A SoC and the first-generation SJA1105T Ethernet
> > switch for prototyping implementations of a subset of IEEE 802.1 TSN
> > standards.
> >
> > It has two regular Ethernet ports and four switched, TSN-capable ports.
> >
> > It also features:
> > - One Arduino header
> > - One expansion header
> > - Two USB 3.0 ports
> > - One mini PCIe slot
> > - One SATA interface
> > - Accelerometer, gyroscope, temperature sensors
> >
> > Signed-off-by: Vladimir Oltean 
> > ---
> >  arch/arm/boot/dts/Makefile|   3 +-
> >  arch/arm/boot/dts/ls1021a-tsn.dts | 238 ++
> >  2 files changed, 240 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/arm/boot/dts/ls1021a-tsn.dts
> >
> > diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
> > index f4f5aeaf3298..529f0150f6b4 100644
> > --- a/arch/arm/boot/dts/Makefile
> > +++ b/arch/arm/boot/dts/Makefile
> > @@ -593,7 +593,8 @@ dtb-$(CONFIG_SOC_IMX7ULP) += \
> >  dtb-$(CONFIG_SOC_LS1021A) += \
> >   ls1021a-moxa-uc-8410a.dtb \
> >   ls1021a-qds.dtb \
> > - ls1021a-twr.dtb
> > + ls1021a-twr.dtb \
> > + ls1021a-tsn.dtb
>
> Please keep the list alphabetically sorted.  That said, ls1021a-tsn.dtb
> should go prior to ls1021a-twr.dtb.
>
> >  dtb-$(CONFIG_SOC_VF610) += \
> >   vf500-colibri-eval-v3.dtb \
> >   vf610-bk4.dtb \
> > diff --git a/arch/arm/boot/dts/ls1021a-tsn.dts 
> > b/arch/arm/boot/dts/ls1021a-tsn.dts
> > new file mode 100644
> > index ..5269486699bd
> > --- /dev/null
> > +++ b/arch/arm/boot/dts/ls1021a-tsn.dts
> > @@ -0,0 +1,238 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright 2016-2018 NXP Semiconductors
> > + * Copyright 2019 Vladimir Oltean 
> > + */
> > +
> > +/dts-v1/;
> > +#include "ls1021a.dtsi"
> > +
> > +/ {
> > + model = "NXP LS1021A-TSN Board";
> > +
> > + sys_mclk: clock-mclk {
> > + compatible = "fixed-clock";
> > + #clock-cells = <0>;
> > + clock-frequency = <24576000>;
> > + };
> > +
> > + regulators {
> > + compatible = "simple-bus";
> > + #address-cells = <1>;
> > + #size-cells = <0>;
>
> This is the old style of organizing fixed regulators, which has been
> complained by device tree maintainers.  Drop this container node and put
> the regulator nodes directly under root, using name schema below.
>
> reg_xxx: regulator-xxx {
> ...
> };
>
> And thus, 'reg' property in regulator node should be dropped.
>
> > +
> > + reg_3p3v: regulator@0 {
> > + compatible = "regulator-fixed";
> > + reg = <0>;
> > + regulator-name = "3P3V";
> > + regulator-min-microvolt = <330>;
> > + regulator-max-microvolt = <330>;
> > + regulator-always-on;
> > + };
> > + reg_2p5v: regulator@1 {
> > + compatible = "regulator-fixed";
> > + reg = <1>;
> > + regulator-name = "2P5V";
> > + regulator-min-microvolt = <250>;
> > + regulator-max-microvolt = <250>;
> > + regulator-always-on;
> > + };
> > + };
> > +};
> > +
> > +&enet0 {
> > + tbi-handle = <&tbi0>;
> > + phy-handle = <&sgmii_phy2>;
> > + phy-mode = "sgmii";
> > + status = "ok";
>
> For sake of consistency, we prefer to use "okay".
>
> > +};
> > +
> > +&enet1 {
> > + tbi-handle = <&tbi1>;
> > + phy-handle = <&sgmii_phy1>;
> > + phy-mode = "sgmii";
> > + status = "ok";
> > +};
> > +
> > +/* RGMII delays added via PCB traces */
> > +&enet2 {
> > + phy-mode = "rgmii";
> > + status = "ok";
>
> Please have a newline between property list and child node.
>
> > + fixed-link {
> > + speed = <1000>;
> > + full-duplex;
> > + };
> > +};
> > +
> > +&dspi0 {
>
> Please sort these labeled nodes alphabetically.
>
> > + bus-num = <0>;
> > + status = "ok";
> > +
> > + /* ADG704BRMZ 1:4 mux/demux */
> > + tsn_switch: sja1105@1 {
>
> Use a generic node name, while label name can be specific.
>
> > + reg = <0x1>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + compatible = "nxp,sja1105t";
>
> Undocumented compatible?
>
> > + /* 12 MHz */
> > + spi-max-frequency = <1200>;
> > + /* Sample data on trailing clock edge */
> > + spi-cpha;
> > + fsl,spi-cs-sck-delay = <1000>;
> > + fsl,spi-sck-cs-delay = <1000>;
>
> Have a newline.
>
> > + ports {
> > + 

Re: [PATCH bpf] selftests/bpf: fix bpf_get_current_task

2019-05-17 Thread Daniel Borkmann
On 05/17/2019 06:34 AM, Alexei Starovoitov wrote:
> Fix bpf_get_current_task() declaration.
> 
> Signed-off-by: Alexei Starovoitov 

Applied, thanks!


Re: [PATCH bpf] bpftool: fix BTF raw dump of FWD's fwd_kind

2019-05-17 Thread Daniel Borkmann
On 05/17/2019 08:21 AM, Andrii Nakryiko wrote:
> kflag bit determines whether FWD is for struct or union. Use that bit.
> 
> Fixes: c93cc69004df ("bpftool: add ability to dump BTF types")
> Signed-off-by: Andrii Nakryiko 

Applied, thanks!


RE: Kernel UDP behavior with missing destinations

2019-05-17 Thread David Laight
From: Willem de Bruijn
> Sent: 17 May 2019 04:23
> On Thu, May 16, 2019 at 8:27 PM Adam Urban  wrote:
> >
> > And replying to your earlier comment about TTL, yes I think a TTL on
> > arp_queues would be hugely helpful.
> >
> > In any environment where you are streaming time-sensitive UDP traffic,
> > you really want the kernel to be tuned to immediately drop the
> > outgoing packet if the destination isn't yet known/in the arp table
> > already...

I suspect we may suffer from the same problems when sending out a lot
of RTP (think of sending 1000s of UDP messages to different addresses
every 20ms).
For various reasons the sends are done from a single raw socket (rather
than 'connected' UDP sockets).

> For packets that need to be sent immediately or not at all, you
> probably do not want a TTL, but simply for the send call to fail
> immediately with EAGAIN instead of queuing the packet for ARP
> resolution at all. Which is approximated with unres_qlen 0.
> 
> The relation between unres_qlen_bytes, arp_queue and SO_SNDBUF is
> pretty straightforward in principal. Packets can be queued on the arp
> queue until the byte limit is reached. Any packets on this queue still
> have their memory counted towards their socket send budget. If a
> packet is queued that causes to exceed the threshold, older packets
> are freed and dropped as needed. Calculating the exact numbers is not
> as straightforward, as, for instance, skb->truesize is a kernel
> implementation detail.

But 'fiddling' with the arp queue will affect all traffic.
So you'd need it to be per socket option so that it is a property
of the message by the time it reaches the arp code.

> The simple solution is just to overprovision the socket SO_SNDBUF. If
> there are few sockets in the system that perform this role, that seems
> perfectly fine.

That depends on how often you are sending messages compared to the
arp timeout. If you are sending 50 messages a second to each of 1000
destinations the over provisioning of SO_SNDBUF would have to be extreme.

FWIW we do sometimes see sendmsg() taking much longer than expected,
but haven't get tracked down why.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: Kernel UDP behavior with missing destinations

2019-05-17 Thread Willem de Bruijn
On Fri, May 17, 2019 at 8:57 AM David Laight  wrote:
>
> From: Willem de Bruijn
> > Sent: 17 May 2019 04:23
> > On Thu, May 16, 2019 at 8:27 PM Adam Urban  wrote:
> > >
> > > And replying to your earlier comment about TTL, yes I think a TTL on
> > > arp_queues would be hugely helpful.
> > >
> > > In any environment where you are streaming time-sensitive UDP traffic,
> > > you really want the kernel to be tuned to immediately drop the
> > > outgoing packet if the destination isn't yet known/in the arp table
> > > already...
>
> I suspect we may suffer from the same problems when sending out a lot
> of RTP (think of sending 1000s of UDP messages to different addresses
> every 20ms).
> For various reasons the sends are done from a single raw socket (rather
> than 'connected' UDP sockets).
>
> > For packets that need to be sent immediately or not at all, you
> > probably do not want a TTL, but simply for the send call to fail
> > immediately with EAGAIN instead of queuing the packet for ARP
> > resolution at all. Which is approximated with unres_qlen 0.
> >
> > The relation between unres_qlen_bytes, arp_queue and SO_SNDBUF is
> > pretty straightforward in principal. Packets can be queued on the arp
> > queue until the byte limit is reached. Any packets on this queue still
> > have their memory counted towards their socket send budget. If a
> > packet is queued that causes to exceed the threshold, older packets
> > are freed and dropped as needed. Calculating the exact numbers is not
> > as straightforward, as, for instance, skb->truesize is a kernel
> > implementation detail.
>
> But 'fiddling' with the arp queue will affect all traffic.
> So you'd need it to be per socket option so that it is a property
> of the message by the time it reaches the arp code.

A per socket or even datagram do-not-queue signal would be
interesting. Where any queuing would instead result in send failure
(though this feedback signal does not work for secondary qdiscs).

The recent SCM_TXTIME cmsg has a deadline mode that might implement
this. In which case we would only have to check for it in the neighbor
layer.

>
> > The simple solution is just to overprovision the socket SO_SNDBUF. If
> > there are few sockets in the system that perform this role, that seems
> > perfectly fine.
>
> That depends on how often you are sending messages compared to the
> arp timeout. If you are sending 50 messages a second to each of 1000
> destinations the over provisioning of SO_SNDBUF would have to be extreme.
>
> FWIW we do sometimes see sendmsg() taking much longer than expected,
> but haven't get tracked down why.

I've observed this problem with health checks under particular ARP
settings as well.


Re: [PATCH] ARM: dts: Introduce the NXP LS1021A-TSN board

2019-05-17 Thread Shawn Guo
On Fri, May 17, 2019 at 03:05:59PM +0300, Vladimir Oltean wrote:
> Hi Shawn,
> 
> Thanks for the feedback!
> Do you want a v2 now (will you merge it for 5.2) or should I send it
> after the merge window closes?

It's a 5.3 material.

Shawn

> The "nxp,sja1105t" compatible is not undocumented but belongs to
> drivers/net/dsa/sja1105/ which was recently merged into mainline via
> the netdev tree (hence it's not in your tree yet).
> The situation with "ad7924" is more funny. The compatible is indeed
> undocumented but belongs to drivers/iio/adc/ad7923.c. I don't know why
> it lacks an entry in Documentation/devicetree/bindings/iio/adc/.
> However I mistook the chip and it's not a Analog Devices AD7924 ADC
> with a SPI interface, but a TI ADS7924 ADC with an I2C interface. I
> can remove it from v2 since it does not have a Linux driver as far as
> I can tell.
> 
> -Vladimir


[PATCH iproute2-next] treewide: refactor help messages

2019-05-17 Thread Matteo Croce
Every tool in the iproute2 package have one or more function to show
an help message to the user. Some of these functions print the help
line by line with a series of printf call, e.g. ip/xfrm_state.c does
60 fprintf calls.
If we group all the calls to a single one and just concatenate strings,
we save a lot of libc calls and thus object size. The size difference
of the compiled binaries calculated with bloat-o-meter is:

ip/ip:
add/remove: 0/0 grow/shrink: 5/15 up/down: 103/-4796 (-4693)
Total: Before=672591, After=667898, chg -0.70%
ip/rtmon:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-54 (-54)
Total: Before=48879, After=48825, chg -0.11%
tc/tc:
add/remove: 0/2 grow/shrink: 31/10 up/down: 882/-6133 (-5251)
Total: Before=351912, After=346661, chg -1.49%
bridge/bridge:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-459 (-459)
Total: Before=70502, After=70043, chg -0.65%
misc/lnstat:
add/remove: 0/1 grow/shrink: 1/0 up/down: 48/-486 (-438)
Total: Before=9960, After=9522, chg -4.40%
tipc/tipc:
add/remove: 0/0 grow/shrink: 1/1 up/down: 18/-62 (-44)
Total: Before=79182, After=79138, chg -0.06%

While at it, indent some strings which were starting at column 0,
and use tabs where possible, to have a consistent style across helps.

Signed-off-by: Matteo Croce 
---
 bridge/link.c|  35 +-
 bridge/mdb.c |   5 +-
 ip/ip.c  |  26 
 ip/ip6tunnel.c   |  35 +-
 ip/ipaddress.c   |  53 +++
 ip/ipaddrlabel.c |   5 +-
 ip/ipila.c   |  10 +--
 ip/iplink.c  | 107 +++---
 ip/iplink_bridge.c   |  66 +--
 ip/iplink_bridge_slave.c |  40 ++--
 ip/iplink_geneve.c   |  32 -
 ip/iplink_hsr.c  |  24 +++
 ip/iplink_ipoib.c|   4 +-
 ip/iplink_vlan.c |  16 ++---
 ip/iplink_vxlan.c|  54 +++
 ip/ipmaddr.c |   5 +-
 ip/ipmonitor.c   |   9 +--
 ip/ipmroute.c|  10 +--
 ip/ipneigh.c |  17 +++--
 ip/ipnetns.c |  23 +++
 ip/ipntable.c|  12 ++--
 ip/ipseg6.c  |  13 ++--
 ip/iptunnel.c|  25 +++
 ip/iptuntap.c|  13 ++--
 ip/ipvrf.c   |   9 +--
 ip/link_gre.c|  63 --
 ip/link_gre6.c   |  73 ++---
 ip/link_ip6tnl.c |  67 ---
 ip/link_iptnl.c  |  63 --
 ip/link_vti.c|  24 +++
 ip/link_vti6.c   |  24 +++
 ip/link_xfrm.c   |   7 +-
 ip/rtmon.c   |   9 +--
 ip/tcp_metrics.c |   9 +--
 ip/xfrm_monitor.c|   5 +-
 ip/xfrm_policy.c |  99 +++-
 ip/xfrm_state.c  | 138 ++-
 misc/lnstat.c|  47 ++---
 misc/nstat.c |  24 +++
 tc/e_bpf.c   |  28 
 tc/f_basic.c |  16 +++--
 tc/f_bpf.c   |  62 +-
 tc/f_flow.c  |  30 -
 tc/f_flower.c|  90 -
 tc/f_fw.c|  19 ++
 tc/f_matchall.c  |  16 +++--
 tc/f_route.c |  12 ++--
 tc/f_rsvp.c  |  19 +++---
 tc/f_tcindex.c   |   7 +-
 tc/m_action.c|  26 
 tc/m_bpf.c   |  54 +++
 tc/m_connmark.c  |   5 +-
 tc/m_estimator.c |   9 +--
 tc/m_gact.c  |   3 +-
 tc/m_ife.c   |   9 ++-
 tc/m_pedit.c |   2 +-
 tc/m_police.c|  18 ++---
 tc/m_sample.c|  19 +++---
 tc/m_simple.c|   5 +-
 tc/m_tunnel_key.c|   4 +-
 tc/q_atm.c   |   5 +-
 tc/q_cake.c  |  30 -
 tc/q_cbq.c   |  20 +++---
 tc/q_cbs.c   |   6 +-
 tc/q_choke.c |   5 +-
 tc/q_codel.c |   7 +-
 tc/q_etf.c   |  11 ++--
 tc/q_fq.c|  15 +++--
 tc/q_fq_codel.c  |  11 ++--
 tc/q_gred.c  |  11 ++--
 tc/q_hhf.c   |  13 ++--
 tc/q_mqprio.c|  13 ++--
 tc/q_netem.c |  32 -
 tc/q_pie.c   |   7 +-
 tc/q_red.c   |   7 +-
 tc/q_sfq.c   |  13 ++--
 tc/q_taprio.c|  15 +++--
 tc/q_tbf.c   |   7 +-
 tc/tc.c  |  12 ++--
 tc/tc_class.c|  17 ++---
 tc/tc_exec.c |   9 +--
 tc/tc_qdisc.c|  25 +++
 tipc/bearer.c|  44 ++---
 83 files changed, 1066 insertions(+), 1022 deletions(-)

diff --git a/bridge/link.c b/bridge/link.c
index 04cfc144..074edf00 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -254,23 +254,24 @@ int print_linkinfo(struc

Re: RFC: Fixing SK_REUSEPORT from sk_lookup_* helpers

2019-05-17 Thread Lorenz Bauer
On Thu, 16 May 2019 at 21:33, Alexei Starovoitov
 wrote:
>
> On Thu, May 16, 2019 at 09:41:34AM +0100, Lorenz Bauer wrote:
> > On Wed, 15 May 2019 at 18:16, Joe Stringer  wrote:
> > >
> > > On Wed, May 15, 2019 at 8:11 AM Lorenz Bauer  wrote:
> > > >
> > > > In the BPF-based TPROXY session with Joe Stringer [1], I mentioned
> > > > that the sk_lookup_* helpers currently return inconsistent results if
> > > > SK_REUSEPORT programs are in play.
> > > >
> > > > SK_REUSEPORT programs are a hook point in inet_lookup. They get access
> > > > to the full packet
> > > > that triggered the look up. To support this, inet_lookup gained a new
> > > > skb argument to provide such context. If skb is NULL, the SK_REUSEPORT
> > > > program is skipped and instead the socket is selected by its hash.
> > > >
> > > > The first problem is that not all callers to inet_lookup from BPF have
> > > > an skb, e.g. XDP. This means that a look up from XDP gives an
> > > > incorrect result. For now that is not a huge problem. However, once we
> > > > get sk_assign as proposed by Joe, we can end up circumventing
> > > > SK_REUSEPORT.
> > >
> > > To clarify a bit, the reason this is a problem is that a
> > > straightforward implementation may just consider passing the skb
> > > context into the sk_lookup_*() and through to the inet_lookup() so
> > > that it would run the SK_REUSEPORT BPF program for socket selection on
> > > the skb when the packet-path BPF program performs the socket lookup.
> > > However, as this paragraph describes, the skb context is not always
> > > available.
> > >
> > > > At the conference, someone suggested using a similar approach to the
> > > > work done on the flow dissector by Stanislav: create a dedicated
> > > > context sk_reuseport which can either take an skb or a plain pointer.
> > > > Patch up load_bytes to deal with both. Pass the context to
> > > > inet_lookup.
> > > >
> > > > This is when we hit the second problem: using the skb or XDP context
> > > > directly is incorrect, because it assumes that the relevant protocol
> > > > headers are at the start of the buffer. In our use case, the correct
> > > > headers are at an offset since we're inspecting encapsulated packets.
> > > >
> > > > The best solution I've come up with is to steal 17 bits from the flags
> > > > argument to sk_lookup_*, 1 bit for BPF_F_HEADERS_AT_OFFSET, 16bit for
> > > > the offset itself.
> > >
> > > FYI there's also the upper 32 bits of the netns_id parameter, another
> > > option would be to steal 16 bits from there.
> >
> > Or len, which is only 16 bits realistically. The offset doesn't really fit 
> > into
> > either of them very well, using flags seemed the cleanest to me.
> > Is there some best practice around this?
> >
> > >
> > > > Thoughts?
> > >
> > > Internally with skbs, we use `skb_pull()` to manage header offsets,
> > > could we do something similar with `bpf_xdp_adjust_head()` prior to
> > > the call to `bpf_sk_lookup_*()`?
> >
> > That would only work if it retained the contents of the skipped
> > buffer, and if there
> > was a way to undo the adjustment later. We're doing the sk_lookup to
> > decide whether to
> > accept or forward the packet, so at the point of the call we might still 
> > need
> > that data. Is that feasible with skb / XDP ctx?
>
> While discussing the solution for reuseport I propose to use
> progs/test_select_reuseport_kern.c as an example of realistic program.
> It reads tcp/udp header directly via ctx->data or via bpf_skb_load_bytes()
> including payload after the header.
> It also uses bpf_skb_load_bytes_relative() to fetch IP.
> I think if we're fixing the sk_lookup from XDP the above program
> would need to work.

Agreed.

> And I think we can make it work by adding new requirement that
> 'struct bpf_sock_tuple *' argument to bpf_sk_lookup_* must be
> a pointer to the packet and not a pointer to bpf program stack.

This would break existing users, no? The sk_assign use case Joe Stringer
is working on would also break, because its impossible to look up a tuple
that hasn't come from the network.

It occurs to me that it's impossible to reconcile this use case with
SK_REUSEPORT in general. It would be great if we could return an
error in such case.

> Then helper can construct a fake skb and assign
> fake_skb->data = &bpf_sock_tuple_arg.sport

That isn't valid if the packet contains IP options or extension headers, because
the offset of sport is variable.

> It can check that struct bpf_sock_tuple * pointer is within 100-ish bytes
> from xdp->data and within xdp->data_end

Why the 100-byte limitation?

> This way the reuseport program's assumption that ctx->data points to tcp/udp
> will be preserved and it can access it all including payload.

How about the following:

sk_lookup(ctx, &saddr, len, netns, BPF_F_IPV4 |
BPF_F_OFFSET(offsetof(sport))

SK_REUSEPORT can then access from saddr+offsetof(sport) to saddr+len.
The helper uses
offsetof(sport) to retrieve the tuple.

- Works with sta

Re: 5.1 `ip route get addr/cidr` regression

2019-05-17 Thread Michal Kubecek
On Fri, May 17, 2019 at 12:22:41PM +0200, Jason A. Donenfeld wrote:
> Hi,
> 
> I'm back now and catching up with a lot of things. A few people have
> mentioned to me that wg-quick(8), a bash script that makes a bunch of
> iproute2 invocations, appears to be broken on 5.1. I've distilled the
> behavior change down to the following.
> 
> Behavior on 5.0:
> 
> + ip link add wg0 type dummy
> + ip address add 192.168.50.2/24 dev wg0
> + ip link set mtu 1420 up dev wg0
> + ip route get 192.168.50.0/24
> broadcast 192.168.50.0 dev wg0 src 192.168.50.2 uid 0
>cache 
> 
> Behavior on 5.1:
> 
> + ip link add wg0 type dummy
> + ip address add 192.168.50.2/24 dev wg0
> + ip link set mtu 1420 up dev wg0
> + ip route get 192.168.50.0/24
> RTNETLINK answers: Invalid argument

With recent kernel and iproute2 5.1, I get

  alaris:~ # ip route get 172.17.1.2/24
  Error: ipv4: Invalid values in header for route get request.

This message comes from kernel commit a00302b60777 ("net: ipv4: route:
perform strict checks also for doit handlers") which only considers the
range valid if the prefix is /32 (a single address).

But these checks are only performed when userspace requests strict
validation which iproute2 does since (iproute2) commit aea41afcfd6d ("ip
bridge: Set NETLINK_GET_STRICT_CHK on socket"). So I would say the
change is a result of the combination of kernel (5.1) commit
a00302b60777 and iproute2 (5.0) commit aea41afcfd6d.

> Upon investigating, I'm not sure that `ip route get` was ever suitable
> for getting details on a particular route. So I'll adjust the
> wg-quick(8) code accordingly. But FYI, this is unexpected userspace
> breakage.

AFAIK the purpose of 'ip route get' always was to let the user check
the result of a route lookup, i.e. "what route would be used if I sent
a packet to an address". To be honest I would have to check how exactly
was "ip route get /" implemented before.

Michal Kubecek


[PATCH v2] ipvs: Fix use-after-free in ip_vs_in

2019-05-17 Thread YueHaibing
BUG: KASAN: use-after-free in ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
Read of size 4 at addr 8881e9b26e2c by task sshd/5603

CPU: 0 PID: 5603 Comm: sshd Not tainted 4.19.39+ #30
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
 dump_stack+0x71/0xab
 print_address_description+0x6a/0x270
 kasan_report+0x179/0x2c0
 ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
 ip_vs_in+0xd8/0x170 [ip_vs]
 nf_hook_slow+0x5f/0xe0
 __ip_local_out+0x1d5/0x250
 ip_local_out+0x19/0x60
 __tcp_transmit_skb+0xba1/0x14f0
 tcp_write_xmit+0x41f/0x1ed0
 ? _copy_from_iter_full+0xca/0x340
 __tcp_push_pending_frames+0x52/0x140
 tcp_sendmsg_locked+0x787/0x1600
 ? tcp_sendpage+0x60/0x60
 ? inet_sk_set_state+0xb0/0xb0
 tcp_sendmsg+0x27/0x40
 sock_sendmsg+0x6d/0x80
 sock_write_iter+0x121/0x1c0
 ? sock_sendmsg+0x80/0x80
 __vfs_write+0x23e/0x370
 vfs_write+0xe7/0x230
 ksys_write+0xa1/0x120
 ? __ia32_sys_read+0x50/0x50
 ? __audit_syscall_exit+0x3ce/0x450
 do_syscall_64+0x73/0x200
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7ff6f6147c60
Code: 73 01 c3 48 8b 0d 28 12 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 
00 00 83 3d 5d 73 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 
c3 48 83
RSP: 002b:7ffd772ead18 EFLAGS: 0246 ORIG_RAX: 0001
RAX: ffda RBX: 0034 RCX: 7ff6f6147c60
RDX: 0034 RSI: 55df30a31270 RDI: 0003
RBP: 55df30a31270 R08:  R09: 
R10: 7ffd772ead70 R11: 0246 R12: 7ffd772ead74
R13: 7ffd772eae20 R14: 7ffd772eae24 R15: 55df2f12ddc0

Allocated by task 6052:
 kasan_kmalloc+0xa0/0xd0
 __kmalloc+0x10a/0x220
 ops_init+0x97/0x190
 register_pernet_operations+0x1ac/0x360
 register_pernet_subsys+0x24/0x40
 0xc0ea016d
 do_one_initcall+0x8b/0x253
 do_init_module+0xe3/0x335
 load_module+0x2fc0/0x3890
 __do_sys_finit_module+0x192/0x1c0
 do_syscall_64+0x73/0x200
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 6067:
 __kasan_slab_free+0x130/0x180
 kfree+0x90/0x1a0
 ops_free_list.part.7+0xa6/0xc0
 unregister_pernet_operations+0x18b/0x1f0
 unregister_pernet_subsys+0x1d/0x30
 ip_vs_cleanup+0x1d/0xd2f [ip_vs]
 __x64_sys_delete_module+0x20c/0x300
 do_syscall_64+0x73/0x200
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at 8881e9b26600 which belongs to 
the cache kmalloc-4096 of size 4096
The buggy address is located 2092 bytes inside of 4096-byte region 
[8881e9b26600, 8881e9b27600)
The buggy address belongs to the page:
page:ea0007a6c800 count:1 mapcount:0 mapping:888107c0e600 index:0x0 
compound_mapcount: 0
flags: 0x17c0008100(slab|head)
raw: 0017c0008100 dead0100 dead0200 888107c0e600
raw:  80070007 0001 
page dumped because: kasan: bad access detected

while unregistering ipvs module, ops_free_list calls
__ip_vs_cleanup, then nf_unregister_net_hooks be called to
do remove nf hook entries. It need a RCU period to finish,
however net->ipvs is set to NULL immediately, which will
trigger NULL pointer dereference when a packet is hooked
and handled by ip_vs_in where net->ipvs is dereferenced.

Another scene is ops_free_list call ops_free to free the
net_generic directly while __ip_vs_cleanup finished, then
calling ip_vs_in will triggers use-after-free.

This patch moves nf_unregister_net_hooks from __ip_vs_cleanup()
to __ip_vs_dev_cleanup(),  where rcu_barrier() is called by
unregister_pernet_device -> unregister_pernet_operations,
that will do the needed grace period.

Reported-by: Hulk Robot 
Fixes: efe41606184e ("ipvs: convert to use pernet nf_hook api")
Suggested-by: Julian Anastasov 
Signed-off-by: YueHaibing 
---
v2: fix by moving nf_unregister_net_hooks from __ip_vs_cleanup() to 
__ip_vs_dev_cleanup()
---
 net/netfilter/ipvs/ip_vs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 14457551bcb4..8ebf21149ec3 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -2312,7 +2312,6 @@ static void __net_exit __ip_vs_cleanup(struct net *net)
 {
struct netns_ipvs *ipvs = net_ipvs(net);
 
-   nf_unregister_net_hooks(net, ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
ip_vs_service_net_cleanup(ipvs);/* ip_vs_flush() with locks */
ip_vs_conn_net_cleanup(ipvs);
ip_vs_app_net_cleanup(ipvs);
@@ -2327,6 +2326,7 @@ static void __net_exit __ip_vs_dev_cleanup(struct net 
*net)
 {
struct netns_ipvs *ipvs = net_ipvs(net);
EnterFunction(2);
+   nf_unregister_net_hooks(net, ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
ipvs->enable = 0;   /* Disable packet reception */
smp_wmb();
ip_vs_sync_net_cleanup(ipvs);
-- 
2.20.1




Re: [PATCH] net: sysctl: cleanup net_sysctl_init error exit paths

2019-05-17 Thread George G. Davis
Hello David,

On Thu, May 16, 2019 at 02:27:44PM -0700, David Miller wrote:
> From: "George G. Davis" 
> Date: Thu, 16 May 2019 11:23:08 -0400
> 
> > Unwind net_sysctl_init error exit goto spaghetti code
> > 
> > Suggested-by: Joshua Frkuska 
> > Signed-off-by: George G. Davis 
> 
> Cleanups are not appropriate until the net-next tree opens back up.
> 
> So please resubmit at that time.

I fear that I may be distracted by other shiny objects by then but
I'll make a reminder and try to resubmit during the next merge window.

Thanks!

> 
> Thank you.

-- 
Regards,
George


Re: [PATCH iproute2-next] treewide: refactor help messages

2019-05-17 Thread Stephen Hemminger
On Fri, 17 May 2019 15:38:28 +0200
Matteo Croce  wrote:

> Every tool in the iproute2 package have one or more function to show
> an help message to the user. Some of these functions print the help
> line by line with a series of printf call, e.g. ip/xfrm_state.c does
> 60 fprintf calls.
> If we group all the calls to a single one and just concatenate strings,
> we save a lot of libc calls and thus object size. The size difference
> of the compiled binaries calculated with bloat-o-meter is:
> 
> ip/ip:
> add/remove: 0/0 grow/shrink: 5/15 up/down: 103/-4796 (-4693)
> Total: Before=672591, After=667898, chg -0.70%
> ip/rtmon:
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-54 (-54)
> Total: Before=48879, After=48825, chg -0.11%
> tc/tc:
> add/remove: 0/2 grow/shrink: 31/10 up/down: 882/-6133 (-5251)
> Total: Before=351912, After=346661, chg -1.49%
> bridge/bridge:
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-459 (-459)
> Total: Before=70502, After=70043, chg -0.65%
> misc/lnstat:
> add/remove: 0/1 grow/shrink: 1/0 up/down: 48/-486 (-438)
> Total: Before=9960, After=9522, chg -4.40%
> tipc/tipc:
> add/remove: 0/0 grow/shrink: 1/1 up/down: 18/-62 (-44)
> Total: Before=79182, After=79138, chg -0.06%
> 
> While at it, indent some strings which were starting at column 0,
> and use tabs where possible, to have a consistent style across helps.
> 
> Signed-off-by: Matteo Croce 

Looks good thanks, I had been doing this bit by bit over time.


Re: [PATCH] Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.

2019-05-17 Thread Felipe Gasper


> On May 17, 2019, at 12:59 AM, Andy Lutomirski  wrote:
> 
>> On May 16, 2019, at 8:25 PM, Felipe  wrote:
>> 
>> Author: Felipe Gasper 
>> Date:   Thu May 16 12:16:53 2019 -0500
>> 
>>   Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
>> 
>>   This adds the ability for Netlink to report a socket’s UID along with the
>>   other UNIX socket diagnostic information that is already available. This 
>> will
>>   allow diagnostic tools greater insight into which users control which 
>> socket.
>> 
>>   Signed-off-by: Felipe Gasper 
>> 
>> diff --git a/include/uapi/linux/unix_diag.h b/include/uapi/linux/unix_diag.h
>> index 5c502fd..a198857 100644
>> --- a/include/uapi/linux/unix_diag.h
>> +++ b/include/uapi/linux/unix_diag.h
>> @@ -20,6 +20,7 @@ struct unix_diag_req {
>> #define UDIAG_SHOW_ICONS0x0008/* show pending connections */
>> #define UDIAG_SHOW_RQLEN0x0010/* show skb receive queue len */
>> #define UDIAG_SHOW_MEMINFO0x0020/* show memory info of a socket 
>> */
>> +#define UDIAG_SHOW_UID0x0040/* show socket's UID */
>> 
>> struct unix_diag_msg {
>>   __u8udiag_family;
>> @@ -40,6 +41,7 @@ enum {
>>   UNIX_DIAG_RQLEN,
>>   UNIX_DIAG_MEMINFO,
>>   UNIX_DIAG_SHUTDOWN,
>> +UNIX_DIAG_UID,
>> 
>>   __UNIX_DIAG_MAX,
>> };
>> diff --git a/net/unix/diag.c b/net/unix/diag.c
>> index 3183d9b..011f56c 100644
>> --- a/net/unix/diag.c
>> +++ b/net/unix/diag.c
>> @@ -110,6 +110,11 @@ static int sk_diag_show_rqlen(struct sock *sk, struct 
>> sk_buff *nlskb)
>>   return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql);
>> }
>> 
>> +static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb)
>> +{
>> +return nla_put(nlskb, UNIX_DIAG_UID, sizeof(kuid_t), &(sk->sk_uid));
> 
> That type is called *k* uid_t because it’s internal to the kernel. You
> probably want from_kuid_munged(), which will fix it up for an
> appropriate userns.  Presumably you want sk’s netns’s userns.

Thank you for pointing this out.

Would it suffice to get the userns as: “sk_user_ns(sk)”?

Or would it be better to pass struct netlink_callback *cb from unix_diag_dump() 
to sk_diag_dump() to sk_diag_fill(), then to the new function to add the UID?

cheers,
-Felipe Gasper

Re: dsa: using multi-gbps speeds on CPU port

2019-05-17 Thread Maxime Chevallier
Hi everyone,

On Wed, 15 May 2019 09:09:26 -0700
Florian Fainelli  wrote:

>On 5/15/19 7:02 AM, Maxime Chevallier wrote:
>> Hi Andrew,
>> 
>> On Wed, 15 May 2019 15:27:01 +0200
>> Andrew Lunn  wrote:
>>   
>>> I think you are getting your terminology wrong. 'master' is eth0 in
>>> the example you gave above. CPU and DSA ports don't have netdev
>>> structures, and so any PHY used with them is not corrected to a
>>> netdev.  
>> 
>> Ah yes sorry, I'm still in the process of getting familiar with the
>> internals of DSA :/
>>   
 I'll be happy to help on that, but before prototyping anything, I wanted
 to have your thougts on this, and see if you had any plans.
>>>
>>> There are two different issues here.
>>>
>>> 1) Is using a fixed-link on a CPU or DSA port the right way to do this?
>>> 2) Making fixed-link support > 1G.
>>>
>>> The reason i decided to use fixed-link on CPU and DSA ports is that we
>>> already have all the code needed to configure a port, and an API to do
>>> it, the adjust_link() callback. Things have moved on since then, and
>>> we now have an additional API, .phylink_mac_config(). It might be
>>> better to directly use that. If there is a max-speed property, create
>>> a phylink_link_state structure, which has no reference to a netdev,
>>> and pass it to .phylink_mac_config().
>>>
>>> It is just an idea, but maybe you could investigate if that would
>>> work.  

I've quickly prototyped and tested this solution, and besides a few
tweaks that are needed on the mv88e6xxx driver side, it works fine.

I'll post an RFC with this shortly, so that you can see what it looks
like.

As Russell said, there wasn't anything needed on the master interface
side.

>
>Vladimir mentioned a few weeks ago that he is considering adding support
>for PHYLIB and PHYLINK to run without a net_device instance, you two
>should probably coordinate with each other and make sure both of your
>requirements (which are likely the same) get addressed.

That would help a lot solving this issue indeed, I'll be happy to help
on that, thanks for the tip !

Maxime


-- 
Maxime Chevallier, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


[PATCH] net: Treat sock->sk_drops as an unsigned int when printing

2019-05-17 Thread Patrick Talbert
Currently, procfs socket stats format sk_drops as a signed int (%d). For large
values this will cause a negative number to be printed.

We know the drop count can never be a negative so change the format specifier to
%u.

Signed-off-by: Patrick Talbert 
---
 net/ipv4/ping.c  | 2 +-
 net/ipv4/raw.c   | 2 +-
 net/ipv4/udp.c   | 2 +-
 net/ipv6/datagram.c  | 2 +-
 net/netlink/af_netlink.c | 2 +-
 net/phonet/socket.c  | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 7ccb5f87f70b..834be7daeb32 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -1113,7 +1113,7 @@ static void ping_v4_format_sock(struct sock *sp, struct 
seq_file *f,
__u16 srcp = ntohs(inet->inet_sport);
 
seq_printf(f, "%5d: %08X:%04X %08X:%04X"
-   " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d",
+   " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %u",
bucket, src, srcp, dest, destp, sp->sk_state,
sk_wmem_alloc_get(sp),
sk_rmem_alloc_get(sp),
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index dc91c27bb788..0e482f07b37f 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -1076,7 +1076,7 @@ static void raw_sock_seq_show(struct seq_file *seq, 
struct sock *sp, int i)
  srcp  = inet->inet_num;
 
seq_printf(seq, "%4d: %08X:%04X %08X:%04X"
-   " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d\n",
+   " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %u\n",
i, src, srcp, dest, destp, sp->sk_state,
sk_wmem_alloc_get(sp),
sk_rmem_alloc_get(sp),
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3c58ba02af7d..8fb250ed53d4 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2883,7 +2883,7 @@ static void udp4_format_sock(struct sock *sp, struct 
seq_file *f,
__u16 srcp= ntohs(inet->inet_sport);
 
seq_printf(f, "%5d: %08X:%04X %08X:%04X"
-   " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d",
+   " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %u",
bucket, src, srcp, dest, destp, sp->sk_state,
sk_wmem_alloc_get(sp),
udp_rqueue_get(sp),
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index ee4a4e54d016..f07fb24f4ba1 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -1034,7 +1034,7 @@ void __ip6_dgram_sock_seq_show(struct seq_file *seq, 
struct sock *sp,
src   = &sp->sk_v6_rcv_saddr;
seq_printf(seq,
   "%5d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-  "%02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d\n",
+  "%02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %u\n",
   bucket,
   src->s6_addr32[0], src->s6_addr32[1],
   src->s6_addr32[2], src->s6_addr32[3], srcp,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 216ab915dd54..718a97d5f1fd 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2642,7 +2642,7 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
struct sock *s = v;
struct netlink_sock *nlk = nlk_sk(s);
 
-   seq_printf(seq, "%pK %-3d %-10u %08x %-8d %-8d %-5d %-8d %-8d 
%-8lu\n",
+   seq_printf(seq, "%pK %-3d %-10u %08x %-8d %-8d %-5d %-8d %-8u 
%-8lu\n",
   s,
   s->sk_protocol,
   nlk->portid,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 30187990257f..2567af2fbd6f 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -607,7 +607,7 @@ static int pn_sock_seq_show(struct seq_file *seq, void *v)
struct pn_sock *pn = pn_sk(sk);
 
seq_printf(seq, "%2d %04X:%04X:%02X %02X %08X:%08X %5d %lu "
-   "%d %pK %d",
+   "%d %pK %u",
sk->sk_protocol, pn->sobject, pn->dobject,
pn->resource, sk->sk_state,
sk_wmem_alloc_get(sk), sk_rmem_alloc_get(sk),
-- 
2.18.1



Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation

2019-05-17 Thread Alexander Duyck
On Thu, May 16, 2019 at 6:48 PM Florian Fainelli  wrote:
>
>
>
> On 5/16/2019 6:03 PM, Daniel Walker wrote:
> > On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
> >> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
> >>>
> >>>
> >>> On 5/16/19, 12:35 PM, "Jeff Kirsher"  wrote:
> >>>
> >>> On Wed, 2019-05-08 at 23:14 +, Nikunj Kela wrote:
> >>>>> Some of the broken NICs don't have EEPROM programmed correctly. It
> >>>>> results
> >>>>> in probe to fail. This change adds a module parameter that can be
> >>>>> used to
> >>>>> ignore nvm checksum validation.
> >>>>>
> >>>>> Cc: xe-linux-exter...@cisco.com
> >>>>> Signed-off-by: Nikunj Kela 
> >>>>> ---
> >>>>>  drivers/net/ethernet/intel/igb/igb_main.c | 28
> >>>>> ++--
> >>>>>  1 file changed, 22 insertions(+), 6 deletions(-)
> >>>
> >>> >NAK for two reasons.  First, module parameters are not desirable
> >>> >because their individual to one driver and a global solution should 
> >>> be
> >>> >found so that all networking device drivers can use the solution.  
> >>> This
> >>> >will keep the interface to change/setup/modify networking drivers
> >>> >consistent for all drivers.
> >>>
> >>>
> >>> >Second and more importantly, if your NIC is broken, fix it.  Do not 
> >>> try
> >>> >and create a software workaround so that you can continue to use a
> >>> >broken NIC.  There are methods/tools available to properly reprogram
> >>> >the EEPROM on a NIC, which is the right solution for your issue.
> >>>
> >>> I am proposing this as a debug parameter. Obviously, we need to fix 
> >>> EEPROM but this helps us continuing the development while manufacturing 
> >>> fixes NIC.
> >>
> >> Then why even bother with sending this upstream?
> >
> > It seems rather drastic to disable the entire driver because the checksum
> > doesn't match. It really should be a warning, even a big warning, to let 
> > people
> > know something is wrong, but disabling the whole driver doesn't make sense.
>
> You could generate a random Ethernet MAC address if you don't have a
> valid one, a lot of drivers do that, and that's a fairly reasonable
> behavior. At some point in your product development someone will
> certainly verify that the provisioned MAC address matches the network
> interface's MAC address.
> --
> Florian

The thing is the EEPROM contains much more than just the MAC address.
There ends up being configuration for some of the PCIe interface in
the hardware as well as PHY configuration. If that is somehow mangled
we shouldn't be bringing up the part because there are one or more
pieces of the device configuration that are likely wrong.

The checksum is being used to make sure the EEPROM is valid, without
that we would need to go through and validate each individual section
of the EEPROM before enabling the the portions of the device related
to it. The concern is that this will become a slippery slope where we
eventually have to code all the configuration of the EEPROM into the
driver itself.

We need to make the checksum a hard stop. If the part is broken then
it needs to be addressed. Workarounds just end up being used and
forgotten, which makes it that much harder to support the product.
Better to mark the part as being broken, and get it fixed now, than to
have parts start shipping that require workarounds in order to
function.

- Alex


Re: 5.1 `ip route get addr/cidr` regression

2019-05-17 Thread David Ahern
On 5/17/19 4:22 AM, Jason A. Donenfeld wrote:
> Hi,
> 
> I'm back now and catching up with a lot of things. A few people have
> mentioned to me that wg-quick(8), a bash script that makes a bunch of
> iproute2 invocations, appears to be broken on 5.1. I've distilled the
> behavior change down to the following.
> 
> Behavior on 5.0:
> 
> + ip link add wg0 type dummy
> + ip address add 192.168.50.2/24 dev wg0
> + ip link set mtu 1420 up dev wg0
> + ip route get 192.168.50.0/24
> broadcast 192.168.50.0 dev wg0 src 192.168.50.2 uid 0
>cache 
> 
> Behavior on 5.1:
> 
> + ip link add wg0 type dummy
> + ip address add 192.168.50.2/24 dev wg0
> + ip link set mtu 1420 up dev wg0
> + ip route get 192.168.50.0/24
> RTNETLINK answers: Invalid argument

This is a 5.1 change.
a00302b607770 ("net: ipv4: route: perform strict checks also for doit
handlers")

Basically, the /24 is unexpected. I'll send a patch.

> 
> Upon investigating, I'm not sure that `ip route get` was ever suitable
> for getting details on a particular route. So I'll adjust the

'ip route get  fibmatch' will show the fib entry.



Re: 5.1 `ip route get addr/cidr` regression

2019-05-17 Thread David Ahern
On 5/17/19 8:17 AM, Michal Kubecek wrote:
> AFAIK the purpose of 'ip route get' always was to let the user check
> the result of a route lookup, i.e. "what route would be used if I sent
> a packet to an address". To be honest I would have to check how exactly
> was "ip route get /" implemented before.
> 

The prefixlen was always silently ignored. We are trying to clean up
this 'silent ignoring' just hitting a few speed bumps.


Re: 5.1 `ip route get addr/cidr` regression

2019-05-17 Thread Jason A. Donenfeld
On Fri, May 17, 2019 at 5:21 PM David Ahern  wrote:
>
> On 5/17/19 8:17 AM, Michal Kubecek wrote:
> > AFAIK the purpose of 'ip route get' always was to let the user check
> > the result of a route lookup, i.e. "what route would be used if I sent
> > a packet to an address". To be honest I would have to check how exactly
> > was "ip route get /" implemented before.
> >
>
> The prefixlen was always silently ignored. We are trying to clean up
> this 'silent ignoring' just hitting a few speed bumps.

Indeed what we were after has always been, `ip route show dev 
match /`, and the old positive return value from `ip
route get` wasn't always correct for what we were using it for. So
mostly the breakage exposed another bug here.


Re: [RFC PATCH v2 net-next 0/3] flow_offload: Re-add per-action statistics

2019-05-17 Thread Edward Cree
On 15/05/2019 20:39, Edward Cree wrote:
> A point for discussion: would it be better if, instead of the tcfa_index
>  (for which the driver has to know the rules about which flow_action
>  types share a namespace), we had some kind of globally unique cookie?
>  In the same way that rule->cookie is really a pointer, could we use the
>  address of the TC-internal data structure representing the action?  Do
>  rules that share an action all point to the same struct tc_action in
>  their tcf_exts, for instance?
A quick test showed that, indeed, they do; I'm now leaning towards the
 approach of adding "unsigned long cookie" to struct flow_action_entry
 and populating it with (unsigned long)act in tc_setup_flow_action().
Pablo, how do the two options interact with your netfilter offload?  I'm
 guessing it's easier for you to find a unique pointer than to generate
 a unique u32 action_index for each action.  I'm also assuming that
 netfilter doesn't have a notion of shared actions.

-Ed


[PATCH net-next RFC] ipv6: elide flowlabel check if no exclusive leases exist

2019-05-17 Thread Willem de Bruijn
From: Willem de Bruijn 

Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
If not set, by default an autogenerated flowlabel is selected.

Explicit flowlabels require a control operation per label plus a
datapath check on every connection (every datagram if unconnected).

This is particularly expensive on unconnected sockets with many
connections, such as QUIC.

In the common case, where no lease is exclusive, the check can be
safely elided, as both lease request and check trivially succeed.
Indeed, autoflowlabel does the same (even with exclusive leases).

Elide the check if no process has requested an exclusive lease.

This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.

This is decidedly an RFC patch:
- need to update all fl6_sock_lookup callers, not just udp
- behavior should be per-netns isolated

Other approaches considered:
- a single "get all flowlabels, non-exclusive" flowlabel get request
  if set, elide fl6_sock_lookup and fail exclusive lease requests

- sysctls (only useful if on by default, with static_branch)
  A) "non-exclusive mode", failing all exclusive lease requests:
 processes already have to be robust against lease failure
  B) just bypass check in fl6_sock_lookup, like autoflowlabel

Signed-off-by: Willem de Bruijn 
---
 include/net/ipv6.h   | 11 +++
 net/ipv6/ip6_flowlabel.c |  6 ++
 net/ipv6/udp.c   |  8 
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index daf80863d3a50..8881cee572410 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -343,7 +344,17 @@ static inline void txopt_put(struct ipv6_txoptions *opt)
kfree_rcu(opt, rcu);
 }
 
+extern struct static_key_false ipv6_flowlabel_exclusive;
 struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label);
+static inline struct ip6_flowlabel *fl6_sock_verify(struct sock *sk,
+   __be32 label)
+{
+   if (static_branch_unlikely(&ipv6_flowlabel_exclusive))
+   return fl6_sock_lookup(sk, label) ? : ERR_PTR(-ENOENT);
+
+   return NULL;
+}
+
 struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space,
 struct ip6_flowlabel *fl,
 struct ipv6_txoptions *fopt);
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index be5f3d7ceb966..d5f4233b04e0c 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -57,6 +57,8 @@ static DEFINE_SPINLOCK(ip6_fl_lock);
 
 static DEFINE_SPINLOCK(ip6_sk_fl_lock);
 
+DEFINE_STATIC_KEY_FALSE(ipv6_flowlabel_exclusive);
+
 #define for_each_fl_rcu(hash, fl)  \
for (fl = rcu_dereference_bh(fl_ht[(hash)]);\
 fl != NULL;\
@@ -98,6 +100,8 @@ static void fl_free_rcu(struct rcu_head *head)
 {
struct ip6_flowlabel *fl = container_of(head, struct ip6_flowlabel, 
rcu);
 
+   if (fl->share != IPV6_FL_S_NONE && fl->share != IPV6_FL_S_ANY)
+   static_branch_dec(&ipv6_flowlabel_exclusive);
if (fl->share == IPV6_FL_S_PROCESS)
put_pid(fl->owner.pid);
kfree(fl->opt);
@@ -423,6 +427,8 @@ fl_create(struct net *net, struct sock *sk, struct 
in6_flowlabel_req *freq,
}
fl->dst = freq->flr_dst;
atomic_set(&fl->users, 1);
+   if (fl->share != IPV6_FL_S_ANY)
+   static_branch_inc(&ipv6_flowlabel_exclusive);
switch (fl->share) {
case IPV6_FL_S_EXCL:
case IPV6_FL_S_ANY:
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 07fa579dfb96c..859a1cbd54906 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1331,8 +1331,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
if (np->sndflow) {
fl6.flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK;
if (fl6.flowlabel&IPV6_FLOWLABEL_MASK) {
-   flowlabel = fl6_sock_lookup(sk, fl6.flowlabel);
-   if (!flowlabel)
+   flowlabel = fl6_sock_verify(sk, fl6.flowlabel);
+   if (IS_ERR(flowlabel))
return -EINVAL;
}
}
@@ -1383,8 +1383,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
return err;
}
if ((fl6.flowlabel&IPV6_FLOWLABEL_MASK) && !flowlabel) {
-   flowlabel = fl6_sock_lookup(sk, fl6.flowlabel);
-   if (!flowlabel)
+   flowlabel = fl6_sock_verify(sk, fl6.flowlabel);
+

Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation

2019-05-17 Thread Daniel Walker
On Fri, May 17, 2019 at 08:16:34AM -0700, Alexander Duyck wrote:
> On Thu, May 16, 2019 at 6:48 PM Florian Fainelli  wrote:
> >
> >
> >
> > On 5/16/2019 6:03 PM, Daniel Walker wrote:
> > > On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
> > >> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
> > >>>
> > >>>
> > >>> On 5/16/19, 12:35 PM, "Jeff Kirsher"  
> > >>> wrote:
> > >>>
> > >>> On Wed, 2019-05-08 at 23:14 +, Nikunj Kela wrote:
> > >>>>> Some of the broken NICs don't have EEPROM programmed correctly. It
> > >>>>> results
> > >>>>> in probe to fail. This change adds a module parameter that can be
> > >>>>> used to
> > >>>>> ignore nvm checksum validation.
> > >>>>>
> > >>>>> Cc: xe-linux-exter...@cisco.com
> > >>>>> Signed-off-by: Nikunj Kela 
> > >>>>> ---
> > >>>>>  drivers/net/ethernet/intel/igb/igb_main.c | 28
> > >>>>> ++--
> > >>>>>  1 file changed, 22 insertions(+), 6 deletions(-)
> > >>>
> > >>> >NAK for two reasons.  First, module parameters are not desirable
> > >>> >because their individual to one driver and a global solution 
> > >>> should be
> > >>> >found so that all networking device drivers can use the solution.  
> > >>> This
> > >>> >will keep the interface to change/setup/modify networking drivers
> > >>> >consistent for all drivers.
> > >>>
> > >>>
> > >>> >Second and more importantly, if your NIC is broken, fix it.  Do 
> > >>> not try
> > >>> >and create a software workaround so that you can continue to use a
> > >>> >broken NIC.  There are methods/tools available to properly 
> > >>> reprogram
> > >>> >the EEPROM on a NIC, which is the right solution for your issue.
> > >>>
> > >>> I am proposing this as a debug parameter. Obviously, we need to fix 
> > >>> EEPROM but this helps us continuing the development while manufacturing 
> > >>> fixes NIC.
> > >>
> > >> Then why even bother with sending this upstream?
> > >
> > > It seems rather drastic to disable the entire driver because the checksum
> > > doesn't match. It really should be a warning, even a big warning, to let 
> > > people
> > > know something is wrong, but disabling the whole driver doesn't make 
> > > sense.
> >
> > You could generate a random Ethernet MAC address if you don't have a
> > valid one, a lot of drivers do that, and that's a fairly reasonable
> > behavior. At some point in your product development someone will
> > certainly verify that the provisioned MAC address matches the network
> > interface's MAC address.
> > --
> > Florian
> 
> The thing is the EEPROM contains much more than just the MAC address.
> There ends up being configuration for some of the PCIe interface in
> the hardware as well as PHY configuration. If that is somehow mangled
> we shouldn't be bringing up the part because there are one or more
> pieces of the device configuration that are likely wrong.
> 
> The checksum is being used to make sure the EEPROM is valid, without
> that we would need to go through and validate each individual section
> of the EEPROM before enabling the the portions of the device related
> to it. The concern is that this will become a slippery slope where we
> eventually have to code all the configuration of the EEPROM into the
> driver itself.
 

I don't think you can say because the checksum is valid that all data contained
inside is also valid. You can have a valid checksum , and someone screwed up the
data prior to the checksum getting computed.


> We need to make the checksum a hard stop. If the part is broken then
> it needs to be addressed. Workarounds just end up being used and
> forgotten, which makes it that much harder to support the product.
> Better to mark the part as being broken, and get it fixed now, than to
> have parts start shipping that require workarounds in order to
> function.o

I don't think it's realistic to define the development process for large
corporations like Cisco, or like what your doing , to define the development
process for all corporations and products which may use intel parts. It's better
to be flexible.

Daniel


Re: [Intel-wired-lan] i40e X722 RSS problem with NAT-Traversal IPsec packets

2019-05-17 Thread Alexander Duyck
On Thu, May 16, 2019 at 4:32 PM Alexander Duyck
 wrote:
>
> On Thu, May 16, 2019 at 11:37 AM Lennart Sorensen
>  wrote:
> >
> > On Thu, May 16, 2019 at 02:34:08PM -0400, Lennart Sorensen wrote:
> > > Here is what I see:
> > >
> > > i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.7-k
> > > i40e: Copyright (c) 2013 - 2014 Intel Corporation.
> > > i40e :3d:00.0: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
> > > i40e :3d:00.0: The driver for the device detected a newer version of 
> > > the NVM image than expected. Please install the most recent version of 
> > > the network driver.
> > > i40e :3d:00.0: MAC address: a4:bf:01:4e:0c:87
> > > i40e :3d:00.0: flow_type: 63 input_mask:0x4000
> > > i40e :3d:00.0: flow_type: 46 input_mask:0x0007fff8
> > > i40e :3d:00.0: flow_type: 45 input_mask:0x0007fff8
> > > i40e :3d:00.0: flow_type: 44 input_mask:0x00078000
> > > i40e :3d:00.0: flow_type: 43 input_mask:0x0007fffe
> > > i40e :3d:00.0: flow_type: 42 input_mask:0x0007fffe
> > > i40e :3d:00.0: flow_type: 41 input_mask:0x0007fffe
> > > i40e :3d:00.0: flow_type: 40 input_mask:0x0007fffe
> > > i40e :3d:00.0: flow_type: 39 input_mask:0x0007fffe
> > > i40e :3d:00.0: flow_type: 36 input_mask:0x00060600
> > > i40e :3d:00.0: flow_type: 35 input_mask:0x00060600
> > > i40e :3d:00.0: flow_type: 34 input_mask:0x000606078000
> > > i40e :3d:00.0: flow_type: 33 input_mask:0x00060606
> > > i40e :3d:00.0: flow_type: 32 input_mask:0x00060606
> > > i40e :3d:00.0: flow_type: 31 input_mask:0x00060606
> > > i40e :3d:00.0: flow_type: 30 input_mask:0x00060606
> > > i40e :3d:00.0: flow_type: 29 input_mask:0x00060606
> > > i40e :3d:00.0: Features: PF-id[0] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN 
> > > Geneve VEPA
> > > i40e :3d:00.1: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
> > > i40e :3d:00.1: The driver for the device detected a newer version of 
> > > the NVM image than expected. Please install the most recent version of 
> > > the network driver.
> > > i40e :3d:00.1: MAC address: a4:bf:01:4e:0c:88
> > > i40e :3d:00.1: flow_type: 63 input_mask:0x4000
> > > i40e :3d:00.1: flow_type: 46 input_mask:0x0007fff8
> > > i40e :3d:00.1: flow_type: 45 input_mask:0x0007fff8
> > > i40e :3d:00.1: flow_type: 44 input_mask:0x00078000
> > > i40e :3d:00.1: flow_type: 43 input_mask:0x0007fffe
> > > i40e :3d:00.1: flow_type: 42 input_mask:0x0007fffe
> > > i40e :3d:00.1: flow_type: 41 input_mask:0x0007fffe
> > > i40e :3d:00.1: flow_type: 40 input_mask:0x0007fffe
> > > i40e :3d:00.1: flow_type: 39 input_mask:0x0007fffe
> > > i40e :3d:00.1: flow_type: 36 input_mask:0x00060600
> > > i40e :3d:00.1: flow_type: 35 input_mask:0x00060600
> > > i40e :3d:00.1: flow_type: 34 input_mask:0x000606078000
> > > i40e :3d:00.1: flow_type: 33 input_mask:0x00060606
> > > i40e :3d:00.1: flow_type: 32 input_mask:0x00060606
> > > i40e :3d:00.1: flow_type: 31 input_mask:0x00060606
> > > i40e :3d:00.1: flow_type: 30 input_mask:0x00060606
> > > i40e :3d:00.1: flow_type: 29 input_mask:0x00060606
> > > i40e :3d:00.1: Features: PF-id[1] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN 
> > > Geneve VEPA
> > > i40e :3d:00.1 eth2: NIC Link is Up, 1000 Mbps Full Duplex, Flow 
> > > Control: None
> > > i40e_ioctl: power down: eth1
> > > i40e_ioctl: power down: eth2
> >
> > Those last two lines is something I added, so ignore those.
>
> No problem.
>
> So just looking at the data provided I am going to guess that IPv6 w/
> UDP likely works without any issues and it is just going to be IPv4
> that is the problem. When you compare the UDP setup from mine versus
> yours it looks like for some reason somebody swapped around the input
> bits for the L3 src and destination fields. I'm basing that on the
> input set masks in the i40e_txrx.h header:
> /* INPUT SET MASK for RSS, flow director, and flexible payload */
> #define I40E_L3_SRC_SHIFT   47
> #define I40E_L3_SRC_MASK(0x3ULL << I40E_L3_SRC_SHIFT)
> #define I40E_L3_V6_SRC_SHIFT43
> #define I40E_L3_V6_SRC_MASK (0xFFULL << I40E_L3_V6_SRC_SHIFT)
> #define I40E_L3_DST_SHIFT   35
> #define I40E_L3_DST_MASK(0x3ULL << I40E_L3_DST_SHIFT)
> #define I40E_L3_V6_DST_SHIFT35
> #define I40E_L3_V6_DST_MASK (0xFFULL << I40E_L3_V6_DST_SHIFT)
> #define I40E_L4_SRC_SHIFT   34
> #define I40E_L4_SRC_MASK(0x1ULL << I40E_L4_SRC_SHIFT)
> #define I40E_L4_DST_SHIFT   33
> #define I40E_L4_DST_MASK(0x1ULL << I40E_L4_DST_SHIFT)
> #define I40E_VERIFY_TAG_SHIFT 

Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation

2019-05-17 Thread Florian Fainelli
On 5/17/19 9:36 AM, Daniel Walker wrote:
> On Fri, May 17, 2019 at 08:16:34AM -0700, Alexander Duyck wrote:
>> On Thu, May 16, 2019 at 6:48 PM Florian Fainelli  
>> wrote:
>>>
>>>
>>>
>>> On 5/16/2019 6:03 PM, Daniel Walker wrote:
 On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
>>
>>
>> On 5/16/19, 12:35 PM, "Jeff Kirsher"  wrote:
>>
>> On Wed, 2019-05-08 at 23:14 +, Nikunj Kela wrote:
>>>> Some of the broken NICs don't have EEPROM programmed correctly. It
>>>> results
>>>> in probe to fail. This change adds a module parameter that can be
>>>> used to
>>>> ignore nvm checksum validation.
>>>>
>>>> Cc: xe-linux-exter...@cisco.com
>>>> Signed-off-by: Nikunj Kela 
>>>> ---
>>>>  drivers/net/ethernet/intel/igb/igb_main.c | 28
>>>> ++--
>>>>  1 file changed, 22 insertions(+), 6 deletions(-)
>>
>> >NAK for two reasons.  First, module parameters are not desirable
>> >because their individual to one driver and a global solution should 
>> be
>> >found so that all networking device drivers can use the solution.  
>> This
>> >will keep the interface to change/setup/modify networking drivers
>> >consistent for all drivers.
>>
>>
>> >Second and more importantly, if your NIC is broken, fix it.  Do not 
>> try
>> >and create a software workaround so that you can continue to use a
>> >broken NIC.  There are methods/tools available to properly reprogram
>> >the EEPROM on a NIC, which is the right solution for your issue.
>>
>> I am proposing this as a debug parameter. Obviously, we need to fix 
>> EEPROM but this helps us continuing the development while manufacturing 
>> fixes NIC.
>
> Then why even bother with sending this upstream?

 It seems rather drastic to disable the entire driver because the checksum
 doesn't match. It really should be a warning, even a big warning, to let 
 people
 know something is wrong, but disabling the whole driver doesn't make sense.
>>>
>>> You could generate a random Ethernet MAC address if you don't have a
>>> valid one, a lot of drivers do that, and that's a fairly reasonable
>>> behavior. At some point in your product development someone will
>>> certainly verify that the provisioned MAC address matches the network
>>> interface's MAC address.
>>> --
>>> Florian
>>
>> The thing is the EEPROM contains much more than just the MAC address.
>> There ends up being configuration for some of the PCIe interface in
>> the hardware as well as PHY configuration. If that is somehow mangled
>> we shouldn't be bringing up the part because there are one or more
>> pieces of the device configuration that are likely wrong.
>>
>> The checksum is being used to make sure the EEPROM is valid, without
>> that we would need to go through and validate each individual section
>> of the EEPROM before enabling the the portions of the device related
>> to it. The concern is that this will become a slippery slope where we
>> eventually have to code all the configuration of the EEPROM into the
>> driver itself.
>  
> 
> I don't think you can say because the checksum is valid that all data 
> contained
> inside is also valid. You can have a valid checksum , and someone screwed up 
> the
> data prior to the checksum getting computed.
> 
> 
>> We need to make the checksum a hard stop. If the part is broken then
>> it needs to be addressed. Workarounds just end up being used and
>> forgotten, which makes it that much harder to support the product.
>> Better to mark the part as being broken, and get it fixed now, than to
>> have parts start shipping that require workarounds in order to
>> function.o
> 
> I don't think it's realistic to define the development process for large
> corporations like Cisco, or like what your doing , to define the development
> process for all corporations and products which may use intel parts. It's 
> better
> to be flexible.

Nikunj indicated that "manufacturing fixes NIC" so that sounds like a
workaround for an issue that would not affect a final product, in which
case, keeping downstream changes for development boards/intermediate
revisions of a product and focusing on relevant upstreaming changes for
the actual product would make a lot more sense, no?
-- 
Florian


Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation

2019-05-17 Thread Alexander Duyck
On Fri, May 17, 2019 at 9:36 AM Daniel Walker  wrote:
>
> On Fri, May 17, 2019 at 08:16:34AM -0700, Alexander Duyck wrote:
> > On Thu, May 16, 2019 at 6:48 PM Florian Fainelli  
> > wrote:
> > >
> > >
> > >
> > > On 5/16/2019 6:03 PM, Daniel Walker wrote:
> > > > On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
> > > >> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
> > > >>>
> > > >>>
> > > >>> On 5/16/19, 12:35 PM, "Jeff Kirsher"  
> > > >>> wrote:
> > > >>>
> > > >>> On Wed, 2019-05-08 at 23:14 +, Nikunj Kela wrote:
> > > >>>>> Some of the broken NICs don't have EEPROM programmed correctly. 
> > > >>> It
> > > >>>>> results
> > > >>>>> in probe to fail. This change adds a module parameter that can 
> > > >>> be
> > > >>>>> used to
> > > >>>>> ignore nvm checksum validation.
> > > >>>>>
> > > >>>>> Cc: xe-linux-exter...@cisco.com
> > > >>>>> Signed-off-by: Nikunj Kela 
> > > >>>>> ---
> > > >>>>>  drivers/net/ethernet/intel/igb/igb_main.c | 28
> > > >>>>> ++--
> > > >>>>>  1 file changed, 22 insertions(+), 6 deletions(-)
> > > >>>
> > > >>> >NAK for two reasons.  First, module parameters are not desirable
> > > >>> >because their individual to one driver and a global solution 
> > > >>> should be
> > > >>> >found so that all networking device drivers can use the 
> > > >>> solution.  This
> > > >>> >will keep the interface to change/setup/modify networking drivers
> > > >>> >consistent for all drivers.
> > > >>>
> > > >>>
> > > >>> >Second and more importantly, if your NIC is broken, fix it.  Do 
> > > >>> not try
> > > >>> >and create a software workaround so that you can continue to use 
> > > >>> a
> > > >>> >broken NIC.  There are methods/tools available to properly 
> > > >>> reprogram
> > > >>> >the EEPROM on a NIC, which is the right solution for your issue.
> > > >>>
> > > >>> I am proposing this as a debug parameter. Obviously, we need to fix 
> > > >>> EEPROM but this helps us continuing the development while 
> > > >>> manufacturing fixes NIC.
> > > >>
> > > >> Then why even bother with sending this upstream?
> > > >
> > > > It seems rather drastic to disable the entire driver because the 
> > > > checksum
> > > > doesn't match. It really should be a warning, even a big warning, to 
> > > > let people
> > > > know something is wrong, but disabling the whole driver doesn't make 
> > > > sense.
> > >
> > > You could generate a random Ethernet MAC address if you don't have a
> > > valid one, a lot of drivers do that, and that's a fairly reasonable
> > > behavior. At some point in your product development someone will
> > > certainly verify that the provisioned MAC address matches the network
> > > interface's MAC address.
> > > --
> > > Florian
> >
> > The thing is the EEPROM contains much more than just the MAC address.
> > There ends up being configuration for some of the PCIe interface in
> > the hardware as well as PHY configuration. If that is somehow mangled
> > we shouldn't be bringing up the part because there are one or more
> > pieces of the device configuration that are likely wrong.
> >
> > The checksum is being used to make sure the EEPROM is valid, without
> > that we would need to go through and validate each individual section
> > of the EEPROM before enabling the the portions of the device related
> > to it. The concern is that this will become a slippery slope where we
> > eventually have to code all the configuration of the EEPROM into the
> > driver itself.
>
>
> I don't think you can say because the checksum is valid that all data 
> contained
> inside is also valid. You can have a valid checksum , and someone screwed up 
> the
> data prior to the checksum getting computed.

If someone screwed up the data prior to writing the checksum then that
is on them. In theory we could also have a multi-bit error that could
similarly be missed. However if the checksum is not valid then the
data contained in the NVM does not match what was originally written,
so we know we have bad data. Why should we act on the data if we know
it is bad?

> > We need to make the checksum a hard stop. If the part is broken then
> > it needs to be addressed. Workarounds just end up being used and
> > forgotten, which makes it that much harder to support the product.
> > Better to mark the part as being broken, and get it fixed now, than to
> > have parts start shipping that require workarounds in order to
> > function.o
>
> I don't think it's realistic to define the development process for large
> corporations like Cisco, or like what your doing , to define the development
> process for all corporations and products which may use intel parts. It's 
> better
> to be flexible.
>
> Daniel

This isn't about development. If you are doing development you can do
whatever you want with your own downstream driver. What you are
attempting to do is update the upstream driver which is 

Re: [RFC PATCH v2 net-next 0/3] flow_offload: Re-add per-action statistics

2019-05-17 Thread Edward Cree
On 17/05/2019 16:27, Edward Cree wrote:
> I'm now leaning towards the
>  approach of adding "unsigned long cookie" to struct flow_action_entry
>  and populating it with (unsigned long)act in tc_setup_flow_action().

For concreteness, here's what that looks like: patch 1 is replaced with
 the following, the other two are unchanged.
Drivers now have an easier job, as they can just use the cookie directly
 as a hashtable key, rather than worrying about which action types share
 indices.

--8<--

[RFC PATCH v2.5 net-next 1/3] flow_offload: add a cookie to flow_action_entry

Populated with the address of the struct tc_action from which it was made.
Required for support of shared counters (and possibly other shared per-
 action entities in future).

Signed-off-by: Edward Cree 
---
 include/net/flow_offload.h | 1 +
 net/sched/cls_api.c| 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 6200900434e1..fb3278a2bd41 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -137,6 +137,7 @@ enum flow_action_mangle_base {
 
 struct flow_action_entry {
enum flow_action_id id;
+   unsigned long   cookie;
union {
u32 chain_index;/* FLOW_ACTION_GOTO */
struct net_device   *dev;   /* FLOW_ACTION_REDIRECT 
*/
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index d4699156974a..5411cec17af5 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3195,6 +3195,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
struct flow_action_entry *entry;
 
entry = &flow_action->entries[j];
+   entry->cookie = (unsigned long)act;
if (is_tcf_gact_ok(act)) {
entry->id = FLOW_ACTION_ACCEPT;
} else if (is_tcf_gact_shot(act)) {


Re: [Intel-wired-lan] i40e X722 RSS problem with NAT-Traversal IPsec packets

2019-05-17 Thread Lennart Sorensen
On Fri, May 17, 2019 at 09:42:19AM -0700, Alexander Duyck wrote:
> So the patch below/attached should resolve the issues you are seeing
> with your system in terms of UDPv4 RSS. What you should see with this
> patch is the first function to come up will display some "update input
> mask" messages, and then the remaining functions shouldn't make any
> noise about it since the registers being updated are global to the
> device.
> 
> If you can test this and see if it resolves the UDPv4 RSS issues I
> would appreciate it.
> 
> Thanks.
> 
> - Alex
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 65c2b9d2652b..c0a7f66babd9 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -10998,6 +10998,58 @@ static int i40e_pf_config_rss(struct i40e_pf *pf)
> ((u64)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(1)) << 32);
> hena |= i40e_pf_get_default_rss_hena(pf);
> 
> +   for (ret = 64; ret--;) {
> +   u64 hash_inset_orig, hash_inset_update;
> +
> +   if (!(hena & (1ull << ret)))
> +   continue;
> +
> +   /* Read initial input set value for flow type */
> +   hash_inset_orig = i40e_read_rx_ctl(hw,
> I40E_GLQF_HASH_INSET(1, ret));
> +   hash_inset_orig <<= 32;
> +   hash_inset_orig |= i40e_read_rx_ctl(hw,
> I40E_GLQF_HASH_INSET(0, ret));
> +
> +   /* Copy value so we can compare later */
> +   hash_inset_update = hash_inset_orig;
> +
> +   /* We should be looking at either the entire IPv6 or IPv4
> +* mask being set. If only part of the IPv6 mask is set, but
> +* the IPv4 mask is not then we have a garbage mask value
> +* and need to reset it.
> +*/
> +   switch (hash_inset_orig & I40E_L3_V6_SRC_MASK) {
> +   case I40E_L3_V6_SRC_MASK:
> +   case I40E_L3_SRC_MASK:
> +   case 0:
> +   break;
> +   default:
> +   hash_inset_update &= ~I40E_L3_V6_SRC_MASK;
> +   hash_inset_update |= I40E_L3_SRC_MASK;
> +   }
> +
> +   switch (hash_inset_orig & I40E_L3_V6_DST_MASK) {
> +   case I40E_L3_V6_DST_MASK:
> +   case I40E_L3_DST_MASK:
> +   case 0:
> +   break;
> +   default:
> +   hash_inset_update &= ~I40E_L3_V6_DST_MASK;
> +   hash_inset_update |= I40E_L3_DST_MASK;
> +   }
> +
> +   if (hash_inset_update != hash_inset_orig) {
> +   dev_warn(&pf->pdev->dev,
> +"flow type: %d update input mask
> from:0x%016llx, to:0x%016llx\n",
> +ret,
> +hash_inset_orig, hash_inset_update);
> +   i40e_write_rx_ctl(hw, I40E_GLQF_HASH_INSET(0, ret),
> + (u32)hash_inset_update);
> +   hash_inset_update >>= 32;
> +   i40e_write_rx_ctl(hw, I40E_GLQF_HASH_INSET(1, ret),
> + (u32)hash_inset_update);
> +   }
> +   }
> +
> i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), (u32)hena);
> i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), (u32)(hena >> 32));

> i40e: Debug hash inputs
> 
> From: Alexander Duyck 
> 
> 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c |   52 
> +++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 65c2b9d2652b..c0a7f66babd9 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -10998,6 +10998,58 @@ static int i40e_pf_config_rss(struct i40e_pf *pf)
>   ((u64)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(1)) << 32);
>   hena |= i40e_pf_get_default_rss_hena(pf);
>  
> + for (ret = 64; ret--;) {
> + u64 hash_inset_orig, hash_inset_update;
> +
> + if (!(hena & (1ull << ret)))
> + continue;
> +
> + /* Read initial input set value for flow type */
> + hash_inset_orig = i40e_read_rx_ctl(hw, I40E_GLQF_HASH_INSET(1, 
> ret));
> + hash_inset_orig <<= 32;
> + hash_inset_orig |= i40e_read_rx_ctl(hw, I40E_GLQF_HASH_INSET(0, 
> ret));
> +
> + /* Copy value so we can compare later */
> + hash_inset_update = hash_inset_orig;
> +
> + /* We should be looking at either the entire IPv6 or IPv4
> +  * mask being set. If only part of the IPv6 mask is set, but
> +  * the IPv4 mask is not then we have a garbage mask value
> + 

Re: [iproute2 2/3] tc: jsonify tbf qdisc parameters

2019-05-17 Thread David Ahern
On 5/6/19 10:18 AM, Nir Weiner wrote:

>   if (prate64) {
> - fprintf(f, "peakrate %s ", sprint_rate(prate64, b1));
> + print_string(PRINT_ANY, "peakrate", "peakrate %s ", 
> sprint_rate(prate64, b1));
>   if (qopt->mtu || qopt->peakrate.mpu) {
>   mtu = tc_calc_xmitsize(prate64, qopt->mtu);
>   if (show_details) {
>   fprintf(f, "mtu %s/%u mpu %s ", 
> sprint_size(mtu, b1),
>   1 sprint_size(qopt->peakrate.mpu, b2));


The fprintf under show_details should be converted as well. This applies
to patch 1 as well.

And, please add example output to each patch.


Re: 5.1 `ip route get addr/cidr` regression

2019-05-17 Thread Stephen Hemminger
On Fri, 17 May 2019 09:17:51 -0600
David Ahern  wrote:

> On 5/17/19 4:22 AM, Jason A. Donenfeld wrote:
> > Hi,
> > 
> > I'm back now and catching up with a lot of things. A few people have
> > mentioned to me that wg-quick(8), a bash script that makes a bunch of
> > iproute2 invocations, appears to be broken on 5.1. I've distilled the
> > behavior change down to the following.
> > 
> > Behavior on 5.0:
> > 
> > + ip link add wg0 type dummy
> > + ip address add 192.168.50.2/24 dev wg0
> > + ip link set mtu 1420 up dev wg0
> > + ip route get 192.168.50.0/24
> > broadcast 192.168.50.0 dev wg0 src 192.168.50.2 uid 0
> >cache 
> > 
> > Behavior on 5.1:
> > 
> > + ip link add wg0 type dummy
> > + ip address add 192.168.50.2/24 dev wg0
> > + ip link set mtu 1420 up dev wg0
> > + ip route get 192.168.50.0/24
> > RTNETLINK answers: Invalid argument  
> 
> This is a 5.1 change.
> a00302b607770 ("net: ipv4: route: perform strict checks also for doit
> handlers")
> 
> Basically, the /24 is unexpected. I'll send a patch.
> 
> > 
> > Upon investigating, I'm not sure that `ip route get` was ever suitable
> > for getting details on a particular route. So I'll adjust the  
> 
> 'ip route get  fibmatch' will show the fib entry.
> 

If you want to keep the error, the kernel should send additional
extack as to reason. EINVAL is not user friendly...


RE: dsa: using multi-gbps speeds on CPU port

2019-05-17 Thread Ioana Ciornei
> Subject: Re: dsa: using multi-gbps speeds on CPU port
> 
> Hi everyone,
> 
> On Wed, 15 May 2019 09:09:26 -0700
> Florian Fainelli  wrote:
> 
> >On 5/15/19 7:02 AM, Maxime Chevallier wrote:
> >> Hi Andrew,
> >>
> >> On Wed, 15 May 2019 15:27:01 +0200
> >> Andrew Lunn  wrote:
> >>
> >>> I think you are getting your terminology wrong. 'master' is eth0 in
> >>> the example you gave above. CPU and DSA ports don't have netdev
> >>> structures, and so any PHY used with them is not corrected to a
> >>> netdev.
> >>
> >> Ah yes sorry, I'm still in the process of getting familiar with the
> >> internals of DSA :/
> >>
>  I'll be happy to help on that, but before prototyping anything, I wanted
>  to have your thougts on this, and see if you had any plans.
> >>>
> >>> There are two different issues here.
> >>>
> >>> 1) Is using a fixed-link on a CPU or DSA port the right way to do this?
> >>> 2) Making fixed-link support > 1G.
> >>>
> >>> The reason i decided to use fixed-link on CPU and DSA ports is that
> >>> we already have all the code needed to configure a port, and an API
> >>> to do it, the adjust_link() callback. Things have moved on since
> >>> then, and we now have an additional API, .phylink_mac_config(). It
> >>> might be better to directly use that. If there is a max-speed
> >>> property, create a phylink_link_state structure, which has no
> >>> reference to a netdev, and pass it to .phylink_mac_config().
> >>>
> >>> It is just an idea, but maybe you could investigate if that would
> >>> work.
> 
> I've quickly prototyped and tested this solution, and besides a few tweaks 
> that
> are needed on the mv88e6xxx driver side, it works fine.
> 
> I'll post an RFC with this shortly, so that you can see what it looks like.
> 
> As Russell said, there wasn't anything needed on the master interface side.
> 
> >
> >Vladimir mentioned a few weeks ago that he is considering adding
> >support for PHYLIB and PHYLINK to run without a net_device instance,
> >you two should probably coordinate with each other and make sure both
> >of your requirements (which are likely the same) get addressed.
> 
> That would help a lot solving this issue indeed, I'll be happy to help on 
> that,
> thanks for the tip !
> 
> Maxime
> 

Hi Maxime,

I am currently maintaining some drivers for Freescale/NXP DPAA2 Ethernet. This 
architecture has a management firmware that abstracts and simplifies the 
hardware configuration into a so called object model. DPAA2 is a little too 
modular and you have the concept of a network interface object (DPNI) which is 
completely self-contained and separate from the hardware port itself (DPMAC). 
You can connect DPNIs to DPMACs but also DPNIs to one another. The dpaa2-eth 
driver conceptually handles a DPNI object. Among other things, the management 
firmware presents the link state information to the DPNI object as abstract as 
possible (speed, duplex, up/down etc.). The firmware gathers this information 
from whomever the DPNI is connected to. Since the firmware can't reuse Linux 
PHY drivers due to incompatible licensing, we need another driver which acts as 
glue logic between the PHY drivers and the firmware. This is the out-of-tree 
dpmac driver that notifies the firmware of any external PHY events. At the end 
of the day, the dpaa2-eth driver gets notified of these external PHY events 
after the firmware itself is notified and raises an interrupt line. 

To start the PHY state machine for a port, the dpmac driver must fabricate a 
netdevice which it does not register with the stack. One would, of course, 
suggest to move the PHY management directly into the dpaa2-eth driver. But the 
firmware's ABI is already stable and besides, it is not desirable to grant MDIO 
access to users of the DPNI object.

Obviously, that fake netdevice has to go before the dpmac driver sees mainline. 
What you guys are proposing (the phylink/netdev decoupling) would also benefit 
our scenario. I talked to Vladimir and we'll make sure that whatever works for 
us is also benefiting the DSA cpu/cascade port. Hopefully we'll have some 
patches early next week.

-Ioana


Re: 5.1 `ip route get addr/cidr` regression

2019-05-17 Thread David Ahern
On 5/17/19 11:35 AM, Stephen Hemminger wrote:
> On Fri, 17 May 2019 09:17:51 -0600
> David Ahern  wrote:
> 
>> On 5/17/19 4:22 AM, Jason A. Donenfeld wrote:
>>> Hi,
>>>
>>> I'm back now and catching up with a lot of things. A few people have
>>> mentioned to me that wg-quick(8), a bash script that makes a bunch of
>>> iproute2 invocations, appears to be broken on 5.1. I've distilled the
>>> behavior change down to the following.
>>>
>>> Behavior on 5.0:
>>>
>>> + ip link add wg0 type dummy
>>> + ip address add 192.168.50.2/24 dev wg0
>>> + ip link set mtu 1420 up dev wg0
>>> + ip route get 192.168.50.0/24
>>> broadcast 192.168.50.0 dev wg0 src 192.168.50.2 uid 0
>>>cache 
>>>
>>> Behavior on 5.1:
>>>
>>> + ip link add wg0 type dummy
>>> + ip address add 192.168.50.2/24 dev wg0
>>> + ip link set mtu 1420 up dev wg0
>>> + ip route get 192.168.50.0/24
>>> RTNETLINK answers: Invalid argument  
>>
>> This is a 5.1 change.
>> a00302b607770 ("net: ipv4: route: perform strict checks also for doit
>> handlers")
>>
>> Basically, the /24 is unexpected. I'll send a patch.
>>
>>>
>>> Upon investigating, I'm not sure that `ip route get` was ever suitable
>>> for getting details on a particular route. So I'll adjust the  
>>
>> 'ip route get  fibmatch' will show the fib entry.
>>
> 
> If you want to keep the error, the kernel should send additional
> extack as to reason. EINVAL is not user friendly...
> 

The kernel does set an extack for all EINVAL in this code path.

Not sure why Jason is not seeing that. Really odd that he hits the error
AND does not get a message back since it requires an updated ip command
to set the strict checking flag and that command understands extack.
Perhaps no libmnl?


Re: 5.1 `ip route get addr/cidr` regression

2019-05-17 Thread David Ahern
On 5/17/19 11:35 AM, Stephen Hemminger wrote:
> On Fri, 17 May 2019 09:17:51 -0600
> David Ahern  wrote:
> 
>> On 5/17/19 4:22 AM, Jason A. Donenfeld wrote:
>>> Hi,
>>>
>>> I'm back now and catching up with a lot of things. A few people have
>>> mentioned to me that wg-quick(8), a bash script that makes a bunch of
>>> iproute2 invocations, appears to be broken on 5.1. I've distilled the
>>> behavior change down to the following.
>>>
>>> Behavior on 5.0:
>>>
>>> + ip link add wg0 type dummy
>>> + ip address add 192.168.50.2/24 dev wg0
>>> + ip link set mtu 1420 up dev wg0
>>> + ip route get 192.168.50.0/24
>>> broadcast 192.168.50.0 dev wg0 src 192.168.50.2 uid 0
>>>cache 
>>>
>>> Behavior on 5.1:
>>>
>>> + ip link add wg0 type dummy
>>> + ip address add 192.168.50.2/24 dev wg0
>>> + ip link set mtu 1420 up dev wg0
>>> + ip route get 192.168.50.0/24
>>> RTNETLINK answers: Invalid argument  
>>
>> This is a 5.1 change.
>> a00302b607770 ("net: ipv4: route: perform strict checks also for doit
>> handlers")
>>
>> Basically, the /24 is unexpected. I'll send a patch.

oh, and I think changing iproute2 to ignore the /24 and always set to 32
is better than changing the kernel to allow a prefix length that is ignored.



Re: [iproute2 2/3] tc: jsonify tbf qdisc parameters

2019-05-17 Thread Stephen Hemminger
On Fri, 17 May 2019 11:35:16 -0600
David Ahern  wrote:

> On 5/6/19 10:18 AM, Nir Weiner wrote:
> 
> > if (prate64) {
> > -   fprintf(f, "peakrate %s ", sprint_rate(prate64, b1));
> > +   print_string(PRINT_ANY, "peakrate", "peakrate %s ", 
> > sprint_rate(prate64, b1));
> > if (qopt->mtu || qopt->peakrate.mpu) {
> > mtu = tc_calc_xmitsize(prate64, qopt->mtu);
> > if (show_details) {
> > fprintf(f, "mtu %s/%u mpu %s ", 
> > sprint_size(mtu, b1),
> > 1 > sprint_size(qopt->peakrate.mpu, b2));  
> 
> 
> The fprintf under show_details should be converted as well. This applies
> to patch 1 as well.
> 
> And, please add example output to each patch.

One trick I used was scanning for all calls to fprintf(f and replacing them


[PATCH iproute2] ip route: Set rtm_dst_len to 32 for all ip route get requests

2019-05-17 Thread David Ahern
From: David Ahern 

Jason reported that ip route get with a prefix length is now
failing:
$ 192.168.50.0/24
RTNETLINK answers: Invalid argument

iproute2 now uses strict mode and strict mode in the kernel
requires rtm_dst_len to be 32. Non-strict mode ignores the
prefix length, so this allows ip to work without affecting
existing users who add a prefix length to the request.

Fixes: aea41afcfd6d6 ("ip bridge: Set NETLINK_GET_STRICT_CHK on socket")
Reported-by: Jason A. Donenfeld 
Signed-off-by: David Ahern 
---
 ip/iproute.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 2b3dcc5dbd53..d980b86ffd42 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -2035,7 +2035,11 @@ static int iproute_get(int argc, char **argv)
if (addr.bytelen)
addattr_l(&req.n, sizeof(req),
  RTA_DST, &addr.data, addr.bytelen);
-   req.r.rtm_dst_len = addr.bitlen;
+   /* kernel ignores prefix length on 'route get'
+* requests; to allow ip to work with strict mode
+* but not break existing users, just set to 32
+*/
+   req.r.rtm_dst_len = 32;
address_found = true;
}
argc--; argv++;
-- 
2.11.0



Re: dsa: using multi-gbps speeds on CPU port

2019-05-17 Thread Russell King - ARM Linux admin
On Fri, May 17, 2019 at 05:37:00PM +, Ioana Ciornei wrote:
> > Subject: Re: dsa: using multi-gbps speeds on CPU port
> > 
> > Hi everyone,
> > 
> > On Wed, 15 May 2019 09:09:26 -0700
> > Florian Fainelli  wrote:
> > 
> > >On 5/15/19 7:02 AM, Maxime Chevallier wrote:
> > >> Hi Andrew,
> > >>
> > >> On Wed, 15 May 2019 15:27:01 +0200
> > >> Andrew Lunn  wrote:
> > >>
> > >>> I think you are getting your terminology wrong. 'master' is eth0 in
> > >>> the example you gave above. CPU and DSA ports don't have netdev
> > >>> structures, and so any PHY used with them is not corrected to a
> > >>> netdev.
> > >>
> > >> Ah yes sorry, I'm still in the process of getting familiar with the
> > >> internals of DSA :/
> > >>
> >  I'll be happy to help on that, but before prototyping anything, I 
> >  wanted
> >  to have your thougts on this, and see if you had any plans.
> > >>>
> > >>> There are two different issues here.
> > >>>
> > >>> 1) Is using a fixed-link on a CPU or DSA port the right way to do this?
> > >>> 2) Making fixed-link support > 1G.
> > >>>
> > >>> The reason i decided to use fixed-link on CPU and DSA ports is that
> > >>> we already have all the code needed to configure a port, and an API
> > >>> to do it, the adjust_link() callback. Things have moved on since
> > >>> then, and we now have an additional API, .phylink_mac_config(). It
> > >>> might be better to directly use that. If there is a max-speed
> > >>> property, create a phylink_link_state structure, which has no
> > >>> reference to a netdev, and pass it to .phylink_mac_config().
> > >>>
> > >>> It is just an idea, but maybe you could investigate if that would
> > >>> work.
> > 
> > I've quickly prototyped and tested this solution, and besides a few tweaks 
> > that
> > are needed on the mv88e6xxx driver side, it works fine.
> > 
> > I'll post an RFC with this shortly, so that you can see what it looks like.
> > 
> > As Russell said, there wasn't anything needed on the master interface side.
> > 
> > >
> > >Vladimir mentioned a few weeks ago that he is considering adding
> > >support for PHYLIB and PHYLINK to run without a net_device instance,
> > >you two should probably coordinate with each other and make sure both
> > >of your requirements (which are likely the same) get addressed.
> > 
> > That would help a lot solving this issue indeed, I'll be happy to help on 
> > that,
> > thanks for the tip !
> > 
> > Maxime
> > 
> 
> Hi Maxime,
> 
> I am currently maintaining some drivers for Freescale/NXP DPAA2 Ethernet. 
> This architecture has a management firmware that abstracts and simplifies the 
> hardware configuration into a so called object model. DPAA2 is a little too 
> modular and you have the concept of a network interface object (DPNI) which 
> is completely self-contained and separate from the hardware port itself 
> (DPMAC). You can connect DPNIs to DPMACs but also DPNIs to one another. The 
> dpaa2-eth driver conceptually handles a DPNI object. Among other things, the 
> management firmware presents the link state information to the DPNI object as 
> abstract as possible (speed, duplex, up/down etc.). The firmware gathers this 
> information from whomever the DPNI is connected to. Since the firmware can't 
> reuse Linux PHY drivers due to incompatible licensing, we need another driver 
> which acts as glue logic between the PHY drivers and the firmware. This is 
> the out-of-tree dpmac driver that notifies the firmware of any external PHY 
> events. At the end of the day, the dpaa2-eth driver gets notified of these 
> external PHY events after the firmware itself is notified and raises an 
> interrupt line. 
> 
> To start the PHY state machine for a port, the dpmac driver must fabricate a 
> netdevice which it does not register with the stack. One would, of course, 
> suggest to move the PHY management directly into the dpaa2-eth driver. But 
> the firmware's ABI is already stable and besides, it is not desirable to 
> grant MDIO access to users of the DPNI object.
> 
> Obviously, that fake netdevice has to go before the dpmac driver sees 
> mainline. What you guys are proposing (the phylink/netdev decoupling) would 
> also benefit our scenario. I talked to Vladimir and we'll make sure that 
> whatever works for us is also benefiting the DSA cpu/cascade port. Hopefully 
> we'll have some patches early next week.

For SFP, I've already removed much of the netdev bits from that layer,
but I don't see any way to really get rid of it from phylink - we need
access to the netdev state there to know what the carrier state is for
the netdev (phylink tracks that state and manages the carrier state on
behalf of the MAC driver.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


[PATCH] Fix MACsec kernel panics, oopses and bugs

2019-05-17 Thread Andreas Steinmetz
MACsec causes oopses followed by a kernel panic when attached directly or 
indirectly to a bridge. It causes erroneous
checksum messages when attached to vxlan. When I did investigate I did find skb 
leaks, apparent skb mis-handling and
superfluous code. The attached patch fixes all MACsec misbehaviour I could 
find. As I am no kernel developer somebody
with sufficient kernel network knowledge should verify and correct the patch 
where necessary.

Signed-off-by: Andreas Steinmetz 

--- linux.orig/drivers/net/macsec.c 2019-05-17 11:00:13.631121950 +0200
+++ linux/drivers/net/macsec.c  2019-05-17 18:41:41.333119772 +0200
@@ -911,6 +911,9 @@ static void macsec_decrypt_done(struct c
macsec_extra_len(macsec_skb_cb(skb)->has_sci));
macsec_reset_skb(skb, macsec->secy.netdev);
 
+   /* FIXME: any better way to prevent calls to netdev_rx_csum_fault? */
+   skb->csum_complete_sw = 1;
+
len = skb->len;
if (gro_cells_receive(&macsec->gro_cells, skb) == NET_RX_SUCCESS)
count_rx(dev, len);
@@ -938,9 +941,6 @@ static struct sk_buff *macsec_decrypt(st
u16 icv_len = secy->icv_len;
 
macsec_skb_cb(skb)->valid = false;
-   skb = skb_share_check(skb, GFP_ATOMIC);
-   if (!skb)
-   return ERR_PTR(-ENOMEM);
 
ret = skb_cow_data(skb, 0, &trailer);
if (unlikely(ret < 0)) {
@@ -972,11 +972,6 @@ static struct sk_buff *macsec_decrypt(st
 
aead_request_set_crypt(req, sg, sg, len, iv);
aead_request_set_ad(req, 
macsec_hdr_len(macsec_skb_cb(skb)->has_sci));
-   skb = skb_unshare(skb, GFP_ATOMIC);
-   if (!skb) {
-   aead_request_free(req);
-   return ERR_PTR(-ENOMEM);
-   }
} else {
/* integrity only: all headers + data authenticated */
aead_request_set_crypt(req, sg, sg, icv_len, iv);
@@ -1102,20 +1097,12 @@ static rx_handler_result_t macsec_handle
return RX_HANDLER_PASS;
}
 
-   skb = skb_unshare(skb, GFP_ATOMIC);
-   if (!skb) {
-   *pskb = NULL;
-   return RX_HANDLER_CONSUMED;
-   }
-
pulled_sci = pskb_may_pull(skb, macsec_extra_len(true));
if (!pulled_sci) {
if (!pskb_may_pull(skb, macsec_extra_len(false)))
goto drop_direct;
}
 
-   hdr = macsec_ethhdr(skb);
-
/* Frames with a SecTAG that has the TCI E bit set but the C
 * bit clear are discarded, as this reserved encoding is used
 * to identify frames with a SecTAG that are not to be
@@ -1130,6 +1117,12 @@ static rx_handler_result_t macsec_handle
goto drop_direct;
}
 
+   skb = skb_unshare(skb, GFP_ATOMIC);
+   if (!skb)
+   return RX_HANDLER_CONSUMED;
+
+   hdr = macsec_ethhdr(skb);
+
/* ethernet header is part of crypto processing */
skb_push(skb, ETH_HLEN);
 
@@ -1213,22 +1206,22 @@ static rx_handler_result_t macsec_handle
 
/* Disabled && !changed text => skip validation */
if (hdr->tci_an & MACSEC_TCI_C ||
-   secy->validate_frames != MACSEC_VALIDATE_DISABLED)
+   secy->validate_frames != MACSEC_VALIDATE_DISABLED) {
skb = macsec_decrypt(skb, dev, rx_sa, sci, secy);
 
-   if (IS_ERR(skb)) {
-   /* the decrypt callback needs the reference */
-   if (PTR_ERR(skb) != -EINPROGRESS) {
-   macsec_rxsa_put(rx_sa);
-   macsec_rxsc_put(rx_sc);
+   if (IS_ERR(skb)) {
+   /* the decrypt callback needs the reference */
+   if (PTR_ERR(skb) != -EINPROGRESS) {
+   macsec_rxsa_put(rx_sa);
+   macsec_rxsc_put(rx_sc);
+   }
+   rcu_read_unlock();
+   return RX_HANDLER_CONSUMED;
}
-   rcu_read_unlock();
-   *pskb = NULL;
-   return RX_HANDLER_CONSUMED;
-   }
 
-   if (!macsec_post_decrypt(skb, secy, pn))
-   goto drop;
+   if (!macsec_post_decrypt(skb, secy, pn))
+   goto drop;
+   }
 
 deliver:
macsec_finalize_skb(skb, secy->icv_len,
@@ -1239,6 +1232,9 @@ deliver:
macsec_rxsa_put(rx_sa);
macsec_rxsc_put(rx_sc);
 
+   /* FIXME: any better way to prevent calls to netdev_rx_csum_fault? */
+   skb->csum_complete_sw = 1;
+
ret = gro_cells_receive(&macsec->gro_cells, skb);
if (ret == NET_RX_SUCCESS)
count_rx(dev, skb->len);
@@ -1247,7 +1243,6 @@ deliver:
 
rcu_read_unlock();
 
-   *pskb = NULL;
return RX_HANDLER_CONSUMED;
 
 drop:
@@ -1257,7 +1252,6 @@ drop_nosa:
rcu_read_unlock();
 drop_direct:
kfree_skb(skb);
- 

Re: [PATCH iproute2] ip route: Set rtm_dst_len to 32 for all ip route get requests

2019-05-17 Thread Stephen Hemminger
On Fri, 17 May 2019 10:59:13 -0700
David Ahern  wrote:

> From: David Ahern 
> 
> Jason reported that ip route get with a prefix length is now
> failing:
> $ 192.168.50.0/24
> RTNETLINK answers: Invalid argument
> 
> iproute2 now uses strict mode and strict mode in the kernel
> requires rtm_dst_len to be 32. Non-strict mode ignores the
> prefix length, so this allows ip to work without affecting
> existing users who add a prefix length to the request.
> 
> Fixes: aea41afcfd6d6 ("ip bridge: Set NETLINK_GET_STRICT_CHK on socket")
> Reported-by: Jason A. Donenfeld 
> Signed-off-by: David Ahern 
> ---
>  ip/iproute.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/ip/iproute.c b/ip/iproute.c
> index 2b3dcc5dbd53..d980b86ffd42 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
> @@ -2035,7 +2035,11 @@ static int iproute_get(int argc, char **argv)
>   if (addr.bytelen)
>   addattr_l(&req.n, sizeof(req),
> RTA_DST, &addr.data, addr.bytelen);
> - req.r.rtm_dst_len = addr.bitlen;
> + /* kernel ignores prefix length on 'route get'
> +  * requests; to allow ip to work with strict mode
> +  * but not break existing users, just set to 32
> +  */
> + req.r.rtm_dst_len = 32;
>   address_found = true;
>   }
>   argc--; argv++;

I don't like silently ignoring things. It was wrong before and it
is trapped now.

Probably better to error out in iproute2 if any prefix is given.


Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation

2019-05-17 Thread Daniel Walker
On Fri, May 17, 2019 at 09:58:46AM -0700, Alexander Duyck wrote:
> > I don't think you can say because the checksum is valid that all data 
> > contained
> > inside is also valid. You can have a valid checksum , and someone screwed 
> > up the
> > data prior to the checksum getting computed.
> 
> If someone screwed up the data prior to writing the checksum then that
> is on them. In theory we could also have a multi-bit error that could
> similarly be missed. However if the checksum is not valid then the
> data contained in the NVM does not match what was originally written,
> so we know we have bad data. Why should we act on the data if we know
> it is bad?
 
It's hypothetical , but it's likely someone has screwed up the data prior to the
checksum getting computed.

> > > We need to make the checksum a hard stop. If the part is broken then
> > > it needs to be addressed. Workarounds just end up being used and
> > > forgotten, which makes it that much harder to support the product.
> > > Better to mark the part as being broken, and get it fixed now, than to
> > > have parts start shipping that require workarounds in order to
> > > function.o
> >
> > I don't think it's realistic to define the development process for large
> > corporations like Cisco, or like what your doing , to define the development
> > process for all corporations and products which may use intel parts. It's 
> > better
> > to be flexible.
> >
> > Daniel
> 
> This isn't about development. If you are doing development you can do
> whatever you want with your own downstream driver. What you are
> attempting to do is update the upstream driver which is used in
> production environments.
 
Cisco has this issue in development, and in production. So your right, it's not
about development in isolation. People make mistakes..

> What concerns me is when this module parameter gets used in a
> development environment and then slips into being required for a
> production environment. At that point it defeats the whole point of
> the checksum in the first place.

I agree .. Ultimately it's the choice of the OEM, if it gets into production
then it's their product and they support the product. As I was saying in a prior
email it should be a priority of the driver to give flexibility for mistakes
people will inevitably make.

Daniel



Re: dsa: using multi-gbps speeds on CPU port

2019-05-17 Thread Florian Fainelli
On 5/17/19 11:03 AM, Russell King - ARM Linux admin wrote:
> On Fri, May 17, 2019 at 05:37:00PM +, Ioana Ciornei wrote:
>>> Subject: Re: dsa: using multi-gbps speeds on CPU port
>>>
>>> Hi everyone,
>>>
>>> On Wed, 15 May 2019 09:09:26 -0700
>>> Florian Fainelli  wrote:
>>>
 On 5/15/19 7:02 AM, Maxime Chevallier wrote:
> Hi Andrew,
>
> On Wed, 15 May 2019 15:27:01 +0200
> Andrew Lunn  wrote:
>
>> I think you are getting your terminology wrong. 'master' is eth0 in
>> the example you gave above. CPU and DSA ports don't have netdev
>> structures, and so any PHY used with them is not corrected to a
>> netdev.
>
> Ah yes sorry, I'm still in the process of getting familiar with the
> internals of DSA :/
>
>>> I'll be happy to help on that, but before prototyping anything, I wanted
>>> to have your thougts on this, and see if you had any plans.
>>
>> There are two different issues here.
>>
>> 1) Is using a fixed-link on a CPU or DSA port the right way to do this?
>> 2) Making fixed-link support > 1G.
>>
>> The reason i decided to use fixed-link on CPU and DSA ports is that
>> we already have all the code needed to configure a port, and an API
>> to do it, the adjust_link() callback. Things have moved on since
>> then, and we now have an additional API, .phylink_mac_config(). It
>> might be better to directly use that. If there is a max-speed
>> property, create a phylink_link_state structure, which has no
>> reference to a netdev, and pass it to .phylink_mac_config().
>>
>> It is just an idea, but maybe you could investigate if that would
>> work.
>>>
>>> I've quickly prototyped and tested this solution, and besides a few tweaks 
>>> that
>>> are needed on the mv88e6xxx driver side, it works fine.
>>>
>>> I'll post an RFC with this shortly, so that you can see what it looks like.
>>>
>>> As Russell said, there wasn't anything needed on the master interface side.
>>>

 Vladimir mentioned a few weeks ago that he is considering adding
 support for PHYLIB and PHYLINK to run without a net_device instance,
 you two should probably coordinate with each other and make sure both
 of your requirements (which are likely the same) get addressed.
>>>
>>> That would help a lot solving this issue indeed, I'll be happy to help on 
>>> that,
>>> thanks for the tip !
>>>
>>> Maxime
>>>
>>
>> Hi Maxime,
>>
>> I am currently maintaining some drivers for Freescale/NXP DPAA2 Ethernet. 
>> This architecture has a management firmware that abstracts and simplifies 
>> the hardware configuration into a so called object model. DPAA2 is a little 
>> too modular and you have the concept of a network interface object (DPNI) 
>> which is completely self-contained and separate from the hardware port 
>> itself (DPMAC). You can connect DPNIs to DPMACs but also DPNIs to one 
>> another. The dpaa2-eth driver conceptually handles a DPNI object. Among 
>> other things, the management firmware presents the link state information to 
>> the DPNI object as abstract as possible (speed, duplex, up/down etc.). The 
>> firmware gathers this information from whomever the DPNI is connected to. 
>> Since the firmware can't reuse Linux PHY drivers due to incompatible 
>> licensing, we need another driver which acts as glue logic between the PHY 
>> drivers and the firmware. This is the out-of-tree dpmac driver that notifies 
>> the firmware of any external PHY events. At the end of the day, the 
>> dpaa2-eth driver gets notified of these external PHY events after the 
>> firmware itself is notified and raises an interrupt line. 
>>
>> To start the PHY state machine for a port, the dpmac driver must fabricate a 
>> netdevice which it does not register with the stack. One would, of course, 
>> suggest to move the PHY management directly into the dpaa2-eth driver. But 
>> the firmware's ABI is already stable and besides, it is not desirable to 
>> grant MDIO access to users of the DPNI object.
>>
>> Obviously, that fake netdevice has to go before the dpmac driver sees 
>> mainline. What you guys are proposing (the phylink/netdev decoupling) would 
>> also benefit our scenario. I talked to Vladimir and we'll make sure that 
>> whatever works for us is also benefiting the DSA cpu/cascade port. Hopefully 
>> we'll have some patches early next week.
> 
> For SFP, I've already removed much of the netdev bits from that layer,
> but I don't see any way to really get rid of it from phylink - we need
> access to the netdev state there to know what the carrier state is for
> the netdev (phylink tracks that state and manages the carrier state on
> behalf of the MAC driver.)

We can make that a callback that is optional in case you want to use a
PHYLINK instance without a backing net_device. If you pass a valid
net_device pointer, then we default to netif_carrier_ok(), else the
caller of phylink_create() (which would have to 

Re: [RFC bpf-next 0/7] busy poll support for AF_XDP sockets

2019-05-17 Thread Jakub Kicinski
On Thu, 16 May 2019 14:37:51 +0200, Magnus Karlsson wrote:
>   Applications
> method  cores  irqstxpushrxdrop  l2fwd
> ---
> r-t-c 2 y   35.9  11.28.6
> poll  2 y   34.2   9.48.3
> r-t-c 1 y   18.1   N/A6.2
> poll  1 y   14.6   8.45.9
> busypoll  2 y   31.9  10.57.9
> busypoll  1 y   21.5   8.76.2
> busypoll  1 n   22.0  10.37.3

Thanks for the numbers!  One question that keeps coming to my mind 
is how do the cases compare on zero drop performance?

When I was experimenting with AF_XDP it seemed to be slightly more
prone to dropping packets than expected.  I wonder if you're seeing
a similar thing (well drops or back pressure to the traffic generator)?
Perhaps the single core busy poll would make a difference there?


Re: dsa: using multi-gbps speeds on CPU port

2019-05-17 Thread Russell King - ARM Linux admin
On Fri, May 17, 2019 at 11:10:10AM -0700, Florian Fainelli wrote:
> On 5/17/19 11:03 AM, Russell King - ARM Linux admin wrote:
> > On Fri, May 17, 2019 at 05:37:00PM +, Ioana Ciornei wrote:
> >>> Subject: Re: dsa: using multi-gbps speeds on CPU port
> >>>
> >>> Hi everyone,
> >>>
> >>> On Wed, 15 May 2019 09:09:26 -0700
> >>> Florian Fainelli  wrote:
> >>>
>  On 5/15/19 7:02 AM, Maxime Chevallier wrote:
> > Hi Andrew,
> >
> > On Wed, 15 May 2019 15:27:01 +0200
> > Andrew Lunn  wrote:
> >
> >> I think you are getting your terminology wrong. 'master' is eth0 in
> >> the example you gave above. CPU and DSA ports don't have netdev
> >> structures, and so any PHY used with them is not corrected to a
> >> netdev.
> >
> > Ah yes sorry, I'm still in the process of getting familiar with the
> > internals of DSA :/
> >
> >>> I'll be happy to help on that, but before prototyping anything, I 
> >>> wanted
> >>> to have your thougts on this, and see if you had any plans.
> >>
> >> There are two different issues here.
> >>
> >> 1) Is using a fixed-link on a CPU or DSA port the right way to do this?
> >> 2) Making fixed-link support > 1G.
> >>
> >> The reason i decided to use fixed-link on CPU and DSA ports is that
> >> we already have all the code needed to configure a port, and an API
> >> to do it, the adjust_link() callback. Things have moved on since
> >> then, and we now have an additional API, .phylink_mac_config(). It
> >> might be better to directly use that. If there is a max-speed
> >> property, create a phylink_link_state structure, which has no
> >> reference to a netdev, and pass it to .phylink_mac_config().
> >>
> >> It is just an idea, but maybe you could investigate if that would
> >> work.
> >>>
> >>> I've quickly prototyped and tested this solution, and besides a few 
> >>> tweaks that
> >>> are needed on the mv88e6xxx driver side, it works fine.
> >>>
> >>> I'll post an RFC with this shortly, so that you can see what it looks 
> >>> like.
> >>>
> >>> As Russell said, there wasn't anything needed on the master interface 
> >>> side.
> >>>
> 
>  Vladimir mentioned a few weeks ago that he is considering adding
>  support for PHYLIB and PHYLINK to run without a net_device instance,
>  you two should probably coordinate with each other and make sure both
>  of your requirements (which are likely the same) get addressed.
> >>>
> >>> That would help a lot solving this issue indeed, I'll be happy to help on 
> >>> that,
> >>> thanks for the tip !
> >>>
> >>> Maxime
> >>>
> >>
> >> Hi Maxime,
> >>
> >> I am currently maintaining some drivers for Freescale/NXP DPAA2 Ethernet. 
> >> This architecture has a management firmware that abstracts and simplifies 
> >> the hardware configuration into a so called object model. DPAA2 is a 
> >> little too modular and you have the concept of a network interface object 
> >> (DPNI) which is completely self-contained and separate from the hardware 
> >> port itself (DPMAC). You can connect DPNIs to DPMACs but also DPNIs to one 
> >> another. The dpaa2-eth driver conceptually handles a DPNI object. Among 
> >> other things, the management firmware presents the link state information 
> >> to the DPNI object as abstract as possible (speed, duplex, up/down etc.). 
> >> The firmware gathers this information from whomever the DPNI is connected 
> >> to. Since the firmware can't reuse Linux PHY drivers due to incompatible 
> >> licensing, we need another driver which acts as glue logic between the PHY 
> >> drivers and the firmware. This is the out-of-tree dpmac driver that 
> >> notifies the firmware of any external PHY events. At the end of the day, 
> >> the dpaa2-eth driver gets notified of these external PHY events after the 
> >> firmware itself is notified and raises an interrupt line. 
> >>
> >> To start the PHY state machine for a port, the dpmac driver must fabricate 
> >> a netdevice which it does not register with the stack. One would, of 
> >> course, suggest to move the PHY management directly into the dpaa2-eth 
> >> driver. But the firmware's ABI is already stable and besides, it is not 
> >> desirable to grant MDIO access to users of the DPNI object.
> >>
> >> Obviously, that fake netdevice has to go before the dpmac driver sees 
> >> mainline. What you guys are proposing (the phylink/netdev decoupling) 
> >> would also benefit our scenario. I talked to Vladimir and we'll make sure 
> >> that whatever works for us is also benefiting the DSA cpu/cascade port. 
> >> Hopefully we'll have some patches early next week.
> > 
> > For SFP, I've already removed much of the netdev bits from that layer,
> > but I don't see any way to really get rid of it from phylink - we need
> > access to the netdev state there to know what the carrier state is for
> > the netdev (phylink tracks that state and manages the carrier state 

Re: [PATCH] net: caif: fix the value of size argument of snprintf

2019-05-17 Thread David Miller
From: Weikang shi 
Date: Fri, 17 May 2019 15:59:22 +0800

> From: swkhack 
> 
> Because the function snprintf write at most size bytes(including the
> null byte).So the value of the argument size need not to minus one.
> 
> Signed-off-by: swkhack 

Applied.


KASAN: use-after-free Read in tls_push_sg

2019-05-17 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:35c99ffa Merge tag 'for_linus' of git://git.kernel.org/pub..
git tree:   net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10ff3322a0
kernel config:  https://syzkaller.appspot.com/x/.config?x=82f0809e8f0a8c87
dashboard link: https://syzkaller.appspot.com/bug?extid=66fbe4719f6ef22754ee
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+66fbe4719f6ef2275...@syzkaller.appspotmail.com

==
BUG: KASAN: use-after-free in tls_push_sg+0x5e2/0x680 net/tls/tls_main.c:139
Read of size 4 at addr 888066f4d584 by task syz-executor.1/28368

CPU: 0 PID: 28368 Comm: syz-executor.1 Not tainted 5.1.0+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 kasan_report+0x12/0x20 mm/kasan/common.c:614
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
 tls_push_sg+0x5e2/0x680 net/tls/tls_main.c:139
 tls_push_partial_record net/tls/tls_main.c:208 [inline]
 tls_complete_pending_work include/net/tls.h:382 [inline]
 tls_sk_proto_close+0x4a8/0x780 net/tls/tls_main.c:282
 inet_release+0x105/0x1f0 net/ipv4/af_inet.c:432
 inet6_release+0x53/0x80 net/ipv6/af_inet6.c:474
 __sock_release+0xd3/0x2b0 net/socket.c:607
 sock_close+0x1b/0x30 net/socket.c:1279
 __fput+0x302/0x890 fs/file_table.c:279
 fput+0x16/0x20 fs/file_table.c:312
 task_work_run+0x14a/0x1c0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:168
 prepare_exit_to_usermode arch/x86/entry/common.c:199 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
 do_syscall_64+0x594/0x680 arch/x86/entry/common.c:304
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x412b61
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 e4 1a 00 00 c3 48  
83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01

RSP: 002b:7fff8622b0f0 EFLAGS: 0293 ORIG_RAX: 0003
RAX:  RBX: 0005 RCX: 00412b61
RDX:  RSI:  RDI: 0004
RBP: 0001 R08: bb5eab81 R09: bb5eab85
R10: 7fff8622b1d0 R11: 0293 R12: 0073c900
R13: 0073c900 R14: 002190c3 R15: 0073bfac

Allocated by task 28369:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_kmalloc mm/kasan/common.c:489 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503
 __do_kmalloc mm/slab.c:3690 [inline]
 __kmalloc+0x15c/0x740 mm/slab.c:3699
 kmalloc include/linux/slab.h:552 [inline]
 kzalloc include/linux/slab.h:742 [inline]
 tls_get_rec+0x104/0x590 net/tls/tls_sw.c:322
 tls_sw_sendmsg+0xda5/0x17b0 net/tls/tls_sw.c:928
 inet_sendmsg+0x147/0x5e0 net/ipv4/af_inet.c:802
 sock_sendmsg_nosec net/socket.c:660 [inline]
 sock_sendmsg+0xf2/0x170 net/socket.c:671
 __sys_sendto+0x262/0x380 net/socket.c:1964
 __do_sys_sendto net/socket.c:1976 [inline]
 __se_sys_sendto net/socket.c:1972 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1972
 do_syscall_64+0x103/0x680 arch/x86/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4964:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
 __cache_free mm/slab.c:3462 [inline]
 kfree+0xcf/0x230 mm/slab.c:3785
 tls_tx_records+0x4c6/0x760 net/tls/tls_sw.c:408
 tx_work_handler+0xba/0xf0 net/tls/tls_sw.c:2150
 process_one_work+0x98e/0x1790 kernel/workqueue.c:2268
 worker_thread+0x98/0xe40 kernel/workqueue.c:2414
 kthread+0x357/0x430 kernel/kthread.c:253
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at 888066f4d280
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 772 bytes inside of
 2048-byte region [888066f4d280, 888066f4da80)
The buggy address belongs to the page:
page:ea00019bd300 count:1 mapcount:0 mapping:8880aa400c40 index:0x0  
compound_mapcount: 0

flags: 0x1fffc010200(slab|head)
raw: 01fffc010200 ea0002413588 ea0001aa2a08 8880aa400c40
raw:  888066f4c180 00010003 
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 888066f4d480

Re: [PATCH] lib: Correct comment of prandom_seed

2019-05-17 Thread David Miller
From: Philippe Mazenauer 
Date: Fri, 17 May 2019 10:44:44 +

> Variable 'entropy' was wrongly documented as 'seed', changed comment to
> reflect actual variable name.
> 
> ../lib/random32.c:179: warning: Function parameter or member 'entropy' not 
> described in 'prandom_seed'
> ../lib/random32.c:179: warning: Excess function parameter 'seed' description 
> in 'prandom_seed'
> 
> Signed-off-by: Philippe Mazenauer 

Applied.


ethtool 5.1 released

2019-05-17 Thread John W. Linville
ethtool version 5.1 has been released.

Home page: https://www.kernel.org/pub/software/network/ethtool/
Download link:
https://www.kernel.org/pub/software/network/ethtool/ethtool-5.1.tar.xz

Release notes:

* Feature: Add support for 200Gbps (50Gbps per lane) link mode
* Feature: simplify handling of PHY tunable downshift
* Feature: add support for PHY tunable Fast Link Down
* Feature: add PHY Fast Link Down tunable to man page
* Feature: Add a 'start N' option when specifying the Rx flow hash 
indirection table.
* Feature: Add bash-completion script
* Feature: add 1baseR_FEC link mode name
* Fix: qsfp: fix special value comparison
* Feature: move option parsing related code into function
* Feature: move cmdline_coalesce out of do_scoalesce
* Feature: introduce new ioctl for per-queue settings
* Feature: support per-queue sub command --show-coalesce
* Feature: support per-queue sub command --coalesce
* Fix: fix up dump_coalesce output to match actual option names
* Feature: fec: add pretty dump

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.



ARCNET Contemporary Controls PCI20EX PCIe Card

2019-05-17 Thread Kenton, Stephen M.
I've got some old optical (as in eyeglasses) equipment that only talks 
over ARCNET that I want to get up and running. The PEX PCIe-to-PCI 
bridge is on the card with the SMC COM20022 and lspci sees them

02:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8111 PCI
Express-to-PCI Bridge [10b5:8111] (rev 21)
03:04.0 Network controller [0280]: Contemporary Controls Device
[1571:a0e4] (rev aa)
      Subsystem: Contemporary Controls Device [1571:a0e4]

I just pulled the current kernel source and 1571:a0e4 does not seem to be 
supported by the driver.

Before I start trying to invent wheels, is/has anyone else looking in this area?

Thanks,

Steve Kenton



Re: ARCNET Contemporary Controls PCI20EX PCIe Card

2019-05-17 Thread Michael Grzeschik
On Fri, May 17, 2019 at 07:03:19PM +, Kenton, Stephen M. wrote:
> I've got some old optical (as in eyeglasses) equipment that only talks 
> over ARCNET that I want to get up and running. The PEX PCIe-to-PCI 
> bridge is on the card with the SMC COM20022 and lspci sees them
> 
> 02:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8111 PCI
> Express-to-PCI Bridge [10b5:8111] (rev 21)
> 03:04.0 Network controller [0280]: Contemporary Controls Device
> [1571:a0e4] (rev aa)
>       Subsystem: Contemporary Controls Device [1571:a0e4]
> 
> I just pulled the current kernel source and 1571:a0e4 does not seem to be 
> supported by the driver.
> 
> Before I start trying to invent wheels, is/has anyone else looking in this 
> area?

Hi,

you should probably add a new entry into com20020pci_id_table
with the mentioned id 1571:a0e4 in drivers/net/arcnet/com20020-pci.c
and try how far you will come.

Regards,
Michael

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |


signature.asc
Description: PGP signature


Re: 5.1 `ip route get addr/cidr` regression

2019-05-17 Thread Jason A. Donenfeld
On Fri, May 17, 2019 at 7:39 PM David Ahern  wrote:
> Not sure why Jason is not seeing that. Really odd that he hits the error
> AND does not get a message back since it requires an updated ip command
> to set the strict checking flag and that command understands extack.
> Perhaps no libmnl?

Right, no libmnl. This is coming out of the iproute2 compiled for the
tests at https://www.wireguard.com/build-status/ which are pretty
minimal. Extact support would be kind of useful for diagnostics, and
wg(8) already uses it, so I can probably put that in my build system.


[net 03/11] net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index

2019-05-17 Thread Saeed Mahameed
From: Parav Pandit 

To avoid any ambiguity between vport index and vport number,
rename functions that had vport, to vport_num or vport_index appropriately.

vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return
type to u16.

vport_index is an int in vport array. Hence change input type of vport
index in mlx5_eswitch_index_to_vport_num() to int.

Correct multiple eswitch representor interfaces use type u16 of
rep->vport as type int vport_index.

Send vport FW commands with correct eswitch u16 vport_num instead
host int vport_index.

Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for 
uplink rep and vport")
Signed-off-by: Parav Pandit 
Signed-off-by: Vu Pham 
Reviewed-by: Bodong Wang 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/ib_rep.c   | 13 ++-
 drivers/infiniband/hw/mlx5/ib_rep.h   | 12 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 20 -
 .../net/ethernet/mellanox/mlx5/core/eswitch.h | 22 +--
 .../mellanox/mlx5/core/eswitch_offloads.c | 11 +-
 include/linux/mlx5/eswitch.h  |  6 ++---
 6 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/ib_rep.c 
b/drivers/infiniband/hw/mlx5/ib_rep.c
index cbcc40d776b9..269b24a3baa1 100644
--- a/drivers/infiniband/hw/mlx5/ib_rep.c
+++ b/drivers/infiniband/hw/mlx5/ib_rep.c
@@ -109,15 +109,15 @@ u8 mlx5_ib_eswitch_mode(struct mlx5_eswitch *esw)
 }
 
 struct mlx5_ib_dev *mlx5_ib_get_rep_ibdev(struct mlx5_eswitch *esw,
- int vport_index)
+ u16 vport_num)
 {
-   return mlx5_eswitch_get_proto_dev(esw, vport_index, REP_IB);
+   return mlx5_eswitch_get_proto_dev(esw, vport_num, REP_IB);
 }
 
 struct net_device *mlx5_ib_get_rep_netdev(struct mlx5_eswitch *esw,
- int vport_index)
+ u16 vport_num)
 {
-   return mlx5_eswitch_get_proto_dev(esw, vport_index, REP_ETH);
+   return mlx5_eswitch_get_proto_dev(esw, vport_num, REP_ETH);
 }
 
 struct mlx5_ib_dev *mlx5_ib_get_uplink_ibdev(struct mlx5_eswitch *esw)
@@ -125,9 +125,10 @@ struct mlx5_ib_dev *mlx5_ib_get_uplink_ibdev(struct 
mlx5_eswitch *esw)
return mlx5_eswitch_uplink_get_proto_dev(esw, REP_IB);
 }
 
-struct mlx5_eswitch_rep *mlx5_ib_vport_rep(struct mlx5_eswitch *esw, int vport)
+struct mlx5_eswitch_rep *mlx5_ib_vport_rep(struct mlx5_eswitch *esw,
+  u16 vport_num)
 {
-   return mlx5_eswitch_vport_rep(esw, vport);
+   return mlx5_eswitch_vport_rep(esw, vport_num);
 }
 
 struct mlx5_flow_handle *create_flow_rule_vport_sq(struct mlx5_ib_dev *dev,
diff --git a/drivers/infiniband/hw/mlx5/ib_rep.h 
b/drivers/infiniband/hw/mlx5/ib_rep.h
index 1d9778da8a50..8336e0517a5c 100644
--- a/drivers/infiniband/hw/mlx5/ib_rep.h
+++ b/drivers/infiniband/hw/mlx5/ib_rep.h
@@ -14,17 +14,17 @@ extern const struct mlx5_ib_profile uplink_rep_profile;
 
 u8 mlx5_ib_eswitch_mode(struct mlx5_eswitch *esw);
 struct mlx5_ib_dev *mlx5_ib_get_rep_ibdev(struct mlx5_eswitch *esw,
- int vport_index);
+ u16 vport_num);
 struct mlx5_ib_dev *mlx5_ib_get_uplink_ibdev(struct mlx5_eswitch *esw);
 struct mlx5_eswitch_rep *mlx5_ib_vport_rep(struct mlx5_eswitch *esw,
-  int vport_index);
+  u16 vport_num);
 void mlx5_ib_register_vport_reps(struct mlx5_core_dev *mdev);
 void mlx5_ib_unregister_vport_reps(struct mlx5_core_dev *mdev);
 struct mlx5_flow_handle *create_flow_rule_vport_sq(struct mlx5_ib_dev *dev,
   struct mlx5_ib_sq *sq,
   u16 port);
 struct net_device *mlx5_ib_get_rep_netdev(struct mlx5_eswitch *esw,
- int vport_index);
+ u16 vport_num);
 #else /* CONFIG_MLX5_ESWITCH */
 static inline u8 mlx5_ib_eswitch_mode(struct mlx5_eswitch *esw)
 {
@@ -33,7 +33,7 @@ static inline u8 mlx5_ib_eswitch_mode(struct mlx5_eswitch 
*esw)
 
 static inline
 struct mlx5_ib_dev *mlx5_ib_get_rep_ibdev(struct mlx5_eswitch *esw,
- int vport_index)
+ u16 vport_num)
 {
return NULL;
 }
@@ -46,7 +46,7 @@ struct mlx5_ib_dev *mlx5_ib_get_uplink_ibdev(struct 
mlx5_eswitch *esw)
 
 static inline
 struct mlx5_eswitch_rep *mlx5_ib_vport_rep(struct mlx5_eswitch *esw,
-  int vport_index)
+  u16 vport_num)
 {
return NULL;
 }
@@ -63,7 +63,7 @@ struct mlx5_flow_handle *create_flow_rule_vport_sq(struct 
mlx5_ib_dev *dev,
 
 static inline
 struct net_device *mlx5_ib_get_rep_ne

[net 04/11] net/mlx5: Fix peer pf disable hca command

2019-05-17 Thread Saeed Mahameed
From: Bodong Wang 

The command was mistakenly using enable_hca in embedded CPU field.

Fixes: 22e939a91dcb (net/mlx5: Update enable HCA dependency)
Signed-off-by: Bodong Wang 
Reported-by: Alex Rosenbaum 
Signed-off-by: Alex Rosenbaum 
Reviewed-by: Daniel Jurgens 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/ecpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
index 4746f2d28fb6..0ccd6d40baf7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
@@ -26,7 +26,7 @@ static int mlx5_peer_pf_disable_hca(struct mlx5_core_dev *dev)
 
MLX5_SET(disable_hca_in, in, opcode, MLX5_CMD_OP_DISABLE_HCA);
MLX5_SET(disable_hca_in, in, function_id, 0);
-   MLX5_SET(enable_hca_in, in, embedded_cpu_function, 0);
+   MLX5_SET(disable_hca_in, in, embedded_cpu_function, 0);
return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
 
-- 
2.21.0



[net 11/11] net/mlx5e: Fix possible modify header actions memory leak

2019-05-17 Thread Saeed Mahameed
From: Eli Britstein 

The cited commit could disable the modify header flag, but did not free
the allocated memory for the modify header actions. Fix it.

Fixes: 27c11b6b844cd ("net/mlx5e: Do not rewrite fields with the same match")
Signed-off-by: Eli Britstein 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4722ac70f0a9..31cd02f11499 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2567,8 +2567,10 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
/* in case all pedit actions are skipped, remove the MOD_HDR
 * flag.
 */
-   if (parse_attr->num_mod_hdr_actions == 0)
+   if (parse_attr->num_mod_hdr_actions == 0) {
action &= ~MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+   kfree(parse_attr->mod_hdr_actions);
+   }
}
 
attr->action = action;
@@ -3005,6 +3007,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 */
if (parse_attr->num_mod_hdr_actions == 0) {
action &= ~MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+   kfree(parse_attr->mod_hdr_actions);
if (!((action & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP) ||
  (action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH)))
attr->split_count = 0;
-- 
2.21.0



[net 02/11] net/mlx5: Add meaningful return codes to status_to_err function

2019-05-17 Thread Saeed Mahameed
From: Valentine Fatiev 

Current version of function status_to_err return -1 for any
status returned by mlx5_cmd_invoke function. In case status is
MLX5_DRIVER_STATUS_ABORTED we should return 0 to the caller as we
assume command completed successfully on FW. If error returned we are
getting confusing messages in dmesg. In addition, currently returned
value -1 is confusing with -EPERM.

New implementation actually fix original commit and return meaningful
codes for commands delivery status and print message in case of failure.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Valentine Fatiev 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 22 ++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 937ba4bcb056..d2ab8cd8ad9f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1604,7 +1604,27 @@ void mlx5_cmd_flush(struct mlx5_core_dev *dev)
 
 static int status_to_err(u8 status)
 {
-   return status ? -1 : 0; /* TBD more meaningful codes */
+   switch (status) {
+   case MLX5_CMD_DELIVERY_STAT_OK:
+   case MLX5_DRIVER_STATUS_ABORTED:
+   return 0;
+   case MLX5_CMD_DELIVERY_STAT_SIGNAT_ERR:
+   case MLX5_CMD_DELIVERY_STAT_TOK_ERR:
+   return -EBADR;
+   case MLX5_CMD_DELIVERY_STAT_BAD_BLK_NUM_ERR:
+   case MLX5_CMD_DELIVERY_STAT_OUT_PTR_ALIGN_ERR:
+   case MLX5_CMD_DELIVERY_STAT_IN_PTR_ALIGN_ERR:
+   return -EFAULT; /* Bad address */
+   case MLX5_CMD_DELIVERY_STAT_IN_LENGTH_ERR:
+   case MLX5_CMD_DELIVERY_STAT_OUT_LENGTH_ERR:
+   case MLX5_CMD_DELIVERY_STAT_CMD_DESCR_ERR:
+   case MLX5_CMD_DELIVERY_STAT_RES_FLD_NOT_CLR_ERR:
+   return -ENOMSG;
+   case MLX5_CMD_DELIVERY_STAT_FW_ERR:
+   return -EIO;
+   default:
+   return -EINVAL;
+   }
 }
 
 static struct mlx5_cmd_msg *alloc_msg(struct mlx5_core_dev *dev, int in_size,
-- 
2.21.0



[net 01/11] net/mlx5: Imply MLXFW in mlx5_core

2019-05-17 Thread Saeed Mahameed
mlxfw can be compiled as external module while mlx5_core can be
builtin, in such case mlx5 will act like mlxfw is disabled.

Since mlxfw is just a service library for mlx* drivers,
imply it in mlx5_core to make it always reachable if it was enabled.

Fixes: 3ffaabecd1a1 ("net/mlx5e: Support the flash device ethtool callback")
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 9aca8086ee01..88ccfcfcd128 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -8,6 +8,7 @@ config MLX5_CORE
select NET_DEVLINK
imply PTP_1588_CLOCK
imply VXLAN
+   imply MLXFW
default n
---help---
  Core driver for low level functionality of the ConnectX-4 and
-- 
2.21.0



[pull request][net 00/11] Mellanox, mlx5 fixes 2019-05-17

2019-05-17 Thread Saeed Mahameed
Hi Dave,

This series introduces some fixes to mlx5 driver.
For more information please see tag log below.

Please pull and let me know if there is any problem.

For -stable v4.19
  net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled
  net/mlx5: Imply MLXFW in mlx5_core

For -stable v5.0
  net/mlx5e: Add missing ethtool driver info for representors
  net/mlx5e: Additional check for flow destination comparison

For -stable v5.1
  net/mlx5: Fix peer pf disable hca command

Thanks,
Saeed.

---
The following changes since commit 5593530e56943182ebb6d81eca8a3be6db6dbba4:

  Revert "tipc: fix modprobe tipc failed after switch order of device 
registration" (2019-05-17 12:15:05 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-fixes-2019-05-17

for you to fetch changes up to e7739a60712a041516f74c8917a0b3e5f1e4f01e:

  net/mlx5e: Fix possible modify header actions memory leak (2019-05-17 
13:16:49 -0700)


mlx5-fixes-2019-05-17


Bodong Wang (1):
  net/mlx5: Fix peer pf disable hca command

Dmytro Linkin (2):
  net/mlx5e: Add missing ethtool driver info for representors
  net/mlx5e: Additional check for flow destination comparison

Eli Britstein (3):
  net/mlx5e: Fix number of vports for ingress ACL configuration
  net/mlx5e: Fix no rewrite fields with the same match
  net/mlx5e: Fix possible modify header actions memory leak

Parav Pandit (1):
  net/mlx5: E-Switch, Correct type to u16 for vport_num and int for 
vport_index

Saeed Mahameed (2):
  net/mlx5: Imply MLXFW in mlx5_core
  net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled

Tariq Toukan (1):
  net/mlx5e: Fix wrong xmit_more application

Valentine Fatiev (1):
  net/mlx5: Add meaningful return codes to status_to_err function

 drivers/infiniband/hw/mlx5/ib_rep.c| 13 ++-
 drivers/infiniband/hw/mlx5/ib_rep.h| 12 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|  1 +
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 22 +-
 drivers/net/ethernet/mellanox/mlx5/core/ecpf.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 18 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 19 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 27 --
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|  9 
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 20 
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  | 22 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 20 
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  2 ++
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h  |  3 ++-
 include/linux/mlx5/eswitch.h   |  6 ++---
 16 files changed, 136 insertions(+), 62 deletions(-)


[net 10/11] net/mlx5e: Fix no rewrite fields with the same match

2019-05-17 Thread Saeed Mahameed
From: Eli Britstein 

With commit 27c11b6b844c ("net/mlx5e: Do not rewrite fields with the
same match") there are no rewrites if the rewrite value is the same as
the matched value. However, if the field is not matched, the rewrite is
also wrongly skipped. Fix it.

Fixes: 27c11b6b844c ("net/mlx5e: Do not rewrite fields with the same match")
Signed-off-by: Eli Britstein 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 22 ++-
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 542354b5eb4d..4722ac70f0a9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1916,6 +1916,19 @@ struct mlx5_fields {
 offsetof(struct pedit_headers, field) + (off), \
 MLX5_BYTE_OFF(fte_match_set_lyr_2_4, match_field)}
 
+/* masked values are the same and there are no rewrites that do not have a
+ * match.
+ */
+#define SAME_VAL_MASK(type, valp, maskp, matchvalp, matchmaskp) ({ \
+   type matchmaskx = *(type *)(matchmaskp); \
+   type matchvalx = *(type *)(matchvalp); \
+   type maskx = *(type *)(maskp); \
+   type valx = *(type *)(valp); \
+   \
+   (valx & maskx) == (matchvalx & matchmaskx) && !(maskx & (maskx ^ \
+matchmaskx)); \
+})
+
 static bool cmp_val_mask(void *valp, void *maskp, void *matchvalp,
 void *matchmaskp, int size)
 {
@@ -1923,16 +1936,13 @@ static bool cmp_val_mask(void *valp, void *maskp, void 
*matchvalp,
 
switch (size) {
case sizeof(u8):
-   same = ((*(u8 *)valp) & (*(u8 *)maskp)) ==
-  ((*(u8 *)matchvalp) & (*(u8 *)matchmaskp));
+   same = SAME_VAL_MASK(u8, valp, maskp, matchvalp, matchmaskp);
break;
case sizeof(u16):
-   same = ((*(u16 *)valp) & (*(u16 *)maskp)) ==
-  ((*(u16 *)matchvalp) & (*(u16 *)matchmaskp));
+   same = SAME_VAL_MASK(u16, valp, maskp, matchvalp, matchmaskp);
break;
case sizeof(u32):
-   same = ((*(u32 *)valp) & (*(u32 *)maskp)) ==
-  ((*(u32 *)matchvalp) & (*(u32 *)matchmaskp));
+   same = SAME_VAL_MASK(u32, valp, maskp, matchvalp, matchmaskp);
break;
}
 
-- 
2.21.0



[net 05/11] net/mlx5e: Fix wrong xmit_more application

2019-05-17 Thread Saeed Mahameed
From: Tariq Toukan 

Cited patch refactored the xmit_more indication while not preserving
its functionality. Fix it.

Fixes: 3c31ff22b25f ("drivers: mellanox: use netdev_xmit_more() helper")
Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   | 9 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h | 3 ++-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 7b61126fcec9..195a7d903cec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -361,7 +361,7 @@ netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct 
sk_buff *skb,
}
 
stats->bytes += num_bytes;
-   stats->xmit_more += netdev_xmit_more();
+   stats->xmit_more += xmit_more;
 
headlen = skb->len - ihs - skb->data_len;
ds_cnt += !!headlen;
@@ -624,7 +624,8 @@ mlx5i_txwqe_build_datagram(struct mlx5_av *av, u32 dqpn, 
u32 dqkey,
 }
 
 netdev_tx_t mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
- struct mlx5_av *av, u32 dqpn, u32 dqkey)
+ struct mlx5_av *av, u32 dqpn, u32 dqkey,
+ bool xmit_more)
 {
struct mlx5_wq_cyc *wq = &sq->wq;
struct mlx5i_tx_wqe *wqe;
@@ -660,7 +661,7 @@ netdev_tx_t mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct 
sk_buff *skb,
}
 
stats->bytes += num_bytes;
-   stats->xmit_more += netdev_xmit_more();
+   stats->xmit_more += xmit_more;
 
headlen = skb->len - ihs - skb->data_len;
ds_cnt += !!headlen;
@@ -705,7 +706,7 @@ netdev_tx_t mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct 
sk_buff *skb,
goto err_drop;
 
mlx5e_txwqe_complete(sq, skb, opcode, ds_cnt, num_wqebbs, num_bytes,
-num_dma, wi, cseg, false);
+num_dma, wi, cseg, xmit_more);
 
return NETDEV_TX_OK;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index ada1b7c0e0b8..9ca492b430d8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -619,7 +619,7 @@ static int mlx5i_xmit(struct net_device *dev, struct 
sk_buff *skb,
struct mlx5_ib_ah *mah   = to_mah(address);
struct mlx5i_priv *ipriv = epriv->ppriv;
 
-   return mlx5i_sq_xmit(sq, skb, &mah->av, dqpn, ipriv->qkey);
+   return mlx5i_sq_xmit(sq, skb, &mah->av, dqpn, ipriv->qkey, 
netdev_xmit_more());
 }
 
 static void mlx5i_set_pkey_index(struct net_device *netdev, int id)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
index 9165ca567047..e19ba3fcd1b7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
@@ -119,7 +119,8 @@ static inline void mlx5i_sq_fetch_wqe(struct mlx5e_txqsq 
*sq,
 }
 
 netdev_tx_t mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
- struct mlx5_av *av, u32 dqpn, u32 dqkey);
+ struct mlx5_av *av, u32 dqpn, u32 dqkey,
+ bool xmit_more);
 void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 void mlx5i_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats);
 
-- 
2.21.0



[net 07/11] net/mlx5e: Fix number of vports for ingress ACL configuration

2019-05-17 Thread Saeed Mahameed
From: Eli Britstein 

With the cited commit, ACLs are configured for the VF ports. The loop
for the number of ports had the wrong number. Fix it.

Fixes: 184867373d8c ("net/mlx5e: ACLs for priority tag mode")
Signed-off-by: Eli Britstein 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/eswitch_offloads.c   | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 2060456ddcd0..47b446d30f71 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1732,13 +1732,14 @@ static void esw_prio_tag_acls_cleanup(struct 
mlx5_eswitch *esw)
struct mlx5_vport *vport;
int i;
 
-   mlx5_esw_for_each_vf_vport(esw, i, vport, esw->nvports) {
+   mlx5_esw_for_each_vf_vport(esw, i, vport, esw->dev->priv.sriov.num_vfs) 
{
esw_vport_disable_egress_acl(esw, vport);
esw_vport_disable_ingress_acl(esw, vport);
}
 }
 
-static int esw_offloads_steering_init(struct mlx5_eswitch *esw, int nvports)
+static int esw_offloads_steering_init(struct mlx5_eswitch *esw, int vf_nvports,
+ int nvports)
 {
int err;
 
@@ -1746,7 +1747,7 @@ static int esw_offloads_steering_init(struct mlx5_eswitch 
*esw, int nvports)
mutex_init(&esw->fdb_table.offloads.fdb_prio_lock);
 
if (MLX5_CAP_GEN(esw->dev, prio_tag_required)) {
-   err = esw_prio_tag_acls_config(esw, nvports);
+   err = esw_prio_tag_acls_config(esw, vf_nvports);
if (err)
return err;
}
@@ -1839,7 +1840,7 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int 
vf_nvports,
 {
int err;
 
-   err = esw_offloads_steering_init(esw, total_nvports);
+   err = esw_offloads_steering_init(esw, vf_nvports, total_nvports);
if (err)
return err;
 
-- 
2.21.0



[net 08/11] net/mlx5e: Add missing ethtool driver info for representors

2019-05-17 Thread Saeed Mahameed
From: Dmytro Linkin 

For all representors added firmware version info to show in
ethtool driver info.
For uplink representor, because only it is tied to the pci device
sysfs, added pci bus info.

Fixes: ff9b85de5d5d ("net/mlx5e: Add some ethtool port control entries to the 
uplink rep netdev")
Signed-off-by: Dmytro Linkin 
Reviewed-by: Gavi Teitz 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 91e24f1cead8..5283e16c69e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -65,9 +65,26 @@ static void mlx5e_rep_indr_unregister_block(struct 
mlx5e_rep_priv *rpriv,
 static void mlx5e_rep_get_drvinfo(struct net_device *dev,
  struct ethtool_drvinfo *drvinfo)
 {
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+
strlcpy(drvinfo->driver, mlx5e_rep_driver_name,
sizeof(drvinfo->driver));
strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version));
+   snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
+"%d.%d.%04d (%.16s)",
+fw_rev_maj(mdev), fw_rev_min(mdev),
+fw_rev_sub(mdev), mdev->board_id);
+}
+
+static void mlx5e_uplink_rep_get_drvinfo(struct net_device *dev,
+struct ethtool_drvinfo *drvinfo)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+
+   mlx5e_rep_get_drvinfo(dev, drvinfo);
+   strlcpy(drvinfo->bus_info, pci_name(priv->mdev->pdev),
+   sizeof(drvinfo->bus_info));
 }
 
 static const struct counter_desc sw_rep_stats_desc[] = {
@@ -363,7 +380,7 @@ static const struct ethtool_ops mlx5e_vf_rep_ethtool_ops = {
 };
 
 static const struct ethtool_ops mlx5e_uplink_rep_ethtool_ops = {
-   .get_drvinfo   = mlx5e_rep_get_drvinfo,
+   .get_drvinfo   = mlx5e_uplink_rep_get_drvinfo,
.get_link  = ethtool_op_get_link,
.get_strings   = mlx5e_rep_get_strings,
.get_sset_count= mlx5e_rep_get_sset_count,
-- 
2.21.0



[net 09/11] net/mlx5e: Additional check for flow destination comparison

2019-05-17 Thread Saeed Mahameed
From: Dmytro Linkin 

Flow destination comparison has an inaccuracy: code see no
difference between same vf ports, which belong to different pfs.

Example: If start ping from VF0 (PF1) to VF1 (PF1) and mirror
all traffic to VF0 (PF2), icmp reply to VF0 (PF1) and mirrored
flow to VF0 (PF2) would be determined as same destination. It lead
to creating flow handler with rule nodes, which not added to node
tree. When later driver try to delete this flow rules we got
kernel crash.

Add comparison of vhca_id field to avoid this.

Fixes: 1228e912c934 ("net/mlx5: Consider encapsulation properties when 
comparing destinations")
Signed-off-by: Dmytro Linkin 
Reviewed-by: Roi Dayan 
Reviewed-by: Vlad Buslov 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index fb5b61727ee7..d7ca7e82a832 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1380,6 +1380,8 @@ static bool mlx5_flow_dests_cmp(struct 
mlx5_flow_destination *d1,
if ((d1->type == MLX5_FLOW_DESTINATION_TYPE_VPORT &&
 d1->vport.num == d2->vport.num &&
 d1->vport.flags == d2->vport.flags &&
+((d1->vport.flags & MLX5_FLOW_DEST_VPORT_VHCA_ID) ?
+ (d1->vport.vhca_id == d2->vport.vhca_id) : true) &&
 ((d1->vport.flags & MLX5_FLOW_DEST_VPORT_REFORMAT_ID) ?
  (d1->vport.reformat_id == d2->vport.reformat_id) : true)) 
||
(d1->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
-- 
2.21.0



[net 06/11] net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled

2019-05-17 Thread Saeed Mahameed
ethtool user spaces needs to know ring count via ETHTOOL_GRXRINGS when
executing (ethtool -x) which is retrieved via ethtool get_rxnfc callback,
in mlx5 this callback is disabled when CONFIG_MLX5_EN_RXNFC=n.

This patch allows only ETHTOOL_GRXRINGS command on mlx5e_get_rxnfc() when
CONFIG_MLX5_EN_RXNFC is disabled, so ethtool -x will continue working.

Fixes: fe6d86b3c316 ("net/mlx5e: Add CONFIG_MLX5_EN_RXNFC for ethtool rx nfc")
Signed-off-by: Saeed Mahameed 
---
 .../ethernet/mellanox/mlx5/core/en_ethtool.c   | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 7efaa58ae034..dd764e0471f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1901,6 +1901,22 @@ static int mlx5e_flash_device(struct net_device *dev,
return mlx5e_ethtool_flash_device(priv, flash);
 }
 
+#ifndef CONFIG_MLX5_EN_RXNFC
+/* When CONFIG_MLX5_EN_RXNFC=n we only support ETHTOOL_GRXRINGS
+ * otherwise this function will be defined from en_fs_ethtool.c
+ */
+static int mlx5e_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info, 
u32 *rule_locs)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+
+   if (info->cmd != ETHTOOL_GRXRINGS)
+   return -EOPNOTSUPP;
+   /* ring_count is needed by ethtool -x */
+   info->data = priv->channels.params.num_channels;
+   return 0;
+}
+#endif
+
 const struct ethtool_ops mlx5e_ethtool_ops = {
.get_drvinfo   = mlx5e_get_drvinfo,
.get_link  = ethtool_op_get_link,
@@ -1919,8 +1935,8 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
.get_rxfh_indir_size = mlx5e_get_rxfh_indir_size,
.get_rxfh  = mlx5e_get_rxfh,
.set_rxfh  = mlx5e_set_rxfh,
-#ifdef CONFIG_MLX5_EN_RXNFC
.get_rxnfc = mlx5e_get_rxnfc,
+#ifdef CONFIG_MLX5_EN_RXNFC
.set_rxnfc = mlx5e_set_rxnfc,
 #endif
.flash_device  = mlx5e_flash_device,
-- 
2.21.0



[PATCH 2/3] net: 8390: switch X-Surf 100 driver to use ax88796b PHY

2019-05-17 Thread Michael Schmitz
The asix.c driver name causes a module name conflict with a driver
of the same name in drivers/net/usb. Select the new ax88796b PHY
driver when X-SURF 100 support is configured.

Signed-off-by: Michael Schmitz 
Fixes: 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver")
---
 drivers/net/ethernet/8390/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/8390/Kconfig 
b/drivers/net/ethernet/8390/Kconfig
index f2f0264..443b34e 100644
--- a/drivers/net/ethernet/8390/Kconfig
+++ b/drivers/net/ethernet/8390/Kconfig
@@ -49,7 +49,7 @@ config XSURF100
tristate "Amiga XSurf 100 AX88796/NE2000 clone support"
depends on ZORRO
select AX88796
-   select ASIX_PHY
+   select AX88796B_PHY
help
  This driver is for the Individual Computers X-Surf 100 Ethernet
  card (based on the Asix AX88796 chip). If you have such a card,
-- 
1.9.1



[PATCH 3/3] net: phy: remove old Asix Electronics PHY driver

2019-05-17 Thread Michael Schmitz
The asix.c driver name causes a module name conflict with a driver
of the same name in drivers/net/usb. Now that a new ax88796b.c driver
has been added, remove drivers/net/phy/asix.c.

Signed-off-by: Michael Schmitz 
Fixes: 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver")
---
 drivers/net/phy/Kconfig  |  6 -
 drivers/net/phy/Makefile |  1 -
 drivers/net/phy/asix.c   | 57 
 3 files changed, 64 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 1647473..5496e5c 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -253,12 +253,6 @@ config AQUANTIA_PHY
---help---
  Currently supports the Aquantia AQ1202, AQ2104, AQR105, AQR405
 
-config ASIX_PHY
-   tristate "Asix PHYs"
-   help
- Currently supports the Asix Electronics PHY found in the X-Surf 100
- AX88796B package.
-
 config AX88796B_PHY
tristate "Asix PHYs"
help
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index cc5758a..5b5c866 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -52,7 +52,6 @@ ifdef CONFIG_HWMON
 aquantia-objs  += aquantia_hwmon.o
 endif
 obj-$(CONFIG_AQUANTIA_PHY) += aquantia.o
-obj-$(CONFIG_ASIX_PHY) += asix.o
 obj-$(CONFIG_AX88796B_PHY) += ax88796b.o
 obj-$(CONFIG_AT803X_PHY)   += at803x.o
 obj-$(CONFIG_BCM63XX_PHY)  += bcm63xx.o
diff --git a/drivers/net/phy/asix.c b/drivers/net/phy/asix.c
deleted file mode 100644
index 79bf7ef..000
--- a/drivers/net/phy/asix.c
+++ /dev/null
@@ -1,57 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0+
-/* Driver for Asix PHYs
- *
- * Author: Michael Schmitz 
- */
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define PHY_ID_ASIX_AX88796B   0x003b1841
-
-MODULE_DESCRIPTION("Asix PHY driver");
-MODULE_AUTHOR("Michael Schmitz ");
-MODULE_LICENSE("GPL");
-
-/**
- * asix_soft_reset - software reset the PHY via BMCR_RESET bit
- * @phydev: target phy_device struct
- *
- * Description: Perform a software PHY reset using the standard
- * BMCR_RESET bit and poll for the reset bit to be cleared.
- * Toggle BMCR_RESET bit off to accommodate broken AX8796B PHY implementation
- * such as used on the Individual Computers' X-Surf 100 Zorro card.
- *
- * Returns: 0 on success, < 0 on failure
- */
-static int asix_soft_reset(struct phy_device *phydev)
-{
-   int ret;
-
-   /* Asix PHY won't reset unless reset bit toggles */
-   ret = phy_write(phydev, MII_BMCR, 0);
-   if (ret < 0)
-   return ret;
-
-   return genphy_soft_reset(phydev);
-}
-
-static struct phy_driver asix_driver[] = { {
-   .phy_id = PHY_ID_ASIX_AX88796B,
-   .name   = "Asix Electronics AX88796B",
-   .phy_id_mask= 0xfff0,
-   /* PHY_BASIC_FEATURES */
-   .soft_reset = asix_soft_reset,
-} };
-
-module_phy_driver(asix_driver);
-
-static struct mdio_device_id __maybe_unused asix_tbl[] = {
-   { PHY_ID_ASIX_AX88796B, 0xfff0 },
-   { }
-};
-
-MODULE_DEVICE_TABLE(mdio, asix_tbl);
-- 
1.9.1



[PATCH 1/3] net: phy: new ax88796b.c Asix Electronics PHY driver

2019-05-17 Thread Michael Schmitz
The asix.c driver name causes a module name conflict with a driver
of the same name in drivers/net/usb. Add new ax88796b.c driver to
prepare for removal of drivers/net/phy/asix.c later.

Signed-off-by: Michael Schmitz 
Fixes: 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver")
---
 drivers/net/phy/Kconfig|  6 +
 drivers/net/phy/Makefile   |  1 +
 drivers/net/phy/ax88796b.c | 57 ++
 3 files changed, 64 insertions(+)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index d629971..1647473 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -259,6 +259,12 @@ config ASIX_PHY
  Currently supports the Asix Electronics PHY found in the X-Surf 100
  AX88796B package.
 
+config AX88796B_PHY
+   tristate "Asix PHYs"
+   help
+ Currently supports the Asix Electronics PHY found in the X-Surf 100
+ AX88796B package.
+
 config AT803X_PHY
tristate "AT803X PHYs"
---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 27d7f9f..cc5758a 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -53,6 +53,7 @@ aquantia-objs += aquantia_hwmon.o
 endif
 obj-$(CONFIG_AQUANTIA_PHY) += aquantia.o
 obj-$(CONFIG_ASIX_PHY) += asix.o
+obj-$(CONFIG_AX88796B_PHY) += ax88796b.o
 obj-$(CONFIG_AT803X_PHY)   += at803x.o
 obj-$(CONFIG_BCM63XX_PHY)  += bcm63xx.o
 obj-$(CONFIG_BCM7XXX_PHY)  += bcm7xxx.o
diff --git a/drivers/net/phy/ax88796b.c b/drivers/net/phy/ax88796b.c
new file mode 100644
index 000..79bf7ef
--- /dev/null
+++ b/drivers/net/phy/ax88796b.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Driver for Asix PHYs
+ *
+ * Author: Michael Schmitz 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PHY_ID_ASIX_AX88796B   0x003b1841
+
+MODULE_DESCRIPTION("Asix PHY driver");
+MODULE_AUTHOR("Michael Schmitz ");
+MODULE_LICENSE("GPL");
+
+/**
+ * asix_soft_reset - software reset the PHY via BMCR_RESET bit
+ * @phydev: target phy_device struct
+ *
+ * Description: Perform a software PHY reset using the standard
+ * BMCR_RESET bit and poll for the reset bit to be cleared.
+ * Toggle BMCR_RESET bit off to accommodate broken AX8796B PHY implementation
+ * such as used on the Individual Computers' X-Surf 100 Zorro card.
+ *
+ * Returns: 0 on success, < 0 on failure
+ */
+static int asix_soft_reset(struct phy_device *phydev)
+{
+   int ret;
+
+   /* Asix PHY won't reset unless reset bit toggles */
+   ret = phy_write(phydev, MII_BMCR, 0);
+   if (ret < 0)
+   return ret;
+
+   return genphy_soft_reset(phydev);
+}
+
+static struct phy_driver asix_driver[] = { {
+   .phy_id = PHY_ID_ASIX_AX88796B,
+   .name   = "Asix Electronics AX88796B",
+   .phy_id_mask= 0xfff0,
+   /* PHY_BASIC_FEATURES */
+   .soft_reset = asix_soft_reset,
+} };
+
+module_phy_driver(asix_driver);
+
+static struct mdio_device_id __maybe_unused asix_tbl[] = {
+   { PHY_ID_ASIX_AX88796B, 0xfff0 },
+   { }
+};
+
+MODULE_DEVICE_TABLE(mdio, asix_tbl);
-- 
1.9.1



[PATCH 0/3] resolve module name conflict for asix PHY and USB modules

2019-05-17 Thread Michael Schmitz
Haven't heard back in a while, so here goes: 

Commit 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver")
introduced a new PHY driver drivers/net/phy/asix.c that causes a module
name conflict with a pre-existiting driver (drivers/net/usb/asix.c). 

The PHY driver is used by the X-Surf 100 ethernet card driver, and loaded
by that driver via its PHY ID. A rename of the driver looks unproblematic.
 
Rename PHY driver to ax88796b.c in order to resolve name conflict. 

Fixes: 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver")

Michael Schmitz (3):
  net: phy: new ax88796b.c Asix Electronics PHY driver
  net: 8390: switch X-Surf 100 driver to use ax88796b PHY
  net: phy: remove old Asix Electronics PHY driver

 drivers/net/ethernet/8390/Kconfig |  2 +-
 drivers/net/phy/Kconfig   |  2 +-
 drivers/net/phy/Makefile  |  2 +-
 drivers/net/phy/asix.c| 57 ---
 drivers/net/phy/ax88796b.c| 57 +++
 5 files changed, 60 insertions(+), 60 deletions(-)
 delete mode 100644 drivers/net/phy/asix.c
 create mode 100644 drivers/net/phy/ax88796b.c

-- 
1.9.1



Re: 5.1 `ip route get addr/cidr` regression

2019-05-17 Thread Jason A. Donenfeld
On Fri, May 17, 2019 at 10:19 PM Jason A. Donenfeld  wrote:
>
> On Fri, May 17, 2019 at 7:39 PM David Ahern  wrote:
> > Not sure why Jason is not seeing that. Really odd that he hits the error
> > AND does not get a message back since it requires an updated ip command
> > to set the strict checking flag and that command understands extack.
> > Perhaps no libmnl?
>
> Right, no libmnl. This is coming out of the iproute2 compiled for the
> tests at https://www.wireguard.com/build-status/ which are pretty
> minimal. Extact support would be kind of useful for diagnostics, and
> wg(8) already uses it, so I can probably put that in my build system.

Voila, extack:

+ ip link add wg0 type dummy
+ ip addr add 192.168.4.2/24 dev wg0
+ ip link set wg0 up
+ ip route get 192.168.4.0/24
Error: ipv4: Invalid values in header for route


Re: [PATCH] net/mlx5e: restrict the real_dev of vlan device is the same as uplink device

2019-05-17 Thread Saeed Mahameed
On Wed, 2019-05-15 at 17:25 +0800, we...@ucloud.cn wrote:
> From: wenxu 
> 
> When register indr block for vlan device, it should check the
> real_dev
> of vlan device is same as uplink device. Or it will set offload rule
> to mlx5e which will never hit.
> 

I would improve the commit message, it is not really clear to me what
is going on here.

Anyway Roi and team, can you please provide feedback ..

> Signed-off-by: wenxu 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 91e24f1..a39fdac 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -796,7 +796,7 @@ static int mlx5e_nic_rep_netdevice_event(struct
> notifier_block *nb,
>   struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
>  
>   if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
> - !is_vlan_dev(netdev))
> + !(is_vlan_dev(netdev) && vlan_dev_real_dev(netdev) ==
> rpriv->netdev))
>   return NOTIFY_OK;
>  
>   switch (event) {


Re: [PATCH net-next RFC] ipv6: elide flowlabel check if no exclusive leases exist

2019-05-17 Thread Eric Dumazet



On 5/17/19 8:56 AM, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
> If not set, by default an autogenerated flowlabel is selected.
> 
> Explicit flowlabels require a control operation per label plus a
> datapath check on every connection (every datagram if unconnected).
> 
> This is particularly expensive on unconnected sockets with many
> connections, such as QUIC.
> 
> In the common case, where no lease is exclusive, the check can be
> safely elided, as both lease request and check trivially succeed.
> Indeed, autoflowlabel does the same (even with exclusive leases).
> 
> Elide the check if no process has requested an exclusive lease.
> 
> This is an optimization. Robust applications still have to revert to
> requesting leases if the fast path fails due to an exclusive lease.
> 
> This is decidedly an RFC patch:
> - need to update all fl6_sock_lookup callers, not just udp
> - behavior should be per-netns isolated
> 
> Other approaches considered:
> - a single "get all flowlabels, non-exclusive" flowlabel get request
>   if set, elide fl6_sock_lookup and fail exclusive lease requests
> 
> - sysctls (only useful if on by default, with static_branch)
>   A) "non-exclusive mode", failing all exclusive lease requests:
>  processes already have to be robust against lease failure
>   B) just bypass check in fl6_sock_lookup, like autoflowlabel
> 
> Signed-off-by: Willem de Bruijn 
> ---
>  include/net/ipv6.h   | 11 +++
>  net/ipv6/ip6_flowlabel.c |  6 ++
>  net/ipv6/udp.c   |  8 
>  3 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index daf80863d3a50..8881cee572410 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -343,7 +344,17 @@ static inline void txopt_put(struct ipv6_txoptions *opt)
>   kfree_rcu(opt, rcu);
>  }
>  
> +extern struct static_key_false ipv6_flowlabel_exclusive;
>  struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label);
> +static inline struct ip6_flowlabel *fl6_sock_verify(struct sock *sk,
> + __be32 label)
> +{
> + if (static_branch_unlikely(&ipv6_flowlabel_exclusive))
> + return fl6_sock_lookup(sk, label) ? : ERR_PTR(-ENOENT);
> +
> + return NULL;
> +}
> +
>  struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space,
>struct ip6_flowlabel *fl,
>struct ipv6_txoptions *fopt);
> diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
> index be5f3d7ceb966..d5f4233b04e0c 100644
> --- a/net/ipv6/ip6_flowlabel.c
> +++ b/net/ipv6/ip6_flowlabel.c
> @@ -57,6 +57,8 @@ static DEFINE_SPINLOCK(ip6_fl_lock);
>  
>  static DEFINE_SPINLOCK(ip6_sk_fl_lock);
>  
> +DEFINE_STATIC_KEY_FALSE(ipv6_flowlabel_exclusive);
> +
>  #define for_each_fl_rcu(hash, fl)\
>   for (fl = rcu_dereference_bh(fl_ht[(hash)]);\
>fl != NULL;\
> @@ -98,6 +100,8 @@ static void fl_free_rcu(struct rcu_head *head)
>  {
>   struct ip6_flowlabel *fl = container_of(head, struct ip6_flowlabel, 
> rcu);
>  
> + if (fl->share != IPV6_FL_S_NONE && fl->share != IPV6_FL_S_ANY)
> + static_branch_dec(&ipv6_flowlabel_exclusive);

static_branch_dec() can not be invoked from a rcu call back.

>   if (fl->share == IPV6_FL_S_PROCESS)
>   put_pid(fl->owner.pid);
>   kfree(fl->opt);
> @@ -423,6 +427,8 @@ fl_create(struct net *net, struct sock *sk, struct 
> in6_flowlabel_req *freq,
>   }
>   fl->dst = freq->flr_dst;
>   atomic_set(&fl->users, 1);
> + if (fl->share != IPV6_FL_S_ANY)
> + static_branch_inc(&ipv6_flowlabel_exclusive);


Can this be used by unpriv users ?

If yes, then you want to use static_key_false_deferred instead




Re: [PATCH v2] net/mlx5e: Add bonding device for indr block to offload the packet received from bonding device

2019-05-17 Thread Saeed Mahameed
On Fri, 2019-05-17 at 17:17 +0800, we...@ucloud.cn wrote:
> From: wenxu 
> 
> The mlx5e support the lag mode. When add mlx_p0 and mlx_p1 to bond0.
> packet received from mlx_p0 or mlx_p1 and in the ingress tc flower
> forward to vf0. The tc rule can't be offloaded because there is
> no indr_register_block for the bonding device.
> 
> Signed-off-by: wenxu 

Hi Wenxu, thanks for the patch

I would like to wait for some feedback from Roi and his team, 
Guys can you please provide feedback ?

Thanks,
Saeed

> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 91e24f1..134fa0b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -796,6 +796,7 @@ static int mlx5e_nic_rep_netdevice_event(struct
> notifier_block *nb,
>   struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
>  
>   if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
> + !netif_is_bond_master(netdev) &&
>   !is_vlan_dev(netdev))
>   return NOTIFY_OK;
>  


[PATCH 0/1] Fix for VM_FLUSH_RESET_PERMS on sparc

2019-05-17 Thread Rick Edgecombe
Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag on the sparc
architecture. When freeing many BPF JITs simultaneously, the vfree flush
operations can become stuck waiting as they each try to vm_unmap_aliases().

It also came up that using this flag is not needed for architectures like sparc
that already have normal kernel memory as executable. This patch fixes the usage
of this flag on sparc to also fix it in case the root cause is also an issue on
other architectures. Separately we can disable usage of VM_FLUSH_RESET_PERMS for
these architectures if desired.

Rick Edgecombe (1):
  vmalloc: Fix issues with flush flag

 mm/vmalloc.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

-- 
2.17.1



[PATCH 1/1] vmalloc: Fix issues with flush flag

2019-05-17 Thread Rick Edgecombe
Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag on the
sparc architecture.

When freeing many BPF JITs at once the free operations can become stuck
waiting for locks as they each try to vm_unmap_aliases(). Calls to this
function happen frequently on some archs, but in vmalloc itself the lazy
purge operations happens more rarely, where only in extreme cases could
multiple purges be happening at once. Since this is cross platform code we
shouldn't do this here where it could happen concurrently in a burst, and
instead just flush the TLB. Also, add a little logic to skip calls to
page_address() when possible to further speed this up, since they may have
locking on some archs.

Lastly, it appears that the calculation of the address range to flush
was broken at some point, so fix that as well.

Fixes: 868b104d7379 ("mm/vmalloc: Add flag for freeing of special permsissions")
Reported-by: Meelis Roos 
Cc: Meelis Roos 
Cc: Peter Zijlstra 
Cc: "David S. Miller" 
Cc: Dave Hansen 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Ingo Molnar 
Cc: Nadav Amit 
Signed-off-by: Rick Edgecombe 
---
 mm/vmalloc.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 67bbb8d2a0a8..5daa7ec8950f 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1531,9 +1531,10 @@ static inline void set_area_direct_map(const struct 
vm_struct *area,
 /* Handle removing and resetting vm mappings related to the vm_struct. */
 static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 {
+   const bool has_set_direct = IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP);
+   const bool flush_reset = area->flags & VM_FLUSH_RESET_PERMS;
unsigned long addr = (unsigned long)area->addr;
-   unsigned long start = ULONG_MAX, end = 0;
-   int flush_reset = area->flags & VM_FLUSH_RESET_PERMS;
+   unsigned long start = addr, end = addr + get_vm_area_size(area);
int i;
 
/*
@@ -1542,7 +1543,7 @@ static void vm_remove_mappings(struct vm_struct *area, 
int deallocate_pages)
 * This is concerned with resetting the direct map any an vm alias with
 * execute permissions, without leaving a RW+X window.
 */
-   if (flush_reset && !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
+   if (flush_reset && !has_set_direct) {
set_memory_nx(addr, area->nr_pages);
set_memory_rw(addr, area->nr_pages);
}
@@ -1555,22 +1556,24 @@ static void vm_remove_mappings(struct vm_struct *area, 
int deallocate_pages)
 
/*
 * If not deallocating pages, just do the flush of the VM area and
-* return.
+* return. If the arch doesn't have set_direct_map_(), also skip the
+* below work.
 */
-   if (!deallocate_pages) {
-   vm_unmap_aliases();
+   if (!deallocate_pages || !has_set_direct) {
+   flush_tlb_kernel_range(addr, get_vm_area_size(area));
return;
}
 
/*
 * If execution gets here, flush the vm mapping and reset the direct
 * map. Find the start and end range of the direct mappings to make sure
-* the vm_unmap_aliases() flush includes the direct map.
+* the flush_tlb_kernel_range() includes the direct map.
 */
for (i = 0; i < area->nr_pages; i++) {
-   if (page_address(area->pages[i])) {
+   addr = (unsigned long)page_address(area->pages[i]);
+   if (addr) {
start = min(addr, start);
-   end = max(addr, end);
+   end = max(addr + PAGE_SIZE, end);
}
}
 
@@ -1580,7 +1583,7 @@ static void vm_remove_mappings(struct vm_struct *area, 
int deallocate_pages)
 * reset the direct map permissions to the default.
 */
set_area_direct_map(area, set_direct_map_invalid_noflush);
-   _vm_unmap_aliases(start, end, 1);
+   flush_tlb_kernel_range(start, end);
set_area_direct_map(area, set_direct_map_default_noflush);
 }
 
-- 
2.17.1



Re: [PATCH 0/3] resolve module name conflict for asix PHY and USB modules

2019-05-17 Thread Andrew Lunn
On Sat, May 18, 2019 at 08:25:15AM +1200, Michael Schmitz wrote:
> Haven't heard back in a while, so here goes: 
> 
> Commit 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver")
> introduced a new PHY driver drivers/net/phy/asix.c that causes a module
> name conflict with a pre-existiting driver (drivers/net/usb/asix.c). 
> 
> The PHY driver is used by the X-Surf 100 ethernet card driver, and loaded
> by that driver via its PHY ID. A rename of the driver looks unproblematic.
>  
> Rename PHY driver to ax88796b.c in order to resolve name conflict. 

Hi Michael

Please just use git mv and do it all one patch. It then makes it clear
you have not changed the driver, just renamed it.

Thanks
Andrew


[PATCH bpf] bpf: Check sk_fullsock() before returning from bpf_sk_lookup()

2019-05-17 Thread Martin KaFai Lau
The BPF_FUNC_sk_lookup_xxx helpers return RET_PTR_TO_SOCKET_OR_NULL.
Meaning a fullsock ptr and its fullsock's fields in bpf_sock can be
accessed, e.g. type, protocol, mark and priority.
Some new helper, like bpf_sk_storage_get(), also expects
ARG_PTR_TO_SOCKET is a fullsock.

bpf_sk_lookup() currently calls sk_to_full_sk() before returning.
However, the ptr returned from sk_to_full_sk() is not guaranteed
to be a fullsock.  For example, it cannot get a fullsock if sk
is in TCP_TIME_WAIT.

This patch checks for sk_fullsock() before returning. If it is not
a fullsock, sock_gen_put() is called if needed and then returns NULL.

Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
Cc: Joe Stringer 
Signed-off-by: Martin KaFai Lau 
---
 net/core/filter.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 55bfc941d17a..85def5a20aaf 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5337,8 +5337,14 @@ __bpf_sk_lookup(struct sk_buff *skb, struct 
bpf_sock_tuple *tuple, u32 len,
struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
   ifindex, proto, netns_id, flags);
 
-   if (sk)
+   if (sk) {
sk = sk_to_full_sk(sk);
+   if (!sk_fullsock(sk)) {
+   if (!sock_flag(sk, SOCK_RCU_FREE))
+   sock_gen_put(sk);
+   return NULL;
+   }
+   }
 
return sk;
 }
@@ -5369,8 +5375,14 @@ bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple 
*tuple, u32 len,
struct sock *sk = bpf_skc_lookup(skb, tuple, len, proto, netns_id,
 flags);
 
-   if (sk)
+   if (sk) {
sk = sk_to_full_sk(sk);
+   if (!sk_fullsock(sk)) {
+   if (!sock_flag(sk, SOCK_RCU_FREE))
+   sock_gen_put(sk);
+   return NULL;
+   }
+   }
 
return sk;
 }
-- 
2.17.1



Re: [PATCH bpf-next 0/2] Move bpf_printk to bpf_helpers.h

2019-05-17 Thread Michal Rostecki
On Thu, May 16, 2019 at 11:43:03AM -0700, Alexei Starovoitov wrote:
> On Thu, May 16, 2019 at 4:21 AM Michal Rostecki  
> wrote:
> >
> > This series of patches move the commonly used bpf_printk macro to
> > bpf_helpers.h which is already included in all BPF programs which
> > defined that macro on their own.
> 
> makes sense, but it needs to wait until bpf-next reopens.

Sorry for that! Please apply the v2 patch when bpf-next repoens.


Re: [PATCH bpf] bpf: Check sk_fullsock() before returning from bpf_sk_lookup()

2019-05-17 Thread Eric Dumazet



On 5/17/19 2:21 PM, Martin KaFai Lau wrote:
> The BPF_FUNC_sk_lookup_xxx helpers return RET_PTR_TO_SOCKET_OR_NULL.
> Meaning a fullsock ptr and its fullsock's fields in bpf_sock can be
> accessed, e.g. type, protocol, mark and priority.
> Some new helper, like bpf_sk_storage_get(), also expects
> ARG_PTR_TO_SOCKET is a fullsock.
> 
> bpf_sk_lookup() currently calls sk_to_full_sk() before returning.
> However, the ptr returned from sk_to_full_sk() is not guaranteed
> to be a fullsock.  For example, it cannot get a fullsock if sk
> is in TCP_TIME_WAIT.
> 
> This patch checks for sk_fullsock() before returning. If it is not
> a fullsock, sock_gen_put() is called if needed and then returns NULL.
> 
> Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
> Cc: Joe Stringer 
> Signed-off-by: Martin KaFai Lau 
> ---
>  net/core/filter.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 55bfc941d17a..85def5a20aaf 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5337,8 +5337,14 @@ __bpf_sk_lookup(struct sk_buff *skb, struct 
> bpf_sock_tuple *tuple, u32 len,
>   struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
>  ifindex, proto, netns_id, flags);
>  
> - if (sk)
> + if (sk) {
>   sk = sk_to_full_sk(sk);
> + if (!sk_fullsock(sk)) {
> + if (!sock_flag(sk, SOCK_RCU_FREE))
> + sock_gen_put(sk);

This looks a bit convoluted/weird.

What about telling/asking __bpf_skc_lookup() to not return a non fullsock 
instead ?

> + return NULL;
> + }
> + }
>  
>   return sk;
>  }
> @@ -5369,8 +5375,14 @@ bpf_sk_lookup(struct sk_buff *skb, struct 
> bpf_sock_tuple *tuple, u32 len,
>   struct sock *sk = bpf_skc_lookup(skb, tuple, len, proto, netns_id,
>flags);
>  
> - if (sk)
> + if (sk) {
>   sk = sk_to_full_sk(sk);
> + if (!sk_fullsock(sk)) {
> + if (!sock_flag(sk, SOCK_RCU_FREE))
> + sock_gen_put(sk);
> + return NULL;
> + }
> + }
>  
>   return sk;
>  }
> 


Re: [PATCH net-next RFC] ipv6: elide flowlabel check if no exclusive leases exist

2019-05-17 Thread Willem de Bruijn
On Fri, May 17, 2019 at 4:32 PM Eric Dumazet  wrote:
>
>
>
> On 5/17/19 8:56 AM, Willem de Bruijn wrote:
> > From: Willem de Bruijn 
> >
> > Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
> > If not set, by default an autogenerated flowlabel is selected.
> >
> > Explicit flowlabels require a control operation per label plus a
> > datapath check on every connection (every datagram if unconnected).
> >
> > This is particularly expensive on unconnected sockets with many
> > connections, such as QUIC.
> >
> > In the common case, where no lease is exclusive, the check can be
> > safely elided, as both lease request and check trivially succeed.
> > Indeed, autoflowlabel does the same (even with exclusive leases).
> >
> > Elide the check if no process has requested an exclusive lease.
> >
> > This is an optimization. Robust applications still have to revert to
> > requesting leases if the fast path fails due to an exclusive lease.
> >
> > This is decidedly an RFC patch:
> > - need to update all fl6_sock_lookup callers, not just udp
> > - behavior should be per-netns isolated
> >
> > Other approaches considered:
> > - a single "get all flowlabels, non-exclusive" flowlabel get request
> >   if set, elide fl6_sock_lookup and fail exclusive lease requests
> >
> > - sysctls (only useful if on by default, with static_branch)
> >   A) "non-exclusive mode", failing all exclusive lease requests:
> >  processes already have to be robust against lease failure
> >   B) just bypass check in fl6_sock_lookup, like autoflowlabel
> >
> > Signed-off-by: Willem de Bruijn 
> > ---
> >  include/net/ipv6.h   | 11 +++
> >  net/ipv6/ip6_flowlabel.c |  6 ++
> >  net/ipv6/udp.c   |  8 
> >  3 files changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> > index daf80863d3a50..8881cee572410 100644
> > --- a/include/net/ipv6.h
> > +++ b/include/net/ipv6.h
> > @@ -17,6 +17,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -343,7 +344,17 @@ static inline void txopt_put(struct ipv6_txoptions 
> > *opt)
> >   kfree_rcu(opt, rcu);
> >  }
> >
> > +extern struct static_key_false ipv6_flowlabel_exclusive;
> >  struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label);
> > +static inline struct ip6_flowlabel *fl6_sock_verify(struct sock *sk,
> > + __be32 label)
> > +{
> > + if (static_branch_unlikely(&ipv6_flowlabel_exclusive))
> > + return fl6_sock_lookup(sk, label) ? : ERR_PTR(-ENOENT);
> > +
> > + return NULL;
> > +}
> > +
> >  struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space,
> >struct ip6_flowlabel *fl,
> >struct ipv6_txoptions *fopt);
> > diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
> > index be5f3d7ceb966..d5f4233b04e0c 100644
> > --- a/net/ipv6/ip6_flowlabel.c
> > +++ b/net/ipv6/ip6_flowlabel.c
> > @@ -57,6 +57,8 @@ static DEFINE_SPINLOCK(ip6_fl_lock);
> >
> >  static DEFINE_SPINLOCK(ip6_sk_fl_lock);
> >
> > +DEFINE_STATIC_KEY_FALSE(ipv6_flowlabel_exclusive);
> > +
> >  #define for_each_fl_rcu(hash, fl)\
> >   for (fl = rcu_dereference_bh(fl_ht[(hash)]);\
> >fl != NULL;\
> > @@ -98,6 +100,8 @@ static void fl_free_rcu(struct rcu_head *head)
> >  {
> >   struct ip6_flowlabel *fl = container_of(head, struct ip6_flowlabel, 
> > rcu);
> >
> > + if (fl->share != IPV6_FL_S_NONE && fl->share != IPV6_FL_S_ANY)
> > + static_branch_dec(&ipv6_flowlabel_exclusive);
>
> static_branch_dec() can not be invoked from a rcu call back.
>
> >   if (fl->share == IPV6_FL_S_PROCESS)
> >   put_pid(fl->owner.pid);
> >   kfree(fl->opt);
> > @@ -423,6 +427,8 @@ fl_create(struct net *net, struct sock *sk, struct 
> > in6_flowlabel_req *freq,
> >   }
> >   fl->dst = freq->flr_dst;
> >   atomic_set(&fl->users, 1);
> > + if (fl->share != IPV6_FL_S_ANY)
> > + static_branch_inc(&ipv6_flowlabel_exclusive);
>
>
> Can this be used by unpriv users ?
>
> If yes, then you want to use static_key_false_deferred instead

Ah of course. Yes, any user can exercise this API. Thanks, Eric. I'll
take a look at both points.


Re: [PATCH bpf] bpf: Check sk_fullsock() before returning from bpf_sk_lookup()

2019-05-17 Thread Martin Lau
On Fri, May 17, 2019 at 02:51:48PM -0700, Eric Dumazet wrote:
> 
> 
> On 5/17/19 2:21 PM, Martin KaFai Lau wrote:
> > The BPF_FUNC_sk_lookup_xxx helpers return RET_PTR_TO_SOCKET_OR_NULL.
> > Meaning a fullsock ptr and its fullsock's fields in bpf_sock can be
> > accessed, e.g. type, protocol, mark and priority.
> > Some new helper, like bpf_sk_storage_get(), also expects
> > ARG_PTR_TO_SOCKET is a fullsock.
> > 
> > bpf_sk_lookup() currently calls sk_to_full_sk() before returning.
> > However, the ptr returned from sk_to_full_sk() is not guaranteed
> > to be a fullsock.  For example, it cannot get a fullsock if sk
> > is in TCP_TIME_WAIT.
> > 
> > This patch checks for sk_fullsock() before returning. If it is not
> > a fullsock, sock_gen_put() is called if needed and then returns NULL.
> > 
> > Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
> > Cc: Joe Stringer 
> > Signed-off-by: Martin KaFai Lau 
> > ---
> >  net/core/filter.c | 16 ++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 55bfc941d17a..85def5a20aaf 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5337,8 +5337,14 @@ __bpf_sk_lookup(struct sk_buff *skb, struct 
> > bpf_sock_tuple *tuple, u32 len,
> > struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> >ifindex, proto, netns_id, flags);
> >  
> > -   if (sk)
> > +   if (sk) {
> > sk = sk_to_full_sk(sk);
> > +   if (!sk_fullsock(sk)) {
> > +   if (!sock_flag(sk, SOCK_RCU_FREE))
> > +   sock_gen_put(sk);
> 
> This looks a bit convoluted/weird.
> 
> What about telling/asking __bpf_skc_lookup() to not return a non fullsock 
> instead ?
It is becausee some other helpers, like BPF_FUNC_skc_lookup_tcp,
can return non fullsock.

> 
> > +   return NULL;
> > +   }
> > +   }
> >  
> > return sk;
> >  }
> > @@ -5369,8 +5375,14 @@ bpf_sk_lookup(struct sk_buff *skb, struct 
> > bpf_sock_tuple *tuple, u32 len,
> > struct sock *sk = bpf_skc_lookup(skb, tuple, len, proto, netns_id,
> >  flags);
> >  
> > -   if (sk)
> > +   if (sk) {
> > sk = sk_to_full_sk(sk);
> > +   if (!sk_fullsock(sk)) {
> > +   if (!sock_flag(sk, SOCK_RCU_FREE))
> > +   sock_gen_put(sk);
> > +   return NULL;
> > +   }
> > +   }
> >  
> > return sk;
> >  }
> > 


Re: [PATCH v2] net/mlx5e: Add bonding device for indr block to offload the packet received from bonding device

2019-05-17 Thread Mark Bloch


On 5/17/19 2:17 AM, we...@ucloud.cn wrote:
> From: wenxu 
> 
> The mlx5e support the lag mode. When add mlx_p0 and mlx_p1 to bond0.
> packet received from mlx_p0 or mlx_p1 and in the ingress tc flower
> forward to vf0. The tc rule can't be offloaded because there is
> no indr_register_block for the bonding device.
> 
> Signed-off-by: wenxu 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 91e24f1..134fa0b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -796,6 +796,7 @@ static int mlx5e_nic_rep_netdevice_event(struct 
> notifier_block *nb,
>   struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
>  
>   if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
> + !netif_is_bond_master(netdev) &&

I'm not that familiar with this code path, but shouldn't you check the mlx5e
netdevices are slaves of the bond device (what if you have multiple
bond devices in the system?)

>   !is_vlan_dev(netdev))
>   return NOTIFY_OK;
>  
> 

Mark


Re: [pull request][net 00/11] Mellanox, mlx5 fixes 2019-05-17

2019-05-17 Thread David Miller
From: Saeed Mahameed 
Date: Fri, 17 May 2019 20:19:29 +

> This series introduces some fixes to mlx5 driver.
> For more information please see tag log below.
> 
> Please pull and let me know if there is any problem.

Pulled.

> For -stable v4.19
>   net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled
>   net/mlx5: Imply MLXFW in mlx5_core
> 
> For -stable v5.0
>   net/mlx5e: Add missing ethtool driver info for representors
>   net/mlx5e: Additional check for flow destination comparison
> 
> For -stable v5.1
>   net/mlx5: Fix peer pf disable hca command

And queued up, thanks.


Re: [Intel-wired-lan] i40e X722 RSS problem with NAT-Traversal IPsec packets

2019-05-17 Thread Alexander Duyck
On Fri, May 17, 2019 at 10:23 AM Lennart Sorensen
 wrote:
> OK I applied that and see this:
>
> i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.7-k
> i40e: Copyright (c) 2013 - 2014 Intel Corporation.
> i40e :3d:00.0: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
> i40e :3d:00.0: The driver for the device detected a newer version of the 
> NVM image than expected. Please install the most recent version of the 
> network driver.
> i40e :3d:00.0: MAC address: a4:bf:01:4e:0c:87
> i40e :3d:00.0: flow type: 36 update input mask from:0x00060600, 
> to:0x00018018
> i40e :3d:00.0: flow type: 35 update input mask from:0x00060600, 
> to:0x00018018
> i40e :3d:00.0: flow type: 34 update input mask from:0x000606078000, 
> to:0x0001801f8000
> i40e :3d:00.0: flow type: 33 update input mask from:0x00060606, 
> to:0x0001801e
> i40e :3d:00.0: flow type: 32 update input mask from:0x00060606, 
> to:0x0001801e
> i40e :3d:00.0: flow type: 31 update input mask from:0x00060606, 
> to:0x0001801e
> i40e :3d:00.0: flow type: 30 update input mask from:0x00060606, 
> to:0x0001801e
> i40e :3d:00.0: flow type: 29 update input mask from:0x00060606, 
> to:0x0001801e
> i40e :3d:00.0: Features: PF-id[0] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN 
> Geneve VEPA
> i40e :3d:00.1: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
> i40e :3d:00.1: The driver for the device detected a newer version of the 
> NVM image than expected. Please install the most recent version of the 
> network driver.
> i40e :3d:00.1: MAC address: a4:bf:01:4e:0c:88
> i40e :3d:00.1: Features: PF-id[1] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN 
> Geneve VEPA
> i40e :3d:00.1 eth2: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: 
> None
>
> Unfortunately (much to my disappointment, I hoped it would work) I see
> no change in behaviour.
>
> --
> Len Sorensen

I was hoping it would work too. It seemed like it should have been the
answer since it definitely didn't seem right. Now it has me wondering
about some of the other code in the driver.

By any chance have you run anything like DPDK on any of the X722
interfaces on this system recently? I ask because it occurs to me that
if you had and it loaded something like a custom parsing profile it
could cause issues similar to this.

A debugging step you might try would be to revert back to my earlier
patch that only displayed the input mask instead of changing it. Once
you have done that you could look at doing a full power cycle on the
system by either physically disconnecting the power, or using the
power switch on the power supply itself if one is available. It is
necessary to disconnect the motherboard/NIC from power in order to
fully clear the global state stored in the device as it is retained
when the system is in standby.

What I want to verify is if the input mask that we have ran into is
the natural power-on input mask of if that is something that was
overridden by something else. The mask change I made should be reset
if the system loses power, and then it will either default back to the
value with the 6's if that is it's natural state, or it will match
what I had if it was not.

Other than that I really can't think up too much else. I suppose there
is the possibility of the NVM either setting up a DCB setting or
HREGION register causing an override that is limiting the queues to 1.
However, the likelihood of that should be really low.

- Alex


RE: [Intel-wired-lan] [PATCH][next] i40e: mark expected switch fall-through

2019-05-17 Thread Bowers, AndrewX
> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Gustavo A. R. Silva
> Sent: Wednesday, May 1, 2019 1:56 PM
> To: Kirsher, Jeffrey T ; David S. Miller
> 
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> ker...@vger.kernel.org; Kees Cook 
> Subject: [Intel-wired-lan] [PATCH][next] i40e: mark expected switch fall-
> through
> 
> In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
> we are expecting to fall through.
> 
> This patch fixes the following warning:
> 
> drivers/net/ethernet/intel/i40e/i40e_xsk.c: In function ‘i40e_run_xdp_zc’:
> drivers/net/ethernet/intel/i40e/i40e_xsk.c:217:3: warning: this statement
> may fall through [-Wimplicit-fallthrough=]
>bpf_warn_invalid_xdp_action(act);
>^~~~
> drivers/net/ethernet/intel/i40e/i40e_xsk.c:218:2: note: here
>   case XDP_ABORTED:
>   ^~~~
> 
> In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
> we are expecting to fall through.
> 
> This patch fixes the following warning:
> 
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 1 +
>  1 file changed, 1 insertion(+)

Tested-by: Andrew Bowers 




[PATCH 1/3] brcmfmac: re-enable command decode in sdio_aos for BRCM 4354

2019-05-17 Thread Douglas Anderson
In commit 29f6589140a1 ("brcmfmac: disable command decode in
sdio_aos") we disabled something called "command decode in sdio_aos"
for a whole bunch of Broadcom SDIO WiFi parts.

After that patch landed I find that my kernel log on
rk3288-veyron-minnie and rk3288-veyron-speedy is filled with:
  brcmfmac: brcmf_sdio_bus_sleep: error while changing bus sleep state -110

This seems to happen every time the Broadcom WiFi transitions out of
sleep mode.  Reverting the part of the commit that affects the WiFi on
my boards fixes the problem for me, so that's what this patch does.

Note that, in general, the justification in the original commit seemed
a little weak.  It looked like someone was testing on a SD card
controller that would sometimes die if there were CRC errors on the
bus.  This used to happen back in early days of dw_mmc (the controller
on my boards), but we fixed it.  Disabling a feature on all boards
just because one SD card controller is broken seems bad.  ...so
instead of just this patch possibly the right thing to do is to fully
revert the original commit.

Fixes: 29f6589140a1 ("brcmfmac: disable command decode in sdio_aos")
Signed-off-by: Douglas Anderson 
---

 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 22b73da42822..3fd2d58a3c88 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -3378,8 +3378,7 @@ static bool brcmf_sdio_aos_no_decode(struct brcmf_sdio 
*bus)
if (bus->ci->chip == CY_CC_43012_CHIP_ID ||
bus->ci->chip == CY_CC_4373_CHIP_ID ||
bus->ci->chip == BRCM_CC_4339_CHIP_ID ||
-   bus->ci->chip == BRCM_CC_4345_CHIP_ID ||
-   bus->ci->chip == BRCM_CC_4354_CHIP_ID)
+   bus->ci->chip == BRCM_CC_4345_CHIP_ID)
return true;
else
return false;
-- 
2.21.0.1020.gf2820cf01a-goog



[PATCH 0/3] brcmfmac: sdio: Deal better w/ transmission errors waking from sleep

2019-05-17 Thread Douglas Anderson
This series attempts to deal better with the expected transmission
errors that we get when waking up the SDIO-based WiFi on
rk3288-veyron-minnie, rk3288-veyron-speedy, and rk3288-veyron-mickey.

Some details about those errors can be found in
, but to summarize it here: if we try to
send the wakeup command to the WiFi card at the same time it has
decided to wake up itself then it will behave badly on the SDIO bus.
This can cause timeouts or CRC errors.

When I tested on 4.19 and 4.20 these CRC errors can be seen to cause
re-tuning.  Since I am currently developing on 4.19 this was the
original problem I attempted to solve.

On mainline it turns out that you don't see the retuning errors but
you see tons of spam about timeouts trying to wakeup from sleep.  I
tracked down the commit that was causing that and have partially
reverted it here.  I have no real knowledge about Broadcom WiFi, but
the commit that was causing problems sounds (from the descriptioin) to
be a hack commit penalizing all Broadcom WiFi users because of a bug
in a Cypress SD controller.  I will let others comment if this is
truly the case and, if so, what the right solution should be.


Douglas Anderson (3):
  brcmfmac: re-enable command decode in sdio_aos for BRCM 4354
  mmc: core: API for temporarily disabling auto-retuning due to errors
  brcmfmac: sdio: Disable auto-tuning around commands expected to fail

 drivers/mmc/core/core.c   | 27 +--
 .../broadcom/brcm80211/brcmfmac/sdio.c|  6 +++--
 include/linux/mmc/core.h  |  2 ++
 include/linux/mmc/host.h  |  1 +
 4 files changed, 32 insertions(+), 4 deletions(-)

-- 
2.21.0.1020.gf2820cf01a-goog



[PATCH 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail

2019-05-17 Thread Douglas Anderson
There are certain cases, notably when transitioning between sleep and
active state, when Broadcom SDIO WiFi cards will produce errors on the
SDIO bus.  This is evident from the source code where you can see that
we try commands in a loop until we either get success or we've tried
too many times.  The comment in the code reinforces this by saying
"just one write attempt may fail"

Unfortunately these failures sometimes end up causing an "-EILSEQ"
back to the core which triggers a retuning of the SDIO card and that
blocks all traffic to the card until it's done.

Let's disable retuning around the commands we expect might fail.

Fixes: bd11e8bd03ca ("mmc: core: Flag re-tuning is needed on CRC errors")
Signed-off-by: Douglas Anderson 
---

 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 3fd2d58a3c88..c09bb8965487 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -708,6 +709,7 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
bmask = SBSDIO_FUNC1_SLEEPCSR_KSO_MASK;
}
 
+   mmc_expect_errors_begin(bus->sdiodev->func1->card->host);
do {
/* reliable KSO bit set/clr:
 * the sdiod sleep write access is synced to PMU 32khz clk
@@ -730,6 +732,7 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
   &err);
 
} while (try_cnt++ < MAX_KSO_ATTEMPTS);
+   mmc_expect_errors_end(bus->sdiodev->func1->card->host);
 
if (try_cnt > 2)
brcmf_dbg(SDIO, "try_cnt=%d rd_val=0x%x err=%d\n", try_cnt,
-- 
2.21.0.1020.gf2820cf01a-goog



pull-request: bpf 2019-05-18

2019-05-17 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix bpftool's raw BTF dump in relation to forward declarations of union/
   structs, and another fix to unexport logging helpers, from Andrii.

2) Fix inode permission check for retrieving bpf programs, from Chenbo.

3) Fix bpftool to raise rlimit earlier as otherwise libbpf's feature probing
   can fail and subsequently it refuses to load an object, from Yonghong.

4) Fix declaration of bpf_get_current_task() in kselftests, from Alexei.

5) Fix up BPF kselftest .gitignore to add generated files, from Stanislav.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 2407a88a13a2d03ea9b8c86bbdedb3eff80c4b9e:

  Merge branch 'rhashtable-Fix-sparse-warnings' (2019-05-16 09:45:20 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 9c3ddee1246411a3c9c39bfa5457e49579027f0c:

  bpftool: fix BTF raw dump of FWD's fwd_kind (2019-05-17 14:21:29 +0200)


Alexei Starovoitov (1):
  selftests/bpf: fix bpf_get_current_task

Andrii Nakryiko (2):
  libbpf: move logging helpers into libbpf_internal.h
  bpftool: fix BTF raw dump of FWD's fwd_kind

Chenbo Feng (1):
  bpf: relax inode permission check for retrieving bpf program

Stanislav Fomichev (1):
  selftests/bpf: add test_sysctl and map_tests/tests.h to .gitignore

Yonghong Song (1):
  tools/bpftool: move set_max_rlimit() before __bpf_object__open_xattr()

 kernel/bpf/inode.c   |  2 +-
 tools/bpf/bpftool/btf.c  |  4 ++--
 tools/bpf/bpftool/prog.c |  4 ++--
 tools/lib/bpf/btf.c  |  2 +-
 tools/lib/bpf/libbpf.c   |  1 -
 tools/lib/bpf/libbpf_internal.h  | 13 +
 tools/lib/bpf/libbpf_util.h  | 13 -
 tools/lib/bpf/xsk.c  |  2 +-
 tools/testing/selftests/bpf/.gitignore   |  1 +
 tools/testing/selftests/bpf/bpf_helpers.h|  2 +-
 tools/testing/selftests/bpf/map_tests/.gitignore |  1 +
 11 files changed, 23 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/map_tests/.gitignore


Re: pull-request: bpf 2019-05-18

2019-05-17 Thread David Miller
From: Daniel Borkmann 
Date: Sat, 18 May 2019 01:14:00 +0200

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Fix bpftool's raw BTF dump in relation to forward declarations of union/
>structs, and another fix to unexport logging helpers, from Andrii.
> 
> 2) Fix inode permission check for retrieving bpf programs, from Chenbo.
> 
> 3) Fix bpftool to raise rlimit earlier as otherwise libbpf's feature probing
>can fail and subsequently it refuses to load an object, from Yonghong.
> 
> 4) Fix declaration of bpf_get_current_task() in kselftests, from Alexei.
> 
> 5) Fix up BPF kselftest .gitignore to add generated files, from Stanislav.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Daniel.


[PATCH V5] net: phy: tja11xx: Add TJA11xx PHY driver

2019-05-17 Thread Marek Vasut
Add driver for the NXP TJA1100 and TJA1101 PHYs. These PHYs are special
BroadRReach 100BaseT1 PHYs used in automotive.

Signed-off-by: Marek Vasut 
Cc: Andrew Lunn 
Cc: Florian Fainelli 
Cc: Guenter Roeck 
Cc: Heiner Kallweit 
Cc: Jean Delvare 
Cc: linux-hw...@vger.kernel.org
---
V2: - Use phy_modify(), phy_{set,clear}_bits()
- Drop enable argument of tja11xx_enable_link_control()
- Use PHY_BASIC_T1_FEATURES and dont modify supported/advertised
  features in config_init callback
- Use genphy_soft_reset() instead of opencoding the reset sequence.
- Drop the aneg parts, since the PHY datasheet claims it does not
  support aneg
V3: - Replace clr with mask
- Add hwmon support
- Check commstat in tja11xx_read_status() only if link is up
- Use PHY_ID_MATCH_MODEL()
V4: - Use correct bit in tja11xx_hwmon_read() hwmon_temp_crit_alarm
- Use ENOMEM if devm_kstrdup() fails
- Check $type in tja11xx_hwmon_read() in addition to $attr
V5: - Drop assignment of phydev->irq,pause,asym_pause
---
 drivers/net/phy/Kconfig   |   6 +
 drivers/net/phy/Makefile  |   1 +
 drivers/net/phy/nxp-tja11xx.c | 423 ++
 3 files changed, 430 insertions(+)
 create mode 100644 drivers/net/phy/nxp-tja11xx.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index d6299710d634..31478d8b3c0c 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -415,6 +415,12 @@ config NATIONAL_PHY
---help---
  Currently supports the DP83865 PHY.
 
+config NXP_TJA11XX_PHY
+   tristate "NXP TJA11xx PHYs support"
+   depends on HWMON
+   ---help---
+ Currently supports the NXP TJA1100 and TJA1101 PHY.
+
 config QSEMI_PHY
tristate "Quality Semiconductor PHYs"
---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 27d7f9f3b0de..bac339e09042 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_MICROCHIP_PHY)   += microchip.o
 obj-$(CONFIG_MICROCHIP_T1_PHY) += microchip_t1.o
 obj-$(CONFIG_MICROSEMI_PHY)+= mscc.o
 obj-$(CONFIG_NATIONAL_PHY) += national.o
+obj-$(CONFIG_NXP_TJA11XX_PHY)  += nxp-tja11xx.o
 obj-$(CONFIG_QSEMI_PHY)+= qsemi.o
 obj-$(CONFIG_REALTEK_PHY)  += realtek.o
 obj-$(CONFIG_RENESAS_PHY)  += uPD60620.o
diff --git a/drivers/net/phy/nxp-tja11xx.c b/drivers/net/phy/nxp-tja11xx.c
new file mode 100644
index ..11b8701e78fd
--- /dev/null
+++ b/drivers/net/phy/nxp-tja11xx.c
@@ -0,0 +1,423 @@
+// SPDX-License-Identifier: GPL-2.0
+/* NXP TJA1100 BroadRReach PHY driver
+ *
+ * Copyright (C) 2018 Marek Vasut 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PHY_ID_MASK0xfff0
+#define PHY_ID_TJA1100 0x0180dc40
+#define PHY_ID_TJA1101 0x0180dd00
+
+#define MII_ECTRL  17
+#define MII_ECTRL_LINK_CONTROL BIT(15)
+#define MII_ECTRL_POWER_MODE_MASK  GENMASK(14, 11)
+#define MII_ECTRL_POWER_MODE_NO_CHANGE (0x0 << 11)
+#define MII_ECTRL_POWER_MODE_NORMAL(0x3 << 11)
+#define MII_ECTRL_POWER_MODE_STANDBY   (0xc << 11)
+#define MII_ECTRL_CONFIG_ENBIT(2)
+#define MII_ECTRL_WAKE_REQUEST BIT(0)
+
+#define MII_CFG1   18
+#define MII_CFG1_AUTO_OP   BIT(14)
+#define MII_CFG1_SLEEP_CONFIRM BIT(6)
+#define MII_CFG1_LED_MODE_MASK GENMASK(5, 4)
+#define MII_CFG1_LED_MODE_LINKUP   0
+#define MII_CFG1_LED_ENABLEBIT(3)
+
+#define MII_CFG2   19
+#define MII_CFG2_SLEEP_REQUEST_TO  GENMASK(1, 0)
+#define MII_CFG2_SLEEP_REQUEST_TO_16MS 0x3
+
+#define MII_INTSRC 21
+#define MII_INTSRC_TEMP_ERRBIT(1)
+#define MII_INTSRC_UV_ERR  BIT(3)
+
+#define MII_COMMSTAT   23
+#define MII_COMMSTAT_LINK_UP   BIT(15)
+
+#define MII_GENSTAT24
+#define MII_GENSTAT_PLL_LOCKED BIT(14)
+
+#define MII_COMMCFG27
+#define MII_COMMCFG_AUTO_OPBIT(15)
+
+struct tja11xx_priv {
+   char*hwmon_name;
+   struct device   *hwmon_dev;
+};
+
+struct tja11xx_phy_stats {
+   const char  *string;
+   u8  reg;
+   u8  off;
+   u16 mask;
+};
+
+static struct tja11xx_phy_stats tja11xx_hw_stats[] = {
+   { "phy_symbol_error_count", 20, 0, GENMASK(15, 0) },
+   { "phy_polarity_detect", 25, 6, BIT(6) },
+   { "phy_open_detect", 25, 7, BIT(7) },
+   { "phy_short_detect", 25, 8, BIT(8) },
+   { "phy_rem_rcvr_count", 26, 0, GENMASK(7, 0) },
+   { "phy_loc_rcvr_count", 26, 8, GENMASK(15, 8) },
+};
+
+static int tja11xx_check(struct phy_device *phydev, u8 reg, u16 mask, u16 set)
+{
+   int i, ret;
+
+   for (i = 0; i < 200; i++) {
+   ret = phy_read(phydev, reg);
+ 

[PATCH 2/5] libbpf: add missing typedef

2019-05-17 Thread Matteo Croce
Sync tools/include/linux/types.h with the UAPI one to fix this build error:

make -C samples/bpf/../../tools/lib/bpf/ RM='rm -rf' LDFLAGS= 
srctree=samples/bpf/../../ O=
  HOSTCC  samples/bpf/sock_example
In file included from samples/bpf/sock_example.c:27:
/usr/include/linux/ip.h:102:2: error: unknown type name ‘__sum16’
  102 |  __sum16 check;
  |  ^~~
make[2]: *** [scripts/Makefile.host:92: samples/bpf/sock_example] Error 1
make[1]: *** [Makefile:1763: samples/bpf/] Error 2

Signed-off-by: Matteo Croce 
---
 tools/include/linux/types.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/include/linux/types.h b/tools/include/linux/types.h
index 154eb4e3ca7c..5266dbfee945 100644
--- a/tools/include/linux/types.h
+++ b/tools/include/linux/types.h
@@ -58,6 +58,9 @@ typedef __u32 __bitwise __be32;
 typedef __u64 __bitwise __le64;
 typedef __u64 __bitwise __be64;
 
+typedef __u16 __bitwise __sum16;
+typedef __u32 __bitwise __wsum;
+
 typedef struct {
int counter;
 } atomic_t;
-- 
2.21.0



[PATCH 3/5] samples/bpf: fix xdpsock_user build error

2019-05-17 Thread Matteo Croce
Remove duplicate typedef, and use PRIu64 to be both 32 and 64 bit aware.
Fix the following error:

samples/bpf/xdpsock_user.c:52:15: error: conflicting types for ‘u64’
   52 | typedef __u64 u64;
  |   ^~~
In file included from ./tools/include/linux/compiler.h:87,
 from ./tools/include/asm/barrier.h:2,
 from samples/bpf/xdpsock_user.c:4:
./tools/include/linux/types.h:30:18: note: previous declaration of ‘u64’ was 
here
   30 | typedef uint64_t u64;
  |  ^~~
make[2]: *** [scripts/Makefile.host:109: samples/bpf/xdpsock_user.o] Error 1
make[1]: *** [Makefile:1763: samples/bpf/] Error 2

Signed-off-by: Matteo Croce 
---
 samples/bpf/xdpsock_user.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index d08ee1ab7bb4..a4cd42c2f0b0 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -49,9 +50,6 @@
 #define DEBUG_HEXDUMP 0
 #define MAX_SOCKS 8
 
-typedef __u64 u64;
-typedef __u32 u32;
-
 static unsigned long prev_time;
 
 enum benchmark_type {
@@ -243,7 +241,7 @@ static void hex_dump(void *pkt, size_t length, u64 addr)
if (!DEBUG_HEXDUMP)
return;
 
-   sprintf(buf, "addr=%llu", addr);
+   sprintf(buf, "addr=%" PRIu64, addr);
printf("length = %zu\n", length);
printf("%s | ", buf);
while (length-- > 0) {
-- 
2.21.0



  1   2   >