Re: [PATCH net-next 1/2] can-isotp: implement cleanups / improvements from review
On 10/10/20 10:49 PM, Oliver Hartkopp wrote: > As pointed out by Jakub Kicinski here: > https://marc.info/?l=linux-can&m=160229286216008 > this patch addresses the remarked issues: > > - remove empty lines in comment > - remove default=y for CAN_ISOTP in Kconfig > - make use of pr_notice_once() > - use GFP_KERNEL instead of gfp_any() in soft hrtimer context > > The version strings in the CAN subsystem are removed by a separate patch. > > Signed-off-by: Oliver Hartkopp > --- > include/uapi/linux/can/isotp.h | 4 +--- > net/can/Kconfig| 3 ++- > net/can/isotp.c| 14 +++--- > 3 files changed, 10 insertions(+), 11 deletions(-) > > diff --git a/include/uapi/linux/can/isotp.h b/include/uapi/linux/can/isotp.h > index 553006509f4e..accf0efa46f4 100644 > --- a/include/uapi/linux/can/isotp.h > +++ b/include/uapi/linux/can/isotp.h > @@ -151,8 +151,7 @@ struct can_isotp_ll_options { > #define CAN_ISOTP_DEFAULT_LL_TX_DL CAN_MAX_DLEN > #define CAN_ISOTP_DEFAULT_LL_TX_FLAGS0 > > -/* > - * Remark on CAN_ISOTP_DEFAULT_RECV_* values: > +/* Remark on CAN_ISOTP_DEFAULT_RECV_* values: I think for uapi headers we use the default commenting style, not the netdev one. > * > * We can strongly assume, that the Linux Kernel implementation of > * CAN_ISOTP is capable to run with BS=0, STmin=0 and WFTmax=0. > @@ -160,7 +159,6 @@ struct can_isotp_ll_options { > * these default settings can be changed via sockopts. > * For that reason the STmin value is intentionally _not_ checked for > * consistency and copied directly into the flow control (FC) frame. > - * > */ > > #endif /* !_UAPI_CAN_ISOTP_H */ > diff --git a/net/can/Kconfig b/net/can/Kconfig > index 021fe03a8ed6..224e5e0283a9 100644 > --- a/net/can/Kconfig > +++ b/net/can/Kconfig > @@ -57,7 +57,6 @@ source "net/can/j1939/Kconfig" > > config CAN_ISOTP > tristate "ISO 15765-2:2016 CAN transport protocol" > - default y > help > CAN Transport Protocols offer support for segmented Point-to-Point > communication between CAN nodes via two defined CAN Identifiers. > @@ -67,6 +66,8 @@ config CAN_ISOTP > vehicle diagnosis (UDS, ISO 14229) or IP-over-CAN traffic. > This protocol driver implements data transfers according to > ISO 15765-2:2016 for 'classic' CAN and CAN FD frame types. > + If you want to perform automotive vehicle diagnostic services (UDS), > + say 'y'. > > source "drivers/net/can/Kconfig" > > diff --git a/net/can/isotp.c b/net/can/isotp.c > index e6ff032b5426..bc3a722c200b 100644 > --- a/net/can/isotp.c > +++ b/net/can/isotp.c > @@ -222,8 +222,8 @@ static int isotp_send_fc(struct sock *sk, int ae, u8 > flowstatus) > > can_send_ret = can_send(nskb, 1); > if (can_send_ret) > - printk_once(KERN_NOTICE "can-isotp: %s: can_send_ret %d\n", > - __func__, can_send_ret); > + pr_notice_once("can-isotp: %s: can_send_ret %d\n", > +__func__, can_send_ret); please define a pr_fmt for the "can-isotp: " prefix. > > dev_put(dev); > > @@ -769,7 +769,7 @@ static enum hrtimer_restart isotp_tx_timer_handler(struct > hrtimer *hrtimer) > > isotp_tx_burst: > skb = alloc_skb(so->ll.mtu + sizeof(struct can_skb_priv), > - gfp_any()); > + GFP_KERNEL); > if (!skb) { > dev_put(dev); > break; > @@ -798,8 +798,8 @@ static enum hrtimer_restart isotp_tx_timer_handler(struct > hrtimer *hrtimer) > > can_send_ret = can_send(skb, 1); > if (can_send_ret) > - printk_once(KERN_NOTICE "can-isotp: %s: can_send_ret > %d\n", > - __func__, can_send_ret); > + pr_notice_once("can-isotp: %s: can_send_ret %d\n", > +__func__, can_send_ret); > > if (so->tx.idx >= so->tx.len) { > /* we are done */ > @@ -942,8 +942,8 @@ static int isotp_sendmsg(struct socket *sock, struct > msghdr *msg, size_t size) > err = can_send(skb, 1); > dev_put(dev); > if (err) { > - printk_once(KERN_NOTICE "can-isotp: %s: can_send_ret %d\n", > - __func__, err); > + pr_notice_once("can-isotp: %s: can_send_ret %d\n", > +__func__, err); > return err; > } > > Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | signature.asc Description: OpenPGP digital signature
Re: pull-request: can 2020-10-08
On 10/10/20 7:32 PM, Jakub Kicinski wrote: > On Thu, 8 Oct 2020 23:40:19 +0200 Marc Kleine-Budde wrote: >> The first patch is part of my pull request >> "linux-can-fixes-for-5.9-20201006", >> so consider that one obsolete and take this instead. >> >> The first patch is by Lucas Stach and fixes m_can driver by removing an >> erroneous call to m_can_class_suspend() in runtime suspend. Which causes the >> pinctrl state to get stuck on the "sleep" state, which breaks all CAN >> functionality on SoCs where this state is defined. >> >> The last two patches target the j1939 protocol: Cong Wang fixes a syzbot >> finding of an uninitialized variable in the j1939 transport protocol. I >> contribute a patch, that fixes the initialization of a same uninitialized >> variable in a different function. > > Pulled, thanks! > > Since we missed 5.9 would you like me to queue these up for stable? Yes. tnx, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | signature.asc Description: OpenPGP digital signature
Re: [PATCH net-next 2/2] can: remove obsolete version strings
On 10/10/20 10:49 PM, Oliver Hartkopp wrote: > As pointed out by Jakub Kicinski here: > https://marc.info/?l=linux-can&m=160229286216008 Better use https://lore.kernel.org for this. The URL is constructed by adding "/r/$MESSAGE_ID": http://lore.kernel.org/r/20201009175751.5c540...@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com > this patch removes the obsolete version information of the different > CAN protocols and the AF_CAN core module. > > Signed-off-by: Oliver Hartkopp Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | signature.asc Description: OpenPGP digital signature
KASAN: use-after-free Read in sco_chan_del
Hello, syzbot found the following issue on: HEAD commit:a804ab08 Add linux-next specific files for 20201006 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=1073270b90 kernel config: https://syzkaller.appspot.com/x/.config?x=26c1b4cc4a62ccb dashboard link: https://syzkaller.appspot.com/bug?extid=1df6a63e69a359c8b517 compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+1df6a63e69a359c8b...@syzkaller.appspotmail.com == BUG: KASAN: use-after-free in hci_conn_drop include/net/bluetooth/hci_core.h:1145 [inline] BUG: KASAN: use-after-free in hci_conn_drop include/net/bluetooth/hci_core.h:1115 [inline] BUG: KASAN: use-after-free in sco_chan_del+0x400/0x430 net/bluetooth/sco.c:149 Read of size 8 at addr 88804d29c918 by task syz-executor.2/27575 CPU: 0 PID: 27575 Comm: syz-executor.2 Not tainted 5.9.0-rc8-next-20201006-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fb lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562 hci_conn_drop include/net/bluetooth/hci_core.h:1145 [inline] hci_conn_drop include/net/bluetooth/hci_core.h:1115 [inline] sco_chan_del+0x400/0x430 net/bluetooth/sco.c:149 __sco_sock_close+0x16e/0x5b0 net/bluetooth/sco.c:434 sco_sock_close net/bluetooth/sco.c:448 [inline] sco_sock_release+0x69/0x290 net/bluetooth/sco.c:1059 __sock_release+0xcd/0x280 net/socket.c:596 sock_close+0x18/0x20 net/socket.c:1277 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 get_signal+0xd89/0x1f00 kernel/signal.c:2561 arch_do_signal+0x82/0x2470 arch/x86/kernel/signal.c:811 exit_to_user_mode_loop kernel/entry/common.c:161 [inline] exit_to_user_mode_prepare+0x194/0x1f0 kernel/entry/common.c:192 syscall_exit_to_user_mode+0x7a/0x2c0 kernel/entry/common.c:267 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45de29 Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7fabeda51c78 EFLAGS: 0246 ORIG_RAX: 002a RAX: fffc RBX: 2200 RCX: 0045de29 RDX: 0008 RSI: 2080 RDI: 0006 RBP: 0118c158 R08: R09: R10: R11: 0246 R12: 0118c124 R13: 7ffdb3c9529f R14: 7fabeda529c0 R15: 0118c124 Allocated by task 27575: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 kmem_cache_alloc_trace+0x1a0/0x480 mm/slab.c:3552 kmalloc include/linux/slab.h:554 [inline] kzalloc include/linux/slab.h:666 [inline] hci_conn_add+0x53/0x1330 net/bluetooth/hci_conn.c:525 hci_connect_sco+0x356/0x860 net/bluetooth/hci_conn.c:1283 sco_connect net/bluetooth/sco.c:241 [inline] sco_sock_connect+0x308/0x980 net/bluetooth/sco.c:588 __sys_connect_file+0x155/0x1a0 net/socket.c:1852 __sys_connect+0x161/0x190 net/socket.c:1869 __do_sys_connect net/socket.c:1879 [inline] __se_sys_connect net/socket.c:1876 [inline] __x64_sys_connect+0x6f/0xb0 net/socket.c:1876 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 26665: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3420 [inline] kfree+0x10e/0x2a0 mm/slab.c:3758 device_release+0x9f/0x240 drivers/base/core.c:1808 kobject_cleanup lib/kobject.c:705 [inline] kobject_release lib/kobject.c:736 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x171/0x270 lib/kobject.c:753 put_device+0x1b/0x30 drivers/base/core.c:3037 hci_conn_del+0x27e/0x6a0 net/bluetooth/hci_conn.c:645 hci_conn_hash_flush+0x189/0x220 net/bluetooth/hci_conn.c:1558 hci_dev_do_close+0x5c6/0x1080 net/bluetooth/hci_core.c:1770 hci_unregister_dev+0x214/0xe90 net/bluetooth/hci_core.c:3827 vhci_release+0x70/0xe0 drivers/bluetooth/hci_vhci.c:340 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 exit_task_work include/linux/task_work.h:25 [inline] do_exit+0xb23/0x2930 kernel/exit.c:806 do_group_exit+0x125/0x310 kernel/exit.c:903 get_signal+0x428/0x1f00 kernel/signal.c:2757 arch_do_signal+0x82/0x2470 arch/x86/kernel/signal.c:811 exit
Re: [PATCH nf-next v3 3/3] netfilter: Introduce egress hook
On Tue, Sep 08, 2020 at 02:55:36PM +0200, Daniel Borkmann wrote: > I would strongly prefer something where nf integrates into existing > tc hook, not only due to the hook reuse which would be better, > but also to allow for a more flexible interaction between tc/BPF > use cases [...] > one option to move forward [...] overall rework of ingress/egress > side to be a more flexible pipeline (think of cont/ok actions > as with tc filters or stackable LSMs to process & delegate). Interaction between netfilter and tc is facilitated by skb->mark. Both netfilter and tc are able to set and match by way of the mark. E.g. a netfilter hook may set the mark and tc may later perform an action if a matching mark is found. Because the placement of netfilter and tc hooks in the data path has been unchanged for decades, we must assume that users depend on their order for setting and matching the mark. Thus, reworking the data path in the way you suggest (a flexible pipeline) must not change the order of the hooks. It would have to be a fixed pipeline. But what's the benefit then compared to separate netfilter and tc hooks which are patched in at runtime and become NOPs if not used? (Which is what the present series is aiming for.) > to name one example... consider two different entities in the system > setting up the two, that is, one adding rules for nf ingress/egress > on the phys device for host fw and the other one for routing traffic > into/from containers at the tc layer, then traffic going into host ns > will hit nf ingress and on egress side the nf egress part; however, > traffic going to containers via existing tc redirect will not see the > nf ingress as expected but would on reverse path incorrectly > hit the nf egress one which is /not/ the case for dev_queue_xmit() today. Using tc to bounce ingress traffic into a container -- is that actually a thing or is it a hypothetical example? I think at least Docker uses plain vanilla routing and bridging to move packets in and out of containers. However you're right that if tc *is* used to redirect ingress packets to a container veth, then the data path would look like: host tc -> container tc -> container nft Whereas the egress data path would look like: container nft -> container tc -> host nft -> host tc But I argue that the egress data path is actually correct because the host must be able to firewall packets coming out of the container in case the container has been compromised. > And if you check a typical DHCP client that is present on major > modern distros like systemd-networkd's DHCP client then they > already implement filtering of malicious packets via BPF at > socket layer including checking for cookies in the DHCP header > that are set by the application itself to prevent spoofing [0]. > > [0] > https://github.com/systemd/systemd/blob/master/src/libsystemd-network/dhcp-network.c#L28 That's an *ingress* filter so that user space only receives DHCP packets and nothing else. We're talking about the ability to filter *egress* DHCP packets (among others) at the kernel level to guard against unwanted packets coming from user space. Thanks, Lukas
Re: [PATCH net-next 1/2] dt-bindings: net: dsa: b53: Add YAML bindings
On Sat Oct 10 2020, Florian Fainelli wrote: > On 10/10/2020 9:46 AM, Kurt Kanzenbach wrote: >> Convert the b53 DSA device tree bindings to YAML in order to allow >> for automatic checking and such. >> >> Suggested-by: Florian Fainelli >> Signed-off-by: Kurt Kanzenbach > > Thanks for making this change, there are quite a few warnings that are > going to show up because the binding was defined in a way that it would > define chip compatible strings, which not all DTS files are using. Oh, I didn't know there is a second make command for doing the actual check against the dtbs. I've just used `make dt_binding_check'. So, it seems like a lot of the errors are caused by the include files such as [linux]/arch/arm/boot/dts/bcm5301x.dtsi srab: srab@18007000 { compatible = "brcm,bcm5301x-srab"; reg = <0x18007000 0x1000>; status = "disabled"; /* ports are defined in board DTS */ }; The nodename should be "switch" not "srab" as enforced by dsa.yaml. Furthermore, some DTS files are not adding the chip specific compatible strings and the ports leading to more errors. There are also some minor errors regarding the reg-names and such for specific instances. How should we proceed? Adding the missing compatible strings and ports to the DTS files? Or adjusting the include files? > I don't know if Rob would be comfortable with taking this until we > resolve all warnings first. Probably not. We should fix the existing device trees first. Thanks, Kurt signature.asc Description: PGP signature
Re: [PATCH nf-next v3 3/3] netfilter: Introduce egress hook
On Tue, Sep 15, 2020 at 12:02:03AM +0200, Daniel Borkmann wrote: > today it is possible and > perfectly fine to e.g. redirect to a host-facing veth from tc ingress which > then goes into container. Only traffic that goes up the host stack is seen > by nf ingress hook in that case. Likewise, reply traffic can be redirected > from host-facing veth to phys dev for xmit w/o any netfilter interference. > This means netfilter in host ns really only sees traffic to/from host as > intended. This is fine today, however, if 3rd party entities (e.g. distro > side) start pushing down rules on the two nf hooks, then these use cases will > break on the egress one due to this asymmetric layering violation. Hence my > ask that this needs to be configurable from a control plane perspective so > that both use cases can live next to each other w/o breakage. Most trivial > one I can think of is (aside from the fact to refactor the hooks and improve > their performance) a flag e.g. for skb that can be set from tc/BPF layer to > bypass the nf hooks. Basically a flexible opt-in so that existing use-cases > can be retained w/o breakage. I argue that being able to filter traffic coming out of the container is desirable because why should the host trust the software running in the container to never send malicious packets. As for the flag you're asking for, it already exists in the form of skb->mark. Just let tc set the mark when the packet exits the container and add a netfilter rule to accept packets carrying that mark. Or do not set any netfilter egress rules at all to disable the egress filtering and avoid the performance impact it would imply. Thanks, Lukas
[PATCH] Add documentation of fiter syntax to ss manpage
Since the documentation currently referenced in the manpage no longer exists. --- man/man8/ss.8 | 75 ++- 1 file changed, 74 insertions(+), 1 deletion(-) diff --git a/man/man8/ss.8 b/man/man8/ss.8 index 3b2559ff..f9e629f6 100644 --- a/man/man8/ss.8 +++ b/man/man8/ss.8 @@ -401,7 +401,7 @@ Read filter information from FILE. Each line of FILE is interpreted like single command line option. If FILE is - stdin is used. .TP .B FILTER := [ state STATE-FILTER ] [ EXPRESSION ] -Please take a look at the official documentation for details regarding filters. +See below an explanation of STATE-FILTER and EXPRESSION. .SH STATE-FILTER @@ -437,6 +437,79 @@ states except for - opposite to .B bucket +.SH EXPRESSION + +The following simple expressions are supported: + +.TP +.RB { \ src \ | \ dst \ } \ = \ \fR[\fIFAMILY\fB:\fR][\fIADDRESS\fR][\fB:\fIPORT\fR] +Matches if the source or destination matches the host condition. +Providing FAMILY is equivalent to passing the family with the -f option. +ADDRESS and PORT are the family specific address (or hostname) and port (or +service name) to match against. At least one of ADDRESS and PORT should be +provided. Additionally, "*" may be used as a wildcard for either ADDRESS or +PORT. Note that for some families, PORT is meaningless. + +For inet and inet6 addresses, if the address is numeric (not a hostname) a +bitmask can be provided in CIDR notation (ex. 127.0.0.0/16) to match a range of +addresses. If the address is provided as a hostname, all addresses returned by +DNS for that hostname will match. The inet or inet6 address may be enclosed in +"[" and "]". +.TP +.RB { \ sport \ | \ dport \ } "\fI OP \fR[\fIFAMILY\fB:\fR][\fB:\fR]\fIPORT" +Matches if the source or destination port matches the comparison with the +supplied port. FAMILY and PORT are the same as above. OP can be any of "=", +"!=", "<", ">", "<=", or ">=". +.TP +.BR dev \ { \ = \ | \ != \ } \fI\ DEV +Matches if it is on the specified device (or not). The device can be specified +either by name or by index. +.TP +.BR fwmark \ { \ = \ | \ != \ } \ \fIMARK-MASK +Matches if the firewall mark matches the supplied mask (or not). The mask should +be specified as an integer value optionally followed by a "/" and an integer +mask. The integer may be hex-encoded if it begins with "0x" or "0X". +.TP +.BR cgroup \ { \ = \ | \ != \ } \ \fICGROUP +Matches if it is part of the cgroup (or not). +.B CGROUP +should be the path for the desired cgroup. +.TP +.B autobound +Matches if the local port is automatically bound (randomly assigned). + +.PP +Each operator has equivalent aliases as follows: +.IP +"=" can be replaced with "==" or "eq" +.IP +"!=" can be replaced with "ne" or "neq" +.IP +">" can be replaced with "gt" +.IP +"<" can be replaced with "lt" +.IP +">=" can be replaced with "ge" or "geq" +.IP +"<=" can be replaced with "le" or "leq" + +Subexpressions can be combined into more complex expressions in the following +ways: +.TP +.BI not \ EXPRESSION +Negate the EXPRESSION. "!" can be used in place of of "not". +.TP +\fI EXPRESSION EXPRESSION \fR| \fIEXPRESSION \fBand \fIEXPRESSION +Match only if both expressions match. "&" or "&&" can be used in place of "and". +.TP +.IB EXPRESSION \ or \ EXPRESSION +Match if either expression matches. "|" or "||" can be used in place of "or". +.TP +.BI ( \ EXPRESSION \ ) +Group EXPRESSION to change precedence of the above operators. The default +precedence is "not", "and", "or". + + .SH USAGE EXAMPLES .TP .B ss -t -a -- 2.28.0
[PATCH net-next 1/3] macb: add RM9200's interrupt flag TBRE
Transmit Buffer Register Empty replaces TXERR on RM9200 and signals the sender may try to send again becase the last queued frame is no longer in queue (being transmitted or already transmitted). Cc: Nicolas Ferre Cc: Claudiu Beznea Cc: Daniel Palmer Signed-off-by: Willy Tarreau --- drivers/net/ethernet/cadence/macb.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index 4f1b41569260..49d347429de8 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -365,6 +365,8 @@ #define MACB_ISR_RLE_SIZE 1 #define MACB_TXERR_OFFSET 6 /* EN TX frame corrupt from error interrupt */ #define MACB_TXERR_SIZE1 +#define MACB_RM9200_TBRE_OFFSET6 /* EN may send new frame interrupt (RM9200) */ +#define MACB_RM9200_TBRE_SIZE 1 #define MACB_TCOMP_OFFSET 7 /* Enable transmit complete interrupt */ #define MACB_TCOMP_SIZE1 #define MACB_ISR_LINK_OFFSET 9 /* Enable link change interrupt */ -- 2.28.0
[PATCH net-next 3/3] macb: support the two tx descriptors on at91rm9200
The at91rm9200 variant used by a few chips including the MSC313 supports two Tx descriptors (one frame being serialized and another one queued). However the driver only implemented a single one, which adds a dead time after each transfer to receive and process the interrupt and wake the queue up, preventing from reaching line rate. This patch implements a very basic 2-deep queue to address this limitation. The tests run on a Breadbee board equipped with an MSC313E show that at 1 GHz, HTTP traffic on medium-sized objects (45kB) was limited to exactly 50 Mbps before this patch, and jumped to 76 Mbps with this patch. And tests on a single TCP stream with an MTU of 576 jump from 10kpps to 15kpps. With 1500 byte packets it's now possible to reach line rate versus 75 Mbps before. Cc: Nicolas Ferre Cc: Claudiu Beznea Cc: Daniel Palmer Signed-off-by: Willy Tarreau --- drivers/net/ethernet/cadence/macb.h | 2 ++ drivers/net/ethernet/cadence/macb_main.c | 46 +++- 2 files changed, 40 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index e87db95fb0f6..f8133003981f 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -1208,6 +1208,8 @@ struct macb { /* AT91RM9200 transmit queue (1 on wire + 1 queued) */ struct macb_tx_skb rm9200_txq[2]; + unsigned intrm9200_tx_tail; + unsigned intrm9200_tx_len; unsigned intmax_tx_length; u64 ethtool_stats[GEM_STATS_LEN + QUEUE_STATS_LEN * MACB_MAX_QUEUES]; diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index ca6e5456906a..6ff8e4b0b95d 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -3909,6 +3909,7 @@ static int at91ether_start(struct macb *lp) MACB_BIT(ISR_TUND) | MACB_BIT(ISR_RLE) | MACB_BIT(TCOMP)| +MACB_BIT(RM9200_TBRE) | MACB_BIT(ISR_ROVR) | MACB_BIT(HRESP)); @@ -3925,6 +3926,7 @@ static void at91ether_stop(struct macb *lp) MACB_BIT(ISR_TUND) | MACB_BIT(ISR_RLE) | MACB_BIT(TCOMP)| +MACB_BIT(RM9200_TBRE) | MACB_BIT(ISR_ROVR) | MACB_BIT(HRESP)); @@ -3994,11 +3996,10 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct macb *lp = netdev_priv(dev); + unsigned long flags; - if (macb_readl(lp, TSR) & MACB_BIT(RM9200_BNQ)) { - int desc = 0; - - netif_stop_queue(dev); + if (lp->rm9200_tx_len < 2) { + int desc = lp->rm9200_tx_tail; /* Store packet information (to free when Tx completed) */ lp->rm9200_txq[desc].skb = skb; @@ -4012,6 +4013,15 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb, return NETDEV_TX_OK; } + spin_lock_irqsave(&lp->lock, flags); + + lp->rm9200_tx_tail = (desc + 1) & 1; + lp->rm9200_tx_len++; + if (lp->rm9200_tx_len > 1) + netif_stop_queue(dev); + + spin_unlock_irqrestore(&lp->lock, flags); + /* Set address of the data in the Transmit Address register */ macb_writel(lp, TAR, lp->rm9200_txq[desc].mapping); /* Set length of the packet in the Transmit Control register */ @@ -4077,6 +4087,8 @@ static irqreturn_t at91ether_interrupt(int irq, void *dev_id) struct macb *lp = netdev_priv(dev); u32 intstatus, ctl; unsigned int desc; + unsigned int qlen; + u32 tsr; /* MAC Interrupt Status register indicates what interrupts are pending. * It is automatically cleared once read. @@ -4088,21 +4100,39 @@ static irqreturn_t at91ether_interrupt(int irq, void *dev_id) at91ether_rx(dev); /* Transmit complete */ - if (intstatus & MACB_BIT(TCOMP)) { + if (intstatus & (MACB_BIT(TCOMP) | MACB_BIT(RM9200_TBRE))) { /* The TCOM bit is set even if the transmission failed */ if (intstatus & (MACB_BIT(ISR_TUND) | MACB_BIT(ISR_RLE))) dev->stats.tx_errors++; - desc = 0; - if (lp->rm9200_txq[desc].skb) { + spin_lock(&lp->lock); + + tsr = macb_readl(lp, TSR); + + /* we have three possibilities here: +* - all pending packets transmitted (TGO, implies BNQ) +
[PATCH net-next 0/3] macb: support the 2-deep Tx queue on at91
Hi, while running some tests on my Breadbee board, I noticed poor network Tx performance. I had a look at the driver (macb, at91ether variant) and noticed that at91ether_start_xmit() immediately stops the queue after sending a frame and waits for the interrupt to restart the queue, causing a dead time after each packet is sent. The AT91RM9200 datasheet states that the controller supports two frames, one being sent and the other one being queued, so I performed minimal changes to support this. The transmit performance on my board has increased by 50% on medium-sized packets (HTTP traffic), and with large packets I can now reach line rate. Since this driver is shared by various platforms, I tried my best to isolate and limit the changes as much as possible and I think it's pretty reasonable as-is. I've run extensive tests and couldn't meet any unexpected situation (no stall, overflow nor lockup). There are 3 patches in this series. The first one adds the missing interrupt flag for RM9200 (TBRE, indicating the tx buffer is willing to take a new packet). The second one replaces the single skb with a 2-array and uses only index 0. It does no other change, this is just to prepare the code for the third one. The third one implements the queue. Packets are added at the tail of the queue, the queue is stopped at 2 packets and the interrupt releases 0, 1 or 2 depending on what the transmit status register reports. Thanks, Willy Willy Tarreau (3): macb: add RM9200's interrupt flag TBRE macb: prepare at91 to use a 2-frame TX queue macb: support the two tx descriptors on at91rm9200 drivers/net/ethernet/cadence/macb.h | 10 ++-- drivers/net/ethernet/cadence/macb_main.c | 66 ++-- 2 files changed, 56 insertions(+), 20 deletions(-) -- 2.28.0
[PATCH net-next 2/3] macb: prepare at91 to use a 2-frame TX queue
The RM9200 supports one frame being sent while another one is waiting in queue. This avoids the dead time that follows the emission of a frame and which prevents one from reaching line speed. Right now the driver supports only a single skb, so we'll first replace the rm9200-specific skb info with an array of two macb_tx_skb (already used by other drivers). This patch only moves the skb_length to txq[0].size and skb_physaddr to skb[0].mapping but doesn't perform any other change. It already uses [desc] in order to minimize future changes. Cc: Nicolas Ferre Cc: Claudiu Beznea Cc: Daniel Palmer Signed-off-by: Willy Tarreau --- drivers/net/ethernet/cadence/macb.h | 6 ++--- drivers/net/ethernet/cadence/macb_main.c | 28 ++-- 2 files changed, 18 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index 49d347429de8..e87db95fb0f6 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -1206,10 +1206,8 @@ struct macb { phy_interface_t phy_interface; - /* AT91RM9200 transmit */ - struct sk_buff *skb;/* holds skb until xmit interrupt completes */ - dma_addr_t skb_physaddr;/* phys addr from pci_map_single */ - int skb_length; /* saved skb length for pci_unmap_single */ + /* AT91RM9200 transmit queue (1 on wire + 1 queued) */ + struct macb_tx_skb rm9200_txq[2]; unsigned intmax_tx_length; u64 ethtool_stats[GEM_STATS_LEN + QUEUE_STATS_LEN * MACB_MAX_QUEUES]; diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 9179f7b0b900..ca6e5456906a 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -3996,14 +3996,16 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb, struct macb *lp = netdev_priv(dev); if (macb_readl(lp, TSR) & MACB_BIT(RM9200_BNQ)) { + int desc = 0; + netif_stop_queue(dev); /* Store packet information (to free when Tx completed) */ - lp->skb = skb; - lp->skb_length = skb->len; - lp->skb_physaddr = dma_map_single(&lp->pdev->dev, skb->data, - skb->len, DMA_TO_DEVICE); - if (dma_mapping_error(&lp->pdev->dev, lp->skb_physaddr)) { + lp->rm9200_txq[desc].skb = skb; + lp->rm9200_txq[desc].size = skb->len; + lp->rm9200_txq[desc].mapping = dma_map_single(&lp->pdev->dev, skb->data, + skb->len, DMA_TO_DEVICE); + if (dma_mapping_error(&lp->pdev->dev, lp->rm9200_txq[desc].mapping)) { dev_kfree_skb_any(skb); dev->stats.tx_dropped++; netdev_err(dev, "%s: DMA mapping error\n", __func__); @@ -4011,7 +4013,7 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb, } /* Set address of the data in the Transmit Address register */ - macb_writel(lp, TAR, lp->skb_physaddr); + macb_writel(lp, TAR, lp->rm9200_txq[desc].mapping); /* Set length of the packet in the Transmit Control register */ macb_writel(lp, TCR, skb->len); @@ -4074,6 +4076,7 @@ static irqreturn_t at91ether_interrupt(int irq, void *dev_id) struct net_device *dev = dev_id; struct macb *lp = netdev_priv(dev); u32 intstatus, ctl; + unsigned int desc; /* MAC Interrupt Status register indicates what interrupts are pending. * It is automatically cleared once read. @@ -4090,13 +4093,14 @@ static irqreturn_t at91ether_interrupt(int irq, void *dev_id) if (intstatus & (MACB_BIT(ISR_TUND) | MACB_BIT(ISR_RLE))) dev->stats.tx_errors++; - if (lp->skb) { - dev_consume_skb_irq(lp->skb); - lp->skb = NULL; - dma_unmap_single(&lp->pdev->dev, lp->skb_physaddr, -lp->skb_length, DMA_TO_DEVICE); + desc = 0; + if (lp->rm9200_txq[desc].skb) { + dev_consume_skb_irq(lp->rm9200_txq[desc].skb); + lp->rm9200_txq[desc].skb = NULL; + dma_unmap_single(&lp->pdev->dev, lp->rm9200_txq[desc].mapping, +lp->rm9200_txq[desc].size, DMA_TO_DEVICE); dev->stats.tx_packets++; - dev->stats.tx_bytes += lp->skb_length; + dev->stats.tx_bytes += lp->rm9200_txq[desc].size; } netif_wake_queue(dev);
[PATCH net-next] net: mscc: ocelot: remove duplicate ocelot_port_dev_check
A helper for checking whether a net_device belongs to mscc_ocelot already existed and did not need to be rewritten. Use it. Fixes: 319e4dd11a20 ("net: mscc: ocelot: introduce conversion helpers between port and netdev") Signed-off-by: Vladimir Oltean --- drivers/net/ethernet/mscc/ocelot_net.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c index d3c03942546d..b34da11acf65 100644 --- a/drivers/net/ethernet/mscc/ocelot_net.c +++ b/drivers/net/ethernet/mscc/ocelot_net.c @@ -669,7 +669,8 @@ struct net_device *ocelot_port_to_netdev(struct ocelot *ocelot, int port) return priv->dev; } -static bool ocelot_port_dev_check(const struct net_device *dev) +/* Checks if the net_device instance given to us originates from our driver */ +static bool ocelot_netdevice_dev_check(const struct net_device *dev) { return dev->netdev_ops == &ocelot_port_netdev_ops; } @@ -678,7 +679,7 @@ int ocelot_netdev_to_port(struct net_device *dev) { struct ocelot_port_private *priv; - if (!dev || !ocelot_port_dev_check(dev)) + if (!dev || !ocelot_netdevice_dev_check(dev)) return -EINVAL; priv = netdev_priv(dev); @@ -907,12 +908,6 @@ static int ocelot_port_obj_del(struct net_device *dev, return ret; } -/* Checks if the net_device instance given to us originate from our driver. */ -static bool ocelot_netdevice_dev_check(const struct net_device *dev) -{ - return dev->netdev_ops == &ocelot_port_netdev_ops; -} - static int ocelot_netdevice_port_event(struct net_device *dev, unsigned long event, struct netdev_notifier_changeupper_info *info) -- 2.25.1
Re: [PATCH bpf-next v6 2/6] bpf: add redirect_peer helper
On Sun, 11 Oct 2020 01:40:02 +0200 Daniel Borkmann wrote: > Add an efficient ingress to ingress netns switch that can be used out of tc > BPF > programs in order to redirect traffic from host ns ingress into a container > veth device ingress without having to go via CPU backlog queue [0]. For local > containers this can also be utilized and path via CPU backlog queue only needs > to be taken once, not twice. On a high level this borrows from ipvlan which > does > similar switch in __netif_receive_skb_core() and then iterates via > another_round. > This helps to reduce latency for mentioned use cases. > > Pod to remote pod with redirect(), TCP_RR [1]: > > # percpu_netperf 10.217.1.33 > RT_LATENCY: 122.450 (per CPU: 122.666 > 122.401 122.333 122.401 ) > MEAN_LATENCY: 121.210 (per CPU: 121.100 > 121.260 121.320 121.160 ) > STDDEV_LATENCY: 120.040 (per CPU: 119.420 > 119.910 125.460 115.370 ) > MIN_LATENCY: 46.500 (per CPU: 47.000 >47.000 47.000 45.000 ) > P50_LATENCY: 118.500 (per CPU: 118.000 > 119.000 118.000 119.000 ) > P90_LATENCY: 127.500 (per CPU: 127.000 > 128.000 127.000 128.000 ) > P99_LATENCY: 130.750 (per CPU: 131.000 > 131.000 129.000 132.000 ) > > TRANSACTION_RATE: 32666.400 (per CPU:8152.200 > 8169.8428174.4398169.897 ) > > Pod to remote pod with redirect_peer(), TCP_RR: > > # percpu_netperf 10.217.1.33 > RT_LATENCY: 44.449 (per CPU: 43.767 >43.127 45.279 45.622 ) > MEAN_LATENCY: 45.065 (per CPU: 44.030 >45.530 45.190 45.510 ) > STDDEV_LATENCY: 84.823 (per CPU: 66.770 >97.290 84.380 90.850 ) > MIN_LATENCY: 33.500 (per CPU: 33.000 >33.000 34.000 34.000 ) > P50_LATENCY: 43.250 (per CPU: 43.000 >43.000 43.000 44.000 ) > P90_LATENCY: 46.750 (per CPU: 46.000 >47.000 47.000 47.000 ) > P99_LATENCY: 52.750 (per CPU: 51.000 >54.000 53.000 53.000 ) > > TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 > 23187.089 22085.077 21919.130 ) This is awesome results and great work Daniel! :-) I wonder if we can also support this from XDP, which can also native redirect into veth. Originally I though we could add the peer netdev in the devmap, but AFAIK Toke showed me that this was not possible. > [0] > https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf > [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf > > Signed-off-by: Daniel Borkmann > --- > drivers/net/veth.c | 9 ++ > include/linux/netdevice.h | 4 +++ > include/uapi/linux/bpf.h | 17 +++ > net/core/dev.c | 15 -- > net/core/filter.c | 54 +- > tools/include/uapi/linux/bpf.h | 17 +++ > 6 files changed, 106 insertions(+), 10 deletions(-) > [...] > diff --git a/net/core/dev.c b/net/core/dev.c > index 9d55bf5d1a65..7dd015823593 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4930,7 +4930,7 @@ EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook); > > static inline struct sk_buff * > sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int > *ret, > -struct net_device *orig_dev) > +struct net_device *orig_dev, bool *another) > { > #ifdef CONFIG_NET_CLS_ACT > struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress); > @@ -4974,7 +4974,11 @@ sch_handle_ingress(struct sk_buff *skb, struct > packet_type **pt_prev, int *ret, >* redirecting to another netdev >*/ > __skb_push(skb, skb->mac_len); > - skb_do_redirect(skb); > + if (skb_do_redirect(skb) == -EAGAIN) { > + __skb_pull(skb, skb->mac_len); > + *another = true; > + break; > + } > return NULL; > case TC_ACT_CONSUMED: > return NULL; > @@ -5163,7 +5167,12 @@ static int __netif_receive_skb_core(struct sk_buff > **pskb, bool pfmemalloc, > skip_taps: > #ifdef CONFIG_NET_INGRESS > if (static_branch_unlikely
Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled
On 10.10.2020 17:22, Jakub Kicinski wrote: > On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote: >> On 09.10.2020 18:06, Heiner Kallweit wrote: >>> On 09.10.2020 17:58, Jakub Kicinski wrote: On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote: > I'm thinking about a __napi_schedule version that disables hard irq's > conditionally, based on variable force_irqthreads, exported by the irq > subsystem. This would allow to behave correctly with threadirqs set, > whilst not loosing the _irqoff benefit with threadirqs unset. > Let me come up with a proposal. I think you'd need to make napi_schedule_irqoff() behave like that, right? Are there any uses of napi_schedule_irqoff() that are disabling irqs and not just running from an irq handler? >>> Right, the best approach depends on the answer to the latter question. >>> I didn't check this yet, therefore I described the least intrusive approach. >>> >> >> With some help from coccinelle I identified the following functions that >> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run >> from an irq handler (at least not at the first glance). >> >> dpaa2_caam_fqdan_cb >> qede_simd_fp_handler >> mlx4_en_rx_irq >> mlx4_en_tx_irq > > Don't know the others but FWIW the mlx4 ones run from an IRQ handler, > AFAICT: > > static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr) > static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr) > mlx4_eq_int() > mlx4_cq_completion > cq->comp() > >> qeth_qdio_poll >> netvsc_channel_cb >> napi_watchdog > > This one runs from a hrtimer, which I believe will be a hard irq > context on anything but RT. I could be wrong. > A similar discussion can be found e.g. here: https://lore.kernel.org/netdev/20191126222013.1904785-1-bige...@linutronix.de/ However I don't see any actual outcome.
[PATCH net-next v2 2/2] can: remove obsolete version strings
As pointed out by Jakub Kicinski here: http://lore.kernel.org/r/20201009175751.5c540...@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com this patch removes the obsolete version information of the different CAN protocols and the AF_CAN core module. Signed-off-by: Oliver Hartkopp --- include/linux/can/core.h | 7 --- include/net/netns/can.h | 1 - net/can/af_can.c | 2 +- net/can/bcm.c| 4 +--- net/can/gw.c | 4 +--- net/can/isotp.c | 4 +--- net/can/proc.c | 12 net/can/raw.c| 4 +--- 8 files changed, 5 insertions(+), 33 deletions(-) diff --git a/include/linux/can/core.h b/include/linux/can/core.h index 7da9f1f82e8e..5fb8d0e3f9c1 100644 --- a/include/linux/can/core.h +++ b/include/linux/can/core.h @@ -18,13 +18,6 @@ #include #include -#define CAN_VERSION "20170425" - -/* increment this number each time you change some user-space interface */ -#define CAN_ABI_VERSION "9" - -#define CAN_VERSION_STRING "rev " CAN_VERSION " abi " CAN_ABI_VERSION - #define DNAME(dev) ((dev) ? (dev)->name : "any") /** diff --git a/include/net/netns/can.h b/include/net/netns/can.h index b6ab7d1530d7..52fbd8291a96 100644 --- a/include/net/netns/can.h +++ b/include/net/netns/can.h @@ -15,7 +15,6 @@ struct can_rcv_lists_stats; struct netns_can { #if IS_ENABLED(CONFIG_PROC_FS) struct proc_dir_entry *proc_dir; - struct proc_dir_entry *pde_version; struct proc_dir_entry *pde_stats; struct proc_dir_entry *pde_reset_stats; struct proc_dir_entry *pde_rcvlist_all; diff --git a/net/can/af_can.c b/net/can/af_can.c index b7d0f6500893..6373ab9c5507 100644 --- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -875,7 +875,7 @@ static __init int can_init(void) offsetof(struct can_frame, data) != offsetof(struct canfd_frame, data)); - pr_info("can: controller area network core (" CAN_VERSION_STRING ")\n"); + pr_info("can: controller area network core\n"); rcv_cache = kmem_cache_create("can_receiver", sizeof(struct receiver), 0, 0, NULL); diff --git a/net/can/bcm.c b/net/can/bcm.c index 4253915800e6..0e5c37be4a2b 100644 --- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -81,8 +81,6 @@ (CAN_EFF_MASK | CAN_EFF_FLAG | CAN_RTR_FLAG) : \ (CAN_SFF_MASK | CAN_EFF_FLAG | CAN_RTR_FLAG)) -#define CAN_BCM_VERSION "20170425" - MODULE_DESCRIPTION("PF_CAN broadcast manager protocol"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Oliver Hartkopp "); @@ -1696,7 +1694,7 @@ static int __init bcm_module_init(void) { int err; - pr_info("can: broadcast manager protocol (rev " CAN_BCM_VERSION " t)\n"); + pr_info("can: broadcast manager protocol\n"); err = can_proto_register(&bcm_can_proto); if (err < 0) { diff --git a/net/can/gw.c b/net/can/gw.c index 49b4e3d91ad6..6b790b6ff8d2 100644 --- a/net/can/gw.c +++ b/net/can/gw.c @@ -59,7 +59,6 @@ #include #include -#define CAN_GW_VERSION "20190810" #define CAN_GW_NAME "can-gw" MODULE_DESCRIPTION("PF_CAN netlink gateway"); @@ -1194,8 +1193,7 @@ static __init int cgw_module_init(void) /* sanitize given module parameter */ max_hops = clamp_t(unsigned int, max_hops, CGW_MIN_HOPS, CGW_MAX_HOPS); - pr_info("can: netlink gateway (rev " CAN_GW_VERSION ") max_hops=%d\n", - max_hops); + pr_info("can: netlink gateway - max_hops=%d\n", max_hops); ret = register_pernet_subsys(&cangw_pernet_ops); if (ret) diff --git a/net/can/isotp.c b/net/can/isotp.c index 22187669c5c9..132ca6c471d6 100644 --- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -72,8 +72,6 @@ #include #include -#define CAN_ISOTP_VERSION "20200928" - MODULE_DESCRIPTION("PF_CAN isotp 15765-2:2016 protocol"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Oliver Hartkopp "); @@ -1409,7 +1407,7 @@ static __init int isotp_module_init(void) { int err; - pr_info("isotp protocol (rev " CAN_ISOTP_VERSION ")\n"); + pr_info("isotp protocol\n"); err = can_proto_register(&isotp_can_proto); if (err < 0) diff --git a/net/can/proc.c b/net/can/proc.c index a4eb06c9eb70..550928b8b8a2 100644 --- a/net/can/proc.c +++ b/net/can/proc.c @@ -54,7 +54,6 @@ * proc filenames for the PF_CAN core */ -#define CAN_PROC_VERSION "version" #define CAN_PROC_STATS "stats" #define CAN_PROC_RESET_STATS "reset_stats" #define CAN_PROC_RCVLIST_ALL "rcvlist_all" @@ -293,12 +292,6 @@ static int can_reset_stats_proc_show(struct seq_file *m, void *v) return 0; } -static int can_version_proc_show(struct seq_file *m, void *v) -{ - seq_printf(m, "%s\n", CAN_VERSION_STRING); - return 0; -} - static inline void can_rcvlist_proc_show_one(struct seq_file *m, int idx, struct net_device *dev,
[PATCH net-next v2 1/2] can-isotp: implement cleanups / improvements from review
As pointed out by Jakub Kicinski here: http://lore.kernel.org/r/20201009175751.5c540...@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com this patch addresses the remarked issues: - remove empty line in comment - remove default=y for CAN_ISOTP in Kconfig - make use of pr_notice_once() - use GFP_KERNEL instead of gfp_any() in soft hrtimer context - make use of pr_fmt() [suggested my Marc Kleine-Budde] The version strings in the CAN subsystem are removed by a separate patch. Signed-off-by: Oliver Hartkopp --- include/uapi/linux/can/isotp.h | 1 - net/can/Kconfig| 3 ++- net/can/isotp.c| 19 ++- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/uapi/linux/can/isotp.h b/include/uapi/linux/can/isotp.h index 553006509f4e..7793b26aa154 100644 --- a/include/uapi/linux/can/isotp.h +++ b/include/uapi/linux/can/isotp.h @@ -160,7 +160,6 @@ struct can_isotp_ll_options { * these default settings can be changed via sockopts. * For that reason the STmin value is intentionally _not_ checked for * consistency and copied directly into the flow control (FC) frame. - * */ #endif /* !_UAPI_CAN_ISOTP_H */ diff --git a/net/can/Kconfig b/net/can/Kconfig index 021fe03a8ed6..224e5e0283a9 100644 --- a/net/can/Kconfig +++ b/net/can/Kconfig @@ -57,7 +57,6 @@ source "net/can/j1939/Kconfig" config CAN_ISOTP tristate "ISO 15765-2:2016 CAN transport protocol" - default y help CAN Transport Protocols offer support for segmented Point-to-Point communication between CAN nodes via two defined CAN Identifiers. @@ -67,6 +66,8 @@ config CAN_ISOTP vehicle diagnosis (UDS, ISO 14229) or IP-over-CAN traffic. This protocol driver implements data transfers according to ISO 15765-2:2016 for 'classic' CAN and CAN FD frame types. + If you want to perform automotive vehicle diagnostic services (UDS), + say 'y'. source "drivers/net/can/Kconfig" diff --git a/net/can/isotp.c b/net/can/isotp.c index e6ff032b5426..22187669c5c9 100644 --- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -79,6 +79,8 @@ MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Oliver Hartkopp "); MODULE_ALIAS("can-proto-6"); +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #define SINGLE_MASK(id) (((id) & CAN_EFF_FLAG) ? \ (CAN_EFF_MASK | CAN_EFF_FLAG | CAN_RTR_FLAG) : \ (CAN_SFF_MASK | CAN_EFF_FLAG | CAN_RTR_FLAG)) @@ -222,8 +224,8 @@ static int isotp_send_fc(struct sock *sk, int ae, u8 flowstatus) can_send_ret = can_send(nskb, 1); if (can_send_ret) - printk_once(KERN_NOTICE "can-isotp: %s: can_send_ret %d\n", - __func__, can_send_ret); + pr_notice_once("%s: can_send_ret %d\n", + __func__, can_send_ret); dev_put(dev); @@ -769,7 +771,7 @@ static enum hrtimer_restart isotp_tx_timer_handler(struct hrtimer *hrtimer) isotp_tx_burst: skb = alloc_skb(so->ll.mtu + sizeof(struct can_skb_priv), - gfp_any()); + GFP_KERNEL); if (!skb) { dev_put(dev); break; @@ -798,8 +800,8 @@ static enum hrtimer_restart isotp_tx_timer_handler(struct hrtimer *hrtimer) can_send_ret = can_send(skb, 1); if (can_send_ret) - printk_once(KERN_NOTICE "can-isotp: %s: can_send_ret %d\n", - __func__, can_send_ret); + pr_notice_once("%s: can_send_ret %d\n", + __func__, can_send_ret); if (so->tx.idx >= so->tx.len) { /* we are done */ @@ -942,8 +944,7 @@ static int isotp_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) err = can_send(skb, 1); dev_put(dev); if (err) { - printk_once(KERN_NOTICE "can-isotp: %s: can_send_ret %d\n", - __func__, err); + pr_notice_once("%s: can_send_ret %d\n", __func__, err); return err; } @@ -1408,11 +1409,11 @@ static __init int isotp_module_init(void) { int err; - pr_info("can: isotp protocol (rev " CAN_ISOTP_VERSION ")\n"); + pr_info("isotp protocol (rev " CAN_ISOTP_VERSION ")\n"); err = can_proto_register(&isotp_can_proto); if (err < 0) - pr_err("can: registration of isotp protocol failed\n"); + pr_err("registration of isotp protocol failed\n"); return err; } -- 2.28.0
[PATCH net-next 0/9] bnxt_en: Updates for net-next.
This series contains these main changes: 1. Change of default message level to enable more logging. 2. Some cleanups related to processing async events from firmware. 3. Allow online ethtool selftest on multi-function PFs. 4. Return stored firmware version information to devlink. Michael Chan (4): bnxt_en: Set driver default message level. bnxt_en: Simplify bnxt_async_event_process(). bnxt_en: Log event_data1 and event_data2 when handling RESET_NOTIFY event. bnxt_en: Log unknown link speed appropriately. Vasundhara Volam (5): bnxt_en: Return -EROFS to user space, if NVM writes are not permitted. bnxt_en: Enable online self tests for multi-host/NPAR mode. bnxt_en: Add bnxt_hwrm_nvm_get_dev_info() to query NVM info. bnxt_en: Refactor bnxt_dl_info_get(). bnxt_en: Add stored FW version info to devlink info_get cb. drivers/net/ethernet/broadcom/bnxt/bnxt.c | 37 +++-- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + .../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 153 -- .../net/ethernet/broadcom/bnxt/bnxt_devlink.h | 6 + .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 24 ++- .../net/ethernet/broadcom/bnxt/bnxt_ethtool.h | 2 + 6 files changed, 162 insertions(+), 61 deletions(-) -- 2.18.1 smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net-next 1/9] bnxt_en: Return -EROFS to user space, if NVM writes are not permitted.
From: Vasundhara Volam If NVRAM resources are locked, NVM writes are not permitted. In such scenarios, firmware returns HWRM_ERR_CODE_RESOURCE_LOCKED error to firmware commands. Reviewed-by: Edwin Peer Signed-off-by: Vasundhara Volam Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 5e4b7fbeef06..d4402a2cd07f 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4325,6 +4325,8 @@ static int bnxt_hwrm_to_stderr(u32 hwrm_err) switch (hwrm_err) { case HWRM_ERR_CODE_SUCCESS: return 0; + case HWRM_ERR_CODE_RESOURCE_LOCKED: + return -EROFS; case HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED: return -EACCES; case HWRM_ERR_CODE_RESOURCE_ALLOC_ERROR: -- 2.18.1 smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net-next 9/9] bnxt_en: Add stored FW version info to devlink info_get cb.
From: Vasundhara Volam This patch adds FW versions stored in the flash to devlink info_get callback. Return the correct fw.psid running version using the newly added bp->nvm_cfg_ver. Reviewed-by: Andy Gospodarek Signed-off-by: Vasundhara Volam Signed-off-by: Michael Chan --- .../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 45 ++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c index 65e8da1f7a01..7631c2d4c19e 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c @@ -408,6 +408,7 @@ static int bnxt_dl_info_put(struct bnxt *bp, struct devlink_info_req *req, static int bnxt_dl_info_get(struct devlink *dl, struct devlink_info_req *req, struct netlink_ext_ack *extack) { + struct hwrm_nvm_get_dev_info_output nvm_dev_info; struct bnxt *bp = bnxt_get_bp_from_dl(dl); union devlink_param_value nvm_cfg_ver; struct hwrm_ver_get_output *ver_resp; @@ -455,6 +456,12 @@ static int bnxt_dl_info_get(struct devlink *dl, struct devlink_info_req *req, if (rc) return rc; + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_RUNNING, + DEVLINK_INFO_VERSION_GENERIC_FW_PSID, + bp->nvm_cfg_ver); + if (rc) + return rc; + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_RUNNING, DEVLINK_INFO_VERSION_GENERIC_FW, ver_resp->active_pkg_name); @@ -466,7 +473,7 @@ static int bnxt_dl_info_get(struct devlink *dl, struct devlink_info_req *req, sprintf(buf, "%X.%X.%X", (ver >> 16) & 0xF, (ver >> 8) & 0xF, ver & 0xF); - rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_RUNNING, + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_STORED, DEVLINK_INFO_VERSION_GENERIC_FW_PSID, buf); if (rc) @@ -516,7 +523,43 @@ static int bnxt_dl_info_get(struct devlink *dl, struct devlink_info_req *req, rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_RUNNING, DEVLINK_INFO_VERSION_GENERIC_FW_ROCE, roce_ver); + if (rc) + return rc; + + rc = bnxt_hwrm_nvm_get_dev_info(bp, &nvm_dev_info); + if (rc) + return rc; + + if (!(nvm_dev_info.flags & NVM_GET_DEV_INFO_RESP_FLAGS_FW_VER_VALID)) + return 0; + + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_STORED, + DEVLINK_INFO_VERSION_GENERIC_FW, + nvm_dev_info.pkg_name); + if (rc) + return rc; + + snprintf(mgmt_ver, FW_VER_STR_LEN, "%d.%d.%d.%d", +nvm_dev_info.hwrm_fw_major, nvm_dev_info.hwrm_fw_minor, +nvm_dev_info.hwrm_fw_build, nvm_dev_info.hwrm_fw_patch); + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_STORED, + DEVLINK_INFO_VERSION_GENERIC_FW_MGMT, mgmt_ver); + if (rc) + return rc; + + snprintf(ncsi_ver, FW_VER_STR_LEN, "%d.%d.%d.%d", +nvm_dev_info.mgmt_fw_major, nvm_dev_info.mgmt_fw_minor, +nvm_dev_info.mgmt_fw_build, nvm_dev_info.mgmt_fw_patch); + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_STORED, + DEVLINK_INFO_VERSION_GENERIC_FW_NCSI, ncsi_ver); + if (rc) + return rc; + snprintf(roce_ver, FW_VER_STR_LEN, "%d.%d.%d.%d", +nvm_dev_info.roce_fw_major, nvm_dev_info.roce_fw_minor, +nvm_dev_info.roce_fw_build, nvm_dev_info.roce_fw_patch); + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_STORED, + DEVLINK_INFO_VERSION_GENERIC_FW_ROCE, roce_ver); return rc; } -- 2.18.1 smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net-next 6/9] bnxt_en: Log unknown link speed appropriately.
If the VF virtual link is set to always enabled, the speed may be unknown when the physical link is down. The driver currently logs the link speed as 4294967295 Mbps which is SPEED_UNKNOWN. Modify the link up log message as "speed unknown" which makes more sense. Reviewed-by: Vasundhara Volam Reviewed-by: Edwin Peer Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index ca35ec06abf0..2d6ecf1adbc5 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -8902,6 +8902,11 @@ static void bnxt_report_link(struct bnxt *bp) u16 fec; netif_carrier_on(bp->dev); + speed = bnxt_fw_to_ethtool_speed(bp->link_info.link_speed); + if (speed == SPEED_UNKNOWN) { + netdev_info(bp->dev, "NIC Link is Up, speed unknown\n"); + return; + } if (bp->link_info.duplex == BNXT_LINK_DUPLEX_FULL) duplex = "full"; else @@ -8914,7 +8919,6 @@ static void bnxt_report_link(struct bnxt *bp) flow_ctrl = "ON - receive"; else flow_ctrl = "none"; - speed = bnxt_fw_to_ethtool_speed(bp->link_info.link_speed); netdev_info(bp->dev, "NIC Link is Up, %u Mbps %s duplex, Flow control: %s\n", speed, duplex, flow_ctrl); if (bp->flags & BNXT_FLAG_EEE_CAP) -- 2.18.1 smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net-next 7/9] bnxt_en: Add bnxt_hwrm_nvm_get_dev_info() to query NVM info.
From: Vasundhara Volam Add a new bnxt_hwrm_nvm_get_dev_info() to query firmware version information via NVM_GET_DEV_INFO firmware command. Use it to get the running version of the NVM configuration information. This new function will also be used in subsequent patches to get the stored firmware versions. Reviewed-by: Andy Gospodarek Signed-off-by: Vasundhara Volam Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c| 12 drivers/net/ethernet/broadcom/bnxt/bnxt.h| 1 + .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c| 16 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.h| 2 ++ 4 files changed, 31 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 2d6ecf1adbc5..55d96e3a69da 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -7562,6 +7562,16 @@ static int bnxt_hwrm_func_reset(struct bnxt *bp) return hwrm_send_message(bp, &req, sizeof(req), HWRM_RESET_TIMEOUT); } +static void bnxt_nvm_cfg_ver_get(struct bnxt *bp) +{ + struct hwrm_nvm_get_dev_info_output nvm_info; + + if (!bnxt_hwrm_nvm_get_dev_info(bp, &nvm_info)) + snprintf(bp->nvm_cfg_ver, FW_VER_STR_LEN, "%d.%d.%d", +nvm_info.nvm_cfg_ver_maj, nvm_info.nvm_cfg_ver_min, +nvm_info.nvm_cfg_ver_upd); +} + static int bnxt_hwrm_queue_qportcfg(struct bnxt *bp) { int rc = 0; @@ -11223,6 +11233,8 @@ static int bnxt_fw_init_one_p1(struct bnxt *bp) if (rc) return rc; } + bnxt_nvm_cfg_ver_get(bp); + rc = bnxt_hwrm_func_reset(bp); if (rc) return -ENODEV; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index b208ff7c5d14..21ef1c21f602 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1856,6 +1856,7 @@ struct bnxt { #define PHY_VER_STR_LEN (FW_VER_STR_LEN - BC_HWRM_STR_LEN) charfw_ver_str[FW_VER_STR_LEN]; charhwrm_ver_supp[FW_VER_STR_LEN]; + charnvm_cfg_ver[FW_VER_STR_LEN]; u64 fw_ver_code; #define BNXT_FW_VER_CODE(maj, min, bld, rsv) \ ((u64)(maj) << 48 | (u64)(min) << 32 | (u64)(bld) << 16 | (rsv)) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index dcbb7b70d60a..53687bc7fcf5 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -2072,6 +2072,22 @@ static u32 bnxt_get_link(struct net_device *dev) return bp->link_info.link_up; } +int bnxt_hwrm_nvm_get_dev_info(struct bnxt *bp, + struct hwrm_nvm_get_dev_info_output *nvm_dev_info) +{ + struct hwrm_nvm_get_dev_info_output *resp = bp->hwrm_cmd_resp_addr; + struct hwrm_nvm_get_dev_info_input req = {0}; + int rc; + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_NVM_GET_DEV_INFO, -1, -1); + mutex_lock(&bp->hwrm_cmd_lock); + rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); + if (!rc) + memcpy(nvm_dev_info, resp, sizeof(*resp)); + mutex_unlock(&bp->hwrm_cmd_lock); + return rc; +} + static void bnxt_print_admin_err(struct bnxt *bp) { netdev_info(bp->dev, "PF does not have admin privileges to flash or reset the device\n"); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h index 34f44ddfad79..fa6fbde52bea 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h @@ -92,6 +92,8 @@ u32 bnxt_get_rxfh_indir_size(struct net_device *dev); u32 _bnxt_fw_to_ethtool_adv_spds(u16, u8); u32 bnxt_fw_to_ethtool_speed(u16); u16 bnxt_get_fw_auto_link_speeds(u32); +int bnxt_hwrm_nvm_get_dev_info(struct bnxt *bp, + struct hwrm_nvm_get_dev_info_output *nvm_dev_info); int bnxt_flash_package_from_file(struct net_device *dev, const char *filename, u32 install_type); void bnxt_ethtool_init(struct bnxt *bp); -- 2.18.1 smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net-next 2/9] bnxt_en: Enable online self tests for multi-host/NPAR mode.
From: Vasundhara Volam Online self tests are not disruptive and can be run in NPAR mode and in multi-host NIC as well. Reviewed-by: Edwin Peer Signed-off-by: Vasundhara Volam Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 52b4ea6ef8c9..dcbb7b70d60a 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -3298,7 +3298,7 @@ static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest, u8 test_mask = 0; int rc = 0, i; - if (!bp->num_tests || !BNXT_SINGLE_PF(bp)) + if (!bp->num_tests || !BNXT_PF(bp)) return; memset(buf, 0, sizeof(u64) * bp->num_tests); if (!netif_running(dev)) { @@ -3311,9 +3311,9 @@ static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest, do_ext_lpbk = true; if (etest->flags & ETH_TEST_FL_OFFLINE) { - if (bp->pf.active_vfs) { + if (bp->pf.active_vfs || !BNXT_SINGLE_PF(bp)) { etest->flags |= ETH_TEST_FL_FAILED; - netdev_warn(dev, "Offline tests cannot be run with active VFs\n"); + netdev_warn(dev, "Offline tests cannot be run with active VFs or on shared PF\n"); return; } offline = true; @@ -3829,7 +3829,7 @@ void bnxt_ethtool_init(struct bnxt *bp) bnxt_get_pkgver(dev); bp->num_tests = 0; - if (bp->hwrm_spec_code < 0x10704 || !BNXT_SINGLE_PF(bp)) + if (bp->hwrm_spec_code < 0x10704 || !BNXT_PF(bp)) return; bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_SELFTEST_QLIST, -1, -1); -- 2.18.1 smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net-next 8/9] bnxt_en: Refactor bnxt_dl_info_get().
From: Vasundhara Volam Add a new function bnxt_dl_info_put() to simplify the code, as there are more stored firmware version fields to be added in the next patch. Also, rename fw_ver variable name to ncsi_ver for better naming while copying to devlink info_get cb. Reviewed-by: Pavan Chebbi Reviewed-by: Andy Gospodarek Signed-off-by: Vasundhara Volam Signed-off-by: Michael Chan --- .../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 112 ++ .../net/ethernet/broadcom/bnxt/bnxt_devlink.h | 6 + 2 files changed, 70 insertions(+), 48 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c index d436134bdc40..65e8da1f7a01 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c @@ -382,6 +382,29 @@ static int bnxt_hwrm_get_nvm_cfg_ver(struct bnxt *bp, return rc; } +static int bnxt_dl_info_put(struct bnxt *bp, struct devlink_info_req *req, + enum bnxt_dl_version_type type, const char *key, + char *buf) +{ + if (!strlen(buf)) + return 0; + + if ((bp->flags & BNXT_FLAG_CHIP_P5) && + (!strcmp(key, DEVLINK_INFO_VERSION_GENERIC_FW_NCSI) || +!strcmp(key, DEVLINK_INFO_VERSION_GENERIC_FW_ROCE))) + return 0; + + switch (type) { + case BNXT_VERSION_FIXED: + return devlink_info_version_fixed_put(req, key, buf); + case BNXT_VERSION_RUNNING: + return devlink_info_version_running_put(req, key, buf); + case BNXT_VERSION_STORED: + return devlink_info_version_stored_put(req, key, buf); + } + return 0; +} + static int bnxt_dl_info_get(struct devlink *dl, struct devlink_info_req *req, struct netlink_ext_ack *extack) { @@ -390,7 +413,7 @@ static int bnxt_dl_info_get(struct devlink *dl, struct devlink_info_req *req, struct hwrm_ver_get_output *ver_resp; char mgmt_ver[FW_VER_STR_LEN]; char roce_ver[FW_VER_STR_LEN]; - char fw_ver[FW_VER_STR_LEN]; + char ncsi_ver[FW_VER_STR_LEN]; char buf[32]; int rc; @@ -398,10 +421,11 @@ static int bnxt_dl_info_get(struct devlink *dl, struct devlink_info_req *req, if (rc) return rc; - if (strlen(bp->board_partno)) { - rc = devlink_info_version_fixed_put(req, - DEVLINK_INFO_VERSION_GENERIC_BOARD_ID, - bp->board_partno); + if (BNXT_PF(bp) && (bp->flags & BNXT_FLAG_DSN_VALID)) { + sprintf(buf, "%02X-%02X-%02X-%02X-%02X-%02X-%02X-%02X", + bp->dsn[7], bp->dsn[6], bp->dsn[5], bp->dsn[4], + bp->dsn[3], bp->dsn[2], bp->dsn[1], bp->dsn[0]); + rc = devlink_info_serial_number_put(req, buf); if (rc) return rc; } @@ -412,54 +436,49 @@ static int bnxt_dl_info_get(struct devlink *dl, struct devlink_info_req *req, return rc; } + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_FIXED, + DEVLINK_INFO_VERSION_GENERIC_BOARD_ID, + bp->board_partno); + if (rc) + return rc; + sprintf(buf, "%X", bp->chip_num); - rc = devlink_info_version_fixed_put(req, - DEVLINK_INFO_VERSION_GENERIC_ASIC_ID, buf); + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_FIXED, + DEVLINK_INFO_VERSION_GENERIC_ASIC_ID, buf); if (rc) return rc; ver_resp = &bp->ver_resp; sprintf(buf, "%X", ver_resp->chip_rev); - rc = devlink_info_version_fixed_put(req, - DEVLINK_INFO_VERSION_GENERIC_ASIC_REV, buf); + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_FIXED, + DEVLINK_INFO_VERSION_GENERIC_ASIC_REV, buf); if (rc) return rc; - if (BNXT_PF(bp)) { - sprintf(buf, "%02X-%02X-%02X-%02X-%02X-%02X-%02X-%02X", - bp->dsn[7], bp->dsn[6], bp->dsn[5], bp->dsn[4], - bp->dsn[3], bp->dsn[2], bp->dsn[1], bp->dsn[0]); - rc = devlink_info_serial_number_put(req, buf); - if (rc) - return rc; - } - - if (strlen(ver_resp->active_pkg_name)) { - rc = - devlink_info_version_running_put(req, - DEVLINK_INFO_VERSION_GENERIC_FW, - ver_resp->active_pkg_name); - if (rc) - return rc; - } + rc = bnxt_dl_info_put(bp, req, BNXT_VERSION_RUNNING, + DEVLINK_INFO_VERSION_GENERIC_FW, + ver_resp->active_pkg_name); +
[PATCH net-next 5/9] bnxt_en: Log event_data1 and event_data2 when handling RESET_NOTIFY event.
Log these values that contain useful firmware state information. Reviewed-by: Edwin Peer Reviewed-by: Vasundhara Volam Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index f093cdd61426..ca35ec06abf0 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -2032,6 +2032,9 @@ static int bnxt_async_event_process(struct bnxt *bp, set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event); break; case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY: + if (netif_msg_hw(bp)) + netdev_warn(bp->dev, "Received RESET_NOTIFY event, data1: 0x%x, data2: 0x%x\n", + data1, data2); if (!bp->fw_health) goto async_event_process_exit; -- 2.18.1 smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net-next 4/9] bnxt_en: Simplify bnxt_async_event_process().
event_data1 and event_data2 are used when processing most events. Store these in local variables at the beginning of the function to simplify many of the case statements. Reviewed-by: Edwin Peer Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 0da6a1138eee..f093cdd61426 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1980,11 +1980,12 @@ static int bnxt_async_event_process(struct bnxt *bp, struct hwrm_async_event_cmpl *cmpl) { u16 event_id = le16_to_cpu(cmpl->event_id); + u32 data1 = le32_to_cpu(cmpl->event_data1); + u32 data2 = le32_to_cpu(cmpl->event_data2); /* TODO CHIMP_FW: Define event id's for link change, error etc */ switch (event_id) { case ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE: { - u32 data1 = le32_to_cpu(cmpl->event_data1); struct bnxt_link_info *link_info = &bp->link_info; if (BNXT_VF(bp)) @@ -2014,7 +2015,6 @@ static int bnxt_async_event_process(struct bnxt *bp, set_bit(BNXT_HWRM_PF_UNLOAD_SP_EVENT, &bp->sp_event); break; case ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED: { - u32 data1 = le32_to_cpu(cmpl->event_data1); u16 port_id = BNXT_GET_EVENT_PORT(data1); if (BNXT_VF(bp)) @@ -2031,9 +2031,7 @@ static int bnxt_async_event_process(struct bnxt *bp, goto async_event_process_exit; set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event); break; - case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY: { - u32 data1 = le32_to_cpu(cmpl->event_data1); - + case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY: if (!bp->fw_health) goto async_event_process_exit; @@ -2053,10 +2051,8 @@ static int bnxt_async_event_process(struct bnxt *bp, } set_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event); break; - } case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: { struct bnxt_fw_health *fw_health = bp->fw_health; - u32 data1 = le32_to_cpu(cmpl->event_data1); if (!fw_health) goto async_event_process_exit; @@ -2084,8 +2080,6 @@ static int bnxt_async_event_process(struct bnxt *bp, goto async_event_process_exit; } case ASYNC_EVENT_CMPL_EVENT_ID_RING_MONITOR_MSG: { - u32 data1 = le32_to_cpu(cmpl->event_data1); - u32 data2 = le32_to_cpu(cmpl->event_data2); struct bnxt_rx_ring_info *rxr; u16 grp_idx; -- 2.18.1 smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net-next 3/9] bnxt_en: Set driver default message level.
Currently, bp->msg_enable has default value of 0. It is more useful to have the commonly used NETIF_MSG_DRV and NETIF_MSG_HW enabled by default. Reviewed-by: Edwin Peer Reviewed-by: Vasundhara Volam Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index d4402a2cd07f..0da6a1138eee 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -69,6 +69,7 @@ #include "bnxt_debugfs.h" #define BNXT_TX_TIMEOUT(5 * HZ) +#define BNXT_DEF_MSG_ENABLE(NETIF_MSG_DRV | NETIF_MSG_HW) MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Broadcom BCM573xx network driver"); @@ -12510,6 +12511,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) return -ENOMEM; bp = netdev_priv(dev); + bp->msg_enable = BNXT_DEF_MSG_ENABLE; bnxt_set_max_func_irqs(bp, max_irqs); if (bnxt_vf_pciid(ent->driver_data)) -- 2.18.1 smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH v2 0/3] [PATCH v2 0/3] [PATCH v2 0/3] net, mac80211, kernel: enable KCOV remote coverage collection for 802.11 frame handling
On 11 October 2020 12:37:29 CEST, Andrey Konovalov wrote: >I initially hesitated to do that, as it would multiply the number of >kcov callbacks. But perhaps you're right and a clean API look >outweighs the rest. I will do this in v3. Yeah, OK, dunno. You can always make it an inline calling the "full" API so after compiling it's equivalent. But if course that still has the two APIs. It just seemed to the common case wouldn't worry really (have to) about these things, especially if you plan on changing it again later. johannes -- Sent from my phone.
[PATCH 5/5] net/tls: use semicolons rather than commas to separate statements
Replace commas with semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // @@ expression e1,e2; @@ e1 -, +; e2 ... when any // Signed-off-by: Julia Lawall --- net/tls/tls_main.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 002b0859fed5..8d93cea99f2c 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -869,7 +869,7 @@ static int __init tls_register(void) tls_sw_proto_ops = inet_stream_ops; tls_sw_proto_ops.splice_read = tls_sw_splice_read; - tls_sw_proto_ops.sendpage_locked = tls_sw_sendpage_locked, + tls_sw_proto_ops.sendpage_locked = tls_sw_sendpage_locked; tls_device_init(); tcp_register_ulp(&tcp_tls_ulp_ops);
[PATCH 0/5] net: use semicolons rather than commas to separate statements
These patches replace commas by semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. This was done using the Coccinelle semantic patch (http://coccinelle.lip6.fr/) shown below. This semantic patch ensures that commas inside for loop headers will not be transformed. It also doesn't touch macro definitions. Coccinelle ensures that braces are added as needed when a single-statement branch turns into a multi-statement one. This semantic patch has a few false positives, for variable delcarations such as: LIST_HEAD(x), *y; The semantic patch could be improved to avoid these, but for the moment they have been removed manually (2 occurrences). // @initialize:ocaml@ @@ let infunction p = (* avoid macros *) (List.hd p).current_element <> "something_else" let combined p1 p2 = (List.hd p1).line_end = (List.hd p2).line || (((List.hd p1).line_end < (List.hd p2).line) && ((List.hd p1).col < (List.hd p2).col)) @bad@ statement S; declaration d; position p; @@ S@p d // special cases where newlines are needed (hope for no more than 5) @@ expression e1,e2; statement S; position p != bad.p; position p1; position p2 : script:ocaml(p1) { infunction p1 && combined p1 p2 }; @@ - e1@p1,@S@p e2@p2; + e1; e2; @@ expression e1,e2; statement S; position p != bad.p; position p1; position p2 : script:ocaml(p1) { infunction p1 && combined p1 p2 }; @@ - e1@p1,@S@p e2@p2; + e1; e2; @@ expression e1,e2; statement S; position p != bad.p; position p1; position p2 : script:ocaml(p1) { infunction p1 && combined p1 p2 }; @@ - e1@p1,@S@p e2@p2; + e1; e2; @@ expression e1,e2; statement S; position p != bad.p; position p1; position p2 : script:ocaml(p1) { infunction p1 && combined p1 p2 }; @@ - e1@p1,@S@p e2@p2; + e1; e2; @@ expression e1,e2; statement S; position p != bad.p; position p1; position p2 : script:ocaml(p1) { infunction p1 && combined p1 p2 }; @@ - e1@p1,@S@p e2@p2; + e1; e2; @r@ expression e1,e2; statement S; position p != bad.p; @@ e1 ,@S@p e2; @@ expression e1,e2; position p1; position p2 : script:ocaml(p1) { infunction p1 && not(combined p1 p2) }; statement S; position r.p; @@ e1@p1 -,@S@p +; e2@p2 ... when any // --- net/ipv4/tcp_input.c |3 ++- net/ipv4/tcp_vegas.c |8 net/ipv6/calipso.c |2 +- net/mac80211/debugfs_sta.c |2 +- net/rxrpc/recvmsg.c|2 +- net/tls/tls_main.c |2 +- 6 files changed, 10 insertions(+), 9 deletions(-)
[PATCH 1/5] rxrpc: use semicolons rather than commas to separate statements
Replace commas with semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // @@ expression e1,e2; @@ e1 -, +; e2 ... when any // Signed-off-by: Julia Lawall --- net/rxrpc/recvmsg.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index c4684dde1f16..acc4660cfa5b 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -69,7 +69,7 @@ bool __rxrpc_set_call_completion(struct rxrpc_call *call, if (call->state < RXRPC_CALL_COMPLETE) { call->abort_code = abort_code; call->error = error; - call->completion = compl, + call->completion = compl; call->state = RXRPC_CALL_COMPLETE; trace_rxrpc_call_complete(call); wake_up(&call->waitq);
[PATCH 2/5] mac80211: use semicolons rather than commas to separate statements
Replace commas with semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // @@ expression e1,e2; @@ e1 -, +; e2 ... when any // Signed-off-by: Julia Lawall --- net/mac80211/debugfs_sta.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c index 829dcad69c2c..6a51b8b58f9e 100644 --- a/net/mac80211/debugfs_sta.c +++ b/net/mac80211/debugfs_sta.c @@ -274,7 +274,7 @@ static ssize_t sta_aql_read(struct file *file, char __user *userbuf, "Q limit[low/high]: VO: %u/%u VI: %u/%u BE: %u/%u BK: %u/%u\n", q_depth[0], q_depth[1], q_depth[2], q_depth[3], q_limit_l[0], q_limit_h[0], q_limit_l[1], q_limit_h[1], - q_limit_l[2], q_limit_h[2], q_limit_l[3], q_limit_h[3]), + q_limit_l[2], q_limit_h[2], q_limit_l[3], q_limit_h[3]); rv = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); kfree(buf);
[PATCH 3/5] tcp: use semicolons rather than commas to separate statements
Replace commas with semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // @@ expression e1,e2; @@ e1 -, +; e2 ... when any // Signed-off-by: Julia Lawall --- net/ipv4/tcp_input.c |3 ++- net/ipv4/tcp_vegas.c |8 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 02d0e2fb77c0..7ff9be52c061 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4418,7 +4418,8 @@ static void tcp_sack_maybe_coalesce(struct tcp_sock *tp) sp[i] = sp[i + 1]; continue; } - this_sack++, swalk++; + this_sack++; + swalk++; } } diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c index 3f51e781562a..c8003c8aad2c 100644 --- a/net/ipv4/tcp_vegas.c +++ b/net/ipv4/tcp_vegas.c @@ -293,10 +293,10 @@ size_t tcp_vegas_get_info(struct sock *sk, u32 ext, int *attr, const struct vegas *ca = inet_csk_ca(sk); if (ext & (1 << (INET_DIAG_VEGASINFO - 1))) { - info->vegas.tcpv_enabled = ca->doing_vegas_now, - info->vegas.tcpv_rttcnt = ca->cntRTT, - info->vegas.tcpv_rtt = ca->baseRTT, - info->vegas.tcpv_minrtt = ca->minRTT, + info->vegas.tcpv_enabled = ca->doing_vegas_now; + info->vegas.tcpv_rttcnt = ca->cntRTT; + info->vegas.tcpv_rtt = ca->baseRTT; + info->vegas.tcpv_minrtt = ca->minRTT; *attr = INET_DIAG_VEGASINFO; return sizeof(struct tcpvegas_info);
[PATCH 4/5] net/ipv6: use semicolons rather than commas to separate statements
Replace commas with semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // @@ expression e1,e2; @@ e1 -, +; e2 ... when any // Signed-off-by: Julia Lawall --- net/ipv6/calipso.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c index 8d3f66c310db..78f766019b7e 100644 --- a/net/ipv6/calipso.c +++ b/net/ipv6/calipso.c @@ -761,7 +761,7 @@ static int calipso_genopt(unsigned char *buf, u32 start, u32 buf_len, calipso[1] = len - 2; *(__be32 *)(calipso + 2) = htonl(doi_def->doi); calipso[6] = (len - CALIPSO_HDR_LEN) / 4; - calipso[7] = secattr->attr.mls.lvl, + calipso[7] = secattr->attr.mls.lvl; crc = ~crc_ccitt(0x, calipso, len); calipso[8] = crc & 0xff; calipso[9] = (crc >> 8) & 0xff;
[PATCH] net: korina: free array used for rx/tx descriptors
Memory was not freed when driver is unloaded from the kernel. Signed-off-by: Valentin Vidic --- drivers/net/ethernet/korina.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index 03e034918d14..99146145f020 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -1133,6 +1133,7 @@ static int korina_remove(struct platform_device *pdev) iounmap(lp->eth_regs); iounmap(lp->rx_dma_regs); iounmap(lp->tx_dma_regs); + kfree(lp->td_ring); unregister_netdev(bif->dev); free_netdev(bif->dev); -- 2.20.1
Re: [PATCH net-next v6 2/7] net: dsa: Add DSA driver for Hirschmann Hellcreek switches
On Tue Oct 06 2020, Vladimir Oltean wrote: > On Tue, Oct 06, 2020 at 03:23:36PM +0200, Kurt Kanzenbach wrote: >> So you're saying private VLANs can be used but the user or the other >> kernel modules shouldn't be allowed to use them to simplify the >> implementation? Makes sense to me. > > It would be interesting to see if you could simply turn off VLAN > awareness in standalone mode, and still use unique pvids per port. That doesn't work, just tested. When VLAN awareness is disabled, everything is switched regardless of VLAN tags and table. Therefore, the implementation could look like this: * bridge without filtering: * vlan_awareness=0 * drop private vlans * bridge with vlan filtering: * vlan_awareness=1 * drop private vlans * standalone: * vlan_awareness=1 * use private vlans * forbid other users to use private vlans to allow configure_vlans_while_not_filtering behavior in .vlan_prepare() * forbid use of lan0. and lan1. in .port_prechangeupper() So, this should work, or? Thanks, Kurt signature.asc Description: PGP signature
Re: [PATCH net-next v2 1/2] Makefile.extrawarn: Add symbol for W=1 warnings for today
On Tue, Oct 6, 2020 at 6:08 AM Andrew Lunn wrote: > > > It depends a lot on what portion of the kernel gets enabled for W=1. > > > > As long as it's only drivers that are actively maintained, and they > > make up a fairly small portion of all code, it should not be a problem > > to find someone to fix useful warnings. > > Well, drivers/net/ethernet is around 1.5M LOC. The tree as a whole is > just short of 23M LOC. So i guess that is a small portion of all the > code. > > Andrew I am not a big fan of KBUILD_CFLAGS_W1_ since it is ugly. I'd like to start with adding individual flags like drivers/gpu/drm/i915/Makefile, and see how difficult it would be to maintain it. One drawback of your approach is that you cannot set KBUILD_CFLAGS_W1_20200930 until you eliminate all the warnings in the sub-directory in interest. (i.e. all or nothing approach) At best, you can only work out from 'old -> new' order because KBUILD_CFLAGS_W1_20200326 is a suer-set of KBUILD_CFLAGS_W1_20190907, which is a suer-set of KBUILD_CFLAGS_W1_20190617 ... If you add flags individually, you can start with low-hanging fruits, or ones with higher priority as Arnd mentions about -Wmissing-{declaration,prototypes}. For example, you might be able to set 'subdir-ccflags-y += -Wmissing-declarations' to drivers/net/Makefile, while 'subdir-ccflags-y += -Wunused-but-set-variable' stays in drivers/net/ethernet/Makefile. -- Best Regards Masahiro Yamada
Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled
On 10.10.2020 17:22, Jakub Kicinski wrote: > On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote: >> On 09.10.2020 18:06, Heiner Kallweit wrote: >>> On 09.10.2020 17:58, Jakub Kicinski wrote: On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote: > I'm thinking about a __napi_schedule version that disables hard irq's > conditionally, based on variable force_irqthreads, exported by the irq > subsystem. This would allow to behave correctly with threadirqs set, > whilst not loosing the _irqoff benefit with threadirqs unset. > Let me come up with a proposal. I think you'd need to make napi_schedule_irqoff() behave like that, right? Are there any uses of napi_schedule_irqoff() that are disabling irqs and not just running from an irq handler? >>> Right, the best approach depends on the answer to the latter question. >>> I didn't check this yet, therefore I described the least intrusive approach. >>> >> >> With some help from coccinelle I identified the following functions that >> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run >> from an irq handler (at least not at the first glance). >> >> dpaa2_caam_fqdan_cb >> qede_simd_fp_handler >> mlx4_en_rx_irq >> mlx4_en_tx_irq > > Don't know the others but FWIW the mlx4 ones run from an IRQ handler, > AFAICT: > > static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr) > static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr) > mlx4_eq_int() > mlx4_cq_completion > cq->comp() > >> qeth_qdio_poll >> netvsc_channel_cb >> napi_watchdog > > This one runs from a hrtimer, which I believe will be a hard irq > context on anything but RT. I could be wrong. > Typically forced irq threading will not be enabled, therefore going back to use napi_schedule() in drivers in most cases will cause losing the benefit of the irqoff version. Something like the following should be better. Only small drawback I see is that in case of forced irq threading hrtimers will still run in hardirq context and we lose the benefit of the irqoff version in napi_watchdog(). diff --git a/net/core/dev.c b/net/core/dev.c index a146bac84..7d18560b2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6393,7 +6393,11 @@ EXPORT_SYMBOL(napi_schedule_prep); */ void __napi_schedule_irqoff(struct napi_struct *n) { - napi_schedule(this_cpu_ptr(&softnet_data), n); + /* hard irqs may not be masked in case of forced irq threading */ + if (force_irqthreads) + __napi_schedule(n); + else + napi_schedule(this_cpu_ptr(&softnet_data), n); } EXPORT_SYMBOL(__napi_schedule_irqoff); -- 2.28.0
Re: [PATCH] mm: proc: add Sock to /proc/meminfo
On Sat, Oct 10, 2020 at 06:38:54PM +0800, Muchun Song wrote: > The amount of memory allocated to sockets buffer can become significant. > However, we do not display the amount of memory consumed by sockets > buffer. In this case, knowing where the memory is consumed by the kernel > is very difficult. On our server with 500GB RAM, sometimes we can see > 25GB disappear through /proc/meminfo. After our analysis, we found the > following memory allocation path which consumes the memory with page_owner > enabled. I have a high lelel question. There is accounting of the socket memory for memcg that gets called from the networking layer. Did you check if the same call sites can be used for the system-wide accounting as well? > 849698 times: > Page allocated via order 3, mask > 0x4052c0(GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP) >__alloc_pages_nodemask+0x11d/0x290 >skb_page_frag_refill+0x68/0xf0 >sk_page_frag_refill+0x19/0x70 >tcp_sendmsg_locked+0x2f4/0xd10 >tcp_sendmsg+0x29/0xa0 >sock_sendmsg+0x30/0x40 >sock_write_iter+0x8f/0x100 >__vfs_write+0x10b/0x190 >vfs_write+0xb0/0x190 >ksys_write+0x5a/0xd0 >do_syscall_64+0x5d/0x110 >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Signed-off-by: Muchun Song > --- > drivers/base/node.c | 2 ++ > drivers/net/virtio_net.c | 3 +-- Is virtio-net the only dirver that requred an update? > fs/proc/meminfo.c| 1 + > include/linux/mmzone.h | 1 + > include/linux/skbuff.h | 43 ++-- > kernel/exit.c| 3 +-- > mm/page_alloc.c | 7 +-- > mm/vmstat.c | 1 + > net/core/sock.c | 8 > net/ipv4/tcp.c | 3 +-- > net/xfrm/xfrm_state.c| 3 +-- > 11 files changed, 59 insertions(+), 16 deletions(-) >
Re: [Patch net] ip_gre: set dev->hard_header_len properly
On Sat, Oct 10, 2020 at 11:58 AM Cong Wang wrote: > > Well, AF_PACKET RAW socket is supposed to construct L2 headers > from the user buffer, and for a tunnel device these headers are indeed its > L2. If users do not want to do this, they can switch to DGRAM anyway. > > I know how inconvenient it is to construct a GRE tunnel header, I guess > this is why all other tunnel devices do not provide a header_ops::create. > GRE tunnel is not a special case, this is why I agree on removing its > ->create() although it does look like all tunnel devices should let users > construct headers by definition of RAW. > > Of course, it may be too late to change this behavior even if we really > want, users may already assume there is always no tunnel header needed > to construct. Sorry. I was re-thinking about whether we should expose certain headers in header_ops or not. Now my thoughts are a little bit different, so I want to re-reply to your email. 1. The first reason why I think we should not expose the GRE header in header_ops, is that in this way we can keep the un-exposed headers of GRE devices consistent with those of GRETAP devices. (This is just like the consistency between "universal TUN devices" and "universal TAP devices".) This is similar to what I wrote in my previous replies. However, after thinking I think this only applies to GRE/GRETAP but not to some other tunnel devices. For example, for PPP, we can't actually transmit Ethernet frames directly over PPP (but need to encapsulate them in BCP over PPP), so there's no way to keep the un-exposed headers of PPP devices consistent with those of "PPP TAP" (PPP bridging) devices anyway. 2. Second, I think the headers before the GRE header (the outer IP header and the "encap" header) should actually belong to the underlying network being tunneled through, and should not be viewed as L2 headers of the tunneling network. As for the GRE header, it might be better viewed as a "bridge" between the tunneled network and the tunneling network, and belongs neither of the two, so it should not be viewed as the L2 header of the tunneling network either. However, for PPP tunnels, I think the PPP header clearly belongs to the tunneling network's L2 header (because PPP can run directly over a hardware link, and even if it is used for a tunnel, it needs a specific underlying network's protocol, such as SSH or PPPoE, to transport). So I think it is better to expose the PPP header in header_ops.
Dear Friend
Hello Friend.I am Mrs.CHANTAL I am sending this brief letter to solicit your partnership to transfer $7.2 Million US Dollars.I shall send you more information and procedures when I receive positive response From you. Please send me a message in My private email address is ( mrschant...@gmail.com ) Best Regards Mrs.Chantal
Re: [PATCH net-next v6 2/7] net: dsa: Add DSA driver for Hirschmann Hellcreek switches
On Sun, Oct 11, 2020 at 02:29:08PM +0200, Kurt Kanzenbach wrote: > On Tue Oct 06 2020, Vladimir Oltean wrote: > > It would be interesting to see if you could simply turn off VLAN > > awareness in standalone mode, and still use unique pvids per port. > > That doesn't work, just tested. When VLAN awareness is disabled, > everything is switched regardless of VLAN tags and table. That's strange, do you happen to know where things are going wrong? I would expect: - port VLAN awareness is disabled, so any packet is classified to the port-based VLAN - the port-based VLAN is a private VLAN whose membership includes only that port, plus the CPU port - the switch does not forward packets towards a port that is not member of the packets' classified VLAN When VLAN awareness is disabled, are you able to cause packet drops by deleting the pvid of the ingress port? Therefore, can you confirm that lan1 is not a member of lan0's pvid, but the switch still forwards the packets to it? > Therefore, the implementation could look like this: > > * bridge without filtering: >* vlan_awareness=0 >* drop private vlans > * bridge with vlan filtering: >* vlan_awareness=1 >* drop private vlans > * standalone: >* vlan_awareness=1 >* use private vlans >* forbid other users to use private vlans to allow > configure_vlans_while_not_filtering behavior in .vlan_prepare() >* forbid use of lan0. and lan1. in .port_prechangeupper() > > So, this should work, or? Yes, this is an alternative that could work.
[PATCH] net/af_unix: Remove unused old_pid variable
Commit 109f6e39fa07c48f5801 ("af_unix: Allow SO_PEERCRED to work across namespaces.") introduced the old_pid variable in unix_listen, but it's never used. Remove the declaration and the call to put_pid. Signed-off-by: Or Cohen --- net/unix/af_unix.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 3385a7a0b231..26d3bf81186f 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -613,7 +613,6 @@ static int unix_listen(struct socket *sock, int backlog) int err; struct sock *sk = sock->sk; struct unix_sock *u = unix_sk(sk); - struct pid *old_pid = NULL; err = -EOPNOTSUPP; if (sock->type != SOCK_STREAM && sock->type != SOCK_SEQPACKET) @@ -634,7 +633,6 @@ static int unix_listen(struct socket *sock, int backlog) out_unlock: unix_state_unlock(sk); - put_pid(old_pid); out: return err; } -- 2.17.1
Re: [PATCH net-next v2 1/2] can-isotp: implement cleanups / improvements from review
On Sun, 11 Oct 2020 11:24:07 +0200 Oliver Hartkopp wrote: > diff --git a/net/can/isotp.c b/net/can/isotp.c > index e6ff032b5426..22187669c5c9 100644 > --- a/net/can/isotp.c > +++ b/net/can/isotp.c > @@ -79,6 +79,8 @@ MODULE_LICENSE("Dual BSD/GPL"); > MODULE_AUTHOR("Oliver Hartkopp "); > MODULE_ALIAS("can-proto-6"); > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt You need to move this before the includes: net/can/isotp.c:82: warning: "pr_fmt" redefined 82 | #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt | In file included from ../include/linux/kernel.h:15, from ../include/linux/list.h:9, from ../include/linux/module.h:12, from ../net/can/isotp.c:56: include/linux/printk.h:297: note: this is the location of the previous definition 297 | #define pr_fmt(fmt) fmt | net/can/isotp.c:82:9: warning: preprocessor token pr_fmt redefined net/can/isotp.c: note: in included file (through ../include/linux/kernel.h, ../include/linux/list.h, ../include/linux/module.h): include/linux/printk.h:297:9: this was the original definition
Re: [PATCH net-next v2 1/2] can-isotp: implement cleanups / improvements from review
On Sun, 11 Oct 2020 11:24:07 +0200 Oliver Hartkopp wrote: > @@ -769,7 +771,7 @@ static enum hrtimer_restart isotp_tx_timer_handler(struct > hrtimer *hrtimer) > > isotp_tx_burst: > skb = alloc_skb(so->ll.mtu + sizeof(struct can_skb_priv), > - gfp_any()); > + GFP_KERNEL); hrtimer will need GFP_ATOMIC
Re: [PATCH net-next 1/2] dt-bindings: net: dsa: b53: Add YAML bindings
On 10/11/2020 1:32 AM, Kurt Kanzenbach wrote: On Sat Oct 10 2020, Florian Fainelli wrote: On 10/10/2020 9:46 AM, Kurt Kanzenbach wrote: Convert the b53 DSA device tree bindings to YAML in order to allow for automatic checking and such. Suggested-by: Florian Fainelli Signed-off-by: Kurt Kanzenbach Thanks for making this change, there are quite a few warnings that are going to show up because the binding was defined in a way that it would define chip compatible strings, which not all DTS files are using. Oh, I didn't know there is a second make command for doing the actual check against the dtbs. I've just used `make dt_binding_check'. So, it seems like a lot of the errors are caused by the include files such as [linux]/arch/arm/boot/dts/bcm5301x.dtsi srab: srab@18007000 { compatible = "brcm,bcm5301x-srab"; reg = <0x18007000 0x1000>; status = "disabled"; /* ports are defined in board DTS */ }; The nodename should be "switch" not "srab" as enforced by dsa.yaml. Furthermore, some DTS files are not adding the chip specific compatible strings and the ports leading to more errors. There are also some minor errors regarding the reg-names and such for specific instances. How should we proceed? Adding the missing compatible strings and ports to the DTS files? Or adjusting the include files? The include is correct as it provides the fallback family string which is what the driver will be looking for unless we do not provide a chip compatible. The various DTS should be updated to contain both the chip compatible and the fallback family (brcm,bcm5301x-srab) string, I will update the various DTS and submit these for review later next week. Then we could imagine me taking this YAML change through the Broadcom ARM SoC pull requests that way no new regressions are introduced. Sounds good? -- Florian
[PATCH] rtw88: fix fw_fifo_addr check
From: Tom Rix The clang build reports this warning fw.c:1485:21: warning: address of array 'rtwdev->chip->fw_fifo_addr' will always evaluate to 'true' if (!rtwdev->chip->fw_fifo_addr) { fw_fifo_addr is an array in rtw_chip_info so it is always nonzero. A better check is if the first element of the array is nonzero. In the cases where fw_fifo_addr is initialized by rtw88b and rtw88c, the first array element is 0x780. Signed-off-by: Tom Rix --- drivers/net/wireless/realtek/rtw88/fw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtw88/fw.c b/drivers/net/wireless/realtek/rtw88/fw.c index 042015bc8055..b2fd87834f23 100644 --- a/drivers/net/wireless/realtek/rtw88/fw.c +++ b/drivers/net/wireless/realtek/rtw88/fw.c @@ -1482,7 +1482,7 @@ static bool rtw_fw_dump_check_size(struct rtw_dev *rtwdev, int rtw_fw_dump_fifo(struct rtw_dev *rtwdev, u8 fifo_sel, u32 addr, u32 size, u32 *buffer) { - if (!rtwdev->chip->fw_fifo_addr) { + if (!rtwdev->chip->fw_fifo_addr[0]) { rtw_dbg(rtwdev, RTW_DBG_FW, "chip not support dump fw fifo\n"); return -ENOTSUPP; } -- 2.18.1
Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled
On Sun, 11 Oct 2020 15:42:24 +0200 Heiner Kallweit wrote: > On 10.10.2020 17:22, Jakub Kicinski wrote: > > On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote: > >> On 09.10.2020 18:06, Heiner Kallweit wrote: > >>> On 09.10.2020 17:58, Jakub Kicinski wrote: > On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote: > > I'm thinking about a __napi_schedule version that disables hard irq's > > conditionally, based on variable force_irqthreads, exported by the irq > > subsystem. This would allow to behave correctly with threadirqs set, > > whilst not loosing the _irqoff benefit with threadirqs unset. > > Let me come up with a proposal. > > I think you'd need to make napi_schedule_irqoff() behave like that, > right? Are there any uses of napi_schedule_irqoff() that are disabling > irqs and not just running from an irq handler? > > >>> Right, the best approach depends on the answer to the latter question. > >>> I didn't check this yet, therefore I described the least intrusive > >>> approach. > >>> > >> > >> With some help from coccinelle I identified the following functions that > >> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run > >> from an irq handler (at least not at the first glance). > >> > >> dpaa2_caam_fqdan_cb > >> qede_simd_fp_handler > >> mlx4_en_rx_irq > >> mlx4_en_tx_irq > > > > Don't know the others but FWIW the mlx4 ones run from an IRQ handler, > > AFAICT: > > > > static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr) > > static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr) > > mlx4_eq_int() > > mlx4_cq_completion > > cq->comp() > > > >> qeth_qdio_poll > >> netvsc_channel_cb > >> napi_watchdog > > > > This one runs from a hrtimer, which I believe will be a hard irq > > context on anything but RT. I could be wrong. > > > > Typically forced irq threading will not be enabled, therefore going > back to use napi_schedule() in drivers in most cases will cause > losing the benefit of the irqoff version. Something like the following > should be better. Only small drawback I see is that in case of forced > irq threading hrtimers will still run in hardirq context and we lose > the benefit of the irqoff version in napi_watchdog(). > > diff --git a/net/core/dev.c b/net/core/dev.c > index a146bac84..7d18560b2 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6393,7 +6393,11 @@ EXPORT_SYMBOL(napi_schedule_prep); > */ > void __napi_schedule_irqoff(struct napi_struct *n) > { > - napi_schedule(this_cpu_ptr(&softnet_data), n); > + /* hard irqs may not be masked in case of forced irq threading */ > + if (force_irqthreads) > + __napi_schedule(n); > + else > + napi_schedule(this_cpu_ptr(&softnet_data), n); > } > EXPORT_SYMBOL(__napi_schedule_irqoff); Does if (force_irqthreads) local_irq_save(flags); napi_schedule(this_cpu_ptr(&softnet_data), n); if (force_irqthreads) local_irq_restore(flags); not produce more concise assembly?
Re: [External] Re: [PATCH] mm: proc: add Sock to /proc/meminfo
On Sun, Oct 11, 2020 at 9:53 PM Mike Rapoport wrote: > > On Sat, Oct 10, 2020 at 06:38:54PM +0800, Muchun Song wrote: > > The amount of memory allocated to sockets buffer can become significant. > > However, we do not display the amount of memory consumed by sockets > > buffer. In this case, knowing where the memory is consumed by the kernel > > is very difficult. On our server with 500GB RAM, sometimes we can see > > 25GB disappear through /proc/meminfo. After our analysis, we found the > > following memory allocation path which consumes the memory with page_owner > > enabled. > > I have a high lelel question. > There is accounting of the socket memory for memcg that gets called from > the networking layer. Did you check if the same call sites can be used > for the system-wide accounting as well? I also think about this. But we did not pass the `struct page` parameter to the sock accounting memcg API. So we did not know the NUMA node which allocated the socket buffer memory and cannot do node-level statistics. In addition, there is another problem. If the user sends a 4096-byte message, we only charge one page to the memcg but the system allocates 8 pages. So if we reuse the same call sites for the system-wide accounting, the statistical count we get is always smaller than the actual situation. > > > 849698 times: > > Page allocated via order 3, mask > > 0x4052c0(GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP) > >__alloc_pages_nodemask+0x11d/0x290 > >skb_page_frag_refill+0x68/0xf0 > >sk_page_frag_refill+0x19/0x70 > >tcp_sendmsg_locked+0x2f4/0xd10 > >tcp_sendmsg+0x29/0xa0 > >sock_sendmsg+0x30/0x40 > >sock_write_iter+0x8f/0x100 > >__vfs_write+0x10b/0x190 > >vfs_write+0xb0/0x190 > >ksys_write+0x5a/0xd0 > >do_syscall_64+0x5d/0x110 > >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > Signed-off-by: Muchun Song > > --- > > drivers/base/node.c | 2 ++ > > drivers/net/virtio_net.c | 3 +-- > > Is virtio-net the only dirver that requred an update? Yeah, only virtio-net needs an update. Because only it uses the skb_page_frag_refill() API. > > > fs/proc/meminfo.c| 1 + > > include/linux/mmzone.h | 1 + > > include/linux/skbuff.h | 43 ++-- > > kernel/exit.c| 3 +-- > > mm/page_alloc.c | 7 +-- > > mm/vmstat.c | 1 + > > net/core/sock.c | 8 > > net/ipv4/tcp.c | 3 +-- > > net/xfrm/xfrm_state.c| 3 +-- > > 11 files changed, 59 insertions(+), 16 deletions(-) > > -- Yours, Muchun
Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled
On Sun, 11 Oct 2020 11:24:41 +0200 Heiner Kallweit wrote: > >> qeth_qdio_poll > >> netvsc_channel_cb > >> napi_watchdog > > > > This one runs from a hrtimer, which I believe will be a hard irq > > context on anything but RT. I could be wrong. > > > > A similar discussion can be found e.g. here: > https://lore.kernel.org/netdev/20191126222013.1904785-1-bige...@linutronix.de/ > However I don't see any actual outcome. Interesting, hopefully Eric will chime in. I think the hrtimer issue was solved. But I'm not actually seeing a lockdep_assert_irqs_disabled() in __raise_softirq_irqoff() in net, so IDK what that's for? In any case if NAPI thinks it has irqs off while they're not, and interacts with other parts of the kernel we may be in for a game of whack-a-mole. Perhaps a way around touching force_irqthreads directly in net/ would be some form of a helper like "theaded_local_irq_save" or such that'd disable IRQs only if force_irqthreads == 1? Is that cheating? :)
Re: [PATCH net-next v2] ne2k: Enable RW-Bugfix
On Sat, 10 Oct 2020 18:17:11 +0200 Armin Wolf wrote: > Correct a typo in ne.c and ne2k-pci.c which > prevented activation of the RW-Bugfix. > Also enable the RW-Bugfix by default since > not doing so could (according to the Datasheet) > cause the system to lock up with some chips. > > Signed-off-by: Armin Wolf > --- > v2 changes: > - change NE8390_RW_BUGFIX to NE_RW_BUGFIX I'm not applying this, sorry. It's been this way since Linux 1.2 and you're giving no information about why the change is needed in practice.
Re: [Linux-kernel-mentees] [PATCH net] ethtool: strset: Fix out of bound read in strset_parse_request()
On Sun, 11 Oct 2020 02:39:29 +0530 Anmol Karn wrote: > Flag ``ETHTOOL_A_STRSET_COUNTS_ONLY`` tells the kernel to only return the > string > counts of the sets, but, when req_info->counts_only tries to read the > tb[ETHTOOL_A_STRSET_COUNTS_ONLY] it gets out of bound. > > - net/ethtool/strset.c > The bug seems to trigger in this line: > > req_info->counts_only = tb[ETHTOOL_A_STRSET_COUNTS_ONLY]; > > Fix it by NULL checking for req_info->counts_only while > reading from tb[ETHTOOL_A_STRSET_COUNTS_ONLY]. > > Reported-by: syzbot+9d1389df89299fa36...@syzkaller.appspotmail.com > Link: > https://syzkaller.appspot.com/bug?id=730deff8fe9954a5e317924d9acff98d9c64a770 > Signed-off-by: Anmol Karn I think the correct fix for this was already applied to net-next as: commit db972e532518 ("ethtool: strset: allow ETHTOOL_A_STRSET_COUNTS_ONLY attr")
Re: [PATCH bpf-next v6 2/6] bpf: add redirect_peer helper
On 10/11/20 11:22 AM, Jesper Dangaard Brouer wrote: On Sun, 11 Oct 2020 01:40:02 +0200 Daniel Borkmann wrote: Add an efficient ingress to ingress netns switch that can be used out of tc BPF programs in order to redirect traffic from host ns ingress into a container veth device ingress without having to go via CPU backlog queue [0]. For local containers this can also be utilized and path via CPU backlog queue only needs to be taken once, not twice. On a high level this borrows from ipvlan which does similar switch in __netif_receive_skb_core() and then iterates via another_round. This helps to reduce latency for mentioned use cases. Pod to remote pod with redirect(), TCP_RR [1]: # percpu_netperf 10.217.1.33 RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 ) MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 ) STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 ) MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 ) P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 ) P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 ) P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 ) TRANSACTION_RATE: 32666.400 (per CPU:8152.200 8169.8428174.4398169.897 ) Pod to remote pod with redirect_peer(), TCP_RR: # percpu_netperf 10.217.1.33 RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 ) MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 ) STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 ) MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 ) P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 ) P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 ) P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 ) TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 ) This is awesome results and great work Daniel! :-) I wonder if we can also support this from XDP, which can also native redirect into veth. Originally I though we could add the peer netdev in the devmap, but AFAIK Toke showed me that this was not possible. I think it should be possible with similar principle. What was the limitation that you ran into with devmap for XDP? [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf Signed-off-by: Daniel Borkmann --- drivers/net/veth.c | 9 ++ include/linux/netdevice.h | 4 +++ include/uapi/linux/bpf.h | 17 +++ net/core/dev.c | 15 -- net/core/filter.c | 54 +- tools/include/uapi/linux/bpf.h | 17 +++ 6 files changed, 106 insertions(+), 10 deletions(-) [...] diff --git a/net/core/dev.c b/net/core/dev.c index 9d55bf5d1a65..7dd015823593 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4930,7 +4930,7 @@ EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook); static inline struct sk_buff * sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, - struct net_device *orig_dev) + struct net_device *orig_dev, bool *another) { #ifdef CONFIG_NET_CLS_ACT struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress); @@ -4974,7 +4974,11 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, * redirecting to another netdev */ __skb_push(skb, skb->mac_len); - skb_do_redirect(skb); + if (skb_do_redirect(skb) == -EAGAIN) { + __skb_pull(skb, skb->mac_len); + *another = true; + break; + } return NULL; case TC_ACT_CONSUMED: return NULL; @@ -5163,7 +5167,12 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc, skip_taps: #ifdef CONFIG_NE
[PATCH v2] net: usb: rtl8150: don't incorrectly assign random MAC addresses
In set_ethernet_addr(), if get_registers() succeeds, the ethernet address that was read must be copied over. Otherwise, a random ethernet address must be assigned. get_registers() returns 0 if successful, and negative error number otherwise. However, in set_ethernet_addr(), this return value is incorrectly checked. Since this return value will never be equal to sizeof(node_id), a random MAC address will always be generated and assigned to the device; even in cases when get_registers() is successful. Correctly modifying the condition that checks if get_registers() was successful or not fixes this problem, and copies the ethernet address appropriately. Fixes: f45a4248ea4c ("net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails") Signed-off-by: Anant Thazhemadam --- Changes in v2: * Fixed the format of the Fixes tag * Modified the commit message to better describe the issue being fixed +CCing Stephen and linux-next, since the commit fixed isn't in the networking tree, but is present in linux-next. drivers/net/usb/rtl8150.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/rtl8150.c b/drivers/net/usb/rtl8150.c index f020401adf04..bf8a60533f3e 100644 --- a/drivers/net/usb/rtl8150.c +++ b/drivers/net/usb/rtl8150.c @@ -261,7 +261,7 @@ static void set_ethernet_addr(rtl8150_t *dev) ret = get_registers(dev, IDR, sizeof(node_id), node_id); - if (ret == sizeof(node_id)) { + if (!ret) { ether_addr_copy(dev->netdev->dev_addr, node_id); } else { eth_hw_addr_random(dev->netdev); -- 2.25.1
Re: [PATCH bpf-next v6 0/6] Follow-up BPF helper improvements
Hello: This series was applied to bpf/bpf-next.git (refs/heads/master): On Sun, 11 Oct 2020 01:40:00 +0200 you wrote: > This series addresses most of the feedback [0] that was to be followed > up from the last series, that is, UAPI helper comment improvements and > getting rid of the ifindex obj file hacks in the selftest by using a > BPF map instead. The __sk_buff data/data_end pointer work, I'm planning > to do in a later round as well as the mem*() BPF improvements we have > in Cilium for libbpf. Next, the series adds two features, i) a helper > called redirect_peer() to improve latency on netns switch, and ii) to > allow map in map with dynamic inner array map sizes. Selftests for each > are added as well. For details, please check individual patches, thanks! > > [...] Here is the summary with links: - [bpf-next,v6,1/6] bpf: improve bpf_redirect_neigh helper description https://git.kernel.org/bpf/bpf-next/c/dd2ce6a5373c - [bpf-next,v6,2/6] bpf: add redirect_peer helper https://git.kernel.org/bpf/bpf-next/c/9aa1206e8f48 - [bpf-next,v6,3/6] bpf: allow for map-in-map with dynamic inner array map entries https://git.kernel.org/bpf/bpf-next/c/4a8f87e60f6d - [bpf-next,v6,4/6] bpf, selftests: add test for different array inner map size https://git.kernel.org/bpf/bpf-next/c/6775dab73bdc - [bpf-next,v6,5/6] bpf, selftests: make redirect_neigh test more extensible https://git.kernel.org/bpf/bpf-next/c/57a73fe7c198 - [bpf-next,v6,6/6] bpf, selftests: add redirect_peer selftest https://git.kernel.org/bpf/bpf-next/c/9f4c53ca23a2 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH v2] net: usb: rtl8150: don't incorrectly assign random MAC addresses
On Sun, 11 Oct 2020 23:00:30 +0530 Anant Thazhemadam wrote: > In set_ethernet_addr(), if get_registers() succeeds, the ethernet address > that was read must be copied over. Otherwise, a random ethernet address > must be assigned. > > get_registers() returns 0 if successful, and negative error number > otherwise. However, in set_ethernet_addr(), this return value is > incorrectly checked. > > Since this return value will never be equal to sizeof(node_id), a > random MAC address will always be generated and assigned to the > device; even in cases when get_registers() is successful. > > Correctly modifying the condition that checks if get_registers() was > successful or not fixes this problem, and copies the ethernet address > appropriately. > > Fixes: f45a4248ea4c ("net: usb: rtl8150: set random MAC address when > set_ethernet_addr() fails") > Signed-off-by: Anant Thazhemadam This patch is a fix to a conflict resolution in linux-next. linux-next is not a "real" tree, it's an integration tree used to figure out conflicts early. We had let Stephen know about the problem already. Please wait one week, and if the problem is still present resend this. Thank you.
Re: [PATCH net-next v2 1/2] can-isotp: implement cleanups / improvements from review
Hi Oliver, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Oliver-Hartkopp/can-isotp-implement-cleanups-improvements-from-review/20201012-55 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git df41c19abbeaae6e24c8e31cff5fdb48632be380 config: xtensa-allyesconfig (attached as .config) compiler: xtensa-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/693c9b4e8c5710e7b536ec9d6a969a7d7343a100 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Oliver-Hartkopp/can-isotp-implement-cleanups-improvements-from-review/20201012-55 git checkout 693c9b4e8c5710e7b536ec9d6a969a7d7343a100 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=xtensa If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> net/can/isotp.c:82: warning: "pr_fmt" redefined 82 | #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt | In file included from include/linux/kernel.h:15, from include/linux/list.h:9, from include/linux/module.h:12, from net/can/isotp.c:56: include/linux/printk.h:297: note: this is the location of the previous definition 297 | #define pr_fmt(fmt) fmt | vim +/pr_fmt +82 net/can/isotp.c 81 > 82 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 83 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH net-next v2 0/4] enetc: Migrate to PHYLINK and PCS_LYNX
On Wed, 7 Oct 2020 12:48:19 +0300 Claudiu Manoil wrote: > Transitioning the enetc driver from phylib to phylink. > Offloading the serdes configuration to the PCS_LYNX > module is a mandatory part of this transition. Aiming > for a cleaner, more maintainable design, and better > code reuse. > The first 2 patches are clean up prerequisites. > > Tested on a p1028rdb board. > > v2: validate() explicitly rejects now all interface modes not > supported by the driver instead of relying on the device tree > to provide only supported interfaces, and dropped redundant > activation of pcs_poll (addressing Ioana's findings) Applied, thank you!
Re: [PATCH net-next 0/3] Offload tc-vlan mangle to mscc_ocelot switch
On Thu, 8 Oct 2020 14:56:57 +0300 Vladimir Oltean wrote: > This series offloads one more action to the VCAP IS1 ingress TCAM, which > is to change the classified VLAN for packets, according to the VCAP IS1 > keys (VLAN, source MAC, source IP, EtherType, etc). Applied, thanks!
Re: [ PATCH v1 2/2] ibmveth: Identify ingress large send packets.
On Sat, 10 Oct 2020 12:51:30 -0400 Willem de Bruijn wrote: > > > @@ -1385,7 +1386,17 @@ static int ibmveth_poll(struct napi_struct *napi, > > > int budget) > > > skb_put(skb, length); > > > skb->protocol = eth_type_trans(skb, netdev); > > > > > > - if (length > netdev->mtu + ETH_HLEN) { > > > + /* PHYP without PLSO support places a -1 in the ip > > > +* checksum for large send frames. > > > +*/ > > > + if (be16_to_cpu(skb->protocol) == ETH_P_IP) { You can byteswap the constant, so its done at compilation time. > > > + struct iphdr *iph = (struct iphdr > > > *)skb->data; > > > + > > > + iph_check = iph->check; > > > > Check against truncated/bad packets. > > .. unless I missed context. Other code in this driver seems to peek in > the network and transport layer headers without additional geometry > and integrity checks, too. Good catch, even if we trust the hypervisor to only forward valid frames this needs to be at least mentioned in the commit message. Also please add Fixes tags to both patches.
Re: [PATCH v2] net: usb: rtl8150: don't incorrectly assign random MAC addresses
On Sun, 2020-10-11 at 10:59 -0700, Jakub Kicinski wrote: > On Sun, 11 Oct 2020 23:00:30 +0530 Anant Thazhemadam wrote: > > In set_ethernet_addr(), if get_registers() succeeds, the ethernet address > > that was read must be copied over. Otherwise, a random ethernet address > > must be assigned. > > > > get_registers() returns 0 if successful, and negative error number > > otherwise. However, in set_ethernet_addr(), this return value is > > incorrectly checked. > > > > Since this return value will never be equal to sizeof(node_id), a > > random MAC address will always be generated and assigned to the > > device; even in cases when get_registers() is successful. > > > > Correctly modifying the condition that checks if get_registers() was > > successful or not fixes this problem, and copies the ethernet address > > appropriately. There are many unchecked uses of set_registers and get_registers in this file. If failures are really expected, then it might be better to fix them up too. $ git grep -w '[gs]et_registers' drivers/net/usb/rtl8150.c drivers/net/usb/rtl8150.c:static int get_registers(rtl8150_t * dev, u16 indx, u16 size, void *data) drivers/net/usb/rtl8150.c:static int set_registers(rtl8150_t * dev, u16 indx, u16 size, const void *data) drivers/net/usb/rtl8150.c: set_registers(dev, PHYADD, sizeof(data), data); drivers/net/usb/rtl8150.c: set_registers(dev, PHYCNT, 1, &tmp); drivers/net/usb/rtl8150.c: get_registers(dev, PHYCNT, 1, data); drivers/net/usb/rtl8150.c: get_registers(dev, PHYDAT, 2, data); drivers/net/usb/rtl8150.c: set_registers(dev, PHYADD, sizeof(data), data); drivers/net/usb/rtl8150.c: set_registers(dev, PHYCNT, 1, &tmp); drivers/net/usb/rtl8150.c: get_registers(dev, PHYCNT, 1, data); drivers/net/usb/rtl8150.c: ret = get_registers(dev, IDR, sizeof(node_id), node_id); drivers/net/usb/rtl8150.c: set_registers(dev, IDR, netdev->addr_len, netdev->dev_addr); drivers/net/usb/rtl8150.c: get_registers(dev, CR, 1, &cr); drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &cr); drivers/net/usb/rtl8150.c: set_registers(dev, IDR_EEPROM + (i * 2), 2, drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &cr); drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &data); drivers/net/usb/rtl8150.c: get_registers(dev, CR, 1, &data); drivers/net/usb/rtl8150.c: set_registers(dev, RCR, 1, &rcr); drivers/net/usb/rtl8150.c: set_registers(dev, TCR, 1, &tcr); drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &cr); drivers/net/usb/rtl8150.c: get_registers(dev, MSR, 1, &msr); drivers/net/usb/rtl8150.c: get_registers(dev, CR, 1, &cr); drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &cr); drivers/net/usb/rtl8150.c: get_registers(dev, CSCR, 2, &tmp); drivers/net/usb/rtl8150.c: set_registers(dev, IDR, 6, netdev->dev_addr); drivers/net/usb/rtl8150.c: get_registers(dev, BMCR, 2, &bmcr); drivers/net/usb/rtl8150.c: get_registers(dev, ANLP, 2, &lpa);
Re: [PATCH net-next] mlx4: handle non-napi callers to napi_poll
On Thu, 8 Oct 2020 11:45:26 -0700 Jonathan Lemon wrote: > From: Jonathan Lemon > > netcons calls napi_poll with a budget of 0 to transmit packets. > Handle this by: > - skipping RX processing > - do not try to recycle TX packets to the RX cache > > Signed-off-by: Jonathan Lemon Tariq, Saeed - how does this look to you?
Re: [PATCH] net: korina: free array used for rx/tx descriptors
On Sun, Oct 11, 2020 at 7:46 AM Valentin Vidic wrote: > > Memory was not freed when driver is unloaded from the kernel. > > Signed-off-by: Valentin Vidic Makes sense. Fixes: ef11291bcd5f ("Add support the Korina (IDT RC32434) Ethernet MAC") Slightly off-topic, but I don't fully fathom what goes on with this pointer straight after the initial kmalloc. lp->td_ring = (struct dma_desc *)KSEG1ADDR(lp->td_ring); > --- > drivers/net/ethernet/korina.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c > index 03e034918d14..99146145f020 100644 > --- a/drivers/net/ethernet/korina.c > +++ b/drivers/net/ethernet/korina.c > @@ -1133,6 +1133,7 @@ static int korina_remove(struct platform_device *pdev) > iounmap(lp->eth_regs); > iounmap(lp->rx_dma_regs); > iounmap(lp->tx_dma_regs); > + kfree(lp->td_ring); > > unregister_netdev(bif->dev); > free_netdev(bif->dev); In general it is nice to release in reverse of acquire. But the driver already does not follow this practice.
Re: [PATCH] mm: proc: add Sock to /proc/meminfo
On Sat, Oct 10, 2020 at 3:39 AM Muchun Song wrote: > > The amount of memory allocated to sockets buffer can become significant. > However, we do not display the amount of memory consumed by sockets > buffer. In this case, knowing where the memory is consumed by the kernel We do it via `ss -m`. Is it not sufficient? And if not, why not adding it there rather than /proc/meminfo? > static inline void __skb_frag_unref(skb_frag_t *frag) > { > - put_page(skb_frag_page(frag)); > + struct page *page = skb_frag_page(frag); > + > + if (put_page_testzero(page)) { > + dec_sock_node_page_state(page); > + __put_page(page); > + } > } You mix socket page frag with skb frag at least, not sure this is exactly what you want, because clearly skb page frags are frequently used by network drivers rather than sockets. Also, which one matches this dec_sock_node_page_state()? Clearly not skb_fill_page_desc() or __skb_frag_ref(). Thanks.
Re: [PATCH v2 0/3] [PATCH v2 0/3] [PATCH v2 0/3] net, mac80211, kernel: enable KCOV remote coverage collection for 802.11 frame handling
On Fri, 2020-10-09 at 17:01 +, Aleksandr Nogikh wrote: > From: Aleksandr Nogikh > > This patch series enables remote KCOV coverage collection during > 802.11 frames processing. These changes make it possible to perform > coverage-guided fuzzing in search of remotely triggerable bugs. Btw, it occurred to me that I don't know at all - is this related to syzkaller? Or is there some other fuzzing you're working on? Can we get the bug reports from it if it's different? :) Also, unrelated to that (but I see Dmitry CC'ed), I started wondering if it'd be helpful to have an easier raw 802.11 inject path on top of say hwsim0; I noticed some syzbot reports where it created raw sockets, but that only gets you into the *data* plane of the wifi stack, not into the *management* plane. Theoretically you could add a monitor interface, but right now the wifi setup (according to the current docs on github) is using two IBSS interfaces. Perhaps an inject path on the mac80211-hwsim "hwsim0" interface would be something to consider? Or simply adding a third radio that's in "monitor" mode, so that a raw socket bound to *that* interface can inject with a radiotap header followed by an 802.11 frame, getting to arbitrary frame handling code, not just data frames. Any thoughts? johannes
Re: [PATCH] net: Add mhi-net driver
On Fri, 9 Oct 2020 22:33:31 +0200 Loic Poulain wrote: > This patch adds a new network driver implementing MHI transport for > network packets. Packets can be in any format, though QMAP (rmnet) > is the usual protocol (flow control + PDN mux). > > It support two MHI devices, IP_HW0 which is, the path to the IPA > (IP accelerator) on qcom modem, And IP_SW0 which is the software > driven IP path (to modem CPU). > > Signed-off-by: Loic Poulain > +static int mhi_ndo_xmit(struct sk_buff *skb, struct net_device *ndev) > +{ > + struct mhi_net_dev *mhi_netdev = netdev_priv(ndev); > + struct mhi_device *mdev = mhi_netdev->mdev; > + int err; > + > + /* Only support for single buffer transfer for now */ > + err = skb_linearize(skb); Since you don't advertise NETIF_F_SG you shouldn't have to call this, no? > + if (unlikely(err)) { > + kfree_skb(skb); > + mhi_netdev->stats.tx_dropped++; > + return NETDEV_TX_OK; > + } > + > + skb_tx_timestamp(skb); > + > + /* mhi_queue_skb is not thread-safe, but xmit is serialized by the > + * network core. Once MHI core will be thread save, migrate to > + * NETIF_F_LLTX support. > + */ > + err = mhi_queue_skb(mdev, DMA_TO_DEVICE, skb, skb->len, MHI_EOT); > + if (err) { > + netdev_err(ndev, "mhi_queue_skb err %d\n", err); This needs to be at least rate limited. > + netif_stop_queue(ndev); What's going to start the queue if it's a transient errors and not NETDEV_TX_BUSY? > + return (err == -ENOMEM) ? NETDEV_TX_BUSY : err; You should drop the packet if it's not NETDEV_TX_BUSY, and return NETDEV_TX_OK. Don't return negative errors from ndo_xmit. > + } > + > + return NETDEV_TX_OK; > +} > + > +static void mhi_ndo_get_stats64(struct net_device *ndev, > + struct rtnl_link_stats64 *stats) > +{ > + struct mhi_net_dev *mhi_netdev = netdev_priv(ndev); > + > + stats->rx_packets = mhi_netdev->stats.rx_packets; > + stats->rx_bytes = mhi_netdev->stats.rx_bytes; > + stats->rx_errors = mhi_netdev->stats.rx_errors; > + stats->rx_dropped = mhi_netdev->stats.rx_dropped; > + stats->tx_packets = mhi_netdev->stats.tx_packets; > + stats->tx_bytes = mhi_netdev->stats.tx_bytes; > + stats->tx_errors = mhi_netdev->stats.tx_errors; > + stats->tx_dropped = mhi_netdev->stats.tx_dropped; > +} Can you use > +static void mhi_net_dl_callback(struct mhi_device *mhi_dev, > + struct mhi_result *mhi_res) > +{ > + struct mhi_net_dev *mhi_netdev = dev_get_drvdata(&mhi_dev->dev); > + struct sk_buff *skb = mhi_res->buf_addr; > + > + atomic_dec(&mhi_netdev->stats.rx_queued); > + > + if (mhi_res->transaction_status) { > + mhi_netdev->stats.rx_errors++; > + kfree_skb(skb); > + } else { > + mhi_netdev->stats.rx_packets++; > + mhi_netdev->stats.rx_bytes += mhi_res->bytes_xferd; > + > + skb->protocol = htons(ETH_P_MAP); > + skb_put(skb, mhi_res->bytes_xferd); > + netif_rx(skb); > + } > + > + schedule_delayed_work(&mhi_netdev->rx_refill, 0); Scheduling a work to replace every single RX buffer looks quite inefficient. Any chance you can do batching? I assume mhi_queue_skb() has to be able to sleep? > +static void mhi_net_rx_refill_work(struct work_struct *work) > +{ > + struct mhi_net_dev *mhi_netdev = container_of(work, struct mhi_net_dev, > + rx_refill.work); > + struct net_device *ndev = mhi_netdev->ndev; > + struct mhi_device *mdev = mhi_netdev->mdev; > + struct sk_buff *skb; > + int err; > + > + if (!netif_running(ndev)) > + return; How can this happen? You cancel the work from ndo_stop. > + do { > + skb = netdev_alloc_skb(ndev, READ_ONCE(ndev->mtu)); > + if (unlikely(!skb)) { > + /* If we are starved of RX buffers, retry later */ > + if (!atomic_read(&mhi_netdev->stats.rx_queued)) > + schedule_delayed_work(&mhi_netdev->rx_refill, > HZ / 2); > + break; > + } > + > + err = mhi_queue_skb(mdev, DMA_FROM_DEVICE, skb, ndev->mtu, > + MHI_EOT); > + if (err) { > + atomic_dec(&mhi_netdev->stats.rx_queued); This can never fail with an empty ring? No need to potentially reschedule the work here? > + kfree_skb(skb); > + break; > + } > + > + atomic_inc(&mhi_netdev->stats.rx_queued); > + > + } while (1); > +} > + > +static int mhi_net_probe(struct mhi_device *mhi_dev, > + const struct mhi_device_id *id) > +{ > + const char *netname = (char *)id->driver_data; > + struct mhi_net_dev *mhi_netdev; > + st
Re: [Linux-kernel-mentees] [PATCH net] ethtool: strset: Fix out of bound read in strset_parse_request()
Hello sir, On Sun, Oct 11, 2020 at 10:24 PM Jakub Kicinski wrote: > > On Sun, 11 Oct 2020 02:39:29 +0530 Anmol Karn wrote: > > Flag ``ETHTOOL_A_STRSET_COUNTS_ONLY`` tells the kernel to only return the > > string > > counts of the sets, but, when req_info->counts_only tries to read the > > tb[ETHTOOL_A_STRSET_COUNTS_ONLY] it gets out of bound. > > > > - net/ethtool/strset.c > > The bug seems to trigger in this line: > > > > req_info->counts_only = tb[ETHTOOL_A_STRSET_COUNTS_ONLY]; > > > > Fix it by NULL checking for req_info->counts_only while > > reading from tb[ETHTOOL_A_STRSET_COUNTS_ONLY]. > > > > Reported-by: syzbot+9d1389df89299fa36...@syzkaller.appspotmail.com > > Link: > > https://syzkaller.appspot.com/bug?id=730deff8fe9954a5e317924d9acff98d9c64a770 > > Signed-off-by: Anmol Karn > > I think the correct fix for this was already applied to net-next as: > > commit db972e532518 ("ethtool: strset: allow ETHTOOL_A_STRSET_COUNTS_ONLY > attr") I am glad that it's fixed now. Thanks, Anmol
[Patch net v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly
GRE tunnel has its own header_ops, ipgre_header_ops, and sets it conditionally. When it is set, it assumes the outer IP header is already created before ipgre_xmit(). This is not true when we send packets through a raw packet socket, where L2 headers are supposed to be constructed by user. Packet socket calls dev_validate_header() to validate the header. But GRE tunnel does not set dev->hard_header_len, so that check can be simply bypassed, therefore uninit memory could be passed down to ipgre_xmit(). Similar for dev->needed_headroom. dev->hard_header_len is supposed to be the length of the header created by dev->header_ops->create(), so it should be used whenever header_ops is set, and dev->needed_headroom should be used when it is not set. Reported-and-tested-by: syzbot+4a2c52677a8a1aa28...@syzkaller.appspotmail.com Cc: Xie He Cc: William Tu Cc: Willem de Bruijn Signed-off-by: Cong Wang --- Note, there are some other suspicious use of dev->hard_header_len in the same file, but let's leave them to a separate patch if really needed. net/ipv4/ip_gre.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 4e31f23e4117..82fee0010353 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -626,8 +626,7 @@ static netdev_tx_t ipgre_xmit(struct sk_buff *skb, if (dev->header_ops) { /* Need space for new headers */ - if (skb_cow_head(skb, dev->needed_headroom - - (tunnel->hlen + sizeof(struct iphdr + if (skb_cow_head(skb, dev->hard_header_len)) goto free_skb; tnl_params = (const struct iphdr *)skb->data; @@ -748,7 +747,11 @@ static void ipgre_link_update(struct net_device *dev, bool set_mtu) len = tunnel->tun_hlen - len; tunnel->hlen = tunnel->hlen + len; - dev->needed_headroom = dev->needed_headroom + len; + if (dev->header_ops) + dev->hard_header_len += len; + else + dev->needed_headroom += len; + if (set_mtu) dev->mtu = max_t(int, dev->mtu - len, 68); @@ -944,6 +947,7 @@ static void __gre_tunnel_init(struct net_device *dev) tunnel->parms.iph.protocol = IPPROTO_GRE; tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen; + dev->needed_headroom = tunnel->hlen + sizeof(tunnel->parms.iph); dev->features |= GRE_FEATURES; dev->hw_features|= GRE_FEATURES; @@ -987,10 +991,14 @@ static int ipgre_tunnel_init(struct net_device *dev) return -EINVAL; dev->flags = IFF_BROADCAST; dev->header_ops = &ipgre_header_ops; + dev->hard_header_len = tunnel->hlen + sizeof(*iph); + dev->needed_headroom = 0; } #endif } else if (!tunnel->collect_md) { dev->header_ops = &ipgre_header_ops; + dev->hard_header_len = tunnel->hlen + sizeof(*iph); + dev->needed_headroom = 0; } return ip_tunnel_init(dev); -- 2.28.0
Re: BUG: unable to handle kernel paging request in tcf_action_dump_terse
#syz fix: net_sched: check error pointer in tcf_dump_walker()
Re: [PATCH] net: Avoid allocing memory on memoryless numa node
On Sun, 11 Oct 2020 12:11:40 +0800 Xianting Tian wrote: > In architecture like powerpc, we can have cpus without any local memory > attached to it. In such cases the node does not have real memory. > > Use local_memory_node(), which is guaranteed to have memory. > local_memory_node is a noop in other architectures that does not support > memoryless nodes. > > Signed-off-by: Xianting Tian > --- > net/core/dev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 266073e30..dcb4533ef 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2590,7 +2590,7 @@ static struct xps_map *expand_xps_map(struct xps_map > *map, int attr_index, > new_map = kzalloc(XPS_MAP_SIZE(alloc_len), GFP_KERNEL); > else > new_map = kzalloc_node(XPS_MAP_SIZE(alloc_len), GFP_KERNEL, > -cpu_to_node(attr_index)); > + > local_memory_node(cpu_to_node(attr_index))); > if (!new_map) > return NULL; > Are we going to patch all kmalloc_node() callers now to apply local_memory_node()? Can't the allocator take care of this?
Re: [PATCH net-next] cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
On Thu, 8 Oct 2020 19:30:15 +0530 Ayush Sawal wrote: > @@ -140,8 +141,8 @@ static int ch_ipsec_uld_state_change(void *handle, enum > cxgb4_state new_state) > return 0; > } > > -static inline int chcr_ipsec_setauthsize(struct xfrm_state *x, > - struct ipsec_sa_entry *sa_entry) > +static inline int ch_ipsec_setauthsize(struct xfrm_state *x, > +struct ipsec_sa_entry *sa_entry) > { > int hmac_ctrl; > int authsize = x->aead->alg_icv_len / 8; > @@ -164,8 +165,8 @@ static inline int chcr_ipsec_setauthsize(struct > xfrm_state *x, > return hmac_ctrl; > } > > -static inline int chcr_ipsec_setkey(struct xfrm_state *x, > - struct ipsec_sa_entry *sa_entry) > +static inline int ch_ipsec_setkey(struct xfrm_state *x, > + struct ipsec_sa_entry *sa_entry) Please remove the inline keywords while at it, and let the compiler decide what to inline. > { > int keylen = (x->aead->alg_key_len + 7) / 8; > unsigned char *key = x->aead->alg_key; > if (x->props.aalgo != SADB_AALG_NONE) { > - pr_debug("CHCR: Cannot offload authenticated xfrm states\n"); > + pr_debug("CH_IPSEC: Cannot offload authenticated xfrm > states\n"); > return -EINVAL; > } > if (x->props.calgo != SADB_X_CALG_NONE) { > - pr_debug("CHCR: Cannot offload compressed xfrm states\n"); > + pr_debug("CH_IPSEC: Cannot offload compressed xfrm states\n"); > return -EINVAL; > } > if (x->props.family != AF_INET && > x->props.family != AF_INET6) { > - pr_debug("CHCR: Only IPv4/6 xfrm state offloaded\n"); > + pr_debug("CH_IPSEC: Only IPv4/6 xfrm state offloaded\n"); > return -EINVAL; > } > if (x->props.mode != XFRM_MODE_TRANSPORT && > x->props.mode != XFRM_MODE_TUNNEL) { > - pr_debug("CHCR: Only transport and tunnel xfrm offload\n"); > + pr_debug("CH_IPSEC: Only transport and tunnel xfrm offload\n"); > return -EINVAL; > } > if (x->id.proto != IPPROTO_ESP) { > - pr_debug("CHCR: Only ESP xfrm state offloaded\n"); > + pr_debug("CH_IPSEC: Only ESP xfrm state offloaded\n"); > return -EINVAL; > } > if (x->encap) { > - pr_debug("CHCR: Encapsulated xfrm state not offloaded\n"); > + pr_debug("CH_IPSEC: Encapsulated xfrm state not offloaded\n"); > return -EINVAL; > } > if (!x->aead) { > - pr_debug("CHCR: Cannot offload xfrm states without aead\n"); > + pr_debug("CH_IPSEC: Cannot offload xfrm states without aead\n"); Why is this printing the "CH_IPSEC: " prefix if you already have: +#define pr_fmt(fmt) "ch_ipsec: " fmt ? > return -EINVAL; > } > if (x->aead->alg_icv_len != 128 && > x->aead->alg_icv_len != 96) { > - pr_debug("CHCR: Cannot offload xfrm states with AEAD ICV length > other than 96b & 128b\n"); > + pr_debug("CH_IPSEC: Cannot offload xfrm states with AEAD ICV > length other than 96b & 128b\n"); > return -EINVAL; > } > if ((x->aead->alg_key_len != 128 + 32) && > (x->aead->alg_key_len != 256 + 32)) { > - pr_debug("CHCR: Cannot offload xfrm states with AEAD key length > other than 128/256 bit\n"); > + pr_debug("CH_IPSEC: Cannot offload xfrm states with AEAD key > length other than 128/256 bit\n"); > return -EINVAL; > } > if (x->tfcpad) { > - pr_debug("CHCR: Cannot offload xfrm states with tfc padding\n"); > + pr_debug("CH_IPSEC: Cannot offload xfrm states with tfc > padding\n"); > return -EINVAL; > } > if (!x->geniv) { > - pr_debug("CHCR: Cannot offload xfrm states without geniv\n"); > + pr_debug("CH_IPSEC: Cannot offload xfrm states without > geniv\n"); > return -EINVAL; > } > if (strcmp(x->geniv, "seqiv")) { > - pr_debug("CHCR: Cannot offload xfrm states with geniv other > than seqiv\n"); > + pr_debug("CH_IPSEC: Cannot offload xfrm states with geniv other > than seqiv\n"); > return -EINVAL; > } > @@ -763,7 +764,7 @@ out_free: dev_kfree_skb_any(skb); > before = (u64 *)pos; > end = (u64 *)pos + flits; > /* Setup IPSec CPL */ > - pos = (void *)chcr_crypto_wreq(skb, dev, (void *)pos, > + pos = (void *)ch_ipsec_crypto_wreq(skb, dev, (void *)pos, > credits, sa_entry); The continuation line needs to be adjusted to match the position of opening parenthesis. > if (before > (u64 *)pos) { > left = (u8 *)end - (u8 *)q->q.stat;
Re: [PATCH v2] net: usb: rtl8150: don't incorrectly assign random MAC addresses
On 20-10-11 11:33:00, Joe Perches wrote: > On Sun, 2020-10-11 at 10:59 -0700, Jakub Kicinski wrote: > > On Sun, 11 Oct 2020 23:00:30 +0530 Anant Thazhemadam wrote: > > > In set_ethernet_addr(), if get_registers() succeeds, the ethernet address > > > that was read must be copied over. Otherwise, a random ethernet address > > > must be assigned. > > > > > > get_registers() returns 0 if successful, and negative error number > > > otherwise. However, in set_ethernet_addr(), this return value is > > > incorrectly checked. > > > > > > Since this return value will never be equal to sizeof(node_id), a > > > random MAC address will always be generated and assigned to the > > > device; even in cases when get_registers() is successful. > > > > > > Correctly modifying the condition that checks if get_registers() was > > > successful or not fixes this problem, and copies the ethernet address > > > appropriately. > > There are many unchecked uses of set_registers and get_registers > in this file. > > If failures are really expected, then it might be better to fix > them up too. Checking the return value of each get/set_registers() is going to be a PITA and not very helpful. Doing so when setting the MAC address _does_ make sense as in that case it is not a hard error. In almost all other occasions if usb_control_msg_send/recv() return an error i'd rather dump an error message (from within get/set_registers()) and let the user decide whether to get rid of this adapter or start debugging it. cheers, Petko > $ git grep -w '[gs]et_registers' drivers/net/usb/rtl8150.c > drivers/net/usb/rtl8150.c:static int get_registers(rtl8150_t * dev, u16 indx, > u16 size, void *data) > drivers/net/usb/rtl8150.c:static int set_registers(rtl8150_t * dev, u16 indx, > u16 size, const void *data) > drivers/net/usb/rtl8150.c: set_registers(dev, PHYADD, sizeof(data), > data); > drivers/net/usb/rtl8150.c: set_registers(dev, PHYCNT, 1, &tmp); > drivers/net/usb/rtl8150.c: get_registers(dev, PHYCNT, 1, data); > drivers/net/usb/rtl8150.c: get_registers(dev, PHYDAT, 2, data); > drivers/net/usb/rtl8150.c: set_registers(dev, PHYADD, sizeof(data), > data); > drivers/net/usb/rtl8150.c: set_registers(dev, PHYCNT, 1, &tmp); > drivers/net/usb/rtl8150.c: get_registers(dev, PHYCNT, 1, data); > drivers/net/usb/rtl8150.c: ret = get_registers(dev, IDR, > sizeof(node_id), node_id); > drivers/net/usb/rtl8150.c: set_registers(dev, IDR, netdev->addr_len, > netdev->dev_addr); > drivers/net/usb/rtl8150.c: get_registers(dev, CR, 1, &cr); > drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &cr); > drivers/net/usb/rtl8150.c: set_registers(dev, IDR_EEPROM + (i * > 2), 2, > drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &cr); > drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &data); > drivers/net/usb/rtl8150.c: get_registers(dev, CR, 1, &data); > drivers/net/usb/rtl8150.c: set_registers(dev, RCR, 1, &rcr); > drivers/net/usb/rtl8150.c: set_registers(dev, TCR, 1, &tcr); > drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &cr); > drivers/net/usb/rtl8150.c: get_registers(dev, MSR, 1, &msr); > drivers/net/usb/rtl8150.c: get_registers(dev, CR, 1, &cr); > drivers/net/usb/rtl8150.c: set_registers(dev, CR, 1, &cr); > drivers/net/usb/rtl8150.c: get_registers(dev, CSCR, 2, &tmp); > drivers/net/usb/rtl8150.c: set_registers(dev, IDR, 6, netdev->dev_addr); > drivers/net/usb/rtl8150.c: get_registers(dev, BMCR, 2, &bmcr); > drivers/net/usb/rtl8150.c: get_registers(dev, ANLP, 2, &lpa); > > >
Re: [PATCH net-next 3/9] bnxt_en: Set driver default message level.
On Sun, 11 Oct 2020 06:22:55 -0400 Michael Chan wrote: > Currently, bp->msg_enable has default value of 0. It is more useful > to have the commonly used NETIF_MSG_DRV and NETIF_MSG_HW enabled by > default. > > Reviewed-by: Edwin Peer > Reviewed-by: Vasundhara Volam > Signed-off-by: Michael Chan This will add whole bunch of output for "RX buffer error 4000[45]", no? That one needs to switch to a silent reset first, I'd think.
[PATCH net-next 00/12] net: add and use function dev_fetch_sw_netstats for fetching pcpu_sw_netstats
In several places the same code is used to populate rtnl_link_stats64 fields with data from pcpu_sw_netstats. Therefore factor out this code to a new function dev_fetch_sw_netstats(). Heiner Kallweit (12): net: core: add function dev_fetch_sw_netstats for fetching pcpu_sw_netstats IB/hfi1: use new function dev_fetch_sw_netstats net: macsec: use new function dev_fetch_sw_netstats net: usb: qmi_wwan: use new function dev_fetch_sw_netstats net: usbnet: use new function dev_fetch_sw_netstats qtnfmac: use new function dev_fetch_sw_netstats net: bridge: use new function dev_fetch_sw_netstats net: dsa: use new function dev_fetch_sw_netstats iptunnel: use new function dev_fetch_sw_netstats mac80211: use new function dev_fetch_sw_netstats net: openvswitch: use new function dev_fetch_sw_netstats xfrm: use new function dev_fetch_sw_netstats drivers/infiniband/hw/hfi1/ipoib_main.c | 34 +- drivers/net/macsec.c | 25 + drivers/net/usb/qmi_wwan.c| 24 + drivers/net/usb/usbnet.c | 24 + drivers/net/wireless/quantenna/qtnfmac/core.c | 27 +- include/linux/netdevice.h | 2 ++ net/bridge/br_device.c| 21 +-- net/core/dev.c| 36 +++ net/dsa/slave.c | 21 +-- net/ipv4/ip_tunnel_core.c | 23 +--- net/mac80211/iface.c | 23 +--- net/openvswitch/vport-internal_dev.c | 20 +-- net/xfrm/xfrm_interface.c | 22 +--- 13 files changed, 49 insertions(+), 253 deletions(-) -- 2.28.0
[PATCH net-next 01/12] net: core: add function dev_fetch_sw_netstats for fetching pcpu_sw_netstats
In several places the same code is used to populate rtnl_link_stats64 fields with data from pcpu_sw_netstats. Therefore factor out this code to a new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- include/linux/netdevice.h | 2 ++ net/core/dev.c| 36 2 files changed, 38 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index a0df43b13..ca4736349 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4495,6 +4495,8 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev, struct rtnl_link_stats64 *storage); void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64, const struct net_device_stats *netdev_stats); +void dev_fetch_sw_netstats(struct rtnl_link_stats64 *s, + struct pcpu_sw_netstats __percpu *netstats); extern int netdev_max_backlog; extern int netdev_tstamp_prequeue; diff --git a/net/core/dev.c b/net/core/dev.c index 7d18560b2..ba91bf16b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10323,6 +10323,42 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev, } EXPORT_SYMBOL(dev_get_stats); +/** + * dev_fetch_sw_netstats - get per-cpu network device statistics + * @s: place to store stats + * @netstats: per-cpu network stats to read from + * + * Read per-cpu network statistics and populate the related fields in s. + */ +void dev_fetch_sw_netstats(struct rtnl_link_stats64 *s, + struct pcpu_sw_netstats __percpu *netstats) +{ + int cpu; + + if (IS_ERR_OR_NULL(netstats)) + return; + + for_each_possible_cpu(cpu) { + struct pcpu_sw_netstats *stats, tmp; + unsigned int start; + + stats = per_cpu_ptr(netstats, cpu); + do { + start = u64_stats_fetch_begin_irq(&stats->syncp); + tmp.rx_packets = stats->rx_packets; + tmp.rx_bytes = stats->rx_bytes; + tmp.tx_packets = stats->tx_packets; + tmp.tx_bytes = stats->tx_bytes; + } while (u64_stats_fetch_retry_irq(&stats->syncp, start)); + + s->rx_packets += tmp.rx_packets; + s->rx_bytes += tmp.rx_bytes; + s->tx_packets += tmp.tx_packets; + s->tx_bytes += tmp.tx_bytes; + } +} +EXPORT_SYMBOL(dev_fetch_sw_netstats); + struct netdev_queue *dev_ingress_queue_create(struct net_device *dev) { struct netdev_queue *queue = dev_ingress_queue(dev); -- 2.28.0
[PATCH net-next 11/12] net: openvswitch: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- net/openvswitch/vport-internal_dev.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index d8fe66eea..1e30d8df3 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -86,31 +86,13 @@ static void internal_dev_destructor(struct net_device *dev) static void internal_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) { - int i; - memset(stats, 0, sizeof(*stats)); stats->rx_errors = dev->stats.rx_errors; stats->tx_errors = dev->stats.tx_errors; stats->tx_dropped = dev->stats.tx_dropped; stats->rx_dropped = dev->stats.rx_dropped; - for_each_possible_cpu(i) { - const struct pcpu_sw_netstats *percpu_stats; - struct pcpu_sw_netstats local_stats; - unsigned int start; - - percpu_stats = per_cpu_ptr(dev->tstats, i); - - do { - start = u64_stats_fetch_begin_irq(&percpu_stats->syncp); - local_stats = *percpu_stats; - } while (u64_stats_fetch_retry_irq(&percpu_stats->syncp, start)); - - stats->rx_bytes += local_stats.rx_bytes; - stats->rx_packets += local_stats.rx_packets; - stats->tx_bytes += local_stats.tx_bytes; - stats->tx_packets += local_stats.tx_packets; - } + dev_fetch_sw_netstats(stats, dev->tstats); } static const struct net_device_ops internal_dev_netdev_ops = { -- 2.28.0
[PATCH net-next 08/12] net: dsa: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- net/dsa/slave.c | 21 + 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index e7c1d62fd..3bc5ca40c 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -1221,28 +1221,9 @@ static void dsa_slave_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct dsa_slave_priv *p = netdev_priv(dev); - struct pcpu_sw_netstats *s; - unsigned int start; - int i; netdev_stats_to_stats64(stats, &dev->stats); - for_each_possible_cpu(i) { - u64 tx_packets, tx_bytes, rx_packets, rx_bytes; - - s = per_cpu_ptr(p->stats64, i); - do { - start = u64_stats_fetch_begin_irq(&s->syncp); - tx_packets = s->tx_packets; - tx_bytes = s->tx_bytes; - rx_packets = s->rx_packets; - rx_bytes = s->rx_bytes; - } while (u64_stats_fetch_retry_irq(&s->syncp, start)); - - stats->tx_packets += tx_packets; - stats->tx_bytes += tx_bytes; - stats->rx_packets += rx_packets; - stats->rx_bytes += rx_bytes; - } + dev_fetch_sw_netstats(stats, p->stats64); } static int dsa_slave_get_rxnfc(struct net_device *dev, -- 2.28.0
[PATCH net-next 07/12] net: bridge: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- net/bridge/br_device.c | 21 + 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index 9a2fb4aa1..6f742fee8 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -206,27 +206,8 @@ static void br_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct net_bridge *br = netdev_priv(dev); - struct pcpu_sw_netstats tmp, sum = { 0 }; - unsigned int cpu; - - for_each_possible_cpu(cpu) { - unsigned int start; - const struct pcpu_sw_netstats *bstats - = per_cpu_ptr(br->stats, cpu); - do { - start = u64_stats_fetch_begin_irq(&bstats->syncp); - memcpy(&tmp, bstats, sizeof(tmp)); - } while (u64_stats_fetch_retry_irq(&bstats->syncp, start)); - sum.tx_bytes += tmp.tx_bytes; - sum.tx_packets += tmp.tx_packets; - sum.rx_bytes += tmp.rx_bytes; - sum.rx_packets += tmp.rx_packets; - } - stats->tx_bytes = sum.tx_bytes; - stats->tx_packets = sum.tx_packets; - stats->rx_bytes = sum.rx_bytes; - stats->rx_packets = sum.rx_packets; + dev_fetch_sw_netstats(stats, br->stats); } static int br_change_mtu(struct net_device *dev, int new_mtu) -- 2.28.0
[PATCH net-next 04/12] net: usb: qmi_wwan: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- drivers/net/usb/qmi_wwan.c | 24 +--- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 5ca1356b8..a322f5187 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -126,31 +126,9 @@ static void qmimux_get_stats64(struct net_device *net, struct rtnl_link_stats64 *stats) { struct qmimux_priv *priv = netdev_priv(net); - unsigned int start; - int cpu; netdev_stats_to_stats64(stats, &net->stats); - - for_each_possible_cpu(cpu) { - struct pcpu_sw_netstats *stats64; - u64 rx_packets, rx_bytes; - u64 tx_packets, tx_bytes; - - stats64 = per_cpu_ptr(priv->stats64, cpu); - - do { - start = u64_stats_fetch_begin_irq(&stats64->syncp); - rx_packets = stats64->rx_packets; - rx_bytes = stats64->rx_bytes; - tx_packets = stats64->tx_packets; - tx_bytes = stats64->tx_bytes; - } while (u64_stats_fetch_retry_irq(&stats64->syncp, start)); - - stats->rx_packets += rx_packets; - stats->rx_bytes += rx_bytes; - stats->tx_packets += tx_packets; - stats->tx_bytes += tx_bytes; - } + dev_fetch_sw_netstats(stats, priv->stats64); } static const struct net_device_ops qmimux_netdev_ops = { -- 2.28.0
[PATCH net-next 12/12] xfrm: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- net/xfrm/xfrm_interface.c | 22 +- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c index 5b120936d..aa4cdcf69 100644 --- a/net/xfrm/xfrm_interface.c +++ b/net/xfrm/xfrm_interface.c @@ -541,27 +541,7 @@ static int xfrmi_update(struct xfrm_if *xi, struct xfrm_if_parms *p) static void xfrmi_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *s) { - int cpu; - - for_each_possible_cpu(cpu) { - struct pcpu_sw_netstats *stats; - struct pcpu_sw_netstats tmp; - int start; - - stats = per_cpu_ptr(dev->tstats, cpu); - do { - start = u64_stats_fetch_begin_irq(&stats->syncp); - tmp.rx_packets = stats->rx_packets; - tmp.rx_bytes = stats->rx_bytes; - tmp.tx_packets = stats->tx_packets; - tmp.tx_bytes = stats->tx_bytes; - } while (u64_stats_fetch_retry_irq(&stats->syncp, start)); - - s->rx_packets += tmp.rx_packets; - s->rx_bytes += tmp.rx_bytes; - s->tx_packets += tmp.tx_packets; - s->tx_bytes += tmp.tx_bytes; - } + dev_fetch_sw_netstats(s, dev->tstats); s->rx_dropped = dev->stats.rx_dropped; s->tx_dropped = dev->stats.tx_dropped; -- 2.28.0
[PATCH net-next 05/12] net: usbnet: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- drivers/net/usb/usbnet.c | 24 +--- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 963d260d1..6062dc278 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -983,31 +983,9 @@ EXPORT_SYMBOL_GPL(usbnet_set_link_ksettings); void usbnet_get_stats64(struct net_device *net, struct rtnl_link_stats64 *stats) { struct usbnet *dev = netdev_priv(net); - unsigned int start; - int cpu; netdev_stats_to_stats64(stats, &net->stats); - - for_each_possible_cpu(cpu) { - struct pcpu_sw_netstats *stats64; - u64 rx_packets, rx_bytes; - u64 tx_packets, tx_bytes; - - stats64 = per_cpu_ptr(dev->stats64, cpu); - - do { - start = u64_stats_fetch_begin_irq(&stats64->syncp); - rx_packets = stats64->rx_packets; - rx_bytes = stats64->rx_bytes; - tx_packets = stats64->tx_packets; - tx_bytes = stats64->tx_bytes; - } while (u64_stats_fetch_retry_irq(&stats64->syncp, start)); - - stats->rx_packets += rx_packets; - stats->rx_bytes += rx_bytes; - stats->tx_packets += tx_packets; - stats->tx_bytes += tx_bytes; - } + dev_fetch_sw_netstats(stats, dev->stats64); } EXPORT_SYMBOL_GPL(usbnet_get_stats64); -- 2.28.0
[PATCH net-next 10/12] mac80211: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- net/mac80211/iface.c | 23 +-- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 240862a74..1be775979 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -709,28 +709,7 @@ static u16 ieee80211_netdev_select_queue(struct net_device *dev, static void ieee80211_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { - int i; - - for_each_possible_cpu(i) { - const struct pcpu_sw_netstats *tstats; - u64 rx_packets, rx_bytes, tx_packets, tx_bytes; - unsigned int start; - - tstats = per_cpu_ptr(dev->tstats, i); - - do { - start = u64_stats_fetch_begin_irq(&tstats->syncp); - rx_packets = tstats->rx_packets; - tx_packets = tstats->tx_packets; - rx_bytes = tstats->rx_bytes; - tx_bytes = tstats->tx_bytes; - } while (u64_stats_fetch_retry_irq(&tstats->syncp, start)); - - stats->rx_packets += rx_packets; - stats->tx_packets += tx_packets; - stats->rx_bytes += rx_bytes; - stats->tx_bytes += tx_bytes; - } + dev_fetch_sw_netstats(stats, dev->tstats); } static const struct net_device_ops ieee80211_dataif_ops = { -- 2.28.0
[PATCH net-next 06/12] qtnfmac: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- drivers/net/wireless/quantenna/qtnfmac/core.c | 27 +-- 1 file changed, 1 insertion(+), 26 deletions(-) diff --git a/drivers/net/wireless/quantenna/qtnfmac/core.c b/drivers/net/wireless/quantenna/qtnfmac/core.c index 374074dc7..7f66cf608 100644 --- a/drivers/net/wireless/quantenna/qtnfmac/core.c +++ b/drivers/net/wireless/quantenna/qtnfmac/core.c @@ -139,34 +139,9 @@ static void qtnf_netdev_get_stats64(struct net_device *ndev, struct rtnl_link_stats64 *stats) { struct qtnf_vif *vif = qtnf_netdev_get_priv(ndev); - unsigned int start; - int cpu; netdev_stats_to_stats64(stats, &ndev->stats); - - if (!vif->stats64) - return; - - for_each_possible_cpu(cpu) { - struct pcpu_sw_netstats *stats64; - u64 rx_packets, rx_bytes; - u64 tx_packets, tx_bytes; - - stats64 = per_cpu_ptr(vif->stats64, cpu); - - do { - start = u64_stats_fetch_begin_irq(&stats64->syncp); - rx_packets = stats64->rx_packets; - rx_bytes = stats64->rx_bytes; - tx_packets = stats64->tx_packets; - tx_bytes = stats64->tx_bytes; - } while (u64_stats_fetch_retry_irq(&stats64->syncp, start)); - - stats->rx_packets += rx_packets; - stats->rx_bytes += rx_bytes; - stats->tx_packets += tx_packets; - stats->tx_bytes += tx_bytes; - } + dev_fetch_sw_netstats(stats, vif->stats64); } /* Netdev handler for transmission timeout. -- 2.28.0
[PATCH net-next 03/12] net: macsec: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- drivers/net/macsec.c | 25 + 1 file changed, 1 insertion(+), 24 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 2b0c8f01d..e74483279 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -3647,30 +3647,7 @@ static int macsec_change_mtu(struct net_device *dev, int new_mtu) static void macsec_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *s) { - int cpu; - - if (!dev->tstats) - return; - - for_each_possible_cpu(cpu) { - struct pcpu_sw_netstats *stats; - struct pcpu_sw_netstats tmp; - int start; - - stats = per_cpu_ptr(dev->tstats, cpu); - do { - start = u64_stats_fetch_begin_irq(&stats->syncp); - tmp.rx_packets = stats->rx_packets; - tmp.rx_bytes = stats->rx_bytes; - tmp.tx_packets = stats->tx_packets; - tmp.tx_bytes = stats->tx_bytes; - } while (u64_stats_fetch_retry_irq(&stats->syncp, start)); - - s->rx_packets += tmp.rx_packets; - s->rx_bytes += tmp.rx_bytes; - s->tx_packets += tmp.tx_packets; - s->tx_bytes += tmp.tx_bytes; - } + dev_fetch_sw_netstats(s, dev->tstats); s->rx_dropped = dev->stats.rx_dropped; s->tx_dropped = dev->stats.tx_dropped; -- 2.28.0
[PATCH net-next 02/12] IB/hfi1: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- drivers/infiniband/hw/hfi1/ipoib_main.c | 34 + 1 file changed, 1 insertion(+), 33 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/ipoib_main.c b/drivers/infiniband/hw/hfi1/ipoib_main.c index 014351ebb..9f71b9d70 100644 --- a/drivers/infiniband/hw/hfi1/ipoib_main.c +++ b/drivers/infiniband/hw/hfi1/ipoib_main.c @@ -97,41 +97,9 @@ static void hfi1_ipoib_dev_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *storage) { struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev); - u64 rx_packets = 0ull; - u64 rx_bytes = 0ull; - u64 tx_packets = 0ull; - u64 tx_bytes = 0ull; - int i; netdev_stats_to_stats64(storage, &dev->stats); - - for_each_possible_cpu(i) { - const struct pcpu_sw_netstats *stats; - unsigned int start; - u64 trx_packets; - u64 trx_bytes; - u64 ttx_packets; - u64 ttx_bytes; - - stats = per_cpu_ptr(priv->netstats, i); - do { - start = u64_stats_fetch_begin_irq(&stats->syncp); - trx_packets = stats->rx_packets; - trx_bytes = stats->rx_bytes; - ttx_packets = stats->tx_packets; - ttx_bytes = stats->tx_bytes; - } while (u64_stats_fetch_retry_irq(&stats->syncp, start)); - - rx_packets += trx_packets; - rx_bytes += trx_bytes; - tx_packets += ttx_packets; - tx_bytes += ttx_bytes; - } - - storage->rx_packets += rx_packets; - storage->rx_bytes += rx_bytes; - storage->tx_packets += tx_packets; - storage->tx_bytes += tx_bytes; + dev_fetch_sw_netstats(storage, priv->netstats); } static const struct net_device_ops hfi1_ipoib_netdev_ops = { -- 2.28.0
[PATCH net-next 09/12] iptunnel: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit --- net/ipv4/ip_tunnel_core.c | 23 +-- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index b2ea1a8c5..25f1caf5a 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -433,29 +433,8 @@ EXPORT_SYMBOL(skb_tunnel_check_pmtu); void ip_tunnel_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *tot) { - int i; - netdev_stats_to_stats64(tot, &dev->stats); - - for_each_possible_cpu(i) { - const struct pcpu_sw_netstats *tstats = - per_cpu_ptr(dev->tstats, i); - u64 rx_packets, rx_bytes, tx_packets, tx_bytes; - unsigned int start; - - do { - start = u64_stats_fetch_begin_irq(&tstats->syncp); - rx_packets = tstats->rx_packets; - tx_packets = tstats->tx_packets; - rx_bytes = tstats->rx_bytes; - tx_bytes = tstats->tx_bytes; - } while (u64_stats_fetch_retry_irq(&tstats->syncp, start)); - - tot->rx_packets += rx_packets; - tot->tx_packets += tx_packets; - tot->rx_bytes += rx_bytes; - tot->tx_bytes += tx_bytes; - } + dev_fetch_sw_netstats(tot, dev->tstats); } EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64); -- 2.28.0
Re: [PATCH net-next 01/12] net: core: add function dev_fetch_sw_netstats for fetching pcpu_sw_netstats
On Sun, 11 Oct 2020 21:36:43 +0200 Heiner Kallweit wrote: > +void dev_fetch_sw_netstats(struct rtnl_link_stats64 *s, > +struct pcpu_sw_netstats __percpu *netstats) netstats is unmodified, should it be const? > +{ > + int cpu; > + > + if (IS_ERR_OR_NULL(netstats)) > + return; Any code calling this with a null pointer is broken/buggy, please don't ignore that.
[PATCH bpf-next] bpf: Migrate from patchwork.ozlabs.org to patchwork.kernel.org.
From: Alexei Starovoitov Move the bpf/bpf-next patch processing queue to patchwork.kernel.org. Signed-off-by: Alexei Starovoitov --- Hi All BPF developers, we've migrated bpf patchwork to kernel.org. Please monitor the bpf queue at https://patchwork.kernel.org/project/netdevbpf/list/?delegate=121173 The new home provides: - automatic marking of patches as 'accepted' as soon as they land in bpf/bpf-next trees. - automatic email notification when patches are pushed into bpf/bpf-next trees. - automatic marking as 'superseded' when patch series are respun. - automatic marking with bpf or netdev delegate based on [PATCH bpf|bpf-next|net|net-next] subject and via file-based pattern matching. - Fast UI with low latency for those in US and Europe. - Patches sent to netdev@vger and/or bpf@vger appear in one place. Thank you, Konstantin, for making it happen. The patches will still appear at ozlabs.org until it stops receiving emails. All bpf patches at ozlabs will be marked as 'not applicable'. The ozlabs instance is deprecated. It will be used as an archive. --- Documentation/bpf/bpf_devel_QA.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst index 75a0dca5f295..5b613d2a5f1a 100644 --- a/Documentation/bpf/bpf_devel_QA.rst +++ b/Documentation/bpf/bpf_devel_QA.rst @@ -60,13 +60,13 @@ Q: Where can I find patches currently under discussion for BPF subsystem? A: All patches that are Cc'ed to netdev are queued for review under netdev patchwork project: - http://patchwork.ozlabs.org/project/netdev/list/ + https://patchwork.kernel.org/project/netdevbpf/list/ Those patches which target BPF, are assigned to a 'bpf' delegate for further processing from BPF maintainers. The current queue with patches under review can be found at: - https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147 + https://patchwork.kernel.org/project/netdevbpf/list/?delegate=121173 Once the patches have been reviewed by the BPF community as a whole and approved by the BPF maintainers, their status in patchwork will be -- 2.23.0
Re: [PATCH bpf-next v2] bpf_fib_lookup: optionally skip neighbour lookup
On 10/10/20 3:42 PM, Toke Høiland-Jørgensen wrote: Daniel Borkmann writes: On 10/9/20 11:28 PM, David Ahern wrote: On 10/9/20 11:42 AM, Toke Høiland-Jørgensen wrote: David Ahern writes: On 10/9/20 3:13 AM, Toke Høiland-Jørgensen wrote: The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will pass the packet up the stack in this case. However, with the addition of bpf_redirect_neigh() that can be used instead to perform the neighbour lookup, at the cost of a bit of duplicated work. For that we still need the target ifindex, and since bpf_fib_lookup() already has that at the time it performs the neighbour lookup, there is really no reason why it can't just return it in any case. So let's just always return the ifindex, and also add a flag that lets the caller turn off the neighbour lookup entirely in bpf_fib_lookup(). seems really odd to do the fib lookup only to skip the neighbor lookup and defer to a second helper to do a second fib lookup and send out. The better back-to-back calls is to return the ifindex and gateway on successful fib lookup regardless of valid neighbor. If the call to bpf_redirect_neigh is needed, it can have a flag to skip the fib lookup and just redirect to the given nexthop address + ifindex. ie., bpf_redirect_neigh only does neighbor handling in this case. Hmm, yeah, I guess it would make sense to cache and reuse the lookup - maybe stick it in bpf_redirect_info()? However, given the imminent That is not needed. opening of the merge window, I don't see this landing before then. So I'm going to respin this patch with just the original change to always return the ifindex, then we can revisit the flags/reuse of the fib lookup later. What I am suggesting is a change in API to bpf_redirect_neigh which should be done now, before the merge window, before it comes a locked API. Right now, bpf_redirect_neigh does a lookup to get the nexthop. It should take the gateway as an input argument. If set, then the lookup is not done - only the neighbor redirect. Sounds like a reasonable extension, agree. API freeze is not merge win, but final v5.10 tag in this case as it always has been. In case it's not in time, we can simply just move flags to arg3 and add a reserved param as arg2 which must be zero (and thus indicate to perform the lookup as-is). Later we could extend to pass params similar as in fib_lookup helper for the gw. Right, I can take a look at this next week. Feel free to merge (v3 of) this patch now, that change will be needed in any case I think... Ok, sounds reasonable, done. Lets fix the remaining one as David suggested until rc1, at latest rc2 time frame. I'll be mostly offline next week during the day, but happy to help till that deadline as well. Thanks, Daniel
Re: [PATCH 4/5] net/ipv6: use semicolons rather than commas to separate statements
On Sun, Oct 11, 2020 at 7:18 AM Julia Lawall wrote: > > Replace commas with semicolons. Commas introduce unnecessary > variability in the code structure and are hard to see. What is done > is essentially described by the following Coccinelle semantic patch > (http://coccinelle.lip6.fr/): > > // > @@ expression e1,e2; @@ > e1 > -, > +; > e2 > ... when any > // > > Signed-off-by: Julia Lawall > > --- > net/ipv6/calipso.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks Julia. Acked-by: Paul Moore > diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c > index 8d3f66c310db..78f766019b7e 100644 > --- a/net/ipv6/calipso.c > +++ b/net/ipv6/calipso.c > @@ -761,7 +761,7 @@ static int calipso_genopt(unsigned char *buf, u32 start, > u32 buf_len, > calipso[1] = len - 2; > *(__be32 *)(calipso + 2) = htonl(doi_def->doi); > calipso[6] = (len - CALIPSO_HDR_LEN) / 4; > - calipso[7] = secattr->attr.mls.lvl, > + calipso[7] = secattr->attr.mls.lvl; > crc = ~crc_ccitt(0x, calipso, len); > calipso[8] = crc & 0xff; > calipso[9] = (crc >> 8) & 0xff; -- paul moore www.paul-moore.com
Re: [PATCH bpf-next] bpf: Migrate from patchwork.ozlabs.org to patchwork.kernel.org.
Hello: This patch was applied to bpf/bpf-next.git (refs/heads/master): On Sun, 11 Oct 2020 13:01:49 -0700 you wrote: > From: Alexei Starovoitov > > Move the bpf/bpf-next patch processing queue to patchwork.kernel.org. > > Signed-off-by: Alexei Starovoitov > --- > > [...] Here is the summary with links: - [bpf-next] bpf: Migrate from patchwork.ozlabs.org to patchwork.kernel.org. https://git.kernel.org/bpf/bpf-next/c/ebb034b15bfa You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
[PATCH] ptp: ptp_clockmatrix: initialize variables
From: Tom Rix Clang static analysis reports this representative problem ptp_clockmatrix.c:1852:2: warning: 5th function call argument is an uninitialized value snprintf(idtcm->version, sizeof(idtcm->version), "%u.%u.%u", ^~~~ idtcm_display_version_info() calls several idtcm_read_*'s without checking a return status. Initialize the read variables so if a read fails, a garbage value is not displayed. Signed-off-by: Tom Rix --- drivers/ptp/ptp_clockmatrix.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/ptp/ptp_clockmatrix.c b/drivers/ptp/ptp_clockmatrix.c index e020faff7da5..47e5e62da5d2 100644 --- a/drivers/ptp/ptp_clockmatrix.c +++ b/drivers/ptp/ptp_clockmatrix.c @@ -1832,12 +1832,12 @@ static int idtcm_enable_tod(struct idtcm_channel *channel) static void idtcm_display_version_info(struct idtcm *idtcm) { - u8 major; - u8 minor; - u8 hotfix; - u16 product_id; - u8 hw_rev_id; - u8 config_select; + u8 major = 0; + u8 minor = 0; + u8 hotfix = 0; + u16 product_id = 0; + u8 hw_rev_id = 0; + u8 config_select = 0; char *fmt = "%d.%d.%d, Id: 0x%04x HW Rev: %d OTP Config Select: %d\n"; idtcm_read_major_release(idtcm, &major); -- 2.18.1
Re: [PATCH v2] net: usb: rtl8150: don't incorrectly assign random MAC addresses
On Sun, 2020-10-11 at 22:31 +0300, Petko Manolov wrote: > On 20-10-11 11:33:00, Joe Perches wrote: > > On Sun, 2020-10-11 at 10:59 -0700, Jakub Kicinski wrote: > > > On Sun, 11 Oct 2020 23:00:30 +0530 Anant Thazhemadam wrote: > > > > In set_ethernet_addr(), if get_registers() succeeds, the ethernet > > > > address > > > > that was read must be copied over. Otherwise, a random ethernet address > > > > must be assigned. > > > > > > > > get_registers() returns 0 if successful, and negative error number > > > > otherwise. However, in set_ethernet_addr(), this return value is > > > > incorrectly checked. > > > > > > > > Since this return value will never be equal to sizeof(node_id), a > > > > random MAC address will always be generated and assigned to the > > > > device; even in cases when get_registers() is successful. > > > > > > > > Correctly modifying the condition that checks if get_registers() was > > > > successful or not fixes this problem, and copies the ethernet address > > > > appropriately. > > > > There are many unchecked uses of set_registers and get_registers > > in this file. > > > > If failures are really expected, then it might be better to fix > > them up too. > > Checking the return value of each get/set_registers() is going to be a PITA > and > not very helpful. Doing so when setting the MAC address _does_ make sense as > in > that case it is not a hard error. > > In almost all other occasions if usb_control_msg_send/recv() return an error > i'd > rather dump an error message (from within get/set_registers()) and let the > user > decide whether to get rid of this adapter or start debugging it. Your code, your choices... Consider using _once or _ratelimited output too.
Re: [PATCH net-next 08/12] net: dsa: use new function dev_fetch_sw_netstats
On Sun, Oct 11, 2020 at 09:41:27PM +0200, Heiner Kallweit wrote: > Simplify the code by using new function dev_fetch_sw_netstats(). > > Signed-off-by: Heiner Kallweit > --- Tested-by: Vladimir Oltean
Re: [PATCH net-next 01/12] net: core: add function dev_fetch_sw_netstats for fetching pcpu_sw_netstats
On 11.10.2020 21:54, Stephen Hemminger wrote: > On Sun, 11 Oct 2020 21:36:43 +0200 > Heiner Kallweit wrote: > >> +void dev_fetch_sw_netstats(struct rtnl_link_stats64 *s, >> + struct pcpu_sw_netstats __percpu *netstats) > > netstats is unmodified, should it be const? > >> +{ >> +int cpu; >> + >> +if (IS_ERR_OR_NULL(netstats)) >> +return; > > Any code calling this with a null pointer is broken/buggy, please don't > ignore that. > Thanks, I'll consider both points in a v2.
Re: [Patch net v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly
On Sun, Oct 11, 2020 at 12:11 PM Cong Wang wrote: > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > conditionally. When it is set, it assumes the outer IP header is > already created before ipgre_xmit(). > > This is not true when we send packets through a raw packet socket, > where L2 headers are supposed to be constructed by user. Packet > socket calls dev_validate_header() to validate the header. But > GRE tunnel does not set dev->hard_header_len, so that check can > be simply bypassed, therefore uninit memory could be passed down > to ipgre_xmit(). Similar for dev->needed_headroom. > > dev->hard_header_len is supposed to be the length of the header > created by dev->header_ops->create(), so it should be used whenever > header_ops is set, and dev->needed_headroom should be used when it > is not set. Hi, thanks for attempting to fix this tunnel. Are we still considering removing header_ops->create? As said in my email sent previously today, I want to remove header_ops->create because 1) this keeps the un-exposed headers of GRE devices consistent with those of GRETAP devices, and 2) I think the GRE header (and the headers before the GRE header) is not actually the L2 header of the tunnel (the Wikipedia page for "Generic Routing Encapsulation" doesn't consider this protocol to be at L2 either). I'm not sure if you still agree to remove header_ops->create. Do you still agree but think it'd be better to do that in a separate patch? Removing header_ops->create would simplify the fixing of the issue you are trying to fix, too, because that way we would no longer need to use header_ops or hard_header_len. Also, I'm worried that changing hard_header_len (or needed_headroom) in ipgre_link_update would have racing issues. If we remove header_ops, we no longer need to use hard_header_len and we can just set needed_headroom to the maximum value, so that we no longer need to update them in ipgre_link_update.
Re: [Patch net v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly
On Sun, Oct 11, 2020 at 3:11 PM Cong Wang wrote: > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > conditionally. When it is set, it assumes the outer IP header is > already created before ipgre_xmit(). > > This is not true when we send packets through a raw packet socket, > where L2 headers are supposed to be constructed by user. Packet > socket calls dev_validate_header() to validate the header. But > GRE tunnel does not set dev->hard_header_len, so that check can > be simply bypassed, therefore uninit memory could be passed down > to ipgre_xmit(). Similar for dev->needed_headroom. > > dev->hard_header_len is supposed to be the length of the header > created by dev->header_ops->create(), so it should be used whenever > header_ops is set, and dev->needed_headroom should be used when it > is not set. > > Reported-and-tested-by: syzbot+4a2c52677a8a1aa28...@syzkaller.appspotmail.com > Cc: Xie He > Cc: William Tu > Cc: Willem de Bruijn > Signed-off-by: Cong Wang > --- > Note, there are some other suspicious use of dev->hard_header_len in > the same file, but let's leave them to a separate patch if really > needed. There is agreement that hard_header_len should be the length of link layer headers visible to the upper layers, needed_headroom the additional room required for headers that are not exposed, i.e., those pushed inside ndo_start_xmit. The link layer header length also has to agree with the interface hardware type (ARPHRD_..). Tunnel devices have not always been consistent in this, but today "bare" ip tunnel devices without additional headers (ipip, sit, ..) do match this and advertise 0 byte hard_header_len. Bareudp, vxlan and geneve also conform to this. Known exception that probably needs to be addressed is sit, which still advertises LL_MAX_HEADER and so has exposed quite a few syzkaller issues. Side note, it is not entirely clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and why they are needed. GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER. The second makes sense, as it appears as an Ethernet device. The first should match "bare" ip tunnel devices, if following the above logic. Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE header around in collect medata mode") implements. It changes dev->type to ARPHRD_NONE in collect_md mode. Some of the inconsistency comes from the various modes of the GRE driver. Which brings us to ipgre_header_ops. It is set only in two special cases. Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address") added ipgre_header_ops.parse to be able to receive the inner ip source address with PF_PACKET recvfrom. And apparently relies on ipgre_header_ops.create to be able to set an address, which implies SOCK_DGRAM. The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its implementation starts with the beautiful comment "/* Nice toy. Unfortunately, useless in real life :-)". From the rest of that detailed comment, it is not clear to me why it would need to expose the headers. The example does not use packet sockets. A packet socket cannot know devices details such as which configurable mode a device may be in. And different modes conflict with the basic rule that for a given well defined link layer type, i.e., dev->type, header length can be expected to be consistent. In an ideal world these exceptions would not exist, therefore. Unfortunately, this is legacy behavior that will have to continue to be supported. I agree that then swapping dev->needed_headroom for dev->hard_header_len at init is then a good approach. Underlying functions like ip_tunnel_xmit can modify needed_headroom at runtime, not sure how that affects runtime changes in ipgre_link_update: " max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr) + rt->dst.header_len + ip_encap_hlen(&tunnel->encap); if (max_headroom > dev->needed_headroom) dev->needed_headroom = max_headroom; " > net/ipv4/ip_gre.c | 14 +++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index 4e31f23e4117..82fee0010353 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -626,8 +626,7 @@ static netdev_tx_t ipgre_xmit(struct sk_buff *skb, > > if (dev->header_ops) { > /* Need space for new headers */ > - if (skb_cow_head(skb, dev->needed_headroom - > - (tunnel->hlen + sizeof(struct iphdr > + if (skb_cow_head(skb, dev->hard_header_len)) > goto free_skb; > > tnl_params = (const struct iphdr *)skb->data; > @@ -748,7 +747,11 @@ static void ipgre_link_update(struct net_device *dev, > bool set_mtu) > len = tunnel->tun_hlen - len; > tunnel->hlen = tunnel->hlen + len; > > - dev->needed_headroom = dev->needed_headroom + len; > +
Re: [Patch net v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly
On Sun, Oct 11, 2020 at 4:42 PM Xie He wrote: > > On Sun, Oct 11, 2020 at 12:11 PM Cong Wang wrote: > > > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > > conditionally. When it is set, it assumes the outer IP header is > > already created before ipgre_xmit(). > > > > This is not true when we send packets through a raw packet socket, > > where L2 headers are supposed to be constructed by user. Packet > > socket calls dev_validate_header() to validate the header. But > > GRE tunnel does not set dev->hard_header_len, so that check can > > be simply bypassed, therefore uninit memory could be passed down > > to ipgre_xmit(). Similar for dev->needed_headroom. > > > > dev->hard_header_len is supposed to be the length of the header > > created by dev->header_ops->create(), so it should be used whenever > > header_ops is set, and dev->needed_headroom should be used when it > > is not set. > > Hi, thanks for attempting to fix this tunnel. Are we still considering > removing header_ops->create? > > As said in my email sent previously today, I want to remove > header_ops->create because 1) this keeps the un-exposed headers of GRE > devices consistent with those of GRETAP devices, and 2) I think the > GRE header (and the headers before the GRE header) is not actually the > L2 header of the tunnel (the Wikipedia page for "Generic Routing > Encapsulation" doesn't consider this protocol to be at L2 either). > > I'm not sure if you still agree to remove header_ops->create. Do you > still agree but think it'd be better to do that in a separate patch? > > Removing header_ops->create would simplify the fixing of the issue you > are trying to fix, too, because that way we would no longer need to > use header_ops or hard_header_len. Also, I'm worried that changing > hard_header_len (or needed_headroom) in ipgre_link_update would have > racing issues. If we remove header_ops, we no longer need to use > hard_header_len and we can just set needed_headroom to the maximum > value, so that we no longer need to update them in ipgre_link_update. Our messages crossed. It seems there are legacy expectations that sendto/recvfrom packet sockets allow writing/reading the outer IP address, as of commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address"). That is the express purpose of that commit. The behavior is inconsistent with other tunnels, as you also point out, and probably only rarely used if at all. I would love to get rid of it, but given that we cannot be certain that it is unused, I'm afraid that we have to continue to support this special case.
Re: [PATCH net-next 3/9] bnxt_en: Set driver default message level.
On Sun, Oct 11, 2020 at 12:34 PM Jakub Kicinski wrote: > > On Sun, 11 Oct 2020 06:22:55 -0400 Michael Chan wrote: > > Currently, bp->msg_enable has default value of 0. It is more useful > > to have the commonly used NETIF_MSG_DRV and NETIF_MSG_HW enabled by > > default. > > > > Reviewed-by: Edwin Peer > > Reviewed-by: Vasundhara Volam > > Signed-off-by: Michael Chan > > This will add whole bunch of output for "RX buffer error 4000[45]", no? > That one needs to switch to a silent reset first, I'd think. The last round of net-next patches have made changes to reduce this noise already. If the firmware supports the new RING_MONITOR scheme, There will be no more messages. The reset will be counted in a new ethtool -S counter only. If the firmware is older and does not support the new scheme, we've changed the logging to warn_once() in addition to the ethtool -S counter. There is no point in warning more than once. smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH] net: korina: free array used for rx/tx descriptors
On Sun, Oct 11, 2020 at 02:37:33PM -0400, Willem de Bruijn wrote: > Slightly off-topic, but I don't fully fathom what goes on with this > pointer straight after the initial kmalloc. > > lp->td_ring = (struct dma_desc *)KSEG1ADDR(lp->td_ring); KSEG1ADDR should rewrite the memory address into the uncached region for memory mapped I/O. Not sure if this would case problems for kfree since there is another kfree on the fail path: probe_err_register: kfree(lp->td_ring); -- Valentin