Cadence/macb ethernet driver bug on nonlinear skb buffers

Klaus Doth Fri, 08 Mar 2019 08:55:22 -0800

Hi,


I think I found a bug in the cadence / macb ethernet driver.

It seems the macb_pad_and_fcs function in macb_main.c does not handle
cases of fragmented/paged sk-buffers correctly, as sometimes a memmove and
afterwards skb_put_u8 is done on fragmented buffers. skb_put_u8 then
fails as it checks if the buffer is nonlinear.


My setup is a Xilinx ZynqMP using two macb ethernet ports, which are
combined in a bridge interface. As long as only those two interfaces are
bridged, everything works fine, but as soon as I add a wireless AP
interface to it, and then connect to the wireless interface using any
WiFi enabled device, the kernel panics with the message appended at the
bottom of this email. I am currently running Kernel 5.0.0-rc8, so this
issue is in the current mainline kernel, and as far as I can see also in
the stable branch.


I did some debugging and traced the issue to the macb_pad_and_fcs
function, and it only occurs for fragmented sk-buffers.

If I understand the code correctly, the buffer should not be moved by using 
memmove
and afterwards the free tailroom be used for FCS if the buffer is
fragmented. Instead the buffer should be copied, and thus combining the
fragmented buffer into one non-fragmented one, as skb_put_u8 does not
work on fragmented buffers. However as I am not too deep into kernel
network drivers, there may be a better solution, or I could have missed
something important.


Currently my system is running, after I changed the first line of 
static int macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev) from

bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb);

to

bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb) ||
skb_is_nonlinear(*skb);


I.e. handle any nonlinear buffer as if it was cloned. Thus force the
function into copying the buffer for increasing its size.


Before the change, the kernel panicked after a few seconds of running
data over the network bridge, which could be reproduced every time this
connection is attempted. After the change it is running for over a day
now continuously without any issues, or any visible data loss.


If I can help in any way, let me know.


Best regards,

Klaus.



[ 1123.082887] ------------[ cut here ]------------
[ 1123.087491] kernel BUG at net/core/skbuff.c:1703!
[ 1123.092178] Internal error: Oops - BUG: 0 [#1] SMP
[ 1123.096951] Modules linked in: iwlmvm iwlwifi
[ 1123.101302] CPU: 3 PID: 3171 Comm: irq/53-iwlwifi Not tainted 5.0.0-rc8 #13
[ 1123.108252] Hardware name: xlnx,zynqmp (DT)
[ 1123.112420] pstate: 40000005 (nZcv daif -PAN -UAO)
[ 1123.117200] pc : skb_put+0x48/0x60
[ 1123.120589] lr : macb_start_xmit+0x160/0xac0
[ 1123.124839] sp : ffffff801568b2d0
[ 1123.128138] x29: ffffff801568b2d0 x28: 00000000fffffedf
[ 1123.133433] x27: 0000000000000000 x26: ffffff801568b464
[ 1123.138727] x25: ffffffc02acfc100 x24: ffffff8010ed6000
[ 1123.144022] x23: ffffffc02dccd540 x22: ffffff8010ee0298
[ 1123.149317] x21: ffffffc02df18000 x20: 00000000ae3ec97f
[ 1123.154612] x19: ffffffc02df18000 x18: 0000000000000000
[ 1123.159907] x17: 0000000000000000 x16: 0000000000000000
[ 1123.165202] x15: 0000000000000400 x14: 0000000000000000
[ 1123.170496] x13: 0000000000000000 x12: 0000000000000000
[ 1123.175791] x11: 0000000000000000 x10: 000000d700000070
[ 1123.181086] x9 : ffffffbf0095f588 x8 : 00000000518e3072
[ 1123.186381] x7 : 0000000000000001 x6 : 00000000000000d7
[ 1123.191676] x5 : ffffffc02d3aa921 x4 : 0000000000000121
[ 1123.196970] x3 : 0000000000000000 x2 : 0000000000000000
[ 1123.202265] x1 : 0000000000000001 x0 : ffffffc02acfc100
[ 1123.207562] Process irq/53-iwlwifi (pid: 3171, stack limit = 
0x000000002f10bec7)
[ 1123.214938] Call trace:
[ 1123.217371]  skb_put+0x48/0x60
[ 1123.220409]  macb_start_xmit+0x160/0xac0
[ 1123.224316]  dev_hard_start_xmit+0x94/0x128
[ 1123.228482]  sch_direct_xmit+0x144/0x348
[ 1123.232387]  __qdisc_run+0x118/0x520
[ 1123.235947]  __dev_queue_xmit+0x3ac/0x738
[ 1123.239939]  dev_queue_xmit+0x10/0x18
[ 1123.243586]  br_dev_queue_push_xmit+0xac/0x178
[ 1123.248011]  br_forward_finish+0xb0/0xb8
[ 1123.251917]  __br_forward.isra.0+0x128/0x158
[ 1123.256170]  br_forward+0x9c/0xa0
[ 1123.259469]  br_handle_frame_finish+0x2d8/0x3e8
[ 1123.263982]  br_handle_frame+0x1d8/0x2d8
[ 1123.267889]  __netif_receive_skb_core+0x25c/0x8d8
[ 1123.272576]  __netif_receive_skb_one_core+0x38/0x80
[ 1123.277437]  __netif_receive_skb+0x28/0x70
[ 1123.281517]  netif_receive_skb_internal+0x7c/0x128
[ 1123.286291]  napi_gro_receive+0xa4/0xc8
[ 1123.290112]  ieee80211_deliver_skb+0xc8/0x1f0
[ 1123.294459]  ieee80211_rx_handlers+0x9f4/0x1ff8
[ 1123.298973]  ieee80211_prepare_and_rx_handle+0x370/0x1028
[ 1123.304354]  ieee80211_rx_napi+0x6f0/0x968
[ 1123.308449]  iwl_mvm_rx_rx_mpdu+0x470/0xb18 [iwlmvm]
[ 1123.313408]  iwl_mvm_rx+0x54/0x88 [iwlmvm]
[ 1123.317495]  iwl_pcie_rx_handle+0x4cc/0x858 [iwlwifi]
[ 1123.322535]  iwl_pcie_irq_handler+0x188/0x710 [iwlwifi]
[ 1123.327749]  irq_thread_fn+0x28/0x78
[ 1123.331314]  irq_thread+0x124/0x1e8
[ 1123.334787]  kthread+0x128/0x130
[ 1123.337999]  ret_from_fork+0x10/0x18
[ 1123.341558] Code: 540000a8 aa0503e0 a8c17bfd d65f03c0 (d4210000)
[ 1123.347633] ---[ end trace 893b8184596cd876 ]---
[ 1123.352233] Kernel panic - not syncing: Fatal exception in interrupt
[ 1123.358571] SMP: stopping secondary CPUs
[ 1123.362480] Kernel Offset: disabled
[ 1123.365956] CPU features: 0x002,20002004
[ 1123.369861] Memory Limit: none
[ 1123.372902] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt ]---

Cadence/macb ethernet driver bug on nonlinear skb buffers

Reply via email to