Hi Klaus, Thanks for reporting this.
On 09.03.2019 07:50, Harini Katakam wrote: > Hi Klaus, > >> -----Original Message----- >> From: Klaus Doth [mailto:k...@doth.eu] >> Sent: Friday, March 8, 2019 10:19 PM >> To: netdev@vger.kernel.org >> Cc: da...@davemloft.net; claudiu.bez...@microchip.com; Harini Katakam >> <hari...@xilinx.com>; Michal Simek <mich...@xilinx.com>; Nicolas Ferre >> <nicolas.fe...@microchip.com> >> Subject: Cadence/macb ethernet driver bug on nonlinear skb buffers >> >> Hi, >> >> >> I think I found a bug in the cadence / macb ethernet driver. >> >> It seems the macb_pad_and_fcs function in macb_main.c does not handle cases >> of fragmented/paged sk-buffers correctly, as sometimes a memmove and >> afterwards skb_put_u8 is done on fragmented buffers. I will also take a look at this. I though I have also tested theses scenario with my pktgen tests. Sorry for the issue. >> skb_put_u8 then fails as >> it checks if the buffer is nonlinear. >> >> >> My setup is a Xilinx ZynqMP using two macb ethernet ports, which are combined >> in a bridge interface. As long as only those two interfaces are bridged, >> everything works fine, but as soon as I add a wireless AP interface to it, >> and then >> connect to the wireless interface using any WiFi enabled device, the kernel >> panics with the message appended at the bottom of this email. I am currently >> running Kernel 5.0.0-rc8, so this issue is in the current mainline kernel, >> and as far >> as I can see also in the stable branch. >> >> >> I did some debugging and traced the issue to the macb_pad_and_fcs function, >> and it only occurs for fragmented sk-buffers. >> >> If I understand the code correctly, the buffer should not be moved by using >> memmove and afterwards the free tailroom be used for FCS if the buffer is >> fragmented. Instead the buffer should be copied, and thus combining the >> fragmented buffer into one non-fragmented one, as skb_put_u8 does not work >> on fragmented buffers. However as I am not too deep into kernel network >> drivers, there may be a better solution, or I could have missed something >> important. >> >> >> Currently my system is running, after I changed the first line of static int >> macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev) from >> >> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb); >> >> to >> >> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb) || >> skb_is_nonlinear(*skb); >> >> >> I.e. handle any nonlinear buffer as if it was cloned. Thus force the >> function into >> copying the buffer for increasing its size. >> >> >> Before the change, the kernel panicked after a few seconds of running data >> over >> the network bridge, which could be reproduced every time this connection is >> attempted. After the change it is running for over a day now continuously >> without any issues, or any visible data loss. >> > > Thanks for the debug. > Yes, this is a bug - we recently noticed this on ZynqMP and temporarily > disabled > this functionality for fragmented packets before finding a clean solution. > I noticed this error in one of the functional tests using pktgen. > > Regards, > Harini >