Hi Klaus, > -----Original Message----- > From: Klaus Doth [mailto:k...@doth.eu] > Sent: Friday, March 8, 2019 10:19 PM > To: netdev@vger.kernel.org > Cc: da...@davemloft.net; claudiu.bez...@microchip.com; Harini Katakam > <hari...@xilinx.com>; Michal Simek <mich...@xilinx.com>; Nicolas Ferre > <nicolas.fe...@microchip.com> > Subject: Cadence/macb ethernet driver bug on nonlinear skb buffers > > Hi, > > > I think I found a bug in the cadence / macb ethernet driver. > > It seems the macb_pad_and_fcs function in macb_main.c does not handle cases > of fragmented/paged sk-buffers correctly, as sometimes a memmove and > afterwards skb_put_u8 is done on fragmented buffers. skb_put_u8 then fails as > it checks if the buffer is nonlinear. > > > My setup is a Xilinx ZynqMP using two macb ethernet ports, which are combined > in a bridge interface. As long as only those two interfaces are bridged, > everything works fine, but as soon as I add a wireless AP interface to it, > and then > connect to the wireless interface using any WiFi enabled device, the kernel > panics with the message appended at the bottom of this email. I am currently > running Kernel 5.0.0-rc8, so this issue is in the current mainline kernel, > and as far > as I can see also in the stable branch. > > > I did some debugging and traced the issue to the macb_pad_and_fcs function, > and it only occurs for fragmented sk-buffers. > > If I understand the code correctly, the buffer should not be moved by using > memmove and afterwards the free tailroom be used for FCS if the buffer is > fragmented. Instead the buffer should be copied, and thus combining the > fragmented buffer into one non-fragmented one, as skb_put_u8 does not work > on fragmented buffers. However as I am not too deep into kernel network > drivers, there may be a better solution, or I could have missed something > important. > > > Currently my system is running, after I changed the first line of static int > macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev) from > > bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb); > > to > > bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb) || > skb_is_nonlinear(*skb); > > > I.e. handle any nonlinear buffer as if it was cloned. Thus force the function > into > copying the buffer for increasing its size. > > > Before the change, the kernel panicked after a few seconds of running data > over > the network bridge, which could be reproduced every time this connection is > attempted. After the change it is running for over a day now continuously > without any issues, or any visible data loss. >
Thanks for the debug. Yes, this is a bug - we recently noticed this on ZynqMP and temporarily disabled this functionality for fragmented packets before finding a clean solution. I noticed this error in one of the functional tests using pktgen. Regards, Harini