Hi Klaus,

> -----Original Message-----
> From: Klaus Doth [mailto:k...@doth.eu]
> Sent: Friday, March 8, 2019 10:19 PM
> To: netdev@vger.kernel.org
> Cc: da...@davemloft.net; claudiu.bez...@microchip.com; Harini Katakam
> <hari...@xilinx.com>; Michal Simek <mich...@xilinx.com>; Nicolas Ferre
> <nicolas.fe...@microchip.com>
> Subject: Cadence/macb ethernet driver bug on nonlinear skb buffers
> 
> Hi,
> 
> 
> I think I found a bug in the cadence / macb ethernet driver.
> 
> It seems the macb_pad_and_fcs function in macb_main.c does not handle cases
> of fragmented/paged sk-buffers correctly, as sometimes a memmove and
> afterwards skb_put_u8 is done on fragmented buffers. skb_put_u8 then fails as
> it checks if the buffer is nonlinear.
> 
> 
> My setup is a Xilinx ZynqMP using two macb ethernet ports, which are combined
> in a bridge interface. As long as only those two interfaces are bridged,
> everything works fine, but as soon as I add a wireless AP interface to it, 
> and then
> connect to the wireless interface using any WiFi enabled device, the kernel
> panics with the message appended at the bottom of this email. I am currently
> running Kernel 5.0.0-rc8, so this issue is in the current mainline kernel, 
> and as far
> as I can see also in the stable branch.
> 
> 
> I did some debugging and traced the issue to the macb_pad_and_fcs function,
> and it only occurs for fragmented sk-buffers.
> 
> If I understand the code correctly, the buffer should not be moved by using
> memmove and afterwards the free tailroom be used for FCS if the buffer is
> fragmented. Instead the buffer should be copied, and thus combining the
> fragmented buffer into one non-fragmented one, as skb_put_u8 does not work
> on fragmented buffers. However as I am not too deep into kernel network
> drivers, there may be a better solution, or I could have missed something
> important.
> 
> 
> Currently my system is running, after I changed the first line of static int
> macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev) from
> 
> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb);
> 
> to
> 
> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb) ||
> skb_is_nonlinear(*skb);
> 
> 
> I.e. handle any nonlinear buffer as if it was cloned. Thus force the function 
> into
> copying the buffer for increasing its size.
> 
> 
> Before the change, the kernel panicked after a few seconds of running data 
> over
> the network bridge, which could be reproduced every time this connection is
> attempted. After the change it is running for over a day now continuously
> without any issues, or any visible data loss.
> 

Thanks for the debug.
Yes, this is a bug - we recently noticed this on ZynqMP and temporarily disabled
this functionality for fragmented packets before finding a clean solution.
I noticed this error in one of the functional tests using pktgen.

Regards,
Harini

Reply via email to