On Thu, Nov 30, 2017 at 04:50:38PM +0000, Jose Abreu wrote: > Hi Niklas, > > Thanks for the detailed explanation! > > On 30-11-2017 03:51, Niklas Cassel wrote: > > > > Could you try to see if the following patch > > solves your problem: > > (still don't see why it should be needed, > > considering stmmac_tx_clean() should set all non-NULL > > entries to NULL) > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index 1763e48c84e2..1d30b20b3740 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -2830,6 +2830,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff > > *skb, struct net_device *dev) > > > > tx_q->tx_skbuff_dma[first_entry].buf = des; > > tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb); > > + tx_q->tx_skbuff[first_entry] = NULL; > > > > first->des0 = cpu_to_le32(des); > > > > @@ -3003,6 +3004,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, > > struct net_device *dev) > > > > first = desc; > > > > + tx_q->tx_skbuff[first_entry] = NULL; > > + > > enh_desc = priv->plat->enh_desc; > > /* To program the descriptors according to the size of the frame */ > > if (enh_desc) > > I confirm that applying 05cf0d1bf4 ("net: stmmac: free an skb > first when there are no longer any descriptors using it") and > this patch makes my setup work perfectly in multi-queue > configuration. Can you send an official patch to -net? > > You can add my: > Tested-by: Jose Abreu <joab...@synopsys.com> > > Also, maybe we should try to understand why stmmac_tx_clean() is > not setting the entry to NULL?
Hello Jose, It is not easy to decode the databook, however, after reading the databook several times more, looking at: 3.3.2.2 Tx DMA Operation: OSP Mode "The DMA writes the status, with a cleared Own bit, to the corresponding TDES3, thus closing the descriptor." So closing the descriptor means clearing the own bit. Now looking at the mode we are interested in: 3.3.2.1 Tx DMA Operation: Default (Non-OSP) Mode 8. If an Ethernet packet is stored over data buffers in multiple descriptors, the DMA closes the intermediate descriptor and fetches the next descriptor. Steps 3 through 7 are repeated until the end-of-Ethernet-packet data is transferred to the MTL. So I think that my initial understanding was correct after all, i.e., the commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") is needed to make sure that the DMA is not using skbs that might have been freed. How certain are you that this is the offending commit? Your crash is on v4.14-rc5, I can see an interesting commit: 8d5f4b071749 ("stmmac: Don't access tx_q->dirty_tx before netif_tx_lock"), which got included first in v4.14-rc6. I've tried to understand if this could be a use-after-free, but I don't see it. An skb has a certain queue-mapping, so it shouldn't be in more than one queue. tx_q->tx_skbuff is allocated with kmalloc_array(), but is initialized in init_dma_tx_desc_rings(). Other than that, it is only used in stmmac_tso_xmit()/stmmac_xmit() and stmmac_tx_clean(). stmmac_tx_clean() will free and non-NULL skb, and set tx_q->tx_skbuff[entry] to NULL. I don't see why stmmac does: for (i = 0; i < nfrags; i++) { ... tx_q->tx_skbuff[tx_q->cur_tx] = NULL; since it is initialized to NULL in init_dma_tx_desc_rings(), and if it has been cleaned, it should be set to NULL again. If we haven't cleaned for a long time, we will not be able to queue more skbs, since stmmac_tso_xmit()/stmmac_xmit() checks current "clean" entries, see stmmac_tx_avail(). I removed the tx_q->tx_skbuff[tx_q->cur_tx] = NULL from both stmmac_tso_xmit()/stmmac_xmit() locally, and I could not see any problems with ping (different sizes) or with iperf3. Could you try to reproduce the bug with CONFIG_KASAN enabled? (And please don't cut anything from the oops message). Regards, Niklas