On Tue, 22 Oct 2019 13:32:35 +0800, Zhu Yanjun wrote: > On 2019/10/21 23:33, Jakub Kicinski wrote: > > On Mon, 21 Oct 2019 17:56:06 +0800, Zhu Yanjun wrote: > >> On 2019/10/19 6:48, Jakub Kicinski wrote: > >>> On Fri, 18 Oct 2019 06:01:25 -0400, Zhu Yanjun wrote: > >>>> This change adds support for xmit_more based on the igb commit > >>>> 6f19e12f6230 > >>>> ("igb: flush when in xmit_more mode and under descriptor pressure") and > >>>> commit 6b16f9ee89b8 ("net: move skb->xmit_more hint to softnet data") > >>>> that > >>>> were made to igb to support this feature. The function netif_xmit_stopped > >>>> is called to check if transmit queue on device is currently unable to > >>>> send > >>>> to determine if we must write the tail because we can add no further > >>>> buffers. > >>>> When normal packets and/or xmit_more packets fill up tx_desc, it is > >>>> necessary to trigger NIC tx reg. > >>> Looks broken. You gotta make sure you check the kick on _every_ return > >>> path. There are 4 return statements in each function, you only touched > >>> 2. > >> In nv_start_xmit, > >> > >> [...] > >> > >> The above are dma_mapping_error. It seems that triggering NIC HW xmit is > >> not needed. > >> > >> So when "tx_desc full" error, HW NIC xmit is triggerred. When > >> dma_mapping_error, > >> > >> NIC HW xmit is not triggerred. > >> > >> That is why only 2 "return" are touched. > > Imagine you have the following sequence of frames: > > > > skbA | xmit_more() == true > > skbB | xmit_more() == true > > skbC | xmit_more() == true > > skbD | xmit_more() == false > > > > A, B, and C got queued successfully but the driver didn't kick the > > queue because of xmit_more(). Now D gets dropped due to a DMA error. > > Queue never gets kicked. > > DMA error is a complicated problem. We will delve into this problem later. > > From the above commit log, this commit is based on the igb commit > 6f19e12f6230 > ("igb: flush when in xmit_more mode and under descriptor pressure") and > commit 6b16f9ee89b8 ("net: move skb->xmit_more hint to softnet data"). > > It seems that the 2 commits did not consider the DMA errors that you > mentioned.
Then igb is buggy, too.