On 07.12.2016 22:43, Lino Sanfilippo wrote: > Hi Pavel, > > On 07.12.2016 22:37, Pavel Machek wrote: >> On Wed 2016-12-07 21:05:38, Lino Sanfilippo wrote: >>> The driver uses a private lock for synchronization between the xmit >>> function and the xmit completion handler, but since the NETIF_F_LLTX flag >>> is not set, the xmit function is also called with the xmit_lock held. >>> >>> On the other hand the xmit completion handler first takes the private lock >>> and (in case that the tx queue has been stopped) the xmit_lock, leading to >>> a reverse locking order and the potential danger of a deadlock. >>> >>> Fix this by removing the private lock completely and synchronizing the xmit >>> function and completion handler solely by means of the xmit_lock. By doing >>> this remove also the now unnecessary double check for a stopped tx queue. >>> >> >> FYI, here's modified version. I believe _bh versions are needed, and >> I'm testing that version now. (Oh and I also ported it to net-next). >> >> It survived 30 minutes of testing so far... >> > > First off, thanks for testing. > Hmm. I dont understand why _bh would be needed. We call that function from > BH context only (napi poll and timer). > Any idea? >
Could this once again be caused by irq coalescing? When the tx queue has been stopped the cleanup handler has to wakeup the queue within a certain time span, otherwise the watchdog will complain (as it happened in your test). Could you retest this with irq coalescing disabled?