> >> > currently all the device driver call > >> > netif_tx_start_all_queues(dev) on open to W/A this issue. which is > >> > strange since only real_num_tx_queues are active. > >> > >> You could also argue that netif_tx_start_all_queues() should only > >> enable the real_num_tx_queues. > >> [Although that would obviously cause all drivers to reach the > >> 'problem' you're currently fixing]. > > > > Yep. Basically what I pointed out. > > > > It seems inconsistent to have loops using num_tx_queues, and others > > using real_num_tx_queues. > > > > Instead of 'fixing' one of them, we should take a deeper look, even if > > the change looks fine. > > > > num_tx_queues should be used in code that runs once, like > > netdev_lockdep_set_classes(), but other loops should probably use > > real_num_tx_queues. > > > > Anyway all these changes should definitely target net-next, not net > > tree. > > > > But for the long term, you have a point. > We will consider a deeper fix for net-next as you suggested, and drop this > temporary fix.
I think we've actually managed to hit an issue with qede [& modified bnx2x] due to netif_tx_start_all_queues() starting all Tx-queues - While reducing the number of channels on an interface driver reloads following which the xmit function receives an SKB using a too-high txq. Investigation seem to indicate that some TCP traffic arrived during the reload, got enqueued on the qdisc with high txq and then got transmitted as-is after re-enabling tx. [Removing the modulo from bnx2x's select_queue() lead to same issue.]