On 2020/9/2 12:41, Cong Wang wrote: > On Tue, Sep 1, 2020 at 6:42 PM Yunsheng Lin <linyunsh...@huawei.com> wrote: >> >> On 2020/9/2 2:24, Cong Wang wrote: >>> On Mon, Aug 31, 2020 at 5:59 PM Yunsheng Lin <linyunsh...@huawei.com> wrote: >>>> >>>> Currently there is concurrent reset and enqueue operation for the >>>> same lockless qdisc when there is no lock to synchronize the >>>> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in >>>> qdisc_deactivate() called by dev_deactivate_queue(), which may cause >>>> out-of-bounds access for priv->ring[] in hns3 driver if user has >>>> requested a smaller queue num when __dev_xmit_skb() still enqueue a >>>> skb with a larger queue_mapping after the corresponding qdisc is >>>> reset, and call hns3_nic_net_xmit() with that skb later. >>> >>> Can you be more specific here? Which call path requests a smaller >>> tx queue num? If you mean netif_set_real_num_tx_queues(), clearly >>> we already have a synchronize_net() there. >> >> When the netdevice is in active state, the synchronize_net() seems to >> do the correct work, as below: >> >> CPU 0: CPU1: >> __dev_queue_xmit() netif_set_real_num_tx_queues() >> rcu_read_lock_bh(); >> netdev_core_pick_tx(dev, skb, sb_dev); >> . >> . dev->real_num_tx_queues = txq; >> . . >> . . >> . synchronize_net(); >> . . >> q->enqueue() . >> . . >> rcu_read_unlock_bh() . >> qdisc_reset_all_tx_gt >> >> > > Right. > > >> but dev->real_num_tx_queues is not RCU-protected, maybe that is a problem >> too. >> >> The problem we hit is as below: >> In hns3_set_channels(), hns3_reset_notify(h, HNAE3_DOWN_CLIENT) is called >> to deactive the netdevice when user requested a smaller queue num, and >> txq->qdisc is already changed to noop_qdisc when calling >> netif_set_real_num_tx_queues(), so the synchronize_net() in the function >> netif_set_real_num_tx_queues() does not help here. > > How could qdisc still be running after deactivating the device?
qdisc could be running during the device deactivating process. The main process of changing channel number is as below: 1. dev_deactivate() 2. hns3 handware related setup 3. netif_set_real_num_tx_queues() 4. netif_tx_wake_all_queues() 5. dev_activate() During step 1, qdisc could be running while qdisc is resetting, so there could be skb left in the old qdisc(which will be restored back to txq->qdisc during dev_activate()), as below: CPU 0: CPU1: __dev_queue_xmit(): dev_deactivate_many(): rcu_read_lock_bh(); qdisc_deactivate(qdisc); q = rcu_dereference_bh(txq->qdisc); . netdev_core_pick_tx(dev, skb, sb_dev); . . . rcu_assign_pointer(dev_queue->qdisc, qdisc_default); . . . . . . . . q->enqueue() . . . rcu_read_unlock_bh() . And During step 3, txq->qdisc is pointing to noop_qdisc, so the qdisc_reset() only reset the noop_qdisc, but not the actual qdisc, which is stored in txq->qdisc_sleeping, so the actual qdisc may still have skb. When hns3_link_status_change() call step 4 and 5, it will restore all queue's qdisc using txq->qdisc_sleeping and schedule all queue with net_tx_action(). The skb enqueued in step 1 may be dequeued and run, which cause the problem. > > >> >>> >>>> >>>> Avoid the above concurrent op by calling synchronize_rcu_tasks() >>>> after assigning new qdisc to dev_queue->qdisc and before calling >>>> qdisc_deactivate() to make sure skb with larger queue_mapping >>>> enqueued to old qdisc will always be reset when qdisc_deactivate() >>>> is called. >>> >>> Like Eric said, it is not nice to call such a blocking function when >>> we have a large number of TX queues. Possibly we just need to >>> add a synchronize_net() as in netif_set_real_num_tx_queues(), >>> if it is missing. >> >> As above, the synchronize_net() in netif_set_real_num_tx_queues() seems >> to work when netdevice is in active state, but does not work when in >> deactive. > > Please explain why deactivated device still has qdisc running? > > At least before commit 379349e9bc3b4, we always test deactivate > bit before enqueueing. Are you complaining about that commit? > That commit is indeed suspicious, at least it does not precisely revert > commit ba27b4cdaaa66561aaedb21 as it claims. I am not familiar with TCQ_F_CAN_BYPASS. >From my understanding, the problem is that there is no order between qdisc enqueuing and qdisc reset. > > >> >> And we do not want skb left in the old qdisc when netdevice is deactived, >> right? > > Yes, and more importantly, qdisc should not be running after deactivation. > > Thanks. > . >