On Tue, Sep 1, 2020 at 11:35 PM Yunsheng Lin <linyunsh...@huawei.com> wrote: > > On 2020/9/2 12:41, Cong Wang wrote: > > On Tue, Sep 1, 2020 at 6:42 PM Yunsheng Lin <linyunsh...@huawei.com> wrote: > >> > >> On 2020/9/2 2:24, Cong Wang wrote: > >>> On Mon, Aug 31, 2020 at 5:59 PM Yunsheng Lin <linyunsh...@huawei.com> > >>> wrote: > >>>> > >>>> Currently there is concurrent reset and enqueue operation for the > >>>> same lockless qdisc when there is no lock to synchronize the > >>>> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in > >>>> qdisc_deactivate() called by dev_deactivate_queue(), which may cause > >>>> out-of-bounds access for priv->ring[] in hns3 driver if user has > >>>> requested a smaller queue num when __dev_xmit_skb() still enqueue a > >>>> skb with a larger queue_mapping after the corresponding qdisc is > >>>> reset, and call hns3_nic_net_xmit() with that skb later. > >>> > >>> Can you be more specific here? Which call path requests a smaller > >>> tx queue num? If you mean netif_set_real_num_tx_queues(), clearly > >>> we already have a synchronize_net() there. > >> > >> When the netdevice is in active state, the synchronize_net() seems to > >> do the correct work, as below: > >> > >> CPU 0: CPU1: > >> __dev_queue_xmit() netif_set_real_num_tx_queues() > >> rcu_read_lock_bh(); > >> netdev_core_pick_tx(dev, skb, sb_dev); > >> . > >> . dev->real_num_tx_queues = txq; > >> . . > >> . . > >> . synchronize_net(); > >> . . > >> q->enqueue() . > >> . . > >> rcu_read_unlock_bh() . > >> qdisc_reset_all_tx_gt > >> > >> > > > > Right. > > > > > >> but dev->real_num_tx_queues is not RCU-protected, maybe that is a problem > >> too. > >> > >> The problem we hit is as below: > >> In hns3_set_channels(), hns3_reset_notify(h, HNAE3_DOWN_CLIENT) is called > >> to deactive the netdevice when user requested a smaller queue num, and > >> txq->qdisc is already changed to noop_qdisc when calling > >> netif_set_real_num_tx_queues(), so the synchronize_net() in the function > >> netif_set_real_num_tx_queues() does not help here. > > > > How could qdisc still be running after deactivating the device? > > qdisc could be running during the device deactivating process. > > The main process of changing channel number is as below: > > 1. dev_deactivate() > 2. hns3 handware related setup > 3. netif_set_real_num_tx_queues() > 4. netif_tx_wake_all_queues() > 5. dev_activate() > > During step 1, qdisc could be running while qdisc is resetting, so > there could be skb left in the old qdisc(which will be restored back to > txq->qdisc during dev_activate()), as below: > > CPU 0: CPU1: > __dev_queue_xmit(): dev_deactivate_many(): > rcu_read_lock_bh(); qdisc_deactivate(qdisc); > q = rcu_dereference_bh(txq->qdisc); . > netdev_core_pick_tx(dev, skb, sb_dev); . > . > . rcu_assign_pointer(dev_queue->qdisc, > qdisc_default); > . . > . . > . . > . . > q->enqueue() .
Well, like I said, if the deactivated bit were tested before ->enqueue(), there would be no packet queued after qdisc_deactivate(). > . . > rcu_read_unlock_bh() . > > And During step 3, txq->qdisc is pointing to noop_qdisc, so the qdisc_reset() > only reset the noop_qdisc, but not the actual qdisc, which is stored in > txq->qdisc_sleeping, so the actual qdisc may still have skb. > > When hns3_link_status_change() call step 4 and 5, it will restore all queue's > qdisc using txq->qdisc_sleeping and schedule all queue with net_tx_action(). > The skb enqueued in step 1 may be dequeued and run, which cause the problem. > > > > > > >> > >>> > >>>> > >>>> Avoid the above concurrent op by calling synchronize_rcu_tasks() > >>>> after assigning new qdisc to dev_queue->qdisc and before calling > >>>> qdisc_deactivate() to make sure skb with larger queue_mapping > >>>> enqueued to old qdisc will always be reset when qdisc_deactivate() > >>>> is called. > >>> > >>> Like Eric said, it is not nice to call such a blocking function when > >>> we have a large number of TX queues. Possibly we just need to > >>> add a synchronize_net() as in netif_set_real_num_tx_queues(), > >>> if it is missing. > >> > >> As above, the synchronize_net() in netif_set_real_num_tx_queues() seems > >> to work when netdevice is in active state, but does not work when in > >> deactive. > > > > Please explain why deactivated device still has qdisc running? > > > > At least before commit 379349e9bc3b4, we always test deactivate > > bit before enqueueing. Are you complaining about that commit? > > That commit is indeed suspicious, at least it does not precisely revert > > commit ba27b4cdaaa66561aaedb21 as it claims. > > I am not familiar with TCQ_F_CAN_BYPASS. > From my understanding, the problem is that there is no order between > qdisc enqueuing and qdisc reset. It has almost nothing to do with BYPASS, my point here is all about __QDISC_STATE_DEACTIVATED bit, clearly commit 379349e9bc3b4 removed this bit test for lockless qdisc, whether intentionally or accidentally. That bit test existed prior to BYPASS test, see commit ba27b4cdaaa665. Thanks.