On Wed, Sep 2, 2020 at 6:22 PM Yunsheng Lin <linyunsh...@huawei.com> wrote: > > On 2020/9/3 8:35, Cong Wang wrote: > > On Tue, Sep 1, 2020 at 11:35 PM Yunsheng Lin <linyunsh...@huawei.com> wrote: > >> > >> On 2020/9/2 12:41, Cong Wang wrote: > >>> On Tue, Sep 1, 2020 at 6:42 PM Yunsheng Lin <linyunsh...@huawei.com> > >>> wrote: > >>>> > >>>> On 2020/9/2 2:24, Cong Wang wrote: > >>>>> On Mon, Aug 31, 2020 at 5:59 PM Yunsheng Lin <linyunsh...@huawei.com> > >>>>> wrote: > >>>>>> > >>>>>> Currently there is concurrent reset and enqueue operation for the > >>>>>> same lockless qdisc when there is no lock to synchronize the > >>>>>> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in > >>>>>> qdisc_deactivate() called by dev_deactivate_queue(), which may cause > >>>>>> out-of-bounds access for priv->ring[] in hns3 driver if user has > >>>>>> requested a smaller queue num when __dev_xmit_skb() still enqueue a > >>>>>> skb with a larger queue_mapping after the corresponding qdisc is > >>>>>> reset, and call hns3_nic_net_xmit() with that skb later. > >>>>> > >>>>> Can you be more specific here? Which call path requests a smaller > >>>>> tx queue num? If you mean netif_set_real_num_tx_queues(), clearly > >>>>> we already have a synchronize_net() there. > >>>> > >>>> When the netdevice is in active state, the synchronize_net() seems to > >>>> do the correct work, as below: > >>>> > >>>> CPU 0: CPU1: > >>>> __dev_queue_xmit() netif_set_real_num_tx_queues() > >>>> rcu_read_lock_bh(); > >>>> netdev_core_pick_tx(dev, skb, sb_dev); > >>>> . > >>>> . dev->real_num_tx_queues = txq; > >>>> . . > >>>> . . > >>>> . synchronize_net(); > >>>> . . > >>>> q->enqueue() . > >>>> . . > >>>> rcu_read_unlock_bh() . > >>>> qdisc_reset_all_tx_gt > >>>> > >>>> > >>> > >>> Right. > >>> > >>> > >>>> but dev->real_num_tx_queues is not RCU-protected, maybe that is a problem > >>>> too. > >>>> > >>>> The problem we hit is as below: > >>>> In hns3_set_channels(), hns3_reset_notify(h, HNAE3_DOWN_CLIENT) is called > >>>> to deactive the netdevice when user requested a smaller queue num, and > >>>> txq->qdisc is already changed to noop_qdisc when calling > >>>> netif_set_real_num_tx_queues(), so the synchronize_net() in the function > >>>> netif_set_real_num_tx_queues() does not help here. > >>> > >>> How could qdisc still be running after deactivating the device? > >> > >> qdisc could be running during the device deactivating process. > >> > >> The main process of changing channel number is as below: > >> > >> 1. dev_deactivate() > >> 2. hns3 handware related setup > >> 3. netif_set_real_num_tx_queues() > >> 4. netif_tx_wake_all_queues() > >> 5. dev_activate() > >> > >> During step 1, qdisc could be running while qdisc is resetting, so > >> there could be skb left in the old qdisc(which will be restored back to > >> txq->qdisc during dev_activate()), as below: > >> > >> CPU 0: CPU1: > >> __dev_queue_xmit(): dev_deactivate_many(): > >> rcu_read_lock_bh(); qdisc_deactivate(qdisc); > >> q = rcu_dereference_bh(txq->qdisc); . > >> netdev_core_pick_tx(dev, skb, sb_dev); . > >> . > >> . > >> rcu_assign_pointer(dev_queue->qdisc, qdisc_default); > >> . . > >> . . > >> . . > >> . . > >> q->enqueue() . > > > > > > Well, like I said, if the deactivated bit were tested before ->enqueue(), > > there would be no packet queued after qdisc_deactivate(). > > Only if the deactivated bit testing is also protected by qdisc->seqlock? > otherwise there is still window between setting and testing the deactivated > bit.
Can you be more specific here? Why testing or setting a bit is not atomic? AFAIU, qdisc->seqlock is an optimization to replace __QDISC_STATE_RUNNING, which has nothing to do with deactivate bit. Thanks.