On 2020/9/1 14:48, Eric Dumazet wrote: > > > On 8/31/20 5:55 PM, Yunsheng Lin wrote: >> Currently there is concurrent reset and enqueue operation for the >> same lockless qdisc when there is no lock to synchronize the >> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in >> qdisc_deactivate() called by dev_deactivate_queue(), which may cause >> out-of-bounds access for priv->ring[] in hns3 driver if user has >> requested a smaller queue num when __dev_xmit_skb() still enqueue a >> skb with a larger queue_mapping after the corresponding qdisc is >> reset, and call hns3_nic_net_xmit() with that skb later. >> >> Avoid the above concurrent op by calling synchronize_rcu_tasks() >> after assigning new qdisc to dev_queue->qdisc and before calling >> qdisc_deactivate() to make sure skb with larger queue_mapping >> enqueued to old qdisc will always be reset when qdisc_deactivate() >> is called. >> > > We request Fixes: tag for fixes in networking land.
ok. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") > >> Signed-off-by: Yunsheng Lin <linyunsh...@huawei.com> >> --- >> net/sched/sch_generic.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c >> index 265a61d..6e42237 100644 >> --- a/net/sched/sch_generic.c >> +++ b/net/sched/sch_generic.c >> @@ -1160,8 +1160,13 @@ static void dev_deactivate_queue(struct net_device >> *dev, >> >> qdisc = rtnl_dereference(dev_queue->qdisc); >> if (qdisc) { >> - qdisc_deactivate(qdisc); >> rcu_assign_pointer(dev_queue->qdisc, qdisc_default); >> + >> + /* Make sure lockless qdisc enqueuing is done with the >> + * old qdisc in __dev_xmit_skb(). >> + */ >> + synchronize_rcu_tasks(); > > This seems quite wrong, there is not a single use of synchronize_rcu_tasks() > in net/, > we probably do not want this. > > I bet that synchronize_net() is appropriate, if not please explain/comment > why we want this instead. Using synchronize_net() seems more appropriate here, thanks. > > Adding one synchronize_net() per TX queue is a killer for devices with 128 or > 256 TX queues. > > I would rather find a way of not calling qdisc_reset() from > qdisc_deactivate(). Without calling qdisc_reset(), it seems there will always be skb left in the old qdisc. Is above acceptable? How about below steps to avoid the concurrent op: 1. assign new qdisc to all queue' qdisc(which is noop_qdisc). 2. call synchronize_net(). 3. calling qdisc_reset() with all queue' qdisc_sleeping. And the synchronize_net() in dev_deactivate_many() can be reused to ensure old qdisc is not touched any more when calling qdisc_reset(). > > This lockless pfifo_fast is a mess really. > > > . >