On Fri, Jul 10, 2020 at 7:40 AM Petr Machata <pe...@mellanox.com> wrote: > > > Cong Wang <xiyou.wangc...@gmail.com> writes: > > > On Wed, Jul 8, 2020 at 5:13 PM Petr Machata <pe...@mellanox.com> wrote: > >> > >> > >> Petr Machata <pe...@mellanox.com> writes: > >> > >> > Cong Wang <xiyou.wangc...@gmail.com> writes: > >> > > >> > I'll think about it some more. For now I will at least fix the lack of > >> > locking. > >> > >> I guess I could store smp_processor_id() that acquired the lock in > >> struct qdisc_skb_head. Do a trylock instead of lock, and on fail check > >> the stored value. I'll need to be careful about the race between > >> unsuccessful trylock and the test, and about making sure CPU ID doesn't > >> change after it is read. I'll probe this tomorrow. > > > > Like __netif_tx_lock(), right? Seems doable. > > Good to see it actually used, I wasn't sure if the idea made sense :) > > Unfortunately it is not enough. > > Consider two threads (A, B) and two netdevices (eth0, eth1): > > - "A" takes eth0's root lock and proceeds to classification > - "B" takes eth1's root lock and proceeds to classification > - "A" invokes mirror to eth1, waits on lock held by "B" > - "B" invakes mirror to eth0, waits on lock held by "A" > - Some say they are still waiting to this day.
Sure, AA or ABBA deadlock. > > So one option that I see is to just stash the mirrored packet in a queue > instead of delivering it right away: > > - s/netif_receive_skb/netif_rx/ in act_mirred > > - Reuse the RX queue for TX packets as well, differentiating the two by > a bit in SKB CB. Then process_backlog() would call either > __netif_receive_skb() or dev_queue_transmit(). > > - Drop mirred_rec_level guard. I don't think I follow you, the root qdisc lock is on egress which has nothing to do with ingress, so I don't see how netif_rx() is even involved. > > This seems to work, but I might be missing something non-obvious, such > as CB actually being used for something already in that context. I would > really rather not introduce a second backlog queue just for mirred > though. > > Since mirred_rec_level does not kick in anymore, the same packet can end > up being forwarded from the backlog queue, to the qdisc, and back to the > backlog queue, forever. But that seems OK, that's what the admin > configured, so that's what's happening. > > If this is not a good idea for some reason, this might work as well: > > - Convert the current root lock to an rw lock. Convert all current > lockers to write lock (which should be safe), except of enqueue, which > will take read lock. That will allow many concurrent threads to enter > enqueue, or one thread several times, but it will exclude all other > users. Are you sure we can parallelize enqueue()? They all need to move skb into some queue, which is not able to parallelize with just a read lock. Even the "lockless" qdisc takes a spinlock, r->producer_lock, for enqueue(). > > So this guards configuration access to the qdisc tree, makes sure > qdiscs don't go away from under one's feet. > > - Introduce another spin lock to guard the private data of the qdisc > tree, counters etc., things that even two concurrent enqueue > operations shouldn't trample on. Enqueue takes this spin lock after > read-locking the root lock. act_mirred drops it before injecting the > packet and takes it again afterwards. > > Any opinions y'all? I thought about forbidding mirror/redirecting to the same device, but there might be some legitimate use cases of such. So, I don't have any other ideas yet, perhaps there is some way to refactor dev_queue_xmit() to avoid this deadlock. Thanks.