On Mon, Apr 17, 2017 at 5:39 PM, Michael Ma <make0...@gmail.com> wrote: > Hi - > > We've implemented a "glue" qdisc similar to mqprio which can associate > one qdisc to multiple txqs as the root qdisc. Reference count of the > child qdiscs have been adjusted properly in this case so that it > represents the number of txqs it has been attached to. However when > sending packets we saw the skb from dequeue_skb() corrupted with the > following call stack: > > [exception RIP: netif_skb_features+51] > RIP: ffffffff815292b3 RSP: ffff8817f6987940 RFLAGS: 00010246 > > #9 [ffff8817f6987968] validate_xmit_skb at ffffffff815294aa > #10 [ffff8817f69879a0] validate_xmit_skb at ffffffff8152a0d9 > #11 [ffff8817f69879b0] __qdisc_run at ffffffff8154a193 > #12 [ffff8817f6987a00] dev_queue_xmit at ffffffff81529e03 > > It looks like the skb has already been released since its dev pointer > field is invalid. > > Any clue on how this can be investigated further? My current thought > is to add some instrumentation to the place where skb is released and > analyze whether there is any race condition happening there. However
Either dropwatch or perf could do the work to instrument kfree_skb(). > by looking through the existing code I think the case where one root > qdisc is associated with multiple txqs already exists (when mqprio is > not used) so not sure why it won't work when we group txqs and assign > each group a root qdisc. Any insight on this issue would be much > appreciated! How do you implement ->attach()? How does it work with netdev_pick_tx()?