On Tue, Apr 25, 2017 at 9:42 PM, Michael Ma <make0...@gmail.com> wrote: > 2017-04-18 21:46 GMT-07:00 Michael Ma <make0...@gmail.com>: >> 2017-04-18 16:12 GMT-07:00 Cong Wang <xiyou.wangc...@gmail.com>: >>> On Mon, Apr 17, 2017 at 5:39 PM, Michael Ma <make0...@gmail.com> wrote: >>>> Hi - >>>> >>>> We've implemented a "glue" qdisc similar to mqprio which can associate >>>> one qdisc to multiple txqs as the root qdisc. Reference count of the >>>> child qdiscs have been adjusted properly in this case so that it >>>> represents the number of txqs it has been attached to. However when >>>> sending packets we saw the skb from dequeue_skb() corrupted with the >>>> following call stack: >>>> >>>> [exception RIP: netif_skb_features+51] >>>> RIP: ffffffff815292b3 RSP: ffff8817f6987940 RFLAGS: 00010246 >>>> >>>> #9 [ffff8817f6987968] validate_xmit_skb at ffffffff815294aa >>>> #10 [ffff8817f69879a0] validate_xmit_skb at ffffffff8152a0d9 >>>> #11 [ffff8817f69879b0] __qdisc_run at ffffffff8154a193 >>>> #12 [ffff8817f6987a00] dev_queue_xmit at ffffffff81529e03 >>>> >>>> It looks like the skb has already been released since its dev pointer >>>> field is invalid. >>>> >>>> Any clue on how this can be investigated further? My current thought >>>> is to add some instrumentation to the place where skb is released and >>>> analyze whether there is any race condition happening there. However >>> >>> Either dropwatch or perf could do the work to instrument kfree_skb(). >> >> Thanks - will try it out. > > I'm using perf to collect the callstack for kfree_skb and trying to > correlate that with the corrupted SKB address however when system > crashes the perf.data file is also corrupted - how can I view this > file in case the system crashes before perf exits?
Hmm, KASAN is pretty good at detecting use-after-free, its report can nicely shows where we allocate/free it and the use after free. https://01.org/linuxgraphics/gfx-docs/drm/dev-tools/kasan.html