On Jan 6, 2008 9:27 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > On Sat, Jan 05, 2008 at 03:52:32PM +0100, Torsten Kaiser wrote: > ... > > So my personal conclusion would be, that someone is writing to memory > > that he no longer owns. Most probably 0-bytes. (the complete_routine > > got NULLed and the warning about dst->__refcnt being 0). > > > > Use-after-free or something else? > > I agree: your conclusion seems to be the most probable explanation for > this. Then it could be really hard to solve this without bisection or > something similar. But there is some probabability this something could > try kfree later too, but simply this list debugging triggers earlier.
As for example in the case when it dies in ieee1394-thread the list is so corrupted that it will die anyway. But I might try this anyway, as I don't really have a better idee. > > > > If you think some other slub_debug might catch it, I would try this... > > You can try to add "U" to these other slub_debug options. As a matter > of fact, if your above diagnose is right, it seems you risk to damage > your system or even the box with these tests, so if you want to > continue, you should probably turn any possible debugging on (not in > mm only). I did not add U, because I thought that would only needed to trace memory leaks. And I hoped that using P (poison) would catch any later use (after free). > BTW, you've written that some debugging options seem to delay the bug. > Since they often change sizes of some structures than such wrong > writes could have some 'safer' offsets. So, this could really delay > e.g. these list's bugs, but maybe this could also let to stay 'alive' > to such wrong kfree? I think this bug is highly timing dependent. Its not always the same package that dies and as this is a SMP system I would guess two CPUs using the same data will trigger this. And using the poison-option will definitily slow the system down and mess up the timings. What also speaks against the 'safer' offsets is, that after adding my notfreed-byte to skbuff the bug still triggered in the same way. I'm currently looking at http://www.mail-archive.com/[EMAIL PROTECTED]/msg12702.html ,trying to understand if this is relevant for me on x86_64. Torsten -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html