Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-20 Thread Jarek Poplawski
On Fri, Feb 16, 2007 at 08:06:25AM -0800, Ben Greear wrote: ... > Well, I had lockdep and all of the locking debugging I could find > enabled, but > it did not catch this problem..I had to use sysctl -t and manually dig > through the backtraces > to find the deadlock > > It may be that lockd

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Ben Greear
Stephen Hemminger wrote: On Thu, 15 Feb 2007 23:40:32 -0800 Ben Greear <[EMAIL PROTECTED]> wrote: Maybe there should be something like an ASSERT_NOT_RTNL() in the flush_scheduled_work() method? If it's performance criticial, #ifdef it out if we're not debugging locks? You can't safely ad

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Stephen Hemminger
On Thu, 15 Feb 2007 23:40:32 -0800 Ben Greear <[EMAIL PROTECTED]> wrote: > Jarek Poplawski wrote: > > On 14-02-2007 22:27, Stephen Hemminger wrote: > > > >> Ben found this but the problem seems pretty widespread. > >> > >> The following places are subject to deadlock between flush_scheduled_wor

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Ben Greear
Jarek Poplawski wrote: On Fri, Feb 16, 2007 at 10:04:25AM +0100, Jarek Poplawski wrote: On Fri, Feb 16, 2007 at 12:23:05AM -0800, Ben Greear wrote: ... I don't see how asserting it in the rtnl_lock would help anything, because at that point we are about to deadlock anyway... (and t

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Jarek Poplawski
On Fri, Feb 16, 2007 at 10:04:25AM +0100, Jarek Poplawski wrote: > On Fri, Feb 16, 2007 at 12:23:05AM -0800, Ben Greear wrote: ... > > I don't see how asserting it in the rtnl_lock would help anything, > > because at that > > point we are about to deadlock anyway... (and this is probably very >

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Jarek Poplawski
On Fri, Feb 16, 2007 at 12:23:05AM -0800, Ben Greear wrote: > Jarek Poplawski wrote: > >On Thu, Feb 15, 2007 at 11:40:32PM -0800, Ben Greear wrote: > >... > > > >>Maybe there should be something like an ASSERT_NOT_RTNL() in the > >>flush_scheduled_work() > >>method? If it's performance criticia

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Ben Greear
Jarek Poplawski wrote: On Thu, Feb 15, 2007 at 11:40:32PM -0800, Ben Greear wrote: ... Maybe there should be something like an ASSERT_NOT_RTNL() in the flush_scheduled_work() method? If it's performance criticial, #ifdef it out if we're not debugging locks? Yes! I thought about the s

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Jarek Poplawski
On Thu, Feb 15, 2007 at 11:40:32PM -0800, Ben Greear wrote: ... > Maybe there should be something like an ASSERT_NOT_RTNL() in the > flush_scheduled_work() > method? If it's performance criticial, #ifdef it out if we're not > debugging locks? Yes! I thought about the same (at first). But in my

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-15 Thread Ben Greear
Jarek Poplawski wrote: On 14-02-2007 22:27, Stephen Hemminger wrote: Ben found this but the problem seems pretty widespread. The following places are subject to deadlock between flush_scheduled_work and the RTNL mutex. What can happen is that a work queue routine (like bridge port_carrier_ch

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-15 Thread Jarek Poplawski
On 14-02-2007 22:27, Stephen Hemminger wrote: > Ben found this but the problem seems pretty widespread. > > The following places are subject to deadlock between flush_scheduled_work > and the RTNL mutex. What can happen is that a work queue routine (like > bridge port_carrier_check) is waiting for

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-15 Thread Ben Greear
Francois Romieu wrote: Ben Greear <[EMAIL PROTECTED]> : [...] I seem to be able to trigger this within about 1 minute on a particular 2.6.18.2 system with some 8139too devices, so if someone has a patch that could be tested, I'll gladly test it. For whatever reason, I haven't hit this problem o

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-14 Thread Francois Romieu
Ben Greear <[EMAIL PROTECTED]> : [...] > I seem to be able to trigger this within about 1 minute on a > particular 2.6.18.2 system with some 8139too devices, so if someone > has a patch that could be tested, I'll gladly test it. For > whatever reason, I haven't hit this problem on 2.6.20 yet, but

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-14 Thread Ben Greear
Stephen Hemminger wrote: Ben found this but the problem seems pretty widespread. The following places are subject to deadlock between flush_scheduled_work and the RTNL mutex. What can happen is that a work queue routine (like bridge port_carrier_check) is waiting forever for RTNL, and the driver

[BUG] RTNL and flush_scheduled_work deadlocks

2007-02-14 Thread Stephen Hemminger
Ben found this but the problem seems pretty widespread. The following places are subject to deadlock between flush_scheduled_work and the RTNL mutex. What can happen is that a work queue routine (like bridge port_carrier_check) is waiting forever for RTNL, and the driver routine has called flush_s