On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > On Thu, 6 Jun 2013, shuox....@intel.com wrote:
> > > > > > From: Zhang Yanmin <yanmin.zh...@intel.com>
> > > > > > 
> > > > > > synchronize_irq waits pending IRQ handlers to be finished. If using 
> > > > > > this
> > > > > > function while holding a resource, the IRQ handler may cause 
> > > > > > deadlock.
> > > > > > 
> > > > > > Here we add a new function irq_in_progress which doesn't wait for 
> > > > > > the handlers
> > > > > > to be finished.
> > > > > > 
> > > > > > A typical use case at suspend-to-ram:
> > > > > > 
> > > > > > device driver's irq handler is complicated and might hold a mutex 
> > > > > > at rare cases.
> > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > irq_in_progress. if
> > > > > > handler is running, abort suspend.
> > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > suspended, irq handler
> > > > > > either ignores the interrupt, or wakes up the whole system, and the 
> > > > > > driver's
> > > > > > resume function could deal with the delayed interrupt handling.
> > > > > 
> > > > > This is as wrong as it can be. Fix the driver instead of hacking racy
> > > > > functions into the core code.
> > > > > 
> > > > > So your problem looks like this:
> > > > > 
> > > > > CPU 0                         CPU 1
> > > > > irq_handler_thread()          suspend()
> > > > >    .....                      mutex_lock(&m);
> > > > >    mutex_lock(&m);            synchronize_irq();
> > > > > 
> > > > > So why needs the mutex to be taken before synchronize_irq()? Why not
> > > > > doing the obvious?
> > > > > 
> > > > > suspend()
> > > > >   disable_irq(); (Implies synchronize_irq)
> > > > >   mutex_lock(&m);
> > > > >   ....
> > > > >   mutex_unlock(&m);
> > > > >   enable_irq();
> > > > Thanks for the kind comment.
> > > > 
> > > > We do consider your solution before and it works well indeed with some 
> > > > specific
> > > > simple drivers. For example, some drives use GPIO pin as interrupt 
> > > > source.
> > > > 
> > > > On one specific platform, disable_irq would really disable the irq at 
> > > > RTE entry,
> > > > which means we lose the wakeup capability of this device.
> > > > synchronize_irq can be another solution. But we did hit 'DPM device 
> > > > timeout' issue
> > > > reported by dpm_wd_handler at suspend-to-ram.
> > > > 
> > > > The driver is complicated. We couldn't change too many functions to 
> > > > optimize it.
> > > > In addition, we have to use the driver instead of throwing it away.
> > > 
> > > What is preventing you from rewriting it to work properly?
> > It's complicated.
> 
> That sounds like your issue, not ours, so please don't push the problem
> onto someone else.  Take ownership of the driver, fix it up, and all
> will be good.
> 
> 
> > > > With irq_in_progress, we can resolve this issue and it does work, 
> > > > although it
> > > > looks like ugly.
> > > 
> > > Don't paper over driver bugs in the core kernel, fix the driver.
> > It's hard to say it's a driver bug. Could generic kernel provide some
> > flexibility for such complicated drivers?
> 
> Please post the code for the driver, and then we will be glad to
> continue the dicussion.
Greg,

Thanks for your enthusiasm. It's a private driver for products.

Let's stop here and I wouldn't push the patch to upstream.

Thanks all.
Yanmin

> 
> As for "complicated", just make it "uncomplicated", it's just code :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to