On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > > > On Thu, 6 Jun 2013, shuox....@intel.com wrote: > > > > From: Zhang Yanmin <yanmin.zh...@intel.com> > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If using this > > > > function while holding a resource, the IRQ handler may cause deadlock. > > > > > > > > Here we add a new function irq_in_progress which doesn't wait for the > > > > handlers > > > > to be finished. > > > > > > > > A typical use case at suspend-to-ram: > > > > > > > > device driver's irq handler is complicated and might hold a mutex at > > > > rare cases. > > > > Its suspend function is called and a suspended flag is set. > > > > In case its IRQ handler is running, suspend function calls > > > > irq_in_progress. if > > > > handler is running, abort suspend. > > > > The irq handler checks the suspended flag. If the device is suspended, > > > > irq handler > > > > either ignores the interrupt, or wakes up the whole system, and the > > > > driver's > > > > resume function could deal with the delayed interrupt handling. > > > > > > This is as wrong as it can be. Fix the driver instead of hacking racy > > > functions into the core code. > > > > > > So your problem looks like this: > > > > > > CPU 0 CPU 1 > > > irq_handler_thread() suspend() > > > ..... mutex_lock(&m); > > > mutex_lock(&m); synchronize_irq(); > > > > > > So why needs the mutex to be taken before synchronize_irq()? Why not > > > doing the obvious? > > > > > > suspend() > > > disable_irq(); (Implies synchronize_irq) > > > mutex_lock(&m); > > > .... > > > mutex_unlock(&m); > > > enable_irq(); > > Thanks for the kind comment. > > > > We do consider your solution before and it works well indeed with some > > specific > > simple drivers. For example, some drives use GPIO pin as interrupt source. > > > > On one specific platform, disable_irq would really disable the irq at RTE > > entry, > > which means we lose the wakeup capability of this device. > > synchronize_irq can be another solution. But we did hit 'DPM device > > timeout' issue > > reported by dpm_wd_handler at suspend-to-ram. > > > > The driver is complicated. We couldn't change too many functions to > > optimize it. > > In addition, we have to use the driver instead of throwing it away. > > What is preventing you from rewriting it to work properly? It's complicated.
> > > With irq_in_progress, we can resolve this issue and it does work, although > > it > > looks like ugly. > > Don't paper over driver bugs in the core kernel, fix the driver. It's hard to say it's a driver bug. Could generic kernel provide some flexibility for such complicated drivers? For example, any driver's suspend can return error and the whole suspend-to-ram aborts. Can we say the driver has a bug? If so, why not to change all suspend/resume callbacks to return VOID? Current kernel already allows drivers to abort suspend-to-ram at some rare corner cases. Of course, if the abort happens frequently, it's a bug and we need fix it in driver. If it happens rarely, can we provide some flexibility for the driver? Thanks, Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/