On Fri, 2013-06-07 at 08:08 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
> > > On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
> > > > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> > > > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > > > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > > > > > On Thu, 6 Jun 2013, shuox....@intel.com wrote:
> > > > > > > > > > From: Zhang Yanmin <yanmin.zh...@intel.com>
> > > > > > > > > > 
> > > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. 
> > > > > > > > > > If using this
> > > > > > > > > > function while holding a resource, the IRQ handler may 
> > > > > > > > > > cause deadlock.
> > > > > > > > > > 
> > > > > > > > > > Here we add a new function irq_in_progress which doesn't 
> > > > > > > > > > wait for the handlers
> > > > > > > > > > to be finished.
> > > > > > > > > > 
> > > > > > > > > > A typical use case at suspend-to-ram:
> > > > > > > > > > 
> > > > > > > > > > device driver's irq handler is complicated and might hold a 
> > > > > > > > > > mutex at rare cases.
> > > > > > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > > > > > irq_in_progress. if
> > > > > > > > > > handler is running, abort suspend.
> > > > > > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > > > > > suspended, irq handler
> > > > > > > > > > either ignores the interrupt, or wakes up the whole system, 
> > > > > > > > > > and the driver's
> > > > > > > > > > resume function could deal with the delayed interrupt 
> > > > > > > > > > handling.
> > > > > > > > > 
> > > > > > > > > This is as wrong as it can be. Fix the driver instead of 
> > > > > > > > > hacking racy
> > > > > > > > > functions into the core code.
> > > > > > > > > 
> > > > > > > > > So your problem looks like this:
> > > > > > > > > 
> > > > > > > > > CPU 0                         CPU 1
> > > > > > > > > irq_handler_thread()          suspend()
> > > > > > > > >    .....                      mutex_lock(&m);
> > > > > > > > >    mutex_lock(&m);            synchronize_irq();
> > > > > > > > > 
> > > > > > > > > So why needs the mutex to be taken before synchronize_irq()? 
> > > > > > > > > Why not
> > > > > > > > > doing the obvious?
> > > > > > > > > 
> > > > > > > > > suspend()
> > > > > > > > >   disable_irq(); (Implies synchronize_irq)
> > > > > > > > >   mutex_lock(&m);
> > > > > > > > >   ....
> > > > > > > > >   mutex_unlock(&m);
> > > > > > > > >   enable_irq();
> > > > > > > > Thanks for the kind comment.
> > > > > > > > 
> > > > > > > > We do consider your solution before and it works well indeed 
> > > > > > > > with some specific
> > > > > > > > simple drivers. For example, some drives use GPIO pin as 
> > > > > > > > interrupt source.
> > > > > > > > 
> > > > > > > > On one specific platform, disable_irq would really disable the 
> > > > > > > > irq at RTE entry,
> > > > > > > > which means we lose the wakeup capability of this device.
> > > > > > > > synchronize_irq can be another solution. But we did hit 'DPM 
> > > > > > > > device timeout' issue
> > > > > > > > reported by dpm_wd_handler at suspend-to-ram.
> > > > > > > > 
> > > > > > > > The driver is complicated. We couldn't change too many 
> > > > > > > > functions to optimize it.
> > > > > > > > In addition, we have to use the driver instead of throwing it 
> > > > > > > > away.
> > > > > > > 
> > > > > > > What is preventing you from rewriting it to work properly?
> > > > > > It's complicated.
> > > > > 
> > > > > That sounds like your issue, not ours, so please don't push the 
> > > > > problem
> > > > > onto someone else.  Take ownership of the driver, fix it up, and all
> > > > > will be good.
> > > > > 
> > > > > 
> > > > > > > > With irq_in_progress, we can resolve this issue and it does 
> > > > > > > > work, although it
> > > > > > > > looks like ugly.
> > > > > > > 
> > > > > > > Don't paper over driver bugs in the core kernel, fix the driver.
> > > > > > It's hard to say it's a driver bug. Could generic kernel provide 
> > > > > > some
> > > > > > flexibility for such complicated drivers?
> > > > > 
> > > > > Please post the code for the driver, and then we will be glad to
> > > > > continue the dicussion.
> > > > Greg,
> > > > 
> > > > Thanks for your enthusiasm. It's a private driver for products.
> > > 
> > > What do you mean "private driver"?  All drivers need to be merged into
> > > the mainline kernel, it saves you time and money, and we will fix your
> > > bugs for you.
> > > 
> > > You know that, your bosses know that, your management knows that, so why
> > > are you trying to hide things?
> > > 
> > > totally confused,
> > They are embedded device drivers, highly depending on specific devices 
> > which runs
> > its own firmware in devices. Here the kernel drivers run at AP side.
> 
> That's no different from loads of drivers that we have in the kernel
> today, no need to keep them from being merged, please submit them.
It need managers' approval and is beyond my capability.

> 
> > One example is Graphics driver, which is very big and coding is not 
> > friendly. Kernel
> > experts can raise tons of questions against the driver, but we have to make 
> > the driver
> > work well on real products.
> 
> So you are saying that "kernel experts" don't ask questions that
> actually make drivers "work well on real products"?
Obviously, I didn't say that. Pls. also remove the ", as I really
think I'm talking to kernel experts indeed.

>   If that's how you
> feel about the community, then why are you asking the community for help
> in the first place?
1) If upstream merges the patch, we wouldn't always port it to new kernel when
we upgrade our kernel to the latest.
2) Benefit others;

> 
> And do you somehow think that we don't know how to review/write/fix
> graphics drivers?
You know that.

>   Who do you think made Linux in the first place?
All guys who work in linux community proactively.

> 
> > Another reason is drivers need workaround many hardware issues. That makes 
> > it
> > hard to implement kernel drivers in good shape sometimes.
> 
> That's hogwash, we deal with that every single day, in almost every
> single driver we write.  That's why we have a kernel in the first place,
> to fix hardware problems and provide a unified interface to userspace so
> that it does NOT have to deal with hardware problems and differences.
Right. Linux kernel is good, but not perfect.
It's a trade-off between perfect and real utilization.

> 
> > We need support all cases.
> 
> What "cases"?
> 
> > We fixed lots of hard issues on embedded products and think if kernel could 
> > be more
> > flexible to support complicated cases.
> 
> Do you think that Linux is not "flexible"?  It runs on more processors,
> and more system configurations than any other operating system ever has
> in the history of computing.  How is that not "complicated"?
> 
> No one is forcing you to use Linux, so if you don't want to participate
> by providing your drivers and accepting feedback, don't expect us to
> change the core for drivers we have never seen and feel are broken.  It
> doesn't work that way.
> 
> I think you need to spend some time with your company's Linux community
> development training programs, it's pretty obvious that you don't
> understand how the Linux kernel development process works at all.
It seems I could only say that: Linux kernel is perfect. It has no fault.

We just submitted a patch to provide a method to support a use case. Then,
you pour out so many things which are far away from the original discussion. :)

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to