On Sat, 22 Jan 2011 19:39:24 +0000 Steven Chamberlain wrote: > Usually I see the stream of the "control queue full" errors first, but > this time apcupsd hung very soon after boot: > INFO: task apcupsd:21604 blocked for more than 120 seconds. > [<ffffffffa01bdeb9>] ? ohci_urb_dequeue+0xd9/0xe9 [ohci_hcd] > [<ffffffffa003f10c>] ? usb_kill_urb+0x9d/0xbb [usbcore] > [<ffffffff8106681a>] ? autoremove_wake_function+0x0/0x2e > [<ffffffffa0209feb>] ? usbhid_init_reports+0x8c/0xee > [<ffffffffa020b396>] ? hiddev_ioctl+0x2bf/0x632 [usbhid] > This was despite raising apcupsd's POLLTIME from 10 to 60 seconds.
My usbmon dump showed that requests were being made that the UPS wasn't returning or failing, sounds like a task for a bus analyzer and a read through the USB standard. usbhid_init_report is called from usbhid_start and from hid.c "called on probe to start the device", and as such wouldn't be called each time apcupsd polls the UPS. So increasing the POLLTIME wouldn't have helped. I don't know how many reports it is requesting, but the timeout in usbhid_wait_io is 10 seconds, so 12 could add up to 120 seconds. Even so I would expect it to eventually timeout. Now if it was sitting in ohci_urb_dequeue for 120 seconds that is a problem. On Wed, Apr 04, 2012 at 10:43:42PM +0100, Steven Chamberlain wrote: > found 409349 linux-2.6/2.6.32-41 > thanks > > On 04/04/12 22:23, Jonathan Nieder wrote: > > Ping. Is nobody trying to use an APC UPS connected by USB any more? :) I have three. I only had problems with one OHCI SiS rev 07 PCI 1039:7001 controller, and once I added the patch for "introduce timeout for stuck ctrl/out URBs" I've not had the issue anymore. > Yep! This week on 2.6.32-41 I noticed apcupsd hang after about a day, > followed by the 'control queue full' errors an hour later. That was > with the patches suggested in #631287 applied, just in case they made > any difference to this. My guess is the control queue message full problem just enlarges the window of opportunity fro 'BUG during access to hiddev' bug report #631287. I think it would help avoid crashing when it gets into this situation, but from the patch titles I don't think it would help avoid it. The queue is 256 long, and with apcupsd querying every 10 seconds, 256*10/60=42 minutes once an entry gets blocked. With the timeout set to 5 seconds in the "introduce timeout" patch and apcups polling every 10 seconds it no matter if every request failed to complete it shouldn't be able to fill up. But that's only if apcupsd is only submitting one control request every 10 seconds, but looking at my old logs that isn't the case. For my UPS it seemed to only ignore a transfer rarely, maybe there are some that are frequent enough to be a problem, in which case the timeout patch wouldn't be able to keep up. It would be nice to have a usbmon dump of it happening to see if that is the case or not. > > What kernels work and don't work? > > I already tagged affected versions in the BTS; the first report of this > was on 2.6.17 and I myself recall seeing it as far back as 2.6.26. > > > I hadn't been able to test the UPS on a 3.y kernel until this week. > Right now I have the UPS hooked up to a new server running 3.2.0-2-amd64 > (Debian 3.2.12-1) to see if it recurs. What UPS do you have that's the problem? Mine was, BackUPS Pro 500 firmware 16.3.D USB FW:4 > Then in a few weeks, I'll be taking the original hardware (on which the > issue was easily reproducible) out of production use, so I can try 3.y > kernels on that too. (Until now, I couldn't try 3.y kernels on that box > as it needs to run OpenVZ containers). What kind of timeframe is easily reproducible? Can you run information gathering patches on the box, if so I'll put something together. Then again this has been going on for years, if you prefer to wait a few weeks until it is not in production use that's fine. Back when I was first debugging this I added some printk messages for when the queue length increased. If you'll give them a try I'll look them up and also printout that jiffies value as well. -- David Fries <da...@fries.net> PGP pub CB1EE8F0 http://fries.net/~david/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org