Pegasos i8042 broken again

2010-10-09 Thread pacman
Pegasos has no keyboard again. I blame commit 540c6c392f01887dcc96bef0a41e63e6c1334f01, which tries to find i8042 IRQs in the device-tree but doesn't fall back to the old hardcoded 1 and 12 in all failure cases. Specifically, the case where the device-tree contains nothing matching pnpPNP,303 or p

Re: Pegasos i8042 broken again

2010-10-10 Thread pacman
Benjamin Herrenschmidt writes: > > Those things really suck. They absolutely refuse to fix their FW for > reasons I never quite managed to figure out. The last time around, they did release a firmware patch (pegasos-dts-20071018) to fix up the device tree enough to satisfy the kernel. Now that th

Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55

2010-10-13 Thread pacman
Mel Gorman writes: > > On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote: > > > > It's corruption of user memory, which is unusual. I'd be wondering if > > there was a pre-existing bug which 6dda9d55bf545013597 has exposed - > > previously the corruption was hitting something harmles

Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55

2010-10-18 Thread pacman
Mel Gorman writes: > > A bit but I still don't know why it would cause corruption. Maybe this is > still > a caching issue but the difference in timing between list_add and > list_add_tail > is enough to hide the bug. It's also possible there are some registers > ioremapped after the memmap arra

Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55

2010-10-18 Thread pacman
Benjamin Herrenschmidt writes: > > You can do something fun... like a timer interrupt that peeks at those > physical addresses from the linear mapping for example, and try to find > out "when" they get set to the wrong value (you should observe the load > from disk, then the corruption, unless the

Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55

2010-10-19 Thread pacman
Benjamin Herrenschmidt writes: > > > > I thought of that, but as far as I can tell, this CPU doesn't have DABR. > > AFAIK, the 7447 is just a derivative of the 7450 design which -does- > have a DABR ... Unless it's broken :-) Hmm. gdb resorts to single-stepping when I set a watchpoint while debu

Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55

2010-10-19 Thread pacman
Benjamin Herrenschmidt writes: > > On Tue, 2010-10-19 at 22:47 +0200, Segher Boessenkool wrote: > > > > It looks like it is the frame counter in an USB OHCI HCCA. > > 16-bit, 1kHz update, offset x'80 in a page. > > > > So either the kernel forgot to call quiesce on it, or the firmware > > doesn'

Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55

2010-10-20 Thread pacman
Benjamin Herrenschmidt writes: > > On Tue, 2010-10-19 at 22:23 -0500, pac...@kosh.dhis.org wrote: > > The diff fragment above applied inside prom_close_stdin, but there are > > some > > prom_printf calls after prom_close_stdin. Calling prom_printf after > > closing > > stdout sounds like it could

Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55

2010-10-22 Thread pacman
Benjamin Herrenschmidt writes: > > On Wed, 2010-10-20 at 13:33 -0500, pac...@kosh.dhis.org wrote: > > > Just try :-) "quiesce" is something that afaik only apple ever > > > implemented anyways. It uses hooks inside their OF to shut down all > > > drivers that do bus master (among other HW sanitiza

Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55)

2010-10-27 Thread pacman
Benjamin Herrenschmidt writes: > > Ok so you'll have to make up a "workaround" in prom_init that looks for > OHCI's in the device-tree and disable them. > > Check if the OHCI node has some existing f-code words you can use for > that with "dev /path-to-ohci words" in OF for example. If not, you m

Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-10-27 Thread pacman
Olaf Hering writes: > > On Wed, Oct 27, pac...@kosh.dhis.org wrote: > > > |1. How do I locate all usb nodes in the device tree? > > | > > |2. How do I know if a particular usb node is OHCI? > > In the installed system, run 'lspci | grep -i usb', this gives the pci > bus numbers. Then run 'find

Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-10-27 Thread pacman
Segher Boessenkool writes: > > >> > |1. How do I locate all usb nodes in the device tree? > >> > | > >> > |2. How do I know if a particular usb node is OHCI? > > You look for compatible "usb-ohci". There is no "compatible" there. I can probably use class-code since the parent is a PCI bus. > >

Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-10-27 Thread pacman
Segher Boessenkool writes: > > >> 1) Figure out what exactly is going on; > > > > I thought we were past that. > > We are not. > > > The startup sequence leaves the device in a > > bad > > state (writing 1000 times per second to memory that the kernel believes is > > not in use), so it needs to

Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-10-28 Thread pacman
Segher Boessenkool writes: > > > So is it wrong to leave the host controller enabled when the OS is booted? > > Yes. Or, rather, there should be some way for the client to turn off > all dma and interrupt activity; if the client closes the ihandles in > "/chosen", and perhaps calls "quiesce", th

Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-11-04 Thread pacman
Segher Boessenkool writes: > > > Now I'm just trying to find the more correct way of doing it, without > > hardcoded addresses. That'll be something like this: > > > > search the device tree for OHCI nodes > > for each OHCI node > > get assigned-addresses > > map-in > > set HCR > >

Re: Pegasos i8042 broken again

2011-04-04 Thread pacman
Gabriel Paubert writes: > > Ok, I got fed up about it. The patch referred above is obviously wrong since > it leaves interrupts at 0 when a device_type or name of 8042 is found, > so what about the following? Looks like the workaround I was using for a while. In the original report I said I was

"event-scan failed" logflood

2010-05-12 Thread pacman
I upgraded the kernel on my Pegasos from 2.6.32 to 2.6.33 and now it sends the message "event-scan failed" to the kernel log about 60 times per second as long as it's running. The message comes from arch/powerpc/kernel/rtasd.c but I don't know what's going on in there so I can't say much more abou

Re: "event-scan failed" logflood

2010-05-13 Thread pacman
Benjamin Herrenschmidt writes: > > Well, first it should be called once per second, not 60 times per > second, so something is wrong there... Actually I think it was happening a lot more than 60 times per second, and klogd was losing most of the messages because they came too fast. When running t