> Since then, the silence has been deafening.
>
> My assumption now is that this is not ever getting fixed. I'm certainly not
> able to fix it. I'm not a even kernel programmer! I got far enough to
> diagnose the cause just with the "add more printk's and boot it again"
> technique. Hundreds of r
On Wed, Oct 27, pac...@kosh.dhis.org wrote:
> |1. How do I locate all usb nodes in the device tree?
> |
> |2. How do I know if a particular usb node is OHCI?
In the installed system, run 'lspci | grep -i usb', this gives the pci
bus numbers. Then run 'find /sys -name devspec', and look or the bu
Benjamin Herrenschmidt writes:
>
> Ok so you'll have to make up a "workaround" in prom_init that looks for
> OHCI's in the device-tree and disable them.
>
> Check if the OHCI node has some existing f-code words you can use for
> that with "dev /path-to-ohci words" in OF for example. If not, you m
Benjamin Herrenschmidt writes:
>
> On Wed, 2010-10-20 at 13:33 -0500, pac...@kosh.dhis.org wrote:
> > > Just try :-) "quiesce" is something that afaik only apple ever
> > > implemented anyways. It uses hooks inside their OF to shut down all
> > > drivers that do bus master (among other HW sanitiza
On Wed, 2010-10-20 at 13:33 -0500, pac...@kosh.dhis.org wrote:
> > Just try :-) "quiesce" is something that afaik only apple ever
> > implemented anyways. It uses hooks inside their OF to shut down all
> > drivers that do bus master (among other HW sanitization tasks).
>
> I booted a version with
Benjamin Herrenschmidt writes:
>
> On Tue, 2010-10-19 at 22:23 -0500, pac...@kosh.dhis.org wrote:
> > The diff fragment above applied inside prom_close_stdin, but there are
> > some
> > prom_printf calls after prom_close_stdin. Calling prom_printf after
> > closing
> > stdout sounds like it could
On Tue, 2010-10-19 at 22:23 -0500, pac...@kosh.dhis.org wrote:
> The diff fragment above applied inside prom_close_stdin, but there are
> some
> prom_printf calls after prom_close_stdin. Calling prom_printf after
> closing
> stdout sounds like it could be bad. If I moved it down below all the
> pro
Benjamin Herrenschmidt writes:
>
> On Tue, 2010-10-19 at 22:47 +0200, Segher Boessenkool wrote:
> >
> > It looks like it is the frame counter in an USB OHCI HCCA.
> > 16-bit, 1kHz update, offset x'80 in a page.
> >
> > So either the kernel forgot to call quiesce on it, or the firmware
> > doesn'
On Tue, 2010-10-19 at 22:47 +0200, Segher Boessenkool wrote:
>
> It looks like it is the frame counter in an USB OHCI HCCA.
> 16-bit, 1kHz update, offset x'80 in a page.
>
> So either the kernel forgot to call quiesce on it, or the firmware
> doesn't implement that, or the firmware messed up some
On Tue, 2010-10-19 at 13:10 -0500, pac...@kosh.dhis.org wrote:
>
> So what type of driver, firmware, or hardware bug puts a 16-bit 1000Hz
> timer
> in memory, and does it in little-endian instead of the CPU's native
> byte
> order? And why does it stop doing it some time during the early init
> sc
> I made a new discovery.
And this nails it :-)
> So then I ran
> dd if=/dev/mem bs=4 count=1 skip=$((0xfc5c080/4)) | od -t x4
> a few times very fast, plucking the first affected word directly out of
> memory by its physical address. The result:
>
> The low 16 bits are always zero as before. T
Benjamin Herrenschmidt writes:
> >
> > I thought of that, but as far as I can tell, this CPU doesn't have DABR.
>
> AFAIK, the 7447 is just a derivative of the 7450 design which -does-
> have a DABR ... Unless it's broken :-)
Hmm. gdb resorts to single-stepping when I set a watchpoint while debu
On Mon, Oct 18, 2010 at 11:55:44PM +0200, Thomas Gleixner wrote:
> I might be completely one off as usual, but this thing reminds me of a
> bug I stared at yesterday night:
This problem is completely unrelated. My problem was caused by using
binutils-gold.
Helmut
_
On Tue, 19 Oct 2010, Helmut Grohne wrote:
> On Mon, Oct 18, 2010 at 11:55:44PM +0200, Thomas Gleixner wrote:
> > I might be completely one off as usual, but this thing reminds me of a
> > bug I stared at yesterday night:
>
> This problem is completely unrelated. My problem was caused by using
> b
> > >From there, you might be able to close onto the culprit a bit more, for
> > example, try using the DABR register to set data access breakpoints
> > shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
> > can set whether you want it to break on a real or a virtual address.
On Mon, 18 Oct 2010, Andrew Morton wrote:
> On Mon, 18 Oct 2010 12:33:31 +0100
> Mel Gorman wrote:
>
> > A bit but I still don't know why it would cause corruption. Maybe this is
> > still
> > a caching issue but the difference in timing between list_add and
> > list_add_tail
> > is enough to
Benjamin Herrenschmidt writes:
>
> You can do something fun... like a timer interrupt that peeks at those
> physical addresses from the linear mapping for example, and try to find
> out "when" they get set to the wrong value (you should observe the load
> from disk, then the corruption, unless the
On Mon, 2010-10-18 at 14:10 -0500, pac...@kosh.dhis.org wrote:
> I've been flailing around quite a bit. Here's my latest result:
>
> Since I can view the corruption with md5sum /sbin/e2fsck, I know it's in a
> clean cached page. So I made an extra copy of /sbin/e2fsck, which won't be
> loaded int
On Mon, 2010-10-18 at 12:37 -0700, Andrew Morton wrote:
> Well, you've spotted a bug so I'd say we fix it asap.
>
> It's a bit of a shame that we lose the only known way of reproducing a
> different bug, but presumably that will come back and bite someone
> else
> one day, and we'll fix it then :(
On Wed, 2010-10-13 at 15:40 +0100, Mel Gorman wrote:
>
> This is somewhat contrived but I can see how it might happen even on one
> CPU particularly if the L1 cache is virtual and is loose about checking
> physical tags.
>
> > How sensitive/vulnerable is PPC32 to such things?
> >
>
> I can not
On Mon, 18 Oct 2010 12:33:31 +0100
Mel Gorman wrote:
> A bit but I still don't know why it would cause corruption. Maybe this is
> still
> a caching issue but the difference in timing between list_add and
> list_add_tail
> is enough to hide the bug. It's also possible there are some registers
>
Mel Gorman writes:
>
> A bit but I still don't know why it would cause corruption. Maybe this is
> still
> a caching issue but the difference in timing between list_add and
> list_add_tail
> is enough to hide the bug. It's also possible there are some registers
> ioremapped after the memmap arra
On Wed, Oct 13, 2010 at 12:52:05PM -0500, pac...@kosh.dhis.org wrote:
> Mel Gorman writes:
> >
> > On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
> > >
> > > It's corruption of user memory, which is unusual. I'd be wondering if
> > > there was a pre-existing bug which 6dda9d55bf5
Mel Gorman writes:
>
> On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
> >
> > It's corruption of user memory, which is unusual. I'd be wondering if
> > there was a pre-existing bug which 6dda9d55bf545013597 has exposed -
> > previously the corruption was hitting something harmles
On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
> (cc linuxppc-dev@lists.ozlabs.org)
>
> On Mon, 11 Oct 2010 15:30:22 +0100
> Mel Gorman wrote:
>
> > On Sat, Oct 09, 2010 at 04:57:18AM -0500, pac...@kosh.dhis.org wrote:
> > > (What a big Cc: list... scripts/get_maintainer.pl made
(cc linuxppc-dev@lists.ozlabs.org)
On Mon, 11 Oct 2010 15:30:22 +0100
Mel Gorman wrote:
> On Sat, Oct 09, 2010 at 04:57:18AM -0500, pac...@kosh.dhis.org wrote:
> > (What a big Cc: list... scripts/get_maintainer.pl made me do it.)
> >
> > This will be a long story with a weak conclusion, sorry a
26 matches
Mail list logo