Russell King wrote:
> On Thu, Mar 01, 2007 at 01:33:28PM +, Jose Goncalves wrote:
>
>> I've also done your suggestion and I've inserted "msleep(10);" just
>> before the "And clear the interrupt registers again for luck." and my
>> application is now running without problems fore more than 24
On Thu, Mar 01, 2007 at 01:33:28PM +, Jose Goncalves wrote:
> I've also done your suggestion and I've inserted "msleep(10);" just
> before the "And clear the interrupt registers again for luck." and my
> application is now running without problems fore more than 24H! So,
> inserting a delay in
Hi again Russel,
I'm back, after some more testing. Here goes my report.
I've switched to another SBC and the kernel still Oops, so is not a
one-off fault on the hardware.
I've also run memtest86+ on this board for the maximum period that I
reach an Oops with my application (24 H) and it not det
Russell, thanks again for offering to look at this; the more oopses
and soft lockups I see on this board, the more I think you're right
and we have an IRQ handling race.
Here's the struct irqchip setup:
/* mask irq, refer ssection 2.6 under chip 8618 document */
static void mv88w8xx8_mask_irq(un
On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote:
What I find real hard to understand is why a hardware fault happens
always in the same software instruction! I would expect a hardware fault
to hit randomly...
I've experienced just such a hardware fault.
The Infineon DSCC4 serial
Quoting Russell King <[EMAIL PROTECTED]>:
On Thu, Feb 22, 2007 at 03:07:18PM +, Jose Goncalves wrote:
Russell King wrote:
> On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote:
>
>> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things
>> to us, at least on an
Quoting Russell King <[EMAIL PROTECTED]>:
On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote:
It could be a silly question (tamper with me as I'm not familiar with
such low level programming), but couldn't it be possible for a interrupt
to hit in the middle of the serial_in() calls
On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote:
> It could be a silly question (tamper with me as I'm not familiar with
> such low level programming), but couldn't it be possible for a interrupt
> to hit in the middle of the serial_in() calls and mess with %ebx?
I'm no expert on x8
On Thu, Feb 22, 2007 at 03:07:18PM +, Jose Goncalves wrote:
> Russell King wrote:
> > On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote:
> >
> >> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things
> >> to us, at least on an ARM target ...
> >>
> >
> >
Russell King wrote:
> On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote:
>
>> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things
>> to us, at least on an ARM target ...
>>
>
> That's ruled out. Please think about it for a moment - serial_in()
> managed t
Russell King wrote:
> On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote:
>
>> <1>[18840.304048] Unable to handle kernel NULL pointer dereference at
>> virtual address 0012
>> <1>[18840.313046] printing eip:
>> <4>[18840.321687] c01bfa7a
>> <1>[18840.321714] *pde =
>>
On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote:
> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things
> to us, at least on an ARM target ...
That's ruled out. Please think about it for a moment - serial_in()
managed to work correctly most of the time, and the
On Wed, Feb 21, 2007 at 09:57:50PM -0800, H. Peter Anvin wrote:
> Russell King wrote:
>
> >
> >Plainly, %ebx changed across the call to serial_in() at c01c0f7b.
> >First thing to notice is this violates the C code - "up" can not
> >change.
> >
> >Now let's look at serial_in:
> >
> >c01bfa70:
On Wed, Feb 21, 2007 at 09:57:50PM -0800, H. Peter Anvin wrote:
> Russell King wrote:
>
> >Plainly, %ebx changed across the call to serial_in() at c01c0f7b.
> >First thing to notice is this violates the C code - "up" can not
> >change.
> >Now let's look at serial_in:
> >c01bfa70: 55
Russell King wrote:
Plainly, %ebx changed across the call to serial_in() at c01c0f7b.
First thing to notice is this violates the C code - "up" can not
change.
Now let's look at serial_in:
c01bfa70: 55 push %ebp
c01bfa71: 89 e5 mov%esp,%
Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things
to us, at least on an ARM target ...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Plea
On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote:
> <1>[18840.304048] Unable to handle kernel NULL pointer dereference at virtual
> address 0012
> <1>[18840.313046] printing eip:
> <4>[18840.321687] c01bfa7a
> <1>[18840.321714] *pde =
> <0>[18840.331287] Oops: [#1]
On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote:
> New devolpments.
> I have upgraded to 2.6.16.41, applied a patch sent by Frederik that
> removed the changed made in http://lkml.org/lkml/2005/6/23/266 and
> activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL,
> CONFIG_DEBUG
Jose Goncalves wrote:
> New devolpments.
> I have upgraded to 2.6.16.41, applied a patch sent by Frederik that
> removed the changed made in http://lkml.org/lkml/2005/6/23/266 and
> activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL,
> CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP, CONFIG_
New devolpments.
I have upgraded to 2.6.16.41, applied a patch sent by Frederik that
removed the changed made in http://lkml.org/lkml/2005/6/23/266 and
activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL,
CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP, CONFIG_DEBUG_SLAB,
CONFIG_DEBUG_MUTEXES
Michael K. Edwards wrote:
Of course not. But dealing with a stuck IRQ line by locking up isn't
very practical either. IRQ sharing is stupid yet universal, and it
And we don't, that's why we have that "nobody cared" logic that disables
the interrupt line if no driver services the interrupt. T
On 2/19/07, Robert Hancock <[EMAIL PROTECTED]> wrote:
How do you propose to do this? Drivers can get loaded and unloaded at any
time. If you have a device generating spurious interrupts on a shared IRQ
line, there's no way you can use any device on that line until that interrupt
is shut off. Requ
Michael K. Edwards wrote:
Still open, though it's a pity you're more interested in my flawed
understanding that in the possibility that the kernel could be
systematically made more robust against hardware bugs and coding
errors by the simple expedient of putting all the ISRs in before
turning on
On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote:
This can't happen because when __do_irq unmasks the interrupt source,
the CPU mask is set, thereby preventing any further interrupt exceptions
being taken. This is done precisely to prevent this situation happening.
If you are seeing recursion
On Mon, Feb 19, 2007 at 04:04:26PM -0800, Michael K. Edwards wrote:
> On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote:
> >The second interrupt comes in, and when you go to disable that
> >source, you inadvertently re-enable the UART interrupt, despite it
> >still being serviced.
>
> Incorrect.
On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote:
I think something else is going on here. I think you're getting
an interrupt for the UART, and another interrupt is also pending.
Correct. An interrupt for the other UART on the same IRQ.
When the UART interrupt is handled, it is masked at
On Mon, Feb 19, 2007 at 02:16:41PM -0800, Michael K. Edwards wrote:
> Right. But as soon as you turn the source back on, in the postamble
> of the interrupt dispatch handler, it fires again. At least on ARM,
> that gives you recursive hits to __irq_svc and a couple of nested
> calls within it.
I
On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote:
> setup_irq() is where things go wrong, at least for us, at least on
> 2.6.16.x. Interrupts are not disabled at the point in request_irq()
> when the interrupt controller is poked to enable the IRQ source. If
> you're lucky, and you're on an a
On Mon, Feb 19, 2007 at 01:24:17PM -0800, Michael K. Edwards wrote:
> On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote:
> >On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote:
> >> What we've seen on our embedded ARM is that enabling an interrupt that
> >> is shared between multipl
On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote:
On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote:
> What we've seen on our embedded ARM is that enabling an interrupt that
> is shared between multiple UARTs, at a stage when you have not set up
> all the data structures touche
On Mon, Feb 19, 2007 at 05:54:52PM +, Jose Goncalves wrote:
> Russell King wrote:
> Result is attached.
Right... in depth analysis follows.
[15423.650518] [] uart_startup+0x63/0xf4 equates to 0xc01ba49a, which
is indeed the instruction after the call to port->ops->startup.
The important code
On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote:
> What we've seen on our embedded ARM is that enabling an interrupt that
> is shared between multiple UARTs, at a stage when you have not set up
> all the data structures touched by the ISR and softirq, can have
> horrible conseque
What we've seen on our embedded ARM is that enabling an interrupt that
is shared between multiple UARTs, at a stage when you have not set up
all the data structures touched by the ISR and softirq, can have
horrible consequences, including soft lockups and fandangos on core.
You will be vulnerable
Russell King wrote:
> On Mon, Feb 19, 2007 at 04:29:39PM +, Jose Goncalves wrote:
>
>> Russell King wrote:
>>
>>> On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote:
>>>
>>>
(trimmed tie-fei.zang from the CC, added by mistake)
On Mon, Feb 19, 2007 at 0
On Mon, Feb 19, 2007 at 04:29:39PM +, Jose Goncalves wrote:
> Russell King wrote:
> > On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote:
> >
> >> (trimmed tie-fei.zang from the CC, added by mistake)
> >> On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote:
> >>
Russell King wrote:
> On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote:
>
>> (trimmed tie-fei.zang from the CC, added by mistake)
>> On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote:
>>
Neither did I, but introducing printk's through the function, we narrow
On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote:
> (trimmed tie-fei.zang from the CC, added by mistake)
> On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote:
> > > Neither did I, but introducing printk's through the function, we narrowed
> > > the problem to this part of
(trimmed tie-fei.zang from the CC, added by mistake)
On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote:
> > Neither did I, but introducing printk's through the function, we narrowed
> > the problem to this part of the code. And removing it makes the problem
> > go away. We inserted 37 pr
On Tue, Feb 20, 2007 at 02:24:42PM +, Frederik Deweerdt wrote:
> On Mon, Feb 19, 2007 at 01:45:39PM +, Russell King wrote:
> > On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote:
> > > (Sorry for the resend, I forgot to cc the list)
> > > Hi Russell,
> > >
> > > It seems tha
On Mon, Feb 19, 2007 at 01:45:39PM +, Russell King wrote:
> On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote:
> > (Sorry for the resend, I forgot to cc the list)
> > Hi Russell,
> >
> > It seems that the following change in drivers/serial/8250.c
> >
> > +
> > + /*
> > +
On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote:
> (Sorry for the resend, I forgot to cc the list)
> Hi Russell,
>
> It seems that the following change in drivers/serial/8250.c
>
> +
> + /*
> + * Do a quick test to see if we receive an
> + * interrupt when we enabl
(Sorry for the resend, I forgot to cc the list)
Hi Russell,
It seems that the following change in drivers/serial/8250.c
+
+ /*
+* Do a quick test to see if we receive an
+* interrupt when we enable the TX irq.
+*/
+ serial_outp(up, UART_IER, UART_IER_THRI);
+
42 matches
Mail list logo