subject:"Serial related oops"

Re: Serial related oops

2007-03-01 Thread Jose Goncalves

Russell King wrote: > On Thu, Mar 01, 2007 at 01:33:28PM +, Jose Goncalves wrote: > >> I've also done your suggestion and I've inserted "msleep(10);" just >> before the "And clear the interrupt registers again for luck." and my >> application is now running without problems fore more than 24

Re: Serial related oops

2007-03-01 Thread Russell King

On Thu, Mar 01, 2007 at 01:33:28PM +, Jose Goncalves wrote: > I've also done your suggestion and I've inserted "msleep(10);" just > before the "And clear the interrupt registers again for luck." and my > application is now running without problems fore more than 24H! So, > inserting a delay in

Re: Serial related oops

2007-03-01 Thread Jose Goncalves

Hi again Russel, I'm back, after some more testing. Here goes my report. I've switched to another SBC and the kernel still Oops, so is not a one-off fault on the hardware. I've also run memtest86+ on this board for the maximum period that I reach an Oops with my application (24 H) and it not det

Re: Serial related oops

2007-02-23 Thread Michael K. Edwards

Russell, thanks again for offering to look at this; the more oopses and soft lockups I see on this board, the more I think you're right and we have an IRQ handling race. Here's the struct irqchip setup: /* mask irq, refer ssection 2.6 under chip 8618 document */ static void mv88w8xx8_mask_irq(un

Re: Serial related oops

2007-02-22 Thread Paul Fulghum

On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote: What I find real hard to understand is why a hardware fault happens always in the same software instruction! I would expect a hardware fault to hit randomly... I've experienced just such a hardware fault. The Infineon DSCC4 serial

Re: Serial related oops

2007-02-22 Thread jose . goncalves

Quoting Russell King <[EMAIL PROTECTED]>: On Thu, Feb 22, 2007 at 03:07:18PM +, Jose Goncalves wrote: Russell King wrote: > On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: > >> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things >> to us, at least on an

Re: Serial related oops

2007-02-22 Thread jose . goncalves

Quoting Russell King <[EMAIL PROTECTED]>: On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote: It could be a silly question (tamper with me as I'm not familiar with such low level programming), but couldn't it be possible for a interrupt to hit in the middle of the serial_in() calls

Re: Serial related oops

2007-02-22 Thread Russell King

On Thu, Feb 22, 2007 at 03:02:46PM +, Jose Goncalves wrote: > It could be a silly question (tamper with me as I'm not familiar with > such low level programming), but couldn't it be possible for a interrupt > to hit in the middle of the serial_in() calls and mess with %ebx? I'm no expert on x8

Re: Serial related oops

2007-02-22 Thread Russell King

On Thu, Feb 22, 2007 at 03:07:18PM +, Jose Goncalves wrote: > Russell King wrote: > > On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: > > > >> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things > >> to us, at least on an ARM target ... > >> > > > >

Re: Serial related oops

2007-02-22 Thread Jose Goncalves

Russell King wrote: > On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: > >> Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things >> to us, at least on an ARM target ... >> > > That's ruled out. Please think about it for a moment - serial_in() > managed t

Re: Serial related oops

2007-02-22 Thread Jose Goncalves

Russell King wrote: > On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote: > >> <1>[18840.304048] Unable to handle kernel NULL pointer dereference at >> virtual address 0012 >> <1>[18840.313046] printing eip: >> <4>[18840.321687] c01bfa7a >> <1>[18840.321714] *pde = >>

Re: Serial related oops

2007-02-22 Thread Russell King

On Wed, Feb 21, 2007 at 04:34:15PM -0800, Michael K. Edwards wrote: > Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things > to us, at least on an ARM target ... That's ruled out. Please think about it for a moment - serial_in() managed to work correctly most of the time, and the

Re: Serial related oops

2007-02-22 Thread Russell King

On Wed, Feb 21, 2007 at 09:57:50PM -0800, H. Peter Anvin wrote: > Russell King wrote: > > > > >Plainly, %ebx changed across the call to serial_in() at c01c0f7b. > >First thing to notice is this violates the C code - "up" can not > >change. > > > >Now let's look at serial_in: > > > >c01bfa70:

Re: Serial related oops

2007-02-21 Thread Frederik Deweerdt

On Wed, Feb 21, 2007 at 09:57:50PM -0800, H. Peter Anvin wrote: > Russell King wrote: > > >Plainly, %ebx changed across the call to serial_in() at c01c0f7b. > >First thing to notice is this violates the C code - "up" can not > >change. > >Now let's look at serial_in: > >c01bfa70: 55

Re: Serial related oops

2007-02-21 Thread H. Peter Anvin

Russell King wrote: Plainly, %ebx changed across the call to serial_in() at c01c0f7b. First thing to notice is this violates the C code - "up" can not change. Now let's look at serial_in: c01bfa70: 55 push %ebp c01bfa71: 89 e5 mov%esp,%

Re: Serial related oops

2007-02-21 Thread Michael K. Edwards

Are you using an unpatched gcc 4.1.1? Its optimizer did nasty things to us, at least on an ARM target ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Plea

Re: Serial related oops

2007-02-21 Thread Russell King

On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote: > <1>[18840.304048] Unable to handle kernel NULL pointer dereference at virtual > address 0012 > <1>[18840.313046] printing eip: > <4>[18840.321687] c01bfa7a > <1>[18840.321714] *pde = > <0>[18840.331287] Oops: [#1]

Re: Serial related oops

2007-02-21 Thread Frederik Deweerdt

On Wed, Feb 21, 2007 at 02:13:15PM +, Jose Goncalves wrote: > New devolpments. > I have upgraded to 2.6.16.41, applied a patch sent by Frederik that > removed the changed made in http://lkml.org/lkml/2005/6/23/266 and > activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL, > CONFIG_DEBUG

Re: Serial related oops

2007-02-21 Thread Jose Goncalves

Jose Goncalves wrote: > New devolpments. > I have upgraded to 2.6.16.41, applied a patch sent by Frederik that > removed the changed made in http://lkml.org/lkml/2005/6/23/266 and > activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL, > CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP, CONFIG_

Re: Serial related oops

2007-02-21 Thread Jose Goncalves

New devolpments. I have upgraded to 2.6.16.41, applied a patch sent by Frederik that removed the changed made in http://lkml.org/lkml/2005/6/23/266 and activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL, CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP, CONFIG_DEBUG_SLAB, CONFIG_DEBUG_MUTEXES

Re: Serial related oops

2007-02-19 Thread Robert Hancock

Michael K. Edwards wrote: Of course not. But dealing with a stuck IRQ line by locking up isn't very practical either. IRQ sharing is stupid yet universal, and it And we don't, that's why we have that "nobody cared" logic that disables the interrupt line if no driver services the interrupt. T

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards

On 2/19/07, Robert Hancock <[EMAIL PROTECTED]> wrote: How do you propose to do this? Drivers can get loaded and unloaded at any time. If you have a device generating spurious interrupts on a shared IRQ line, there's no way you can use any device on that line until that interrupt is shut off. Requ

Re: Serial related oops

2007-02-19 Thread Robert Hancock

Michael K. Edwards wrote: Still open, though it's a pity you're more interested in my flawed understanding that in the possibility that the kernel could be systematically made more robust against hardware bugs and coding errors by the simple expedient of putting all the ISRs in before turning on

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards

On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: This can't happen because when __do_irq unmasks the interrupt source, the CPU mask is set, thereby preventing any further interrupt exceptions being taken. This is done precisely to prevent this situation happening. If you are seeing recursion

Re: Serial related oops

2007-02-19 Thread Russell King

On Mon, Feb 19, 2007 at 04:04:26PM -0800, Michael K. Edwards wrote: > On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: > >The second interrupt comes in, and when you go to disable that > >source, you inadvertently re-enable the UART interrupt, despite it > >still being serviced. > > Incorrect.

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards

On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: I think something else is going on here. I think you're getting an interrupt for the UART, and another interrupt is also pending. Correct. An interrupt for the other UART on the same IRQ. When the UART interrupt is handled, it is masked at

Re: Serial related oops

2007-02-19 Thread Russell King

On Mon, Feb 19, 2007 at 02:16:41PM -0800, Michael K. Edwards wrote: > Right. But as soon as you turn the source back on, in the postamble > of the interrupt dispatch handler, it fires again. At least on ARM, > that gives you recursive hits to __irq_svc and a couple of nested > calls within it. I

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards

On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: > setup_irq() is where things go wrong, at least for us, at least on > 2.6.16.x. Interrupts are not disabled at the point in request_irq() > when the interrupt controller is poked to enable the IRQ source. If > you're lucky, and you're on an a

Re: Serial related oops

2007-02-19 Thread Russell King

On Mon, Feb 19, 2007 at 01:24:17PM -0800, Michael K. Edwards wrote: > On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: > >On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote: > >> What we've seen on our embedded ARM is that enabling an interrupt that > >> is shared between multipl

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards

On 2/19/07, Russell King <[EMAIL PROTECTED]> wrote: On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote: > What we've seen on our embedded ARM is that enabling an interrupt that > is shared between multiple UARTs, at a stage when you have not set up > all the data structures touche

Re: Serial related oops

2007-02-19 Thread Russell King

On Mon, Feb 19, 2007 at 05:54:52PM +, Jose Goncalves wrote: > Russell King wrote: > Result is attached. Right... in depth analysis follows. [15423.650518] [] uart_startup+0x63/0xf4 equates to 0xc01ba49a, which is indeed the instruction after the call to port->ops->startup. The important code

Re: Serial related oops

2007-02-19 Thread Russell King

On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote: > What we've seen on our embedded ARM is that enabling an interrupt that > is shared between multiple UARTs, at a stage when you have not set up > all the data structures touched by the ISR and softirq, can have > horrible conseque

Re: Serial related oops

2007-02-19 Thread Michael K. Edwards

What we've seen on our embedded ARM is that enabling an interrupt that is shared between multiple UARTs, at a stage when you have not set up all the data structures touched by the ISR and softirq, can have horrible consequences, including soft lockups and fandangos on core. You will be vulnerable

Re: Serial related oops

2007-02-19 Thread Jose Goncalves

Russell King wrote: > On Mon, Feb 19, 2007 at 04:29:39PM +, Jose Goncalves wrote: > >> Russell King wrote: >> >>> On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: >>> >>> (trimmed tie-fei.zang from the CC, added by mistake) On Mon, Feb 19, 2007 at 0

Re: Serial related oops

2007-02-19 Thread Russell King

On Mon, Feb 19, 2007 at 04:29:39PM +, Jose Goncalves wrote: > Russell King wrote: > > On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: > > > >> (trimmed tie-fei.zang from the CC, added by mistake) > >> On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: > >>

Re: Serial related oops

2007-02-19 Thread Jose Goncalves

Russell King wrote: > On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: > >> (trimmed tie-fei.zang from the CC, added by mistake) >> On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: >> Neither did I, but introducing printk's through the function, we narrow

Re: Serial related oops

2007-02-19 Thread Russell King

On Tue, Feb 20, 2007 at 02:48:14PM +, Frederik Deweerdt wrote: > (trimmed tie-fei.zang from the CC, added by mistake) > On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: > > > Neither did I, but introducing printk's through the function, we narrowed > > > the problem to this part of

Re: Serial related oops

2007-02-19 Thread Frederik Deweerdt

(trimmed tie-fei.zang from the CC, added by mistake) On Mon, Feb 19, 2007 at 02:35:20PM +, Russell King wrote: > > Neither did I, but introducing printk's through the function, we narrowed > > the problem to this part of the code. And removing it makes the problem > > go away. We inserted 37 pr

Re: Serial related oops

2007-02-19 Thread Russell King

On Tue, Feb 20, 2007 at 02:24:42PM +, Frederik Deweerdt wrote: > On Mon, Feb 19, 2007 at 01:45:39PM +, Russell King wrote: > > On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote: > > > (Sorry for the resend, I forgot to cc the list) > > > Hi Russell, > > > > > > It seems tha

Re: Serial related oops

2007-02-19 Thread Frederik Deweerdt

On Mon, Feb 19, 2007 at 01:45:39PM +, Russell King wrote: > On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote: > > (Sorry for the resend, I forgot to cc the list) > > Hi Russell, > > > > It seems that the following change in drivers/serial/8250.c > > > > + > > + /* > > +

Re: Serial related oops

2007-02-19 Thread Russell King

On Tue, Feb 20, 2007 at 01:29:09PM +, Frederik Deweerdt wrote: > (Sorry for the resend, I forgot to cc the list) > Hi Russell, > > It seems that the following change in drivers/serial/8250.c > > + > + /* > + * Do a quick test to see if we receive an > + * interrupt when we enabl

Serial related oops

2007-02-19 Thread Frederik Deweerdt

(Sorry for the resend, I forgot to cc the list) Hi Russell, It seems that the following change in drivers/serial/8250.c + + /* +* Do a quick test to see if we receive an +* interrupt when we enable the TX irq. +*/ + serial_outp(up, UART_IER, UART_IER_THRI); +

42 matches

Mail list logo