On Fri, 12 Jan 2001, Frank de Lange wrote:
[I've cut syslog junk away for clarity -- you could just do `dmesg -s 32768'.]
> before network hang
> ===
[...]
> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[...]
> 13 0FF 0F 010 1 01199
[...]
> printing
Linus Torvalds wrote:
>
> On Sat, 13 Jan 2001, Frank de Lange wrote:
>
> IDE is not my favourite example of a "known stable driver". Also, in many
> cases IDE is for historical reasons connected to an EDGE io-apic pin (ie
> it's still considered an ISA interrupt). Which probably wouldn't show th
On Fri, Jan 12, 2001 at 12:04:21PM -0800, Linus Torvalds wrote:
> Ok, so it's tentatively the IOAPIC disable/enable code. But it could
> obviously be something that just interacts with it, including just a
> timing issue (ie the _real_ bug might just be bad behaviour when
> changing IO-APIC state
On Sun, Jan 14, 2001 at 12:13:58AM +, Roeland Th. Jansen wrote:
> On Fri, Jan 12, 2001 at 09:03:49PM +0100, Ingo Molnar wrote:
> > well, some time ago i had an ne2k card in an SMP system as well, and found
> > this very problem. Disabling/enabling focus-cpu appeared to make a
> > difference, b
On Fri, Jan 12, 2001 at 09:03:49PM +0100, Ingo Molnar wrote:
> well, some time ago i had an ne2k card in an SMP system as well, and found
> this very problem. Disabling/enabling focus-cpu appeared to make a
> difference, but later on i made experiments that show that in both cases
> the hang happe
Andrew Morton writes:
> Linus Torvalds wrote:
> > I'm also nervous about the complete lack of locking in vortex_timer():
> > disabling interrupts doesn't mean that transmits couldn't be
> > pending. But maybe the hardware is ok with changing status concurrently.
>
> disable_irq() is very useful i
Linus Torvalds wrote:
>
> I'm also nervous about the complete lack of locking in vortex_timer():
> disabling interrupts doesn't mean that transmits couldn't be
> pending. But maybe the hardware is ok with changing status concurrently.
>
mm.. It's a little racy wrt vortex_ioctl(), but otherwise
On Sat, 13 Jan 2001, Andrew Morton wrote:
>
> 3c59x calls disable_irq() once per minute, and seems to be
> one of the most-affected drivers.
The ne2k thing seems to be the _most_ affected one, as far as I can tell.
However, it could easily be a matter of timing - for example, if the
driver do
On Sat, Jan 13, 2001 at 02:51:54AM +0100, Manfred Spraul wrote:
> Frank de Lange wrote:
> >
> > It could be that people using those cards are not the ones who tend
> > to go for the (somewhat tricky) BP6 board...
> >
>
> I doubt that it's BP6 specific: I have the problem with a Gigabyte BXD
> b
Linus Torvalds wrote:
>
> On Sat, 13 Jan 2001, Frank de Lange wrote:
>
> > On Fri, Jan 12, 2001 at 04:36:33PM -0800, Linus Torvalds wrote:
> > > It may well not be disable_irq() that is buggy. In fact, there's good
> > > reason to believe that it's a hardware problem.
> >
> > I am inclined to be
Alan Cox wrote:
>
> > Could you disable both bandaids? I disabled them, no problems so far.
> > Now back to the disable_irq_nosync().
>
> Ok so it looks like the disable_irq code is buggy. Unfortunately its not
> just used for these drivers they are just the heaviest users.
>
> Given that we ca
Frank de Lange wrote:
>
> It could be that people using those cards are not the ones who tend
> to go for the (somewhat tricky) BP6 board...
>
I doubt that it's BP6 specific: I have the problem with a Gigabyte BXD
board and I doubt that Ingo used an BP6. Perhaps 82093AA specific (the
IO APIC ch
On Fri, Jan 12 2001, Linus Torvalds wrote:
> [...] With disks it is very hard
> to get the same kind of irq load - Linux will merge the requests and do at
> least 1kB worth of transfer per interrupt etc. On a ne2k 100Mbps PCI card,
Actually, without mult count you will do only 512b of I/O per int
Linus Torvalds wrote:
>
> It may well not be disable_irq() that is buggy. In fact, there's good
> reason to believe that it's a hardware problem.
>
Perhaps a problem with the 82093AA external IO APIC used for 440BX
board? I haven't seen any reports from newer Intel boards (the ICH2
includes an I
On Fri, Jan 12, 2001 at 04:56:24PM -0800, Linus Torvalds wrote:
> IDE is not my favourite example of a "known stable driver". Also, in many
> cases IDE is for historical reasons connected to an EDGE io-apic pin (ie
> it's still considered an ISA interrupt). Which probably wouldn't show this
> prob
On Sat, 13 Jan 2001, Frank de Lange wrote:
> On Fri, Jan 12, 2001 at 04:36:33PM -0800, Linus Torvalds wrote:
> > It may well not be disable_irq() that is buggy. In fact, there's good
> > reason to believe that it's a hardware problem.
>
> I am inclined to believe it IS a hardware problem... If
On Sat, 13 Jan 2001, Alan Cox wrote:
> > interrupt_handler()
> > {
> > status = readl(dev->status);
> > if (status & MY_IRQ_DISABLE)
> > return;
>
> Unfortunately on the 8390 the IRQ statud register is on page 0. The code
> on the other CPU m
On Fri, Jan 12, 2001 at 04:36:33PM -0800, Linus Torvalds wrote:
> It may well not be disable_irq() that is buggy. In fact, there's good
> reason to believe that it's a hardware problem.
I am inclined to believe it IS a hardware problem... If disable_irq were buggy,
wouldn't the problem occur more
> interrupt_handler()
> {
> status = readl(dev->status);
> if (status & MY_IRQ_DISABLE)
> return;
Unfortunately on the 8390 the IRQ statud register is on page 0. The code
on the other CPU might not be on page 0. That means we can't eve
On Fri, 12 Jan 2001, Alan Cox wrote:
> > Could you disable both bandaids? I disabled them, no problems so far.
> > Now back to the disable_irq_nosync().
>
> Ok so it looks like the disable_irq code is buggy. Unfortunately its not
> just used for these drivers they are just the heaviest users.
On Fri, 12 Jan 2001, Alan Cox wrote:
> > interrupt controllers (io-apic definitely included). Drivers would
> > generally be better off if they disabled their own chip from sending
> > interrupts, rather than disabling the interrupt line the chip is on.
>
> That doesn't work very well becaus
Alan Cox wrote:
>
> > Could you disable both bandaids? I disabled them, no problems so far.
> > Now back to the disable_irq_nosync().
>
> Ok so it looks like the disable_irq code is buggy. Unfortunately its not
> just used for these drivers they are just the heaviest users.
>
> Given that we ca
> Could you disable both bandaids? I disabled them, no problems so far.
> Now back to the disable_irq_nosync().
Ok so it looks like the disable_irq code is buggy. Unfortunately its not
just used for these drivers they are just the heaviest users.
Given that we can see the IRQ is still set on the
> interrupt controllers (io-apic definitely included). Drivers would
> generally be better off if they disabled their own chip from sending
> interrupts, rather than disabling the interrupt line the chip is on.
That doesn't work very well because the device irq can arrive a measurable
number of
> Remind me: what polarity are your io-apic irq's? Level, edge, sideways?
> Anything else that might be relevant?
Well, sideways ofcourse! :-)
here's a cat /proc/interrupts from the (BP6) box:
CPU0 CPU1
0: 104936 105433IO-APIC-edge timer
1:
On Fri, Jan 12, 2001 at 11:59:25AM -0800, Linus Torvalds wrote:
> > Could this really be the solution?
>
> I'd like to know _which_ of the two makes a difference (or does it only
> trigger with both of them enabled)? And even then I'm not sure that it is
> "the" solution - both changes to io-apic
In article <[EMAIL PROTECTED]>,
Frank de Lange <[EMAIL PROTECTED]> wrote:
>As per Linus' suggestion, I removed the disable_irq/enable_irq statements from
>the 8390 core driver, and replace the spinlocks with irq-safe versions. This
>seems to solve the network hangs, as I am currently running a he
On Fri, Jan 12, 2001 at 08:33:15PM +0100, Manfred Spraul wrote:
> Frank, the 2.4.0 contains 2 band aids that were added for ne2k smp:
>
> * From Ingo: focus cpu disabled, in arch/i386/kernel/apic.c
> * From myself: TARGET_CPU = cpu_online_mask, was 0xFF.
>
> Could you disable both bandaids? I di
Frank de Lange wrote:
>
> On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
> > I removed the disable_irq lines from 8390.c, and that fixed the problem:
> > no hang within 2 minutes - the test is still running.
> >
> > Frank, could you double check it?
>
> I'm currently running my
On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
> I removed the disable_irq lines from 8390.c, and that fixed the problem:
> no hang within 2 minutes - the test is still running.
>
> Frank, could you double check it?
I'm currently running my own patched version, which uses
spin_l
On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
> Linus wrote:
> > Does this seem to happen mainly with drivers that use "disable_irq()"
> > and "enable_irq()"? I know the ne drivers do (through the 8390 module),
> > and some others do too (3c59x).
>
> I removed the disable_irq
As per Linus' suggestion, I removed the disable_irq/enable_irq statements from
the 8390 core driver, and replace the spinlocks with irq-safe versions. This
seems to solve the network hangs, as I am currently running a heavy network
load (which would have killed a non-patched driver within seconds)
Linus wrote:
> Does this seem to happen mainly with drivers that use "disable_irq()"
> and "enable_irq()"? I know the ne drivers do (through the 8390 module),
> and some others do too (3c59x).
I removed the disable_irq lines from 8390.c, and that fixed the problem:
no hang within 2 minutes - t
On Fri, 12 Jan 2001, Manfred Spraul wrote:
> 2.4 spreads the vectors for the external (hardware, from io apic)
> interrupts, but 5 ipi vectors have the same priority: reschedule, call
> function, tlb invalidate, apic error, spurious interrupt.
my reading of the errata is that the lost APIC time
Ingo Molnar wrote:
>
> we *already* reorder vector numbers and spread them out as much as
> possible. We do this in 2.2 as well. We did this almost from day 1 of
> IO-APIC support. If any manually allocated IRQ vector creates a '3 vectors
> in the same 16-vector region' situation then thats a bug
In article <[EMAIL PROTECTED]>,
Manfred Spraul <[EMAIL PROTECTED]> wrote:
>The processor's local APIC includes an in-service entry and a holding
>entry for each priority level. To avoid losing interrupts, software
>should allocate no more than 2 interrupt vectors per priority.
>
>
>Ok, we
On Fri, Jan 12, 2001 at 06:51:36PM +0100, Manfred Spraul wrote:
> Frank, I've attached a proposed kick_IOAPIC pin. Could you try it?
> I'm rebooting with that patch right now.
I added the patch, and tried it out. When the network hangs, I am able to revive it
with ALT-SYSRQ-Q. The debug log show
On Fri, 12 Jan 2001, Manfred Spraul wrote:
> The PPro local apic documentation says:
> <<<
> The processor's local APIC includes an in-service entry and a holding
> entry for each priority level. To avoid losing interrupts, software
> should allocate no more than 2 interrupt vectors per prio
Alan Cox wrote:
>
> > Frank, could you try what happens with the NMI oopser disabled?
> >
> > The second major difference I'm immediately aware of is the number of
> > the reschedule/tlb flush/etc interrupt: 2.2 uses the lowest priority,
> > 2.4 the highest priority.
>
> Im trying to remember wh
Frank de Lange wrote:
>
> On Fri, Jan 12, 2001 at 06:16:36PM +0100, Manfred Spraul wrote:
> > I would first concentrate on the differences between 2.2 and 2.4:
> >
> > Frank, could you try what happens with the NMI oopser disabled?
>
> Here's the results with nmi_watchdog=0
>
>
> After network
> Frank, could you try what happens with the NMI oopser disabled?
>
> The second major difference I'm immediately aware of is the number of
> the reschedule/tlb flush/etc interrupt: 2.2 uses the lowest priority,
> 2.4 the highest priority.
Im trying to remember what they were, but some APIC vers
On Fri, Jan 12, 2001 at 06:16:36PM +0100, Manfred Spraul wrote:
> I would first concentrate on the differences between 2.2 and 2.4:
>
> Frank, could you try what happens with the NMI oopser disabled?
Here's the results with nmi_watchdog=0
Before network hang (nmi_watchdog=0)
===
>
> [EMAIL PROTECTED] said:
> > IRR for interrupt 19 is set, that means the IO APIC has sent the
> > interrupt to a cpu but not yet received the corresponding EOI.
>
> OK, but couldn't we reset it by sending an extra EOI when the drivers
> decide that they've missed interrupts?
How?
You se
[EMAIL PROTECTED] said:
> IRR for interrupt 19 is set, that means the IO APIC has sent the
> interrupt to a cpu but not yet received the corresponding EOI.
OK, but couldn't we reset it by sending an extra EOI when the drivers
decide that they've missed interrupts?
--
dwmw2
-
To unsubscribe
Let's decode it:
> IO APIC #2..
> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
> 12 0FF 0F 0 1 0 1 0 1 1 91
> 13 0FF 0F 0 1 1 1 0 1 1 99
IRR for interrupt 19 is set, that means the IO APIC has sent the
interrupt to a cpu but not yet received the corresponding EOI.
That bit is read
On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote:
> Here is a debugging patch. Could you please apply this,
> rebuild and:
>
> 1: Type ALT-SYSRQ-A when everything is good
> 2: Type ALT-SYSRQ-A when everything is bad
> 3: send the resulting logs.
And, for completeness' sake, here's
On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote:
> Here is a debugging patch. Could you please apply this,
> rebuild and:
>
> 1: Type ALT-SYSRQ-A when everything is good
> 2: Type ALT-SYSRQ-A when everything is bad
> 3: send the resulting logs.
OK, here's the results I get...
Bef
On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote:
> Frank de Lange wrote:
> >
> > Quick and dirty conclusion: as soon as the apic comes in to play, things get
> > messy...
> Here is a debugging patch. Could you please apply this,
> rebuild and:
>
> 1: Type ALT-SYSRQ-A when everythi
[EMAIL PROTECTED] said:
> No, I'm judging based on the fact that I found reports from people
> using NE2K-PCI with several cards as well as tulip-based cards
> (different driver) on abit BP6 as well as Gigabyte motherboards,
> mostly on 2.3.x/2.4.x kernels. I found some postings with these
> pro
Frank de Lange wrote:
>
> Quick and dirty conclusion: as soon as the apic comes in to play, things get
> messy...
Yup.
Frank, for over a year there have been sporadic reports
of APIC's forgetting how to deliver interrupts. Not only
on BP6's. Often with 3com NICs, so I've never been 100% sure
On Thu, Jan 11, 2001 at 02:23:53PM -0500, Jeff Garzik wrote:
> Just out of curiosity, if you boot a Linux 2.4.0 kernel with the
> "noapic" command line option, does behavior improve?
For the curious, here's a summary of some tests I did:
apic, 2 cpu's, no smp affinity -> network hangs under load
On Thu, Jan 11, 2001 at 04:47:00PM -0500, Jeff Garzik wrote:
> Are you judging based on the error message? The 'netdev watchdog ...'
> message is a generic error message that could have any number of
> causes. It's just saying, well, what it says :) The kernel was unable
> to transmit a packet
Frank de Lange wrote:
>
> OK, just one last addition to what has nearly become my own thread...
>
> I now am fairly certain that the problem (network stalls on multiprocessor systems)
>is not BP6 or NE2K-PCI specific. I found several postings which relate to similar
>problems on dissimilar har
OK, just one last addition to what has nearly become my own thread...
I now am fairly certain that the problem (network stalls on multiprocessor systems) is
not BP6 or NE2K-PCI specific. I found several postings which relate to similar
problems on dissimilar hardware. Another interesting one is
Hm, the noapic option seems to help, as I'm currently beating the network to
death but it won't die... As the problem is elusive, it is hard to tell, and it
would not surprise me if the net dropped dead the moment this mail went
through, but current indication is that noapic makes the sudden net-d
Another observation wrt. behaviour with 'noapic'...
When streaming time-critical data over the network (running esound to another
server, etc), sometimes there are hiccups in the stream. These hiccups seem to
be much less frequent, if at all present, when running with 'noapic'. I'm
currently runn
Here's another posting to the list which mentions problems with NE2K and BP6:
http://web.gnu.walfield.org/mail-archive/linux-kernel/2000-August/0132.html
"...In another machine, a dual celeron abit-bp6, recent 2.3.x kernels seem to
dislike my realtek 8029 NIC. (I know, it's garbage plugged in t
> Do you get any transmit timeout messages in the logs? If
> so, send them.
In addition to my previous message, here's what I get from the debug log
facility:
Jan 10 22:56:51 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan 10 22:56:51 behemoth kernel: eth0: Tx timed out, lost in
On Thu, Jan 11, 2001 at 10:48:23PM +1100, Andrew Morton wrote:
> Losing both NICs at the same time could be the elusive "APIC
> stops generating interrupts" problem.
Yup, that's what I thought... But the real question is, is this a
software/configuration problem or a hardware problem which can on
Frank de Lange wrote:
>
> Hi'all,
>
> Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K
> clones) in my BP-6 system, I've been experiencing intermittent network hangs. A
> hang manifests itself as a total failure to communicate through either network
> card, and can only
On Wed, Jan 10, 2001 at 11:21:49PM +0100, Manfred Spraul wrote:
> > which should work, they are
> > NON-busmastering cards after all...),
> third line in w840_probe1():
>
> pci_set_master().
>
> And the documentation begins with
> W89C840F
> PCI Bus Master Fast Ethernet LAN Controlle
On Wed, Jan 10, 2001 at 11:21:49PM +0100, Manfred Spraul wrote:
> Which driver do you use? The driver in 2.4.0 contains several bugfixes.
> If that driver still hangs then I'll double check the documentation.
The NE2K PCI one... I'll try to fiddle around with the driver, who knows...
> And the d
Frank de Lange wrote:
>
> Hi'all,
>
> Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K
> clones) in my BP-6 system, I've been experiencing intermittent network hangs.
>
Which driver do you use? The driver in 2.4.0 contains several bugfixes.
If that driver still hangs th
Hi'all,
Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K
clones) in my BP-6 system, I've been experiencing intermittent network hangs. A
hang manifests itself as a total failure to communicate through either network
card, and can only be solved by rebooting. Removing and
64 matches
Mail list logo