Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-17 Thread Jarek Poplawski
On Fri, Apr 06, 2007 at 07:19:25PM +0100, Christian Kujau wrote: > On Wed, 4 Apr 2007, Christian Kujau wrote: > >>Maybe it's a real locking problem. Here are some more > >>suggestions for testing (if you don't find anything better): > >>- try without SMP, so: 'acpi=off lapic nosmp' > > We were abl

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-06 Thread Christian Kujau
On Fri, 6 Apr 2007, Christian Kujau wrote: but yes, this seem to be different problems, for the curious among you I've put details here: http://nerdbynature.de/bits/2.6.20.4/db2/ that's http://nerdbynature.de/bits/2.6.20.4/db1/2/ sorry. -- BOFH excuse #270: Someone has messed up the kerne

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-06 Thread Christian Kujau
On Wed, 4 Apr 2007, Christian Kujau wrote: Maybe it's a real locking problem. Here are some more suggestions for testing (if you don't find anything better): - try without SMP, so: 'acpi=off lapic nosmp' We were able to have our hosting provider to replace the 8139too with a E100, the onboard

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-05 Thread Jarek Poplawski
On Wed, Apr 04, 2007 at 02:20:23PM +0100, Christian Kujau wrote: > On Wed, 4 Apr 2007, Jarek Poplawski wrote: > >So, it's a lot sooner than before. (BTW, isn't there anything > >in debug log?) > > No, nothing. I've set up remote-syslgging to the other node (node1 > logging to node2 and vice versa

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Francois Romieu
Christian Kujau <[EMAIL PROTECTED]> : [...] > Actually I was thinking about *using* netconsole, since even setting up > remote (userspace-)syslog left nothing on the syslog-server, when the > machine crashed. But if it's b0rked in 8139, I will refrain from doing > so. Please refrain :o) No ser

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Denys
IMHO it can be hardware issue also, i had something very similar with faulty hardware combinations. On Wed, 4 Apr 2007 13:21:00 +0200, Jarek Poplawski wrote > On Tue, Apr 03, 2007 at 04:19:46PM +0100, Christian Kujau wrote: > > On Tue, 3 Apr 2007, Jarek Poplawski wrote: > > >Did you try with 8139

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Wed, 4 Apr 2007, Jarek Poplawski wrote: So, it's a lot sooner than before. (BTW, isn't there anything in debug log?) No, nothing. I've set up remote-syslgging to the other node (node1 logging to node2 and vice versa) - nothing :( I see both CPUs did interrupt handling again. Yes, when

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Christian Kujau
On Tue, 3 Apr 2007, Francois Romieu wrote: Christian Kujau <[EMAIL PROTECTED]> : If the apic voodoo makes no difference, you can: 1 - leave it enabled Well, we tried to boot with ACPI compiled in again, but disabled during boot: - acpi=off lapic, crashed after 1h (almost exactly) of service

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-04 Thread Jarek Poplawski
On Tue, Apr 03, 2007 at 04:19:46PM +0100, Christian Kujau wrote: > On Tue, 3 Apr 2007, Jarek Poplawski wrote: > >Did you try with 8139cp instead of 8139too? > > Tried that, 8139cp could not be loaded :( Sorry for misleading! > >(Maybe even try some other card to narrow the problem?) > >You could

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Francois Romieu
Christian Kujau <[EMAIL PROTECTED]> : [...] > Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both > hosts and feel free to ask for more details. Although both boxes are in > production we'll be happy test more bootoptions/patches and the like. If the apic voodoo makes no differ

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Francois Romieu
Christian Kujau <[EMAIL PROTECTED]> : > On Tue, 3 Apr 2007, Jarek Poplawski wrote: > >Did you try with 8139cp instead of 8139too? > > Tried that, 8139cp could not be loaded :( It is a different beast. -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Tue, 3 Apr 2007, Jarek Poplawski wrote: Did you try with 8139cp instead of 8139too? Tried that, 8139cp could not be loaded :( (Maybe even try some other card to narrow the problem?) You could also try to test without ehci, if it's possible. USB has been disabled completely. After booting

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Mon, 2 Apr 2007, Chuck Ebbert wrote: Where is the info from before you changed to "noapic"? Or were the machines always using XT-PIC for all the interrupts??? We booted with 'acpi=off lapic' (with ACPI options compiled in, to be able to boot with acpi=on later on) and the box locked up agai

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-03 Thread Christian Kujau
On Tue, 3 Apr 2007, Jarek Poplawski wrote: Did you try with 8139cp instead of 8139too? I forgot about that, thanks. (Maybe even try some other card to narrow the problem?) We're try to convince our hosting provider to replace the NIC with a e1000. You could also try to test without ehci

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Jarek Poplawski
On 02-04-2007 21:41, Christian Kujau wrote: > > Hi there, > > we have serious problems with 2 of our servers: both shiny new amd64 > dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing). > Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s > (eth1, irq11). Hi, Did you

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
On Tue, 3 Apr 2007, Len Brown wrote: Which increased stability, disabling ACPI, or disabling the IOAPIC? To be honest, we're not sure. See below. Your box has MPS, so you should be able to use the IOAPIC in either mode. MPS - Multiprocessor Specification? SMP? Yes, it'd be good to use the

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
On Mon, 2 Apr 2007, Chuck Ebbert wrote: Where is the info from before you changed to "noapic"? Or were the machines always using XT-PIC for all the interrupts??? XT-PIC is only used since we switched to noapic, before there was IO-APIC-fasteoi on both ethernet cards and interrupts were balance

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Len Brown
On Monday 02 April 2007 15:41, Christian Kujau wrote: > > Hi there, > > we have serious problems with 2 of our servers: both shiny new amd64 > dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing). > Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s > (eth1, irq11). >

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
On Mon, 2 Apr 2007, Chuck Ebbert wrote: Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both hosts and feel free to ask for more details. Although both boxes are in production we'll be happy test more bootoptions/patches and the like. Where is the info from before you changed t

Re: 2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Chuck Ebbert
Christian Kujau wrote: > > Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both > hosts and feel free to ask for more details. Although both boxes are in > production we'll be happy test more bootoptions/patches and the like. Where is the info from before you changed to "noapic"?

2.6.20.4: NETDEV WATCHDOG and lockups

2007-04-02 Thread Christian Kujau
Hi there, we have serious problems with 2 of our servers: both shiny new amd64 dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing). Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s (eth1, irq11). Both boxes are running fine but after "a while" they lock up and ev