On Tue, Jan 31, 2006 at 02:09:05PM +0100, mickey wrote:
> On Tue, Jan 31, 2006 at 12:24:48PM +1100, Nicholas Young wrote:
> > On Mon, Jan 30, 2006 at 04:40:29PM +0100, mickey wrote:
> > > On Mon, Jan 30, 2006 at 05:46:17PM +1100, Nicholas Young wrote:
> > > > Hello
> > > re
> > > 
> > > > When booting with the standard kernel everything works fine and I
> > > > can login/use the machine, run stress without any errors.
> > > > 
> > > > When booting with the MP kernel it will get to mounting the drive and
> > > > freeze, partial boot log below of where the error occurs.
> > > > 
> > > > I have tested this with AMD64/i386 of 3.8 and the AMD64 snapshot
> > > > 24 Jan 2006. 
> > > 
> > > (i'm not sure what you mean by amd64/i386 ;)
> > 
> > I tested it with 3.8 AMD64 and then 3.8 i386.
> > 
> > > can you give a try to i386 (not amd64) fresh snap plz?
> > 
> > I loaded the i386 snapshot from 24 Jan 06 and it had the same final
> > error.
> > 
> > It also gave during the boot:
> > biomask 0 netmask 0 ttymask 0
> > ioapic0: pin 3 shares different IPL interrupts (40..50), degraded
> > performance
> > ioapic0: pin 5 shares different IPL interrupts (40..90), degraded
> > performance
> > pctr: user-level counter enabled
> > apm0: disconnected
> > wd0(pciide2:0:0): timeout
> >     type: ata
> >     c_bcount: 512
> >     c_skip: 0
> > wd0(pciide2:0:0): timeout
> >     type: ata
> >     c_bcount: 512
> >     c_skip: 0
> 
> can you try non-smp kernel plz?
> also if that fails try disable pcibios in ukc> .
> same plz try w/ smp kernel if disabling pcibios change anything.
> (to get to ukc do boot -c and then "disable pcibios" at UKC> prompt)
> 

Thanks for the reply.

With the non-smp kernel there is no error and it boots to the login
prompt and i can use the machine as per normal. This is with pcibios
enabled.

For the smp kernel disabling the pcibios made no obvious difference it
still ends with the same wd0(pciide2:0:0): timeout issue and then
freezes. It looks like it is going into ltsleep+0x6d forever.

At the moment I think a process is asking for a read from the SATA
drive for the disklable,this appears to be what is happening when it
works correctly.

The read either fails and timesout or is completed but not correctly
notifying the calling process for some reason which then ends up in
wdctimeout().

I think this implies some problem with the SATA driver for the NF4
chipset in SMP mode. This is based on the fact it works correctly
with a IDE drive for SMP and non-SMP kernels and with SATA drives
for non-SMP kernels. (all on the same hardware)

Does this seem like a reasonable idea of where the problem could be?

Anything else I should/could try? I can get a better output from ddb if
someone can give me instructions on what to do. At the moment I am
trying to understand how the wdc code works and what changes when
reading from a device when using a SMP kernel.

I am not sure if this can help, however I have run the following to try
and get an idea of what is going on when the system ends up in
wdctimeout().

boot> boot bsd.mp -d
booting hd0a:bsd.mp: 5000232+873256 [52+258016+239165]=0x613718
entry point at 0x100120
                       .
[ using 497608 bytes of bsd ELF symbol table ]
Stopped at      Debugger+0x4:   leave
ddb{0}> break wdctimeout
ddb{0}> c
...
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
biomask 0 netmask 0 ttymask 0
ioapic0: pin 3 shares different IPL interrupts (40..50), degraded
performance
ioapic0: pin 5 shares different IPL interrupts (40..90), degraded
performance
pctr: user-level cycle counter enabled
apm0: disconnected
Breakpoint at   wdctimeout:     pushl   %ebp
ddb{0}> boot reboot
panic: wdc_exec_command: polled command not done
Stopped at      Debugger+0x4:   leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS
PANIC!
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{0}> trace
Debugger(d18a55a4,d7e2f060,e9058bfc,d7e2f060,e9058c44) at Debugger+0x4
panic(d04f1340,d7e2f060,0,d05acce0,d1872100) at panic+0x63
wdc_exec_command(d18a50d4,e9058c44,e9058c6c,d016da36) at
wdc_exec_command+0x10a

wd_flushcache(d18a7800,10,b0,3e,d1877890) at wd_flushcache+0x67
wd_shutdown(d18a7800,e9058d20,e9058ccc,d01f0d35) at wd_shutdown+0x10
dohooks(d05ae900,1,e9058cfc,d01f0dd1) at dohooks+0x5e
boot(4804,1,e9058d1c,0,0) at boot+0x55
db_boot_poweroff_cmd(d034a2e0,0,ffffffff,e9058d24,d05acee0) at
db_boot_poweroff
_cmd
db_command(d05acee0,d05acd00,e9058e2c,d01efd61,e9058e08) at
db_command+0xff
db_command_loop(1,e9058ec4,e9058e6c,d03465c4,1) at db_command_loop+0x9c
db_trap(1,0,e9058e6c,d0346569,e9058ec4) at db_trap+0x86
kdb_trap(1,0,e9058ec4,d05ae928) at kdb_trap+0xe8
trap() at trap+0xb9
--- trap (number 1) ---
wdctimeout(58,0,10,10,e9057000) at wdctimeout+0x1
Bad frame pointer: 0xe9058f20
ddb{0}> ps
   PID   PPID   PGRP    UID  S       FLAGS  WAIT       COMMAND
     7      0      0      0  2   0x2100604             pfpurge
     6      0      0      0  3   0x2100204  timeout    sensors
     5      0      0      0  3   0x2100204  usbevt     usb1
     4      0      0      0  3   0x2100204  usbtsk     usbtask
     3      0      0      0  3   0x2100204  usbevt     usb0
     2      0      0      0  3   0x2100204  kmalloc    kmthread
     1      0      0      0  3   0x2000004  initexec   swapper
     0     -1      0      0  3   0x2080204  wdccmd     swapper

-- 
Nich

Reply via email to