> ...
> The Mac seems to be the only platform with these issues, I expect the
> problem to be somewhere in the PIO transfer routine (as far as I know
> only the Mac driver uses it, all other drivers use some form of DMA).

Is there any instrumentation in the mac-specific code?

> I'd like someone to try a kernel with the following patch, it makes the
> Mac 'goto' slightly more intelligent, it should now 'remember' devices
> which bugger up when disconnecting.

I'm afraid it didn't help.

> If this doesn't work, perhaps enable the debug macros in NCR53C9x.h and
> see if we can figure it out from the logs.

I don't know which flags do what, but I turned these on, anyway:

#define DEBUG_ESP
#define DEBUG_ESP_DISCONNECT
#define DEBUG_ESP_PHASES

Now I get this:

Linux version 2.6.8.1 ([EMAIL PROTECTED]) (gcc version 3.4.1)
#3 Thu Sep 2 11:36:13 EST 2004
Detected Macintosh model: 36
VIA1 at 50f00000 is a 6522 or clone
VIA2 at 50f02000 is <6>a 6522 or clone
Apple Macintosh Quadra 650
Built 1 zonelists
Kernel command line: root=/dev/sdb4 ro init=/boot.sh debug=ser console=tty0
Killing onboard sonic... Done.
PID hash table entries: 16 (order 4: 128 bytes)
Console: colour dummy device 80x25
Linux version 2.6.8.1 ([EMAIL PROTECTED]) (gcc version 3.4.1) #3 Thu Sep 2 11
:36:13 EST 2004
Detected Macintosh model: 36
VIA1 at 50f00000 is a 6522 or clone
VIA2 at 50f02000 is <6>a 6522 or clone
Apple Macintosh Quadra 650
Built 1 zonelists
Kernel command line: root=/dev/sdb4 ro init=/boot.sh debug=ser console=tty0
Killing onboard sonic... Done.
PID hash table entries: 16 (order 4: 128 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Memory: 38304k/40960k available (1620k kernel code, 908k data, 84k init)
Calibrating delay loop... 22.11 BogoMIPS
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
NET: Registered protocol family 16
NuBus: Scanning NuBus slots.
Slot C:
  Board resource:
    type: [cat 0x1 type 0x0 hw 0x0 sw 0x0]
    name: EtherPort IIN
    board id: 0x12a
    vendor info:
    ID: Kinetics, A Division of Excelan, Inc.
  Function 0x80:
    type: [cat 0x4 type 0x1 hw 0x103 sw 0x106]
    name: Network_EtherNet_KinEth_KinEth_IIN
    MAC address: 00:80:19:03:0b:77
    unknown resource 81, data 0xffff6c
    unknown resource 82, data 0x000093
SCSI subsystem initialized
macfb: framebuffer at 0xf9001000, mapped to 0xd0001000, size 960k
macfb: mode is 640x480x16, linelength=2048
macfb: scrolling: redraw
macfb: directcolor: size=1:5:5:5, shift=15:10:5:0
fb0: Macintosh DAFB built-in frame buffer device
devfs: 2004-01-31 Richard Gooch ([EMAIL PROTECTED])
devfs: boot_options: 0x0
fbcon_startup: No VBL detected, using timer based cursor.
mac_delete_irq: tried to remove invalid irq
Console: switching to colour frame buffer device 80x30
Generic RTC Driver v1.07
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
mac8390.c: v0.4 2001-05-15 David Huggins-Daines <[EMAIL PROTECTED]> and others
eth0: EtherPort IIN in slot C (type kinetics)
MAC 00:80:19:03:0b:77 IRQ 59, shared memory at 0xfc000000-0xfc007fff,  16-bit ac
cess.
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 50MHz system bus speed for PIO modes; override with idebus=xx
mac_esp: io base at 0x50f10000
esp: using quick version
esp: addr at 0x50f10000
SCSI ID 7 Clk 16MHz CCF=4 TOut 138 NCR53C9x(esp236)

mac_esp: 1 esp controllers found
scsi0 : ESP236 (NCR53C9x)
Using anticipatory io scheduler
N<00,00>F<00,00>N<00,00>F<00,00><5>  Vendor: QUANTUM   Model: LPS270S
Rev: 590A
  Type:   Direct-Access                      ANSI SCSI revision: 02
N<01,00>N<02,00>N<03,00>F<03,00><5>  Vendor: MATSHITA  Model: CD-ROM CR-8004
Rev: 1.0p
  Type:   CD-ROM                             ANSI SCSI revision: 02
N<04,00>N<05,00>F<05,00>N<05,00>F<05,00><5>  Vendor: CONNER    Model: CP30540  S
UN0535  Rev: B0CD
  Type:   Direct-Access                      ANSI SCSI revision: 02
N<06,00><6>st: Version 20040403, fixed bufsize 32768, s/g segs 256
N<00,00>F<00,00>N<00,00>F<00,00><5>SCSI device sda: 528808 512-byte hdwr sectors
 (271 MB)
N<00,00>F<00,00>N<00,00>F<00,00><5>SCSI device sda: drive cache: write through
 /dev/scsi/host0/bus0/target0/lun0:N<00,00>D<00,00>R<00,00>esp0: Aborting comman
d
esp0: dumping state
esp0: SW [sreg<07> sstep<04> ireg<0c>]
esp0: HW reread [sreg<01> sstep<c4> ireg<10>]
esp0: current command [tgt<00> lun<00> pphase<FREEING> cphase<CLUELESS>]
esp0: disconnected
N<00,00>esp0: Aborting command
esp0: dumping state
esp0: SW [sreg<07> sstep<04> ireg<0c>]
esp0: HW reread [sreg<01> sstep<c4> ireg<00>]
esp0: current command [tgt<00> lun<00> pphase<UNISSUED> cphase<UNISSUED>]
esp0: disconnected
esp0: Resetting scsi bus
esp0: SCSI bus reset interrupt
D<00,00>R<00,00>N<00,00>esp0: Aborting command
esp0: dumping state
esp0: SW [sreg<07> sstep<04> ireg<0c>]
esp0: HW reread [sreg<01> sstep<c4> ireg<10>]
esp0: current command [tgt<00> lun<00> pphase<UNISSUED> cphase<UNISSUED>]
esp0: disconnected
scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lu
n 0
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda, logical block 0
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda, logical block 0
 unable to read partition table
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
N<05,00>esp0: Aborting command
esp0: dumping state
esp0: SW [sreg<07> sstep<04> ireg<0c>]
esp0: HW reread [sreg<01> sstep<c4> ireg<00>]
esp0: current command [tgt<00> lun<00> pphase<UNISSUED> cphase<UNISSUED>]
esp0: disconnected
N<05,00>*** ILLEGAL INSTRUCTION ***   FORMAT=0
Current process id is 0
BAD KERNEL TRAP: 00000000
Modules linked in:
PC: [<0001ec84>] cascade+0x38/0x56

SR: 2700  SP: 00203f08  a2: 00195ae4
d0: 0272f098    d1: 00000035    d2: 001df35c    d3: 00000035
d4: 01fd4680    d5: 00001000    a0: 00203f70    a1: 00000000
Process swapper (pid: 0, stackpage=00196ae4)
Stack from 00203f08:
        00000035 001df35c 00000035 01fd4680 00001000 00203f70 00000000 00195ae4
        0272f098 ffffffff 00000000 27000001 ec840010 00000000 0000000a 0001ec4c
        00002000 00203f70 0001edae 001df35c 001dfb64 00000035 00000001 001df110
        00002000 025aef50 00203f70 00203f70 00001000 0001c294 001df110 00202604
        00000040 0015abfa 0001c2de 00204000 00004050 027d2e10 00204000 00000040
        01fd4680 00001000 00004250 00195ae4 00195ae4 00000000 ffffffff 00000000
Call Trace: [<00004274>] cpu_idle+0x16/0x22
 [<000196f4>] printk+0x0/0x164
 [<00002022>] rest_init+0x1a/0x1c
 [<001eee5c>] start_kernel+0x1c0/0x1d0
 [<00002800>] inflate_codes+0x146/0x424
 [<001ed3d2>] __start+0x3d2/0xa48

Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing


Last night, before I got your patch, I found that I can avoid any SCSI
problems, as long as the root filesystem is mounted read-only. I got past
the boot failure by powering up the root disk a second or two after
telling the bootloader to "boot now". For the test above, I did not do
that. I always pass "ro" on the kernel command line.

The 2.6.8.1 kernel worked fine for several hours of uptime, but as soon as
I did "mount -o remount,rw /", it was all over, and the following was
logged,

esp0: Aborting command
esp0: dumping state
esp0: SW [sreg<07> sstep<04> ireg<08>]
esp0: HW reread [sreg<07> sstep<84> ireg<10>]
esp0: current command [tgt<05> lun<00> pphase<MSGINDONE> cphase<CLUELESS>]
esp0: disconnected
esp0: Aborting command
esp0: dumping state
esp0: SW [sreg<07> sstep<04> ireg<08>]
esp0: HW reread [sreg<07> sstep<84> ireg<00>]
esp0: current command [tgt<05> lun<00> pphase<UNISSUED> cphase<UNISSUED>]
esp0: disconnected
esp0: Resetting scsi bus
esp0: SCSI bus reset interrupt

Hope this helps... let me know if I should run any other tests.

-F

Reply via email to