> ... > The Mac seems to be the only platform with these issues, I expect the > problem to be somewhere in the PIO transfer routine (as far as I know > only the Mac driver uses it, all other drivers use some form of DMA).
Is there any instrumentation in the mac-specific code? > I'd like someone to try a kernel with the following patch, it makes the > Mac 'goto' slightly more intelligent, it should now 'remember' devices > which bugger up when disconnecting. I'm afraid it didn't help. > If this doesn't work, perhaps enable the debug macros in NCR53C9x.h and > see if we can figure it out from the logs. I don't know which flags do what, but I turned these on, anyway: #define DEBUG_ESP #define DEBUG_ESP_DISCONNECT #define DEBUG_ESP_PHASES Now I get this: Linux version 2.6.8.1 ([EMAIL PROTECTED]) (gcc version 3.4.1) #3 Thu Sep 2 11:36:13 EST 2004 Detected Macintosh model: 36 VIA1 at 50f00000 is a 6522 or clone VIA2 at 50f02000 is <6>a 6522 or clone Apple Macintosh Quadra 650 Built 1 zonelists Kernel command line: root=/dev/sdb4 ro init=/boot.sh debug=ser console=tty0 Killing onboard sonic... Done. PID hash table entries: 16 (order 4: 128 bytes) Console: colour dummy device 80x25 Linux version 2.6.8.1 ([EMAIL PROTECTED]) (gcc version 3.4.1) #3 Thu Sep 2 11 :36:13 EST 2004 Detected Macintosh model: 36 VIA1 at 50f00000 is a 6522 or clone VIA2 at 50f02000 is <6>a 6522 or clone Apple Macintosh Quadra 650 Built 1 zonelists Kernel command line: root=/dev/sdb4 ro init=/boot.sh debug=ser console=tty0 Killing onboard sonic... Done. PID hash table entries: 16 (order 4: 128 bytes) Console: colour dummy device 80x25 Dentry cache hash table entries: 8192 (order: 3, 32768 bytes) Inode-cache hash table entries: 4096 (order: 2, 16384 bytes) Memory: 38304k/40960k available (1620k kernel code, 908k data, 84k init) Calibrating delay loop... 22.11 BogoMIPS Mount-cache hash table entries: 512 (order: 0, 4096 bytes) NET: Registered protocol family 16 NuBus: Scanning NuBus slots. Slot C: Board resource: type: [cat 0x1 type 0x0 hw 0x0 sw 0x0] name: EtherPort IIN board id: 0x12a vendor info: ID: Kinetics, A Division of Excelan, Inc. Function 0x80: type: [cat 0x4 type 0x1 hw 0x103 sw 0x106] name: Network_EtherNet_KinEth_KinEth_IIN MAC address: 00:80:19:03:0b:77 unknown resource 81, data 0xffff6c unknown resource 82, data 0x000093 SCSI subsystem initialized macfb: framebuffer at 0xf9001000, mapped to 0xd0001000, size 960k macfb: mode is 640x480x16, linelength=2048 macfb: scrolling: redraw macfb: directcolor: size=1:5:5:5, shift=15:10:5:0 fb0: Macintosh DAFB built-in frame buffer device devfs: 2004-01-31 Richard Gooch ([EMAIL PROTECTED]) devfs: boot_options: 0x0 fbcon_startup: No VBL detected, using timer based cursor. mac_delete_irq: tried to remove invalid irq Console: switching to colour frame buffer device 80x30 Generic RTC Driver v1.07 RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: loaded (max 8 devices) mac8390.c: v0.4 2001-05-15 David Huggins-Daines <[EMAIL PROTECTED]> and others eth0: EtherPort IIN in slot C (type kinetics) MAC 00:80:19:03:0b:77 IRQ 59, shared memory at 0xfc000000-0xfc007fff, 16-bit ac cess. Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 50MHz system bus speed for PIO modes; override with idebus=xx mac_esp: io base at 0x50f10000 esp: using quick version esp: addr at 0x50f10000 SCSI ID 7 Clk 16MHz CCF=4 TOut 138 NCR53C9x(esp236) mac_esp: 1 esp controllers found scsi0 : ESP236 (NCR53C9x) Using anticipatory io scheduler N<00,00>F<00,00>N<00,00>F<00,00><5> Vendor: QUANTUM Model: LPS270S Rev: 590A Type: Direct-Access ANSI SCSI revision: 02 N<01,00>N<02,00>N<03,00>F<03,00><5> Vendor: MATSHITA Model: CD-ROM CR-8004 Rev: 1.0p Type: CD-ROM ANSI SCSI revision: 02 N<04,00>N<05,00>F<05,00>N<05,00>F<05,00><5> Vendor: CONNER Model: CP30540 S UN0535 Rev: B0CD Type: Direct-Access ANSI SCSI revision: 02 N<06,00><6>st: Version 20040403, fixed bufsize 32768, s/g segs 256 N<00,00>F<00,00>N<00,00>F<00,00><5>SCSI device sda: 528808 512-byte hdwr sectors (271 MB) N<00,00>F<00,00>N<00,00>F<00,00><5>SCSI device sda: drive cache: write through /dev/scsi/host0/bus0/target0/lun0:N<00,00>D<00,00>R<00,00>esp0: Aborting comman d esp0: dumping state esp0: SW [sreg<07> sstep<04> ireg<0c>] esp0: HW reread [sreg<01> sstep<c4> ireg<10>] esp0: current command [tgt<00> lun<00> pphase<FREEING> cphase<CLUELESS>] esp0: disconnected N<00,00>esp0: Aborting command esp0: dumping state esp0: SW [sreg<07> sstep<04> ireg<0c>] esp0: HW reread [sreg<01> sstep<c4> ireg<00>] esp0: current command [tgt<00> lun<00> pphase<UNISSUED> cphase<UNISSUED>] esp0: disconnected esp0: Resetting scsi bus esp0: SCSI bus reset interrupt D<00,00>R<00,00>N<00,00>esp0: Aborting command esp0: dumping state esp0: SW [sreg<07> sstep<04> ireg<0c>] esp0: HW reread [sreg<01> sstep<c4> ireg<10>] esp0: current command [tgt<00> lun<00> pphase<UNISSUED> cphase<UNISSUED>] esp0: disconnected scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lu n 0 scsi0 (0:0): rejecting I/O to offline device Buffer I/O error on device sda, logical block 0 scsi0 (0:0): rejecting I/O to offline device Buffer I/O error on device sda, logical block 0 unable to read partition table Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 N<05,00>esp0: Aborting command esp0: dumping state esp0: SW [sreg<07> sstep<04> ireg<0c>] esp0: HW reread [sreg<01> sstep<c4> ireg<00>] esp0: current command [tgt<00> lun<00> pphase<UNISSUED> cphase<UNISSUED>] esp0: disconnected N<05,00>*** ILLEGAL INSTRUCTION *** FORMAT=0 Current process id is 0 BAD KERNEL TRAP: 00000000 Modules linked in: PC: [<0001ec84>] cascade+0x38/0x56 SR: 2700 SP: 00203f08 a2: 00195ae4 d0: 0272f098 d1: 00000035 d2: 001df35c d3: 00000035 d4: 01fd4680 d5: 00001000 a0: 00203f70 a1: 00000000 Process swapper (pid: 0, stackpage=00196ae4) Stack from 00203f08: 00000035 001df35c 00000035 01fd4680 00001000 00203f70 00000000 00195ae4 0272f098 ffffffff 00000000 27000001 ec840010 00000000 0000000a 0001ec4c 00002000 00203f70 0001edae 001df35c 001dfb64 00000035 00000001 001df110 00002000 025aef50 00203f70 00203f70 00001000 0001c294 001df110 00202604 00000040 0015abfa 0001c2de 00204000 00004050 027d2e10 00204000 00000040 01fd4680 00001000 00004250 00195ae4 00195ae4 00000000 ffffffff 00000000 Call Trace: [<00004274>] cpu_idle+0x16/0x22 [<000196f4>] printk+0x0/0x164 [<00002022>] rest_init+0x1a/0x1c [<001eee5c>] start_kernel+0x1c0/0x1d0 [<00002800>] inflate_codes+0x146/0x424 [<001ed3d2>] __start+0x3d2/0xa48 Kernel panic: Aiee, killing interrupt handler! In interrupt handler - not syncing Last night, before I got your patch, I found that I can avoid any SCSI problems, as long as the root filesystem is mounted read-only. I got past the boot failure by powering up the root disk a second or two after telling the bootloader to "boot now". For the test above, I did not do that. I always pass "ro" on the kernel command line. The 2.6.8.1 kernel worked fine for several hours of uptime, but as soon as I did "mount -o remount,rw /", it was all over, and the following was logged, esp0: Aborting command esp0: dumping state esp0: SW [sreg<07> sstep<04> ireg<08>] esp0: HW reread [sreg<07> sstep<84> ireg<10>] esp0: current command [tgt<05> lun<00> pphase<MSGINDONE> cphase<CLUELESS>] esp0: disconnected esp0: Aborting command esp0: dumping state esp0: SW [sreg<07> sstep<04> ireg<08>] esp0: HW reread [sreg<07> sstep<84> ireg<00>] esp0: current command [tgt<05> lun<00> pphase<UNISSUED> cphase<UNISSUED>] esp0: disconnected esp0: Resetting scsi bus esp0: SCSI bus reset interrupt Hope this helps... let me know if I should run any other tests. -F