retitle 391929 hdparm/sata_promise kernel freeze on 2.6.18.2/amd64 when setting write_cache off thanks
The problem seems to be related to S07hdparm instead of S10checkroot and more specifically to the setting write_cache=off in /etc/hdparm.conf for the two drives attached to the Promise/FastTrak controller (which is being used not as a HW-RAID controller, but rather as provider of two separate SATA channels): lspci -v (full lspci -vv attached): 00:08.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 378/SAT A 378) (rev 02) Subsystem: ASUSTeK Computer Inc. K8V Deluxe/PC-DL Deluxe motherboard Flags: bus master, 66MHz, medium devsel, latency 96, IRQ 177 I/O ports at 8800 [size=64] I/O ports at 8400 [size=16] I/O ports at 8000 [size=128] Memory at fb300000 (32-bit, non-prefetchable) [size=4K] Memory at fb200000 (32-bit, non-prefetchable) [size=128K] Capabilities: [60] Power Management version 2 If I comment out write_cache=off for sd[gh], the system boots fine. I can also set -W0 on sd[ef], which are connected to a sata_via controller (see lspci attachment for details). If I run hdparm -W0 on sdg or sgh, I get a panic, which this time actually mentions sata_promise: Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: [<ffffffff8818c642>] :sata_promise:pdc_eng_timeout+0x62/0x18d PGD 35fb2067 PUD 3585d067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: rfcomm l2cap button ac battery ipv6 ipt_MASQUERADE iptable_nat ipt_REJECT ipt_addrtype ipt_LOG xt_limit xt_tcpudp xt_conntrack ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack nfnetlink iptable_filter ip_tables x_tables netconsole snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq snd_via82xx tsdev serio_raw snd_bt87x snd_via82xx_modem snd_ac97_codec snd_pcm_oss snd_mixer_oss evdev snd_mpu401_uart snd_pcm psmouse snd_rawmidi snd_seq_device snd_timer snd soundcore eth1394 pcspkr floppy ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid10 raid1 md_mod ide_generic ide_cd cdrom skge sd_mod hci_usb bluetooth usbhid usb_storage bt878 via82cxxx ohci1394 shpchp pci_hotplug ieee1394 sata_promise sk98lin sata_via aic7xxx scsi_transport_spi bttv video_buf firmware_class ir_common compat_ioctl32 i2c_algo_bit btcx_risc tveeprom videodev v4l1_compat v4l2_common libata scsi_mod generic ide_core uhci_hcd ehci_hcd i2c_viapro i2c_core gameport snd_ac97_bus snd_page_alloc thermal processor fan Pid: 1129, comm: scsi_eh_4 Not tainted 2.6.18-2-amd64 #1 RIP: 0010:[<ffffffff8818c642>] [<ffffffff8818c642>] :sata_promise:pdc_eng_timeout+0x62/0x18d RSP: 0018:ffff81003d86fe40 EFLAGS: 00010096 RAX: 00000000fafbfcfd RBX: ffff81003e080000 RCX: 000000000000acd4 RDX: 00000000ffffff01 RSI: 0000000000000046 RDI: ffff81003e3461c0 RBP: ffff81003e0804e8 R08: ffffffff804dc140 R09: 0000000000000012 R10: ffff81003d86fe08 R11: 0000000000000000 R12: 0000000000000000 R13: ffff81003e3461c0 R14: 0000000000000246 R15: 0000000000000005 FS: 00002ad83985c8c0(0000) GS:ffffffff80520000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000035c4f000 CR4: 00000000000006e0 Process scsi_eh_4 (pid: 1129, threadinfo ffff81003d86e000, task ffff810037f55770) Stack: ffff81003e080000 ffff81003e0804e8 ffff81003df05ac8 ffff81003e080000 ffff81003e080000 ffffffff880a4f11 0000000000000282 ffff81003e080000 ffffffff880767ed ffff81003df05ac8 ffff81003e080000 ffff81003df05ab8 Call Trace: [<ffffffff880a4f11>] :libata:ata_scsi_error+0x418/0x50b [<ffffffff880767ed>] :scsi_mod:scsi_error_handler+0x0/0xa81 [<ffffffff80290195>] keventd_create_kthread+0x0/0x61 [<ffffffff880768ac>] :scsi_mod:scsi_error_handler+0xbf/0xa81 [<ffffffff80290195>] keventd_create_kthread+0x0/0x61 [<ffffffff880767ed>] :scsi_mod:scsi_error_handler+0x0/0xa81 [<ffffffff80290195>] keventd_create_kthread+0x0/0x61 [<ffffffff8023055a>] kthread+0xd4/0x107 [<ffffffff80259318>] child_rip+0xa/0x12 [<ffffffff80290195>] keventd_create_kthread+0x0/0x61 [<ffffffff80230486>] kthread+0x0/0x107 [<ffffffff8025930e>] child_rip+0x0/0x12 Code: 41 8a 44 24 28 3c 01 74 0d 3c 03 bb e8 03 00 00 0f 85 93 00 RIP [<ffffffff8818c642>] :sata_promise:pdc_eng_timeout+0x62/0x18d RSP <ffff81003d86fe40> CR2: 0000000000000028 NMI Watchdog detected LOCKUP on CPU 0 CPU 0 Modules linked in: rfcomm l2cap button ac battery ipv6 ipt_MASQUERADE iptable_nat ipt_REJECT ipt_addrtype ipt_LOG xt_limit xt_tcpudp xt_conntrack ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack nfnetlink iptable_filter ip_tables x_tables netconsole snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq snd_via82xx tsdev serio_raw snd_bt87x snd_via82xx_modem snd_ac97_codec snd_pcm_oss snd_mixer_oss evdev snd_mpu401_uart snd_pcm psmouse snd_rawmidi snd_seq_device snd_timer snd soundcore eth1394 pcspkr floppy ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid10 raid1 md_mod ide_generic ide_cd cdrom skge sd_mod hci_usb bluetooth usbhid usb_storage bt878 via82cxxx ohci1394 shpchp pci_hotplug ieee1394 sata_promise sk98lin sata_via aic7xxx scsi_transport_spi bttv video_buf firmware_class ir_common compat_ioctl32 i2c_algo_bit btcx_risc tveeprom videodev v4l1_compat v4l2_common libata scsi_mod generic ide_core uhci_hcd ehci_hcd i2c_viapro i2c_core gameport snd_ac97_bus snd_page_alloc thermal processor fan Pid: 2817, comm: md5_raid10 Not tainted 2.6.18-2-amd64 #1 RIP: 0010:[<ffffffff8025e8c6>] [<ffffffff8025e8c6>] .text.lock.spinlock+0x2/0x8a RSP: 0018:ffffffff804bfde0 EFLAGS: 00000086 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: ffffffff804bfe98 RSI: ffff81003e3461c0 RDI: ffff81003e3461c0 RBP: ffffc20000036000 R08: ffff81003eedc000 R09: 0000000000000246 R10: 0000000000000000 R11: ffff810037ada770 R12: 0000000000000000 R13: 00000000000000b1 R14: ffff81003e3461c0 R15: ffffffff804bfe98 FS: 00002ad83985c8c0(0000) GS:ffffffff80520000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000035c4f000 CR4: 00000000000006e0 Process md5_raid10 (pid: 2817, threadinfo ffff81003eedc000, task ffff8100011fd870) Stack: ffffffff8818c172 ffffffff80257af1 ffff81003dff2340 0000000000000000 0000000000000000 00000000000000b1 ffffffff804bfe98 ffffffff804bfe98 ffffffff8020f0f4 ffffffff80528d00 0000000000005880 00000000000000b1 Call Trace: <IRQ> [<ffffffff8818c172>] :sata_promise:pdc_interrupt+0x3b/0x1d9 [<ffffffff80257af1>] blk_run_queue+0x28/0x72 [<ffffffff8020f0f4>] handle_IRQ_event+0x29/0x58 [<ffffffff802a4302>] __do_IRQ+0xa4/0x105 [<ffffffff88077dd7>] :scsi_mod:scsi_io_completion+0x156/0x334 [<ffffffff80263fdf>] do_IRQ+0x65/0x73 [<ffffffff80258989>] ret_from_intr+0x0/0xa [<ffffffff80210376>] __do_softirq+0x53/0xd5 [<ffffffff8026e567>] end_level_ioapic_vector+0x9/0x16 [<ffffffff80259664>] call_softirq+0x1c/0x28 [<ffffffff80264019>] do_softirq+0x2c/0x7d [<ffffffff80263fe4>] do_IRQ+0x6a/0x73 [<ffffffff80258989>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8020b1c4>] memcmp+0xb/0x22 [<ffffffff882a4522>] :raid10:raid10d+0x233/0x9da [<ffffffff80290195>] keventd_create_kthread+0x0/0x61 [<ffffffff8025d504>] schedule_timeout+0x1e/0xad [<ffffffff80290195>] keventd_create_kthread+0x0/0x61 [<ffffffff8828ac2a>] :md_mod:md_thread+0xf8/0x10e [<ffffffff80290358>] autoremove_wake_function+0x0/0x2e [<ffffffff8828ab32>] :md_mod:md_thread+0x0/0x10e [<ffffffff8023055a>] kthread+0xd4/0x107 [<ffffffff80259318>] child_rip+0xa/0x12 [<ffffffff80290195>] keventd_create_kthread+0x0/0x61 [<ffffffff80230486>] kthread+0x0/0x107 [<ffffffff8025930e>] child_rip+0x0/0x12 Code: 83 3f 00 7e f9 e9 6d fe ff ff e8 ff d7 ff ff e9 7d fe ff ff console shuts up ... <0>Kernel panic - not syncing: Aiee, killing interrupt handler! <0>Rebooting in 60 seconds.. Curiously, the last two lines do not always appear; sometimes the system also just remains frozen forever. sdg is a Maxtor 250Gb SATA drive at UDMA6 sdh is a Samsung 250Gb SATA drive at UDMA7 One difference about these is that the RAID10 array holding the swap partition only spans sdg[efg] and does not touch sdh. Both drives are healthy according to smartctl. This is what dmesg knows about them: sata_promise 0000:00:08.0: version 1.04 ACPI: PCI Interrupt 0000:00:08.0[A] -> GSI 18 (level, low) -> IRQ 177 ata3: SATA max UDMA/133 cmd 0xFFFFC20000036200 ctl 0xFFFFC20000036238 bmdma 0x0 irq 177 ata4: SATA max UDMA/133 cmd 0xFFFFC20000036280 ctl 0xFFFFC200000362B8 bmdma 0x0 irq 177 scsi4 : sata_promise ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-7, max UDMA/133, 490234752 sectors: LBA48 ata3.00: ata3: dev 0 multi count 0 ata3.00: configured for UDMA/133 scsi5 : sata_promise ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata4.00: ATA-7, max UDMA7, 488397168 sectors: LBA48 NCQ (depth 0/32) ata4.00: configured for UDMA/133 Vendor: ATA Model: Maxtor 7Y250M0 Rev: YAR5 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdg: 490234752 512-byte hdwr sectors (251000 MB) sdg: Write Protect is off sdg: Mode Sense: 00 3a 00 00 SCSI device sdg: drive cache: write back SCSI device sdg: 490234752 512-byte hdwr sectors (251000 MB) sdg: Write Protect is off sdg: Mode Sense: 00 3a 00 00 SCSI device sdg: drive cache: write back sdg: sdg1 sdg2 sdg3 < sdg5 sdg6 sdg7 sdg8 sdg9 sdg10 > sd 4:0:0:0: Attached scsi disk sdg Vendor: ATA Model: SAMSUNG SP2504C Rev: VT10 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdh: 488397168 512-byte hdwr sectors (250059 MB) sdh: Write Protect is off sdh: Mode Sense: 00 3a 00 00 SCSI device sdh: drive cache: write through SCSI device sdh: 488397168 512-byte hdwr sectors (250059 MB) sdh: Write Protect is off sdh: Mode Sense: 00 3a 00 00 SCSI device sdh: drive cache: write through sdh: sdh1 sdh2 < sdh5 sdh6 sdh7 sdh8 sdh9 sdh10 > sd 5:0:0:0: Attached scsi disk sdh This bug also occurs with the vanilla 2.6.18.2 kernel. Hope this helps. -- .''`. martin f. krafft <[EMAIL PROTECTED]> : :' : proud Debian developer, author, administrator, and user `. `'` http://people.debian.org/~madduck - http://debiansystem.info `- Debian - when you have better things to do than fixing systems
lspci-vv.bz2
Description: Binary data
signature.asc
Description: Digital signature (GPG/PGP)