A few weeks ago I had a scare when a reboot paniced the kernel with a complaint that it could not find the root device (/dev/sde), and further reboots couldn't even see the USB keyboard. Leavng the system powered off overnight "fixed" the problem and the system has been working fine ever since.
I have since had some time to explore this and find it related to the kernel; 3.6.10 works fine, while 3.7.1 fails. If I reset during the 3.7.1 boot while it is spewing its error messages, but before the kernel ultimately panics, I can reboot with 3.6.10, but if 3.7.1 goes all the way to the panic, I have to power off and wait a few minutes before a 3.6.10 reboot is succesful. This is repeatable, but I haven't bothered to see how long the system must be off; "a few minutes" is enough. This is a ~amd64 system, dual Opterons, Tyan S2882, Thunder K8S Pro. The dmesg times here start around 30 seconds because it spends 15 seconds on each of two SCSI hosts probing for nonexistent drives. udev etc are all frozen pre-systemd nonsense. Disks are two SSDs, two 4T drives, two 300G drives, and one 320G IDE/PATA drive; the main board is so old that there are only three boot options: IDE, DVD, network. There are two error messages during the 3.7.1 boot, repeated for all SATA drives: ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) Google does not enlighten me. One suggestion was change the SATA cable, but this is definitely a change from 3.6.10 to 3.7.1. So here are some details ... You can see everything at https://www.dropbox.com/sh/o8j80rps3agvvcf/FBjJLcykRS I am willing to try reasonable config changes for a new reboot attempt, but it is my main home server, not an experimental toy :-) ================ dmesg differences I took some pictures during the boot process and transcribed the results. The 3.6.10 dmesg matches, but of course I can't get a 3.7.1 dmesg. Both 3.6.10 and 3.7.1 appear to be the same up to this point: ata13.00: ATA-8: WDC WD3200AAJB-00J3A0, 01.03E01, max UDMA/133 ata13.00: 625142448 sectors, multi 16: LBA48 ata13.00: configured for UDMA/133 ata1: SATA link down (SStatus 0 SControl 300) ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata9.00: ATA-9: M4-CT512M4SD2, 000F, max UDMA/100 ata9.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 0/32) ata9.00: configured for UDMA/100 ata2: SATA link down (SStatus 0 SControl 300) ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata5.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133 ata5.00: 586114704 sectors, multi 0: LBA48 NCQ (not used) Around here 3.6.10 begins scrolling so fast that I could not get any pictures, so this is from the 3.6.10 dmesg, where it diverges from 3.7.1: ata5.00: configured for UDMA/133 scsi 6:0:0:0: Direct-Access ATA Maxtor 6B300S0 BANC PQ: 0 ANSI: 5 sd 6:0:0:0: [sda] 586114704 512-byte logical blocks: (300 GB/279 GiB) sd 6:0:0:0: [sda] Write Protect is off sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sd 6:0:0:0: [sda] Attached SCSI disk ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133 ata6.00: 586114704 sectors, multi 0: LBA48 NCQ (not used) ata6.00: configured for UDMA/133 scsi 7:0:0:0: Direct-Access ATA Maxtor 6B300S0 BANC PQ: 0 ANSI: 5 sd 7:0:0:0: [sdb] 586114704 512-byte logical blocks: (300 GB/279 GiB) sd 7:0:0:0: [sdb] Write Protect is off sd 7:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 7:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: unknown partition table sd 7:0:0:0: [sdb] Attached SCSI disk .... and on and on until it boots. (The unknown partition table is an LVM volume.) But 3.7.1 pokes along slowly enough while generating its errors that I did get some pictures to transcribe, and this is where it diverges from 3.6.10. ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) ata5: limiting SATA link speed to 1.5 Gbps ata5.00: limiting speed to UDMA/133:PIO3 ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) ata5.00: disabled ata5: hard resetting link ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata5: EH complete ... for all ATA drives until it eventually panics because the root device, /dev/sde, is not found. ================ 3.6.10 ---> 3.7.1 conf changes I rebuilt the 3.7.1 kernel and logged all the new config items. Cputime accounting > 1. Simple tick based cputime accounting (TICK_CPU_ACCOUNTING) (NEW) 2. Fine granularity task level IRQ time accounting (IRQ_TIME_ACCOUNTING) choice[1-2]: Consider userspace as in RCU extended quiescent state (RCU_USER_QS) [N/y/?] (NEW) Module signature verification (MODULE_SIG) [N/y/?] (NEW) Supervisor Mode Access Prevention (X86_SMAP) [Y/n/?] (NEW) n Legacy cpb sysfs knob support for AMD CPUs (X86_ACPI_CPUFREQ_CPB) [Y/n/?] (NEW) Enable core dump support (COREDUMP) [Y/n/?] (NEW) Packet: sockets monitoring interface (PACKET_DIAG) [N/m/y/?] (NEW) m IPv4 NAT (NF_NAT_IPV4) [N/m/?] (NEW) m OMAP OCP2SCP DRIVER (OMAP_OCP2SCP) [N/m/y/?] (NEW) m Calxeda Highbank SATA support (SATA_HIGHBANK) [N/m/y/?] (NEW) m Virtual eXtensible Local Area Network (VXLAN) (VXLAN) [N/m/y/?] (NEW) m Solarflare SFC9000-family PTP support (SFC_PTP) [Y/n/?] (NEW) Microchip MRF24J40 transceiver driver (IEEE802154_MRF24J40) [N/m/?] (NEW) m 8250/16550 PNP device support (SERIAL_8250_PNP) [Y/n/?] (NEW) MAX310X support (SERIAL_MAX310X) [N/y/?] (NEW) SCCNXP serial port support (SERIAL_SCCNXP) [N/m/y/?] (NEW) m TPM HW Random Number Generator support (HW_RANDOM_TPM) [M/n/?] (NEW) TPM Interface Specification 1.2 Interface (I2C - Infineon) (TCG_TIS_I2C_INFINEON) [N/m/?] (NEW) m NXP SC18IS602/602B/603 I2C to SPI bridge (SPI_SC18IS602) [N/m/y/?] (NEW) m Dialog DA9052 GPIO (GPIO_DA9052) [N/m/y/?] (NEW) m TWL6040 GPO (GPIO_TWL6040) [N/m/y/?] (NEW) m OMAP HDQ driver (HDQ_MASTER_OMAP) [N/m/?] (NEW) m Marvell 88PM860x battery driver (BATTERY_88PM860X) [N/m/y/?] (NEW) m Dialog DA9052 Battery (BATTERY_DA9052) [N/m/y/?] (NEW) m Marvell 88PM860x Charger driver (CHARGER_88PM860X) [N/m/?] (NEW) m Analog Devices ADT7410 (SENSORS_ADT7410) [N/m/?] (NEW) m Maxim MAX197 and compatibles (SENSORS_MAX197) [N/m/?] (NEW) m generic cpu cooling support (CPU_THERMAL) [N/y/?] (NEW) Support for the SMSC ECE1099 series chips (MFD_SMSC) [N/y/?] (NEW) Dialog Semiconductor DA9055 PMIC Support (MFD_DA9055) [N/y/?] (NEW) Texas Instruments LP8788 Power Management Unit Driver (MFD_LP8788) [N/y/?] (NEW) Maxim Semiconductor MAX8907 PMIC Support (MFD_MAX8907) [N/m/y/?] (NEW) m Fairchild FAN53555 Regulator (REGULATOR_FAN53555) [N/m/y/?] (NEW) m Maxim 8907 voltage regulator (REGULATOR_MAX8907) [N/m/?] (NEW) m TechnoTrend USB IR Receiver (IR_TTUSBIR) [N/m/?] (NEW) m Media USB Adapters (MEDIA_USB_SUPPORT) [N/y/?] (NEW) y STK1160 USB video capture support (VIDEO_STK1160) [N/m/?] (NEW) m STK1160 AC97 codec support (VIDEO_STK1160_AC97) [N/y/?] (NEW) y Support for various USB DVB devices v2 (DVB_USB_V2) [N/m/?] (NEW) m Enable debug for the B2C2 FlexCop drivers (DVB_B2C2_FLEXCOP_USB_DEBUG) [N/y/?] (NEW) Media PCI Adapters (MEDIA_PCI_SUPPORT) [N/y/?] (NEW) Media test drivers (V4L_TEST_DRIVERS) [N/y] (NEW) ISA and parallel port devices (MEDIA_PARPORT_SUPPORT) [N/y/?] (NEW) Autoselect tuners and i2c modules to build (MEDIA_SUBDRV_AUTOSELECT) [Y/n/?] (NEW) Maximum debug level (NOUVEAU_DEBUG) [5] (NEW) Default debug level (NOUVEAU_DEBUG_DEFAULT) [3] (NEW) Backlight Driver for LM3630 (BACKLIGHT_LM3630) [N/m/y/?] (NEW) m Backlight Driver for LM3639 (BACKLIGHT_LM3639) [N/m/y/?] (NEW) m TPS65217 Backlight (BACKLIGHT_TPS65217) [N/m/?] (NEW) m Default time-out for HD-audio power-save mode (SND_HDA_POWER_SAVE_DEFAULT) [0] (NEW) CIR via RC class (HID_PICOLCD_CIR) [N/y/?] (NEW) Sony PS3 BD Remote Control (HID_PS3REMOTE) [N/m/?] (NEW) m HID Sensors framework support (HID_SENSOR_HUB) [N/m/?] (NEW) m ZTE USB serial driver (USB_SERIAL_ZTE) [N/m/?] (NEW) m OMAP USB2 PHY Driver (OMAP_USB2) [N/m/y/?] (NEW) m LED support for LM3642 Chip (LEDS_LM3642) [N/m/y/?] (NEW) m LED support for LM355x Chips, LM3554 and LM3556 (LEDS_LM355x) [N/m/y/?] (NEW) m LED CPU Trigger (LEDS_TRIGGER_CPU) [N/y/?] (NEW) Dynamic compression of swap pages and clean pagecache pages (ZCACHE2) [N/y/?] (NEW) Silicom devices (NET_VENDOR_SILICOM) [Y/n/?] (NEW) Silicom BypassCTL library support (SBYPASS) [N/m/?] (NEW) m Silicom BypassCTL net support (BPCTL) [N/m/?] (NEW) m Cambridge Electronic Design 1401 USB support (CED1401) [N/m/?] (NEW) m Digi Realport driver (DGRP) [N/m/y/?] (NEW) m STE-Modem remoteproc support (STE_MODEM_RPROC) [N/m/y/?] (NEW) m SMB2 network file system support (EXPERIMENTAL) (CIFS_SMB2) [N/y/?] (NEW) RCU debugging: preemptible RCU race provocation (PROVE_RCU_DELAY) [N/y/?] (NEW) Red-Black tree test (RBTREE_TEST) [N/m/?] (NEW) m Interval tree test (INTERVAL_TREE_TEST) [N/m/?] (NEW) m CAST5 (CAST-128) cipher algorithm (x86_64/AVX) (CRYPTO_CAST5_AVX_X86_64) [N/m/y/?] (NEW) m CAST6 (CAST-256) cipher algorithm (x86_64/AVX) (CRYPTO_CAST6_AVX_X86_64) [N/m/y/?] (NEW) m Asymmetric (public-key cryptographic) key type (ASYMMETRIC_KEY_TYPE) [N/m/y/?] (NEW) m Asymmetric public-key crypto algorithm subtype (ASYMMETRIC_PUBLIC_KEY_SUBTYPE) [N/m/?] (NEW) m RSA public-key algorithm (PUBLIC_KEY_ALGO_RSA) [N/m/?] (NEW) m X.509 certificate parser (X509_CERTIFICATE_PARSER) [N/m/?] (NEW) m -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o