Summary: I had annoying resets of the SATA bus with a 8 Series/C220 Series Chipset controller and a HGST Travelstar 7K1000 drive. I recently managed to stop them and as far as I currently know I am satisfied; I write this mail in the hope that it may be useful for anyone having similar issues. If you do not have that issue and you are not a developer interested in fixing the issue more permanently, you can stop reading right now.
Here are the details. The computer is a Zotac ZBox ID91 nettop with a proprietary motherboard, and, as stated above, a Travelstar 7K1000 hard drive (a 7200 RPM 2.5", an unusual beast). It was installed around June 2014, and I noticed the problems some time later, they probably started right away. The distribution was a Debian Jessie (testing) with the packaged kernel, probably linux-image-3.14-1-amd64:amd64 at the time; the issue was not fixed by upgrades. The possibly relevant hardware information are these: CPU: Intel(R) Core(TM) i3-4130T CPU @ 2.90GHz CPU: product: Intel(R) Core(TM) i3-4130T CPU @ 2.90GHz description: SATA controller product: 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] vendor: Intel Corporation physical id: 1f.2 bus info: pci@0000:00:1f.2 version: 05 width: 32 bits clock: 66MHz capabilities: storage msi pm ahci_1.0 bus_master cap_list configuration: driver=ahci latency=0 resources: irq:42 ioport:f0b0(size=8) ioport:f0a0(size=4) ioport:f090(size=8) ioport:f080(size=4) ioport:f060(size=32) memory:f7d1a000-f7d1a7ff description: ATA Disk product: HGST HTS721010A9 physical id: 0.0.0 bus info: scsi@1:0.0.0 logical name: /dev/sda version: A3J0 serial: [REMOVED] size: 931GiB (1TB) capabilities: partitioned partitioned:dos configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 signature=d3079a6d The resets happened a few times a day (this computer was is kept on for more than a day and suspend is not used), mostly when the disk was in heavy use, sometimes as early as during the boot; there was a few good days when they did not happen. They were annoying because they caused a few seconds freeze of anything reading from disk; AFAIK they never resulted in data corruption. The corresponding kernel messages look like this: [ 337.466498] ata2: EH complete [ 367.251032] ata2.00: exception Emask 0x10 SAct 0x80000 SErr 0x400100 action 0x6 frozen [ 367.251041] ata2.00: irq_stat 0x08000000, interface fatal error [ 367.251046] ata2: SError: { UnrecovData Handshk } [ 367.251053] ata2.00: failed command: WRITE FPDMA QUEUED [ 367.251063] ata2.00: cmd 61/08:98:68:3b:40/00:00:6b:00:00/40 tag 19 ncq 4096 out [ 367.251063] res 50/00:08:68:3b:40/00:00:6b:00:00/40 Emask 0x10 (ATA bus error) [ 367.251068] ata2.00: status: { DRDY } [ 367.251075] ata2: hard resetting link [ 367.571128] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 367.577660] ata2.00: configured for UDMA/133 [ 367.577676] ata2: EH complete [ 409.772730] ata2: limiting SATA link speed to 3.0 Gbps [ 409.772735] ata2.00: exception Emask 0x10 SAct 0x3fe00 SErr 0x400100 action 0x6 frozen [ 409.772736] ata2.00: irq_stat 0x08000000, interface fatal error [ 409.772737] ata2: SError: { UnrecovData Handshk } [ 409.772739] ata2.00: failed command: READ FPDMA QUEUED [ 409.772742] ata2.00: cmd 60/08:48:78:09:41/00:00:01:00:00/40 tag 9 ncq 4096 in [ 409.772742] res 50/00:28:e0:a3:04/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 409.772743] ata2.00: status: { DRDY } <snip seven similar "failed command...DRDY" blocks> [ 409.772773] ata2.00: failed command: WRITE FPDMA QUEUED [ 409.772776] ata2.00: cmd 61/28:88:e0:a3:04/00:00:02:00:00/40 tag 17 ncq 20480 out [ 409.772776] res 50/00:28:e0:a3:04/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 409.772777] ata2.00: status: { DRDY } [ 409.772779] ata2: hard resetting link [ 410.092732] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 410.097670] ata2.00: configured for UDMA/133 Last week, hinted by the penultimate line, I tried to lower the speed of the SATA link permanently, and it worked. I did this by adding "libata.force=2:3.0Gbps" to the kernel command line (configured using /etc/default/grub). Since then, no reset happened; I am confident that seven days without them are not a coincidence. As I said, I consider the issue closed from my point of view. If someone wants to investigate further (for example a kernel hacker to actually fix this, or a distro developer to make an automatic work-around), I can give some more details, and possibly run a few tests if they do not take much time and are not too risky. Hope this helps. Regards, -- Nicolas George
signature.asc
Description: Digital signature