https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Bug ID: 229745 Summary: ahcich: CAM status: Command timeout Product: Base System Version: 11.2-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: b...@freebsd.org Reporter: fbsd98816...@avksrv.org Hello! We have some Supermicro server based on X11SSH-F All servers were installed half year ago and works under Fbsd 11.1. All server have 4 HDD HGST HUS722T1TALA604 All of them works fine for this time with half year uptime. Recently servers were upgraded to Fbsd 11.2 (self build 11.2-STABLE r335679 with default make.conf src.conf and GENERIC) and after some time (all the time different, from 2 hours to 7 days) one or some disks started timeout: Jul 13 00:56:24 mrr32 kernel: ahcich2: Timeout on slot 17 port 0 Jul 13 00:56:24 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 00060000 rs 00060000 tfd 40 serr 00000000 cmd 0004d217 Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 ca 22 23 40 06 00 00 00 00 00 Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command Jul 13 00:58:16 srv32 kernel: ahcich2: Timeout on slot 26 port 0 Jul 13 00:58:16 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 04000000 rs 04000000 tfd 40 serr 00000000 cmd 0004da17 Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e0 8a cc c6 40 18 00 00 00 00 00 Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command Jul 13 01:01:46 srv32 kernel: ahcich2: Timeout on slot 18 port 0 Jul 13 01:01:46 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 00040000 rs 00040000 tfd 40 serr 00000000 cmd 0004d217 Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 2a 2b 23 40 06 00 00 00 00 00 Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command Jul 13 01:07:12 srv32 kernel: ahcich0: Timeout on slot 23 port 0 Jul 13 01:07:12 srv32 kernel: ahcich0: is 00000000 cs 00000000 ss 00800000 rs 00800000 tfd 40 serr 00000000 cmd 0004d717 Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 18 62 f5 c6 40 18 00 00 00 00 00 Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): CAM status: Command timeout Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): Retrying command Jul 13 01:07:43 srv32 kernel: ahcich0: Timeout on slot 2 port 0 Jul 13 01:07:43 srv32 kernel: ahcich0: is 00000000 cs 00000000 ss 00000004 rs 00000004 tfd 40 serr 00000000 cmd 0004c217 Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 62 12 7b 40 06 00 00 00 00 00 Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): CAM status: Command timeout Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): Retrying command reboot (/sbin/shutdown -r or /sbin/reboot) does not solve the problem, disks still timeout after boot. Only power off / power on solve problem for some time. and after while it generate timeount Servers were updated to latest bios available on Supermicro. No changes. ahci0: <Intel Sunrise Point AHCI SATA controller> port 0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xdf310000-0xdf311fff,0xdf31e000-0xdf31e0ff,0xdf31d000-0xdf31d7ff irq 16 at device 23.0 on pci0 ahci0: AHCI v1.31 with 8 6Gbps ports, Port Multiplier not supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 ahcich6: <AHCI channel> at channel 6 on ahci0 ahcich7: <AHCI channel> at channel 7 on ahci0 ses0 at ahciem0 bus 0 scbus8 target 0 lun 0 ses0: <AHCI SGPIO Enclosure 1.00 0001> SEMB S-E-S 2.00 device ses0: SEMB SES Device ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <HGST HUS722T1TALA604 RAGNWA07> ACS-3 ATA SATA 3.x device ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors) ahci0@pci0:0:23:0: class=0x010601 card=0x088415d9 chip=0xa1028086 rev=0x31 hdr=0x00 vendor = 'Intel Corporation' device = 'Sunrise Point-H SATA controller [AHCI mode]' class = mass storage subclass = SATA We use zfs on all servers, some servers are raidz1, some raid-10, with same results We use to use smartd on all servers, I tried to disable smartd. Looks like no changes. We already upgraded zpools to new features, it require remove features before downgrade back to 11.1 -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"