On 5/21/2012 9:04 PM, Matthew Gamble wrote: > We have a box with 3 SiI3124 SATA controllers and 9 CFI-B53PM 5 Port > Backplane port multipliers (the "backblaze storage pod"). Under intense IO > (ZFS rebuild, presently) the system will lock up all IO for 3-4 minutes and > the following entry appears in the dmesg: > > siisch11: Timeout on slot 30 > siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts > 80192000 serr 00000000 > siisch11: ... waiting for slots 25000000 > siisch11: Timeout on slot 26 > siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts > 80192000 serr 00000000 > siisch11: ... waiting for slots 21000000 > siisch11: Timeout on slot 29 > siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts > 80192000 serr 00000000 > siisch11: ... waiting for slots 01000000 > siisch11: Timeout on slot 24 > siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts > 80192000 serr 00000000 > > The errors are on different siisch devices so its not likely to be a SATA > cable issue unless multiple cables all went bad at the same time. On the > advice of some other posts to the mailing list I've already tried locking the > SATA rev to one with the following in /boot/loader.conf which didn't
If they are on different siisch devices then yes, it does not sound like a bad cable. However, I have had that issue with similar errors above that were fixed by using new cables. If you are using 9.0R, I would suggest upgrading to stable. There have been a few bug fixes / improvements to the drivers as well as various parts of the disk subsystem. I have RELENG8 right now and its quite stable for me on a 25TB system which is for the most part similar to 9.x # zpool status pool: zbackup1 state: ONLINE scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011 config: NAME STATE READ WRITE CKSUM zbackup1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada14 ONLINE 0 0 0 ada16 ONLINE 0 0 0 ada13 ONLINE 0 0 0 ada15 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 raidz1-2 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 raidz1-3 ONLINE 0 0 0 ada9 ONLINE 0 0 0 ada10 ONLINE 0 0 0 ada11 ONLINE 0 0 0 ada12 ONLINE 0 0 0 errors: No known data errors # zpool get all zbackup1 NAME PROPERTY VALUE SOURCE zbackup1 size 25.4T - zbackup1 capacity 68% - zbackup1 altroot - default zbackup1 health ONLINE - zbackup1 guid 917659042733882722 default zbackup1 version 28 default zbackup1 bootfs - default zbackup1 delegation on default zbackup1 autoreplace off default zbackup1 cachefile - default zbackup1 failmode wait default zbackup1 listsnapshots on local zbackup1 autoexpand off default zbackup1 dedupditto 0 default zbackup1 dedupratio 1.00x - zbackup1 free 7.95T - zbackup1 allocated 17.4T - zbackup1 readonly off - zbackup1 comment - default This is on an adonics adaptor. ---Mike > > hint.siisch.0.sata_rev=1 > hint.siisch.1.sata_rev=1 > hint.siisch.2.sata_rev=1 > hint.siisch.3.sata_rev=1 > hint.siisch.4.sata_rev=1 > hint.siisch.5.sata_rev=1 > hint.siisch.6.sata_rev=1 > hint.siisch.7.sata_rev=1 > hint.siisch.8.sata_rev=1 > hint.siisch.9.sata_rev=1 > hint.siisch.10.sata_rev=1 > hint.siisch.11.sata_rev=1 > > From time to time this is also causing one of the attached drives to go > offline: > > siisch0: siis_timeout is 00040000 ss 40000000 rs 40000000 es 00000000 sts > 801f2000 serr 00000000 > (ada0:siisch0:0:0:0): lost device > (ada0:siisch0:0:0:0): removing device entry > ada0 at siisch0 bus 0 scbus0 target 0 lun 0 > ada0: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device > ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C) > ada0: Previously was known as ad4 > siisch11: Timeout on slot 30 > > When the drive goes offline that causes the ZFS rebuild to restart, and so > it's never finishing the rebuild of the array. Does anyone have any insight > into what could be causing the timeouts and what we can do to resolve them? > Right now my priority is to get the system a bit more stable so the current > ZFS rebuild can complete – right now it's been doing the same rebuild for > just over 6 days and the timeouts and drive drop offs are causing it to > restart constantly. > > > > > > ________________________________ > > This electronic message contains information from Primus Telecommunications > Canada Inc. ("PRIMUS") , which may be legally privileged and confidential. > The information is intended to be for the use of the individual(s) or entity > named above. If you are not the intended recipient, be aware that any > disclosure, copying, distribution or use of the contents of this information > is prohibited. If you have received this electronic message in error, please > notify us by telephone or e-mail (to the number or address above) > immediately. Any views, opinions or advice expressed in this electronic > message are not necessarily the views, opinions or advice of PRIMUS. It is > the responsibility of the recipient to ensure that any attachments are virus > free and PRIMUS bears no responsibility for any loss or damage arising in any > way from the use thereof.The term "PRIMUS" includes its affiliates. > > ________________________________ > Pour la version en français de ce message, veuillez voir > http://www.primustel.ca/fr/legal/cs.htm > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"