On 5/21/2012 9:04 PM, Matthew Gamble wrote:
> We have a box with 3 SiI3124 SATA controllers and 9 CFI-B53PM 5 Port 
> Backplane port multipliers (the "backblaze storage pod").  Under intense IO 
> (ZFS rebuild, presently) the system will lock up all IO for 3-4 minutes and 
> the following entry appears in the dmesg:
> 
> siisch11: Timeout on slot 30
> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 
> 80192000 serr 00000000
> siisch11:  ... waiting for slots 25000000
> siisch11: Timeout on slot 26
> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 
> 80192000 serr 00000000
> siisch11:  ... waiting for slots 21000000
> siisch11: Timeout on slot 29
> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 
> 80192000 serr 00000000
> siisch11:  ... waiting for slots 01000000
> siisch11: Timeout on slot 24
> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 
> 80192000 serr 00000000
> 
> The errors are on different siisch devices so its not likely to be a SATA 
> cable issue unless multiple cables all went bad at the same time.  On the 
> advice of some other posts to the mailing list I've already tried locking the 
> SATA rev to one with the following in /boot/loader.conf which didn't
If they are on different siisch devices then yes, it does not sound like
a bad cable. However, I have had that issue with similar errors above
that were fixed by using new cables.  If you are using 9.0R, I would
suggest upgrading to stable. There have been a few bug fixes /
improvements to the drivers as well as various parts of the disk
subsystem. I have RELENG8 right now and its quite stable for me on a
25TB system which is for the most part similar to 9.x

# zpool status
  pool: zbackup1
 state: ONLINE
  scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011
config:

        NAME        STATE     READ WRITE CKSUM
        zbackup1    ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada14   ONLINE       0     0     0
            ada16   ONLINE       0     0     0
            ada13   ONLINE       0     0     0
            ada15   ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
          raidz1-2  ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
          raidz1-3  ONLINE       0     0     0
            ada9    ONLINE       0     0     0
            ada10   ONLINE       0     0     0
            ada11   ONLINE       0     0     0
            ada12   ONLINE       0     0     0

errors: No known data errors
# zpool get all zbackup1
NAME      PROPERTY       VALUE       SOURCE
zbackup1  size           25.4T       -
zbackup1  capacity       68%         -
zbackup1  altroot        -           default
zbackup1  health         ONLINE      -
zbackup1  guid           917659042733882722  default
zbackup1  version        28          default
zbackup1  bootfs         -           default
zbackup1  delegation     on          default
zbackup1  autoreplace    off         default
zbackup1  cachefile      -           default
zbackup1  failmode       wait        default
zbackup1  listsnapshots  on          local
zbackup1  autoexpand     off         default
zbackup1  dedupditto     0           default
zbackup1  dedupratio     1.00x       -
zbackup1  free           7.95T       -
zbackup1  allocated      17.4T       -
zbackup1  readonly       off         -
zbackup1  comment        -           default

This is on an adonics adaptor.

        ---Mike
> 
> hint.siisch.0.sata_rev=1
> hint.siisch.1.sata_rev=1
> hint.siisch.2.sata_rev=1
> hint.siisch.3.sata_rev=1
> hint.siisch.4.sata_rev=1
> hint.siisch.5.sata_rev=1
> hint.siisch.6.sata_rev=1
> hint.siisch.7.sata_rev=1
> hint.siisch.8.sata_rev=1
> hint.siisch.9.sata_rev=1
> hint.siisch.10.sata_rev=1
> hint.siisch.11.sata_rev=1
> 
> From time to time this is also causing one of the attached drives to go 
> offline:
> 
> siisch0: siis_timeout is 00040000 ss 40000000 rs 40000000 es 00000000 sts 
> 801f2000 serr 00000000
> (ada0:siisch0:0:0:0): lost device
> (ada0:siisch0:0:0:0): removing device entry
> ada0 at siisch0 bus 0 scbus0 target 0 lun 0
> ada0: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
> ada0: Previously was known as ad4
> siisch11: Timeout on slot 30
> 
> When the drive goes offline that causes the ZFS rebuild to restart, and so 
> it's never finishing the rebuild of the array.  Does anyone have any insight 
> into what could be causing the timeouts and what we can do to resolve them?  
> Right now my priority is to get the system a bit more stable so the current 
> ZFS rebuild can complete – right now it's been doing the same rebuild for 
> just over 6 days and the timeouts and drive drop offs are causing it to 
> restart constantly.
> 
> 
> 
> 
> 
> ________________________________
> 
>  This electronic message contains information from Primus Telecommunications 
> Canada Inc. ("PRIMUS") , which may be legally privileged and confidential. 
> The information is intended to be for the use of the individual(s) or entity 
> named above. If you are not the intended recipient, be aware that any 
> disclosure, copying, distribution or use of the contents of this information 
> is prohibited. If you have received this electronic message in error, please 
> notify us by telephone or e-mail (to the number or address above) 
> immediately. Any views, opinions or advice expressed in this electronic 
> message are not necessarily the views, opinions or advice of PRIMUS. It is 
> the responsibility of the recipient to ensure that any attachments are virus 
> free and PRIMUS bears no responsibility for any loss or damage arising in any 
> way from the use thereof.The term "PRIMUS" includes its affiliates.
> 
> ________________________________
>  Pour la version en français de ce message, veuillez voir
> http://www.primustel.ca/fr/legal/cs.htm
> 
> 
> 
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to