Antonio S. Cofiño wrote:
> [...]
> The system is a supermicro motherboard X8DTH-6F in a 4U chassis
> (SC847E1-R1400LPB) and an external SAS2 JBOD (SC847E16-RJBOD1).
> It makes a system with a total of 4 backplanes (2x SAS + 2x SAS2)
> each of them connected to a 4 different HBA (2x LSI 3081E-R (1068
> chip) + 2x LSI SAS9200-8e (2008 chip)).
> This system is has a total of 81 disk (2x SAS (SEAGATE ST3146356SS)
> + 34 SATA3 (Hitachi HDS722020ALA330) + 45 SATA6 (Hitachi HDS723020BLA642))

> The issue arise when one of the disk starts to fail making long time
> accesses. After some time (minutes, but I'm not sure) all the disks,
> connected to the same HBA, start to report errors. This situation
> produce a general failure on the ZFS making the whole POOL unavailable.
> [...]


Have been there and gave up at the end[1]. Could reproduce (even though
it took a bit longer) under most Linux versions (incl. using latest LSI
drivers) and LSI 3081E-R HBA.

Is it just mpt causing the errors or also mpt_sas?

In a lab environment the LSI 9200 HBA behaved better - I/O only dropped
shortly and then continued on the other disks without generating errors.

Had a lengthy Oracle case on this, but all proposed "workarounds" did
not worked for me at all, which had been (some also from other forums)

- disabling NCQ
- allow-bus-device-reset=0; to /kernel/drv/sd.conf
- set zfs:zfs_vdev_max_pending=1
- set mpt:mpt_enable_msi=0
- keep usage below 90%
- no fmservices running and did temporarily did fmadm unload disk-transport
  or other disk access stuff (smartd?)
- tried changing retries-timeout via sd-conf for the disks without any
  success and ended it doing via mdb 

At the end I knew the bad sector of the "bad" disk and by simply dd
this sector once or twice to /dev/zero I could easily bring down the
system/pool without any load on the disk system.


General consensus from various people: don't use SATA drives on SAS back-
planes. Some SATA drives might work better, but there seems to be no
guarantee. And even for SAS-SAS, try to avoid SAS1 backplanes.

Markus



[1] Search for "What's wrong with LSI 3081 (1068) + expander + (bad) SATA
    disk?"
-- 
KPN International

Darmstädter Landstrasse 184    | 60598 Frankfurt  | Germany
[T] +49 (0)69 96874-298        | [F] -289         | [M] +49 (0)178 5352346
[E] <markus.we...@kpn.de>      | [W] www.kpn.de

KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V.

KPN Eurorings B.V.               | Niederlassung Frankfurt am Main
Amtsgericht Frankfurt HRB56874   | USt.IdNr. DE 225602449
Geschäftsführer  Jacobus Snijder & Louis Rustenhoven

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to