Re: [zfs-discuss] SNV_125 MPT warning in logfile

Richard Elling Fri, 23 Oct 2009 17:19:48 -0700

On Oct 23, 2009, at 4:46 PM, Tim Cook wrote:

On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal <ach...@pnimedia.com>wrote:I don't think there was any intention on Sun's part to ignore theproblem...obviously their target market wants a performance-orientedbox and the x4540 delivers that. Each 1068E controller chip supports8 SAS PHY channels = 1 channel per drive = no contention forchannels. The x4540 is a monster and performs like a dream withsnv_118 (we have a few ourselves).
My issue is that implementing an archival-type solution demands adense, simple storage platform that performs at a reasonable level,nothing more. Our design has the same controller chip (8 SAS PHYchannels) driving 46 disks, so there is bound to be contention thereespecially in high-load situations. I just need it to work andhandle load gracefully, not timeout and cause disk "failures"; atthis point I can't even scrub the zpools to verify the data we haveon there is valid. From a hardware perspective, the 3801E card isspec'ed to handle our architecture; the OS just seems to fall oversomewhere though and not be able to throttle itself in certainintensive IO situations.
That said, I don't know whether to point the finger at LSI'sfirmware or mpt-driver/ZFS. Sun obviously has a good relationshipwith LSI as their 1068E is the recommended SAS controller chip andis used in their own products. At least we've got a bug filed now,and we can hopefully follow this through to find out where thesystem breaks down.
Have you checked in with LSI to verify the IOPS ability of thechip? Just because it supports having 46 drives attached to oneASIC doesn't mean it can actually service all 46 at once. You'retalking (VERY conservatively) 2800 IOPS.


Tim has a valid point. By default, ZFS will queue 35 commands per disk.

For 46 disks that is 1,610 concurrent I/Os. Historically, it hasproven to be

relatively easy to crater performance or cause problems with very, very,

very expensive arrays that are easily overrun by Solaris. As a result,it isnot uncommon to see references to setting throttles, especially inolder docs.


Fortunately, this is  simple to test by reducing the number of I/Os ZFS
will queue.  See the Evil Tuning Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

The mpt source is not open, so the mpt driver's reaction to 1,610concurrentI/Os can only be guessed from afar -- public LSI docs mention a numberof 511concurrent I/Os for SAS1068, but it is not clear to me that is anexplicit limit. If

you have success with zfs_vdev_max_pending set to 10, then the mystery
might be solved. Use iostat to observe the wait and actv columns, which
show the number of transactions in the queues.  JCMP?

NB sometimes a driver will have the limit be configurable. Forexample, to gethigh performance out of a high-end array attached to a qlc card, I'vesetthe execution-throttle in /kernel/drv/qlc.conf to be more than twoorders ofmagnitude greater than its default of 32. /kernel/drv/mpt*.conf doesnot seem

to have a similar throttle.
 -- richard

Even ignoring that, I know for a fact that the chip can't handle rawthroughput numbers on 46 disks unless you've got some very severeraid overhead. That chip is good for roughly 2GB/sec eachdirection. 46 7200RPM drives can fairly easily push 4x that amountin streaming IO loads.
Long story short, it appears you've got a 5lbs bag a 50lbs load...

--Tim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SNV_125 MPT warning in logfile

Reply via email to