I don't think there was any intention on Sun's part to ignore the 
problem...obviously their target market wants a performance-oriented box and 
the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY channels 
= 1 channel per drive = no contention for channels. The x4540 is a monster and 
performs like a dream with snv_118 (we have a few ourselves).
My issue is that implementing an archival-type solution demands a dense, simple 
storage platform that performs at a reasonable level, nothing more. Our design 
has the same controller chip (8 SAS PHY channels) driving 46 disks, so there is 
bound to be contention there especially in high-load situations. I just need it 
to work and handle load gracefully, not timeout and cause disk "failures"; at 
this point I can't even scrub the zpools to verify the data we have on there is 
valid. From a hardware perspective, the 3801E card is spec'ed to handle our 
architecture; the OS just seems to fall over somewhere though and not be able 
to throttle itself in certain intensive IO situations.

That said, I don't know whether to point the finger at LSI's firmware or 
mpt-driver/ZFS. Sun obviously has a good relationship with LSI as their 1068E 
is the recommended SAS controller chip and is used in their own products. At 
least we've got a bug filed now, and we can hopefully follow this through to 
find out where the system breaks down.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to