This may have been mentioned elsewhere and, if so, I apologize for repeating. Is it possible your difficulty here is with the Marvell driver and not, strictly speaking, ZFS? The Solaris Marvell driver has had many, MANY bug fixes and continues to this day to be supported by IDR patches and other quick-fix work-arounds. It is the source of many problems. Graned, ZFS handles these poorly at times (it got a lot better with ZFS v10) but it is difficult to expect the file system to deal well with underlying instability in the hardware driver I think.

I'd be interested to hear if your experiences are the same using the LSI controllers which have a much better driver in Solaris.

Ross wrote:
Supermicro AOC-SAT2-MV8, based on the Marvell chipset.  I figured it was the 
best available at the time since it's using the same chipset as the x4500 
Thumper servers.

Our next machine will be using LSI controllers, but I'm still not entirely 
happy with the way ZFS handles timeout type errors.  It seems that it handles 
drive reported read or write errors fine, and also handles checksum errors, but 
it's completely missed drive timeout errors as used by hardware raid 
controllers.

Personally, I feel that when a pool usually responds to requests in the order 
of milliseconds, a timeout of even a tenth of a second is too long.  Several 
minutes before a pool responds is just a joke.

I'm still a big fan of ZFS, and modern hardware may have better error handling, 
but I can't help but feel this is a little short sighted.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to