This may have been mentioned elsewhere and, if so, I apologize for
repeating.
Is it possible your difficulty here is with the Marvell driver and not,
strictly speaking, ZFS? The Solaris Marvell driver has had many, MANY
bug fixes and continues to this day to be supported by IDR patches and
other quick-fix work-arounds. It is the source of many problems.
Graned, ZFS handles these poorly at times (it got a lot better with ZFS
v10) but it is difficult to expect the file system to deal well with
underlying instability in the hardware driver I think.
I'd be interested to hear if your experiences are the same using the LSI
controllers which have a much better driver in Solaris.
Ross wrote:
Supermicro AOC-SAT2-MV8, based on the Marvell chipset. I figured it was the
best available at the time since it's using the same chipset as the x4500
Thumper servers.
Our next machine will be using LSI controllers, but I'm still not entirely
happy with the way ZFS handles timeout type errors. It seems that it handles
drive reported read or write errors fine, and also handles checksum errors, but
it's completely missed drive timeout errors as used by hardware raid
controllers.
Personally, I feel that when a pool usually responds to requests in the order
of milliseconds, a timeout of even a tenth of a second is too long. Several
minutes before a pool responds is just a joke.
I'm still a big fan of ZFS, and modern hardware may have better error handling,
but I can't help but feel this is a little short sighted.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss