On Thu, Jul 31, 2008 at 16:25, Ross <[EMAIL PROTECTED]> wrote:
> The problems with zpool status hanging concern me, knowing that I can't hot 
> plug drives is an issue, and the long resilver times bug is also a potential 
> problem.  I suspect I can work around the hot plug drive bug with a big 
> warning label on the server, but knowing the pool can hang so easily makes me 
> worry about how well ZFS will handle other faults.
Other hardware-failure type things can cause what appear to be big
problems, too.  We have a scsi->sata enclosure here with some embedded
firmware, and it's connected to a scsi controller on an x4150.  I
swapped some disks in the enclosure and updated the controller
configuration, then rebooted the controller... and the host box died,
because ZFS decided that too many disks were unavailable to continue,
so it panicked the box.  At first I thought this behavior was
terrible---my server is down!---but on some reflection, it makes
sense: It's better to quit before anything else on the filesystem is
corrupted rather than write garbage across a whole pool because of
controller failure or something to that effect.

In any case, I thought you'd be interested in this property of zpools.
 It's not likely to happen in general (especially with DAS and a dumb
controller, like you have), and it's better than the alternative of
potentially scribbling on a pool, but other services running on the
same box could suffer if you were incautious.

> On my drive home tonight I was wondering whether I'm going to have to swallow 
> my pride and order a hardware raid controller for this server, letting that 
> deal with the drive issues, and just using ZFS as a very basic filesystem.
Letting ZFS handle one layer of redundancy is always recommended, if
you're going to use it at all.  Otherwise it can get into a situation
where it finds checksum errors and can't do anything about them.

> The question is whether I can make a server I can be confident in.  I'm now 
> planning a very basic OpenSolaris server just using ZFS as a NFS server, is 
> there anybody out there who can re-assure me that such a server can work well 
> and handle real life drive failures?
We haven't had any "real life" drive failures at work, but at home I
took some old flaky IDE drives and put them in a pentium 3 box running
Nevada.  Several of them were known to cause errors under Linux, so I
mirrored them in approximately-the-same-size pairs and set up weekly
scrubs.  Two drives out of six failed entirely, and were nicely
retired, before I gave up on the idea and bought new disks.  I didn't
lose any data with this scheme, and ZFS told me every once in a while
that it had recovered from a checksum error.  Good drives are always
recommended, of course, but I saw nothing but good behavior with old
broken hardware while I was using it.

Finally, at work we're switching everything over to ZFS because it's
so convenient... but we keep tape backups nonetheless.  I strongly
recommend having up-to-date backups in any situation, but even more so
with ZFS.  It's been very reliable for me personally and at work, but
I've seen horror stories of corrupt pools from which all data is lost.
 I'd rather be sitting around the campfire quaking in my boots at
story time than have a flashlight pointed at my face doing the
telling, if you catch my drift.

Will
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to