Re: [zfs-discuss] I/O freeze after a disk failure

Ralf Ramge Wed, 12 Sep 2007 01:03:30 -0700

Gino wrote:
[...]

> Just a few examples:
> -We lost several zpool with S10U3 because of "spacemap" bug,
> and -nothing- was recoverable.  No fsck here :(
>
>   
Yes, I criticized the lack of zpool recovery mechanisms, too, during my 
AVS testing.  But I don't have the know-how to judge if it has technical 
reasons.


> -We had tons of kernel panics because of ZFS.
> Here a "reboot" must be planned with a couple of weeks in advance
> and done only at saturday night ..
>   
Well, I'm sorry, but if your datacenter runs into problems when a single 
server isn't available, you probably have much worse problems. ZFS is a 
file system. It's not a substitute for hardware trouble or a misplanned 
infrastructure. What would you do if you had the fsck you mentioned 
earlier? Or with another file system like UFS, ext3, whatever? Boot a 
system into single user mode and fsck several terabytes, after planning 
it a couple of weeks in advance?

> -Our 9900V and HP EVAs works really BAD with ZFS because of large cache.
> (echo zfs_nocacheflush/W 1 | mdb -kw) did not solve the problem. Only helped 
> a bit.
>
>   
Use JBODs. Or tell the cache controllers to ignore the flushing 
requests. Should be possible, even the $10k low-cost StorageTek arrays 
support this.

> -ZFS performs badly with a lot of small files.
> (about 20 times slower that UFS with our millions file rsync procedures)
>
>   
I have large Sybase database servers and file servers with billions of 
inodes running using ZFSv3. They are attached to X4600 boxes running 
Solaris 10 U3, 2x 4 GBit/s dual FibreChannel, using dumb and cheap 
Infortrend FC JBODs (2 GBit/s) as storage shelves. During all my 
benchmarks (both on the command line and within applications) show that 
the FibreChannel is the bottleneck, even with random read. ZFS doesn't 
do this out of the box, but a bit of tuning helped a lot.

> -ZFS+FC JBOD:  failed hard disk need a reboot :(((((((((
> (frankly unbelievable in 2007!)
>   
No. Read the thread carefully. It was mentioned that you don't have to 
reboot the server, all you need to do is pull the hard disk. Shouldn't 
be a problem, except if you don't want to replace the faulty one anyway. 
No other manual operations will be necessary, except for the final "zfs 
replace". You could also try cfgadm to get rid of ZFS pool problems, 
perhaps it works - I'm not sure about this, because I had the idea 
*after* I solved that problem, but I'll give it a try someday.

> Anyway we happily use ZFS on our new backup systems (snapshotting with ZFS is 
> amazing), but to tell you the true we are keeping 2 large zpool in sync on 
> each system because we fear an other zpool corruption.
>
>   
May I ask how you accomplish that?

And why are you doing this? You should replicate your zpool to another 
host, instead of mirroring locally. Where's your redundancy in that?

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963 
[EMAIL PROTECTED] - http://web.de/

1&1 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, 
Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss 
Aufsichtsratsvorsitzender: Michael Scheeren

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O freeze after a disk failure

Reply via email to