Re: [zfs-discuss] cluster features

Joe Little Tue, 10 May 2011 10:27:05 -0700

Well, here's my previous summary off list to different solaris folk
(regarding NFS serving via ZFS and iSCSI):

I want to use ZFS as a NAS with no bounds on the backing hardware (not
restricted to one boxes capacity). Thus, there are two options: FC SAN
or iSCSI. In my case, I have multi-building considerations and 10GB
ethernet layer-2 interconnects that make iscsi ideal. Our standard
users use NAS for collections of many small files to many large files
(source code repositories, simulations, cad tools, VM images,
rendering meta-forms, and final results). Ideally to allow for ongoing
growth and drive replacement across multiple iscsi targets, RAIDZ was
selected over static hardware raid solutions. This setup is very
similar to a gfiler (iscsi based) or otherwise a standard NetApp Filer
product, and it would have appeared that Sun is targeting this
solution. I need this setup for both Tier1 primary NAS storage, as
well as disk-to-disk Tier2 backup.

In my extensive testing (not so much benchmarking, and definitely
without the time/focus to learn dtrace and the like), we have found
out that ZFS can be used for a tier2 system and not for tier1 due to
pathologically poor performance via NFS against a ZFS filesystem based
on RAIDZ over non-local storage. We have extremely poor but more
acceptable performance using a non-RAIDZ configuration. Only in the
case of expensive  FC-SAN network implementation would it appear that
ZFS is workable. If this is the only workable solution, then ZFS has
lost its benefits over NetApp as we approach the same costs but do not
have the same current maturity. Is it a lost cause? Honestly, I need
to be convinced that this is workable, and so far optional solutions
have been shot down.

Evidence? The final synethetic test used was to generate a directory
of 6250 random 8k files. On an NFS client (solaris, linux, or even
loop-back on the server itself), run "cp -r SRCDIR DESTDIR" where
DESTDIR is on the NFS server. Averages from memory:

FS                iSCSI backend            Rate
XFS              1.5TB single Lun         ~1-1.1MB/sec
ZFS              1.5TB single Lun         ~250-400KB/sec
ZFS              1.5TB RAIDZ (8 disks) ~25KB/sec

In the case of mixed sized files with predominantly small files above
and below 8K, I see the XFS solution jump to an average of
2.5-3MB/sec. The ZFS store over a single lun stay within
200-420KB/sec, and the RAIDZ range from 16-40KB/sec.

Likely caching and some dynamic behaviours cause ZFS to get worse with
mixed sizing, whereas XFS or such increases performance. Finally, by
switching to SMB and not using NFS, I can maintain over 3MB/sec rates.

Large files over NFS get more reasonable performance (14MB-28MB/sec)
on any given ZFS backend, and I get 30+MB/sec locally with spikes
close to 100MB/sec when writing locally. I only can maximize
performance on my ZFS backend if I use a blocksize (tests using dd) of
256K or greater. 128K seems to provide less overall datarates, and I
believe this is the default when I use cp, rsync, or other commands
locally.

In summary, I can make my ZFS-based initiator an NFS client or
otherwise use rsyncd to ameliorate the pathological NFS server
performance of the ZFS combination. I can then service files fine.
This solution allows us to move forward as a Tier2 only solution. If
_any_ thing can be done to address NFS and its interactions with ZFS,
and bring it close to 1MB/sec performance (these are gig-e
interconnects afterall, think about it) then it will only be 1/10th
the performance of a NetApp in this worse case scenario and perform
similar to the NetApp if not better in other cases. The NetApp can do
around 10MB/sec in the senario I'm depicting. Currently, we have
around 1/20th to 1/30th the performance level when not using RAIDZ,
and 1/200th using RAIDZ.

I just can't quite understand how we can go from a "cp -p TESTDIR
DESTDIR" of 50MB of small files locally in an instant and the OS
returning to the prompt. Zpool iostat showing the writes committed
over the next 3-6 seconds, and this is OK for on-disk consistency. But
then for some reason its required that the NFS client can't commit in
a similar fashion, with Solaris saying "yes, we got it, here's
confirmation.. next" just as it does locally. The data definitely gets
there at the same speed as my tests with remote iscsi pools and as an
NFS client shows. My naive sense is that this should be addressable at
some level without inducing corruption. I have a feeling that its
somehow being overly conservative in this stance.

On 5/30/06, Robert Milkowski <rmilkow...@task.gda.pl> wrote:
> Hello Joe,
>
> Wednesday, May 31, 2006, 12:44:22 AM, you wrote:
>
> JL> Well, I would caution at this point against the iscsi backend if you
> JL> are planning on using NFS. We took a long winded conversation online
> JL> and have yet to return to this list, but the gist of it is that the
> JL> latency of iscsi along with the tendency for NFS to fsync 3 times per
> JL> write causes performance to drop dramatically, and it gets much worse
> JL> for a RAIDZ config. If you want to go this route, FC is a current
> JL> suggested requirement.
>
> Can you provide more info on NFS+raidz?
>
> --
> Best regards,
>  Robert                            mailto:rmilkow...@task.gda.pl
>                                        http://milek.blogspot.com
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] cluster features

Reply via email to