Well, here's my previous summary off list to different solaris folk (regarding NFS serving via ZFS and iSCSI):
I want to use ZFS as a NAS with no bounds on the backing hardware (not restricted to one boxes capacity). Thus, there are two options: FC SAN or iSCSI. In my case, I have multi-building considerations and 10GB ethernet layer-2 interconnects that make iscsi ideal. Our standard users use NAS for collections of many small files to many large files (source code repositories, simulations, cad tools, VM images, rendering meta-forms, and final results). Ideally to allow for ongoing growth and drive replacement across multiple iscsi targets, RAIDZ was selected over static hardware raid solutions. This setup is very similar to a gfiler (iscsi based) or otherwise a standard NetApp Filer product, and it would have appeared that Sun is targeting this solution. I need this setup for both Tier1 primary NAS storage, as well as disk-to-disk Tier2 backup. In my extensive testing (not so much benchmarking, and definitely without the time/focus to learn dtrace and the like), we have found out that ZFS can be used for a tier2 system and not for tier1 due to pathologically poor performance via NFS against a ZFS filesystem based on RAIDZ over non-local storage. We have extremely poor but more acceptable performance using a non-RAIDZ configuration. Only in the case of expensive FC-SAN network implementation would it appear that ZFS is workable. If this is the only workable solution, then ZFS has lost its benefits over NetApp as we approach the same costs but do not have the same current maturity. Is it a lost cause? Honestly, I need to be convinced that this is workable, and so far optional solutions have been shot down. Evidence? The final synethetic test used was to generate a directory of 6250 random 8k files. On an NFS client (solaris, linux, or even loop-back on the server itself), run "cp -r SRCDIR DESTDIR" where DESTDIR is on the NFS server. Averages from memory: FS iSCSI backend Rate XFS 1.5TB single Lun ~1-1.1MB/sec ZFS 1.5TB single Lun ~250-400KB/sec ZFS 1.5TB RAIDZ (8 disks) ~25KB/sec In the case of mixed sized files with predominantly small files above and below 8K, I see the XFS solution jump to an average of 2.5-3MB/sec. The ZFS store over a single lun stay within 200-420KB/sec, and the RAIDZ range from 16-40KB/sec. Likely caching and some dynamic behaviours cause ZFS to get worse with mixed sizing, whereas XFS or such increases performance. Finally, by switching to SMB and not using NFS, I can maintain over 3MB/sec rates. Large files over NFS get more reasonable performance (14MB-28MB/sec) on any given ZFS backend, and I get 30+MB/sec locally with spikes close to 100MB/sec when writing locally. I only can maximize performance on my ZFS backend if I use a blocksize (tests using dd) of 256K or greater. 128K seems to provide less overall datarates, and I believe this is the default when I use cp, rsync, or other commands locally. In summary, I can make my ZFS-based initiator an NFS client or otherwise use rsyncd to ameliorate the pathological NFS server performance of the ZFS combination. I can then service files fine. This solution allows us to move forward as a Tier2 only solution. If _any_ thing can be done to address NFS and its interactions with ZFS, and bring it close to 1MB/sec performance (these are gig-e interconnects afterall, think about it) then it will only be 1/10th the performance of a NetApp in this worse case scenario and perform similar to the NetApp if not better in other cases. The NetApp can do around 10MB/sec in the senario I'm depicting. Currently, we have around 1/20th to 1/30th the performance level when not using RAIDZ, and 1/200th using RAIDZ. I just can't quite understand how we can go from a "cp -p TESTDIR DESTDIR" of 50MB of small files locally in an instant and the OS returning to the prompt. Zpool iostat showing the writes committed over the next 3-6 seconds, and this is OK for on-disk consistency. But then for some reason its required that the NFS client can't commit in a similar fashion, with Solaris saying "yes, we got it, here's confirmation.. next" just as it does locally. The data definitely gets there at the same speed as my tests with remote iscsi pools and as an NFS client shows. My naive sense is that this should be addressable at some level without inducing corruption. I have a feeling that its somehow being overly conservative in this stance. On 5/30/06, Robert Milkowski <rmilkow...@task.gda.pl> wrote: > Hello Joe, > > Wednesday, May 31, 2006, 12:44:22 AM, you wrote: > > JL> Well, I would caution at this point against the iscsi backend if you > JL> are planning on using NFS. We took a long winded conversation online > JL> and have yet to return to this list, but the gist of it is that the > JL> latency of iscsi along with the tendency for NFS to fsync 3 times per > JL> write causes performance to drop dramatically, and it gets much worse > JL> for a RAIDZ config. If you want to go this route, FC is a current > JL> suggested requirement. > > Can you provide more info on NFS+raidz? > > -- > Best regards, > Robert mailto:rmilkow...@task.gda.pl > http://milek.blogspot.com > > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss