erik.ableson wrote:
OK - I'm at my wit's end here as I've looked everywhere to find some means of tuning NFS performance with ESX into returning something acceptable using osol 2008.11. I've eliminated everything but the NFS portion of the equation and am looking for some pointers in the right direction.

Any time you have NFS, ZFS as the backing store, JBOD, and a performance
concern you need to look at the sync activity on the server. This will often be visible as ZIL activity, which you can see clearly with zilstat. http://www.richardelling.com/Home/scripts-and-programs-1/zilstat

The cure is not to disable the ZIL or break NFS.  The cure is lower latency
I/O for the ZIL.
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_NFS_Server_Performance
-- richard

Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a zpool of 7 mirror vdevs. ESX 3.5 and 4.0. Pretty much a vanilla install across the board, no additional software other than the Adaptec StorMan to manage the disks.

local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the Service Console, transfer of a 8Gb file via the datastore browser)

I just found the tool latencytop which points the finger at the ZIL (tip of the hat to Lejun Zhu). Ref: <http://www.infrageeks.com/zfs/nfsd.png> & <http://www.infrageeks.com/zfs/fsflush.png>. Log file: <http://www.infrageeks.com/zfs/latencytop.log>

Now I can understand that there is a performance hit associated with this feature of ZFS for ensuring data integrity, but this drastic a difference makes no sense whatsoever. The pool is capable of handling natively (at worst) 120*7 IOPS and I'm not even seeing enough to saturate a USB thumb drive. This still doesn't answer why the read performance is so bad either. According to latencytop, the culprit would be genunix`cv_timedwait_sig rpcmod`svc

From my searching it appears that there's no async setting for the osol nfsd, and ESX does not offer any mount controls to force an async connection. Other than putting in an SSD as a ZIL (which still strikes me as overkill for basic NFS services) I'm looking for any information that can bring me up to at least reasonable throughput.

Would a dedicated 15K SAS drive help the situation by moving the ZIL traffic off to a dedicated device? Significantly? This is the sort of thing that I don't want to do without some reasonable assurance that it will help since you can't remove a ZIL device from a pool at the moment.

Hints and tips appreciated,

Erik
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to