On Aug 1, 2006, at 03:43, [EMAIL PROTECTED] wrote:


So what does this exercise leave me thinking? Is Linux 2.4.x really
screwed up in NFS-land? This Solaris NFS replaces a Linux-based NFS
server that the clients (linux and IRIX) liked just fine.


Yes; the Linux NFS server and client work together just fine but generally
only because the Linux NFS server replies that writes are done before
they are committed to disk (async operation).

The Linux NFS client is not optimized for server which do not do this
and it appears to write little before waiting for the commit replies.

Well .. linux clients with linux servers tend to be slightly better behaved since the server essentially fudges on the commit and the async cluster count is generally higher (it won't switch on every operation like Solaris will by
default)

Additionally there's a VM issue in the page-writeback code that seems to
affect write performance and RPC socket performance when there's a high
dirty page count. Essentially as pages are flushed there's a higher number
of NFS commit operations which will tend to slow down the Solaris NFS
server (and probably the txgs or zil as well with the increase in synchronous behaviour.) On the linux 2.6 VM - the number of commits has been seen to rise dramatically when the dirty page count is between 40-90% of the overall
system memory .. by tuning the dirtypage_ratio back down to 10% there's
typically less time spent in page-writeback and the overall async throughput should rise .. this wasn't really addressed until 2.6.15 or 2.6.16 so you might also get better results on a later kernel. Watching performance between a linux client and a linux server - the linux server seems to buffer the NFS commit operations .. of course the clients will also buffer as much as they can - so you
can end up with some unbelievable performance numbers both on the
filesystem layers (before you do a sync) and on the NFS client layers as well
(until you unmount/remount.)


Overall, I find that the Linux VM suffers from many of the same sorts of large memory performance problems that Solaris used to face before priority paging in 2.6 and subsequent page coloring schemes. Based on my unscientific mac powerbook performance observations - i suspect that there could be similar issues with various iterations of the BSD or Darwin kernels - but I haven't taken
the initiative to really study any of this.

So to wrap up:

When doing linux client / solaris server NFS .. I'll typically tune the client for 32KB async tcp transfers (you have to dig into the kernel source to increase this and it's not really worth it) tune the VM to reduce time spent in the kludgy page-writeback (typically a sysctl setting for the dirty page ratio or some such), and then increase the nfs:nfs3_async_clusters and nfs:nfs4_async_clusters to something higher than 1 .. say 32 x 32KB transfers to get you to 1MB .. you can also increase the numbers of threads and the read ahead on the server to eek
out some more performance

I'd also look at tuning the volblocksize and recordsize as well as the stripe width on your array to 32K or reasonable multiples .. but I'm not sure how much of the issue is in misaligned I/O blocksizes between the various elements vs mandatory
pauses or improper behaviour incurred from miscommunication ..

---
.je
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to