On Aug 1, 2006, at 03:43, [EMAIL PROTECTED] wrote:
So what does this exercise leave me thinking? Is Linux 2.4.x really
screwed up in NFS-land? This Solaris NFS replaces a Linux-based NFS
server that the clients (linux and IRIX) liked just fine.
Yes; the Linux NFS server and client work together just fine but
generally
only because the Linux NFS server replies that writes are done before
they are committed to disk (async operation).
The Linux NFS client is not optimized for server which do not do this
and it appears to write little before waiting for the commit replies.
Well .. linux clients with linux servers tend to be slightly better
behaved since
the server essentially fudges on the commit and the async cluster
count is
generally higher (it won't switch on every operation like Solaris
will by
default)
Additionally there's a VM issue in the page-writeback code that seems to
affect write performance and RPC socket performance when there's a high
dirty page count. Essentially as pages are flushed there's a higher
number
of NFS commit operations which will tend to slow down the Solaris NFS
server (and probably the txgs or zil as well with the increase in
synchronous
behaviour.) On the linux 2.6 VM - the number of commits has been
seen to
rise dramatically when the dirty page count is between 40-90% of the
overall
system memory .. by tuning the dirtypage_ratio back down to 10% there's
typically less time spent in page-writeback and the overall async
throughput
should rise .. this wasn't really addressed until 2.6.15 or 2.6.16 so
you might
also get better results on a later kernel. Watching performance
between a
linux client and a linux server - the linux server seems to buffer
the NFS commit
operations .. of course the clients will also buffer as much as they
can - so you
can end up with some unbelievable performance numbers both on the
filesystem layers (before you do a sync) and on the NFS client layers
as well
(until you unmount/remount.)
Overall, I find that the Linux VM suffers from many of the same sorts
of large
memory performance problems that Solaris used to face before priority
paging
in 2.6 and subsequent page coloring schemes. Based on my
unscientific mac
powerbook performance observations - i suspect that there could be
similar
issues with various iterations of the BSD or Darwin kernels - but I
haven't taken
the initiative to really study any of this.
So to wrap up:
When doing linux client / solaris server NFS .. I'll typically tune
the client for
32KB async tcp transfers (you have to dig into the kernel source to
increase this
and it's not really worth it) tune the VM to reduce time spent in the
kludgy
page-writeback (typically a sysctl setting for the dirty page ratio
or some such),
and then increase the nfs:nfs3_async_clusters and
nfs:nfs4_async_clusters to
something higher than 1 .. say 32 x 32KB transfers to get you to
1MB .. you can
also increase the numbers of threads and the read ahead on the server
to eek
out some more performance
I'd also look at tuning the volblocksize and recordsize as well as
the stripe width
on your array to 32K or reasonable multiples .. but I'm not sure how
much of the
issue is in misaligned I/O blocksizes between the various elements vs
mandatory
pauses or improper behaviour incurred from miscommunication ..
---
.je
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss