Re: [zfs-discuss] Bandwidth disparity between NFS and ZFS

Roch Tue, 27 Jun 2006 02:03:03 -0700

Chris Csanady writes:
 > On 6/26/06, Neil Perrin <[EMAIL PROTECTED]> wrote:
 > >
 > >
 > > Robert Milkowski wrote On 06/25/06 04:12,:
 > > > Hello Neil,
 > > >
 > > > Saturday, June 24, 2006, 3:46:34 PM, you wrote:
 > > >
 > > > NP> Chris,
 > > >
 > > > NP> The data will be written twice on ZFS using NFS. This is because NFS
 > > > NP> on closing the file internally uses fsync to cause the writes to be
 > > > NP> committed. This causes the ZIL to immediately write the data to the 
 > > > intent log.
 > > > NP> Later the data is also written committed as part of the pools 
 > > > transaction group
 > > > NP> commit, at which point the intent block blocks are freed.
 > > >
 > > > NP> It does seem inefficient to doubly write the data. In fact for blocks
 > > > NP> larger than zfs_immediate_write_sz (was 64K but now 32K after 
 > > > 6440499 fixed)
 > > > NP> we write the data block and also an intent log record with the block 
 > > > pointer.
 > > > NP> During txg commit we link this block into the pool tree. By 
 > > > experimentation
 > > > NP> we found 32K to be the (current) cutoff point. As the nfsd at most 
 > > > write 32K
 > > > NP> they do not benefit from this.
 > > >
 > > > Is 32KB easily tuned (mdb?)?
 > >
 > > I'm not sure. NFS folk?
 > 
 > I think he is referring to the zfs_immediate_write_sz variable, but
 > NFS will support
 > larger block sizes as well.  Unfortunately, since the maximum IP
 > datagram size is
 > 64k, after headers are taken into account, the largest useful value is
 > 60k.  If this is
 > to be laid out as an indirect write, will it be written as
 > 32k+16k+8k+4k blocks?  If so,
 > this seems like it would be quite inefficient for RAID-Z, and writes
 > would best be
 > left at 32k.
 > 
 > Chris
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



I think the 64K issue refers to UDP. That limits the max
block size the NFS may use. But with TCP mounts, NFS is not
bounded by this. It should be possible to adjust the nfs
blocksize up.

For this I think you need to adjust nfs4_bsize on client :

        echo "nfs4_bsize/W131072" | mdb -kw

And it could also help to tune up the transfer size

        echo "nfs4_max_transfer_size/W131072" | mdb -kw

I also wonder if general purpose NFS exports should not have 
their recordsize set to 32K in order to match the default
NFS bsize. But I have not really looked at this perf yet.

-r

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Bandwidth disparity between NFS and ZFS

Reply via email to