Chris Csanady writes: > On 6/26/06, Neil Perrin <[EMAIL PROTECTED]> wrote: > > > > > > Robert Milkowski wrote On 06/25/06 04:12,: > > > Hello Neil, > > > > > > Saturday, June 24, 2006, 3:46:34 PM, you wrote: > > > > > > NP> Chris, > > > > > > NP> The data will be written twice on ZFS using NFS. This is because NFS > > > NP> on closing the file internally uses fsync to cause the writes to be > > > NP> committed. This causes the ZIL to immediately write the data to the > > > intent log. > > > NP> Later the data is also written committed as part of the pools > > > transaction group > > > NP> commit, at which point the intent block blocks are freed. > > > > > > NP> It does seem inefficient to doubly write the data. In fact for blocks > > > NP> larger than zfs_immediate_write_sz (was 64K but now 32K after > > > 6440499 fixed) > > > NP> we write the data block and also an intent log record with the block > > > pointer. > > > NP> During txg commit we link this block into the pool tree. By > > > experimentation > > > NP> we found 32K to be the (current) cutoff point. As the nfsd at most > > > write 32K > > > NP> they do not benefit from this. > > > > > > Is 32KB easily tuned (mdb?)? > > > > I'm not sure. NFS folk? > > I think he is referring to the zfs_immediate_write_sz variable, but > NFS will support > larger block sizes as well. Unfortunately, since the maximum IP > datagram size is > 64k, after headers are taken into account, the largest useful value is > 60k. If this is > to be laid out as an indirect write, will it be written as > 32k+16k+8k+4k blocks? If so, > this seems like it would be quite inefficient for RAID-Z, and writes > would best be > left at 32k. > > Chris > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I think the 64K issue refers to UDP. That limits the max block size the NFS may use. But with TCP mounts, NFS is not bounded by this. It should be possible to adjust the nfs blocksize up. For this I think you need to adjust nfs4_bsize on client : echo "nfs4_bsize/W131072" | mdb -kw And it could also help to tune up the transfer size echo "nfs4_max_transfer_size/W131072" | mdb -kw I also wonder if general purpose NFS exports should not have their recordsize set to 32K in order to match the default NFS bsize. But I have not really looked at this perf yet. -r _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss