* Andrew Gallatin <[EMAIL PROTECTED]> [000404 14:03] wrote:
>
> Currently FreeBSD issues a very large number of NFSv3 commit rpcs when
> writing a sequential file. They average out to about one every 64k or
> so. Solaris, on the other hand, issues only a handful.
>
> At least when running against a Solaris NFS server, these
> frequent commits really kill our write bandwidth.
>
> The commits are initiated out of the bufdaemon:
>
> nfs_commit(e06866c0,360000,0,10000,c8aa5e00) at nfs_commit+0x52a
> nfs_doio(d3088158,c8aa5e00,0,d3088158,40084040) at nfs_doio+0x371
> nfs_strategy(ddef1ec0) at nfs_strategy+0x68
> nfs_writebp(d3088158,1,ddee5920,ddef1ef8,c0180e42) at nfs_writebp+0xdc
> nfs_bwrite(ddef1eec,c02a15c0,e06866c0,d3088158,ddef1f28) at nfs_bwrite+0x16
> bawrite(d3088158,d30faff0,0,40084040,d30fbae8) at bawrite+0x32
> cluster_wbuild(e06866c0,2000,1b8,10,d30fc328) at cluster_wbuild+0x493
> vfs_bio_awrite(d30fc328,3f,c0181f8c,c016aef5,0) at vfs_bio_awrite+0x1a4
> flushbufqueues(0,8000,c024be00,0,b0206) at flushbufqueues+0x116
> buf_daemon(0) at buf_daemon+0x8f
> fork_trampoline() at fork_trampoline+0x8
>
> The "problem" is that flushbufqueues calls vfs_bio_awrite on the buf's
> that need commiting. We then go through the overhead of clustering up
> 64k worth of data & pass it down. It eventually ends up in nfs_doio()
> which finally realizes that the bufs just need to be committed & calls
> nfs_commit() on them. This is repeated for every 64k of data.
>
> I have an idea on how to reduce these commits & a proof of concept
> implementation of it. My idea is to have nfs_doio() call a function
> (which I've called nfs_megacommit()) to consolodate all the
> B_NEEDCOMMIT bufs from a particular file into one large commit. This
> nfs_megacommit() function is basically a cut-n-paste of the top half
> of nfs_flush().
>
> I just tried it this morning & it appears to work. Over a 1Gb/s
> (Alteon, Jumbo frames) link, my write bandwidth increases from
> 5-8MB/sec to 17-18MB/sec when talking to a Solaris (2.7, i86) NFS
> server & writing a 375MB file. The server's nfsstat looks like this.
>
> Before:
>
> Version 3: (54262 calls)
> null getattr setattr lookup access readlink
> 0 0% 0 0% 1 0% 1 0% 3 0% 0 0%
> read write create mkdir symlink mknod
> 0 0% 48325 89% 0 0% 0 0% 0 0% 0 0%
> remove rmdir rename link readdir readdirplus
> 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
> fsstat fsinfo pathconf commit
> 0 0% 0 0% 0 0% 5932 10%
>
>
> After:
>
> Version 3: (48078 calls)
> null getattr setattr lookup access readlink
> 0 0% 0 0% 0 0% 1 0% 1 0% 0 0%
> read write create mkdir symlink mknod
> 0 0% 48027 99% 1 0% 0 0% 0 0% 0 0%
> remove rmdir rename link readdir readdirplus
> 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
> fsstat fsinfo pathconf commit
> 0 0% 0 0% 0 0% 48 0%
>
>
> Can anybody tell me if doing something like this is fundamentally
> broken? Is it worth pursuing?
http://www.freebsd.org/~alfred/nfs_supercommit_broken.diff
only grab as many adjacent blocks as possible, you don't want to
scan the entire file's buffer list for each commit, you also don't
want to interfere with other client's caching forcing sever commits
on thier behalf.
--
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message