--- On Thu, 7/1/10, Garrett Cooper <yanef...@gmail.com> wrote:
> From: Garrett Cooper <yanef...@gmail.com> > Subject: Re: NFS 75 second stall > To: "alan bryan" <alan.br...@yahoo.com> > Cc: freebsd-stable@freebsd.org > Date: Thursday, July 1, 2010, 1:28 PM > On Thu, Jul 1, 2010 at 1:18 PM, alan > bryan <alan.br...@yahoo.com> > wrote: > > > > > > --- On Thu, 7/1/10, Garrett Cooper <yanef...@gmail.com> > wrote: > > > >> From: Garrett Cooper <yanef...@gmail.com> > >> Subject: Re: NFS 75 second stall > >> To: "alan bryan" <alan.br...@yahoo.com> > >> Cc: freebsd-stable@freebsd.org > >> Date: Thursday, July 1, 2010, 12:23 PM > >> On Thu, Jul 1, 2010 at 11:51 AM, alan > >> bryan <alan.br...@yahoo.com> > >> wrote: > >> > > >> > > >> > --- On Thu, 7/1/10, Garrett Cooper <yanef...@gmail.com> > >> wrote: > >> > > >> >> From: Garrett Cooper <yanef...@gmail.com> > >> >> Subject: Re: NFS 75 second stall > >> >> To: "alan bryan" <alan.br...@yahoo.com> > >> >> Cc: freebsd-stable@freebsd.org > >> >> Date: Thursday, July 1, 2010, 11:13 AM > >> >> On Thu, Jul 1, 2010 at 11:01 AM, alan > >> >> bryan <alan.br...@yahoo.com> > >> >> wrote: > >> >> > Setup: > >> >> > > >> >> > server - FreeBSD 8-stable from > today. 2 UFS > >> dirs > >> >> exported via NFS. > >> >> > client - FreeBSD 8.0-Release. > Running a > >> test php > >> >> script that copies around various files > to/from 2 > >> separate > >> >> NFS mounts. > >> >> > > >> >> > Situation: > >> >> > > >> >> > script is started (forked to do 20 > >> simultaneous runs) > >> >> and 20 1GB files are copied to the NFS > dir which > >> works > >> >> fine. When it then switches to reading > those > >> files back > >> >> and simultaneously writing to the other > NFS mount > >> I see a > >> >> hang of 75 seconds. If I do an "ls -l" > on the > >> NFS mount it > >> >> hangs too. After 75 seconds the client > has > >> reported: > >> >> > > >> >> > nfs server > 192.168.10.133:/usr/local/export1: > >> not > >> >> responding > >> >> > nfs server > 192.168.10.133:/usr/local/export1: > >> is alive > >> >> again > >> >> > nfs server > 192.168.10.133:/usr/local/export1: > >> not > >> >> responding > >> >> > nfs server > 192.168.10.133:/usr/local/export1: > >> is alive > >> >> again > >> >> > > >> >> > and then things start working > again. The > >> server was > >> >> originally FreeBSD 8.0-Release also but > was > >> upgraded to the > >> >> latest stable to see if this issue could > be > >> avoided. > >> >> > > >> >> > # nfsstat -s -W -w 1 > >> >> > GtAttr Lookup Rdlink Read > Write > >> Rename > >> >> Access Rddir > >> >> > 0 0 0 > 222 > >> 257 > >> >> 0 0 0 > >> >> > 0 0 0 > 178 > >> 135 > >> >> 0 0 0 > >> >> > 0 0 0 > 85 > >> 127 > >> >> 0 0 0 > >> >> > 0 0 0 > 0 > >> 0 > >> >> 0 0 0 > >> >> > 0 0 0 > 0 > >> 0 > >> >> 0 0 0 > >> >> > 0 0 0 > 0 > >> 0 > >> >> 0 0 0 > >> >> > 0 0 0 > 0 > >> 0 > >> >> 0 0 0 > >> >> > 0 0 0 > 0 > >> 0 > >> >> 0 0 0 > >> >> > > >> >> > ... for 75 rows of all zeros > >> >> > > >> >> > 0 0 0 > 272 > >> 266 > >> >> 0 0 0 > >> >> > 0 0 0 > 167 > >> 165 > >> >> 0 0 0 > >> >> > > >> >> > I also tried runs with 15 > simultaneous > >> processes and > >> >> 25. 15 processes gave only about a 5 > second > >> stall but 25 > >> >> gave again the same 75 second stall. > >> >> > > >> >> > Further, I tested with 2 mounts to > the same > >> server but > >> >> from ZFS filesytems with the exact same > >> stall/timeout > >> >> periods. So, it doesn't appear to > matter what > >> the > >> >> underlying filesystem is - it's something > in NFS > >> or > >> >> networking code. > >> >> > > >> >> > Any ideas on what's going on here? > What's > >> causing > >> >> the complete stall period of zero NFS > activity? > >> Any flaws > >> >> with my testing methods? > >> >> > > >> >> > Thanks for any and all help/ideas. > >> >> > >> >> What network driver are you using? Have > you tried > >> >> tcpdumping the packets? > >> >> -Garrett > >> >> > >> > > >> > I'm using igb currently but have also used > em. I > >> have not tried tcpdumping the packets yet on this > test. > >> Any suggestions on things to look out for (I'm > not that > >> familiar with that whole process). > >> > > >> > Which brings up another point - I'm using > TCP > >> connections for NFS, not UDP. > >> > >> Is the net.inet.tcp.tso sysctl enabled or > >> not? What about rxcsum and txcsum? > >> Thanks, > >> -Garrett > >> > > > > I haven't intentionally/explicitly set any of this so > it's "default": > > > > # sysctl net.inet.tcp.tso > > net.inet.tcp.tso: 1 > > > > > > igb0: > flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> > metric 0 mtu 1500 > > > options=13b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,TSO4> > > ether 00:30:48:c3:26:94 > > inet 192.168.10.133 netmask 0xffffff00 > broadcast 192.168.10.255 > > media: Ethernet autoselect (1000baseT > <full-duplex>) > > status: active > > Devise all of the available permutations that you need to > use to test > this out; there are a total of 3 variables, so 9 > permutations, but > you've already `tested one', so that makes the permutation > count 8. > Example: > > TXCSUM=off, RXCSUM=on, TSO=on > TXCSUM=on, RXCSUM=off, TSO=on > TXCSUM=on, RXCSUM=off, TSO=off > > ... > > Try executing the permutations on the client first, keeping > the server > constant, then make the client constant and make the server > variable, > and finally do both to the server and client. > > Be sure to take measurements for each permutation to ensure > that > things make functional sense. > > The reason why I'm suggesting this is that there were > issues with > em(4) [and igb(4) too I think since it uses common code], > with various > hardware offload bits on 8.0-RELEASE (IIRC disabling txcsum > did the > trick, but you may have to do more than that in order to > get things to > work). > > Here's a similar thread with a different driver: > http://lists.freebsd.org/pipermail/freebsd-current/2009-June/008264.html > (just to illustrate the thought process used to determine > the source > of failure). > > Thanks, > -Garrett > Thanks for the detailed test plan! Is it also fair to then assume that if I update the NFS client machine to the latest 8-Stable that should also fix this issue? (Both will then be running the latest 8-stable code). These are not in production so I can test or upgrade with no issues. Thanks again. --Alan _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"