On Sun, 27 Jun 2010, Rick C. Petty wrote:
Hmm. When I mounted the same filesystem with nfs3 from a different client,
everything started working at almost normal speed (still a little slower
though).
Now on that same host I saw a file get corrupted. On the server, I see
the following:
% hd testfile | tail -4
00677fd0 2a 24 cc 43 03 90 ad e2 9a 4a 01 d9 c4 6a f7 14 |*$.C.....J...j..|
00677fe0 3f ba 01 77 28 4f 0f 58 1a 21 67 c5 73 1e 4f 54 |?..w(O.X.!g.s.OT|
00677ff0 bf 75 59 05 52 54 07 6f db 62 d6 4a 78 e8 3e 2b |.uY.RT.o.b.Jx.>+|
00678000
But on the client I see this:
% hd testfile | tail -4
00011ff0 1e af dc 8e d6 73 67 a2 cd 93 fe cb 7e a4 dd 83 |.....sg.....~...|
00012000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00678000
The only thing I could do to fix it was to copy the file on the server,
delete the original file on the client, and move the copied file back.
Not only is it affecting random file reads, but started breaking src
and ports builds in random places. In one situation, portmaster failed
because of a port checksum. It then tried to refetch and failed with the
same checksum problem. I manually deleted the file, tried again and it
built just fine. The ports tree and distfiles are nfs4 mounted.
I can't explain the corruption, beyond the fact that "soft,intr" can
cause all sorts of grief. If mounts without "soft,intr" still show
corruption problems, try disabling delegations (either kill off the
nfscbd daemons on the client or set vfs.newnfs.issue_delegations=0
on the server). It is disabled by default because it is the "greenest"
part of the subsystem.
The other thing that can really slow it down is if the uid<->login-name
(and/or gid<->group-name) is messed up, but this would normally only
show up for things like "ls -l". (Beware having multiple password database
entries for the same uid, such as "root" and "toor".)
I use the same UIDs/GIDs on all my boxes, so that can't be it. But thanks
for the idea.
Make sure you don't have multiple entries for the same uid, such as "root"
and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of
them, if you have both)
When you did the nfs3 mount did you specify "newnfs" or "nfs" for the
file system type? (I'm wondering if you still saw the problem with the
regular "nfs" client against the server? Others have had good luck using
the server for NFSv3 mounts.)
I used "nfs" for FStype. So I should be using "newnfs"? This wasn't very
clear in the man pages. In fact "newnfs" wasn't mentioned in
"man mount_newnfs".
When you specify "nfs" for an NFSv3 mount, you get the regular client.
When you specify "newnfs" for an NFSv3 mount, you get the experimental
client. When you specify "nfsv4" you always get the experimental NFS
client, and it doesn't matter which FStype you've specified.
One other thing I noticed but I'm not sure if it's a bug or expected
behavior (unrelated to the delays or corruption), is I have the following
filesystems on the server:
/vol/a
/vol/a/b
/vol/a/c
I export all three volumes and set my NFS V4 root to "/". On the client,
I'll "mount ... server:vol /vol" and the "b" and "c" directories show up
but when I try "ls /vol/a/b /vol/a/c", they show up empty. In dmesg I see:
If you are using UFS/FFS on the server, this should work and I don't know
why the empty directories under /vol on the client confused it. If your
server is using ZFS, everything from / including /vol need to be exported.
kernel: nfsv4 client/server protocol prob err=10020
This error indicates that there wasn't a valid FH for the server. I
suspect that the mount failed. (It does a loop of Lookups from "/" in
the kernel during the mount and it somehow got confused part way through.)
After unmounting /vol, I discovered that my client already had /vol/a/b and
/vol/a/c directories (because pre-NFSv4, I had to mount each filesystem
separately). Once I removed those empty dirs and remounted, the problem
went away. But it did drive me crazy for a few hours.
I don't know why these empty dirs would confuse it. I'll try a test
here, but I suspect the real problem was that the mount failed and
then happened to succeed after you deleted the empty dirs.
It still smells like some sort of transport/net interface/... issue
is at the bottom of this. (see response to your next post)
rick
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"