On Mon, 28 Jun 2010, Rick C. Petty wrote:
Make sure you don't have multiple entries for the same uid, such as "root"
and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of
them, if you have both)
Hmm, that's a strange requirement, since FreeBSD by default comes with
both. That should probably be documented in the nfsv4 man page.
Well, if the mapping from uid->name is not unique, getpwuid() will just
return one of them and it probably won't be the expected one. Having
both "root" and "toor" only cause weird behaviour when "root" tries to
use a mount point. I had thought it was in the man pages, but I now
see it isn't mentioned. I'll try and remember to add it.
This error indicates that there wasn't a valid FH for the server. I
suspect that the mount failed. (It does a loop of Lookups from "/" in
the kernel during the mount and it somehow got confused part way through.)
If the mount failed, why would it allow me to "ls /vol/a" and see both "b"
and "c" directories as well as other files/directories on /vol/ ?
I don't know why these empty dirs would confuse it. I'll try a test
here, but I suspect the real problem was that the mount failed and
then happened to succeed after you deleted the empty dirs.
It doesn't seem likely. I spent an hour mounting and unmounting and each
mount looked successful in that there were files and directories besides
the two I was trying to decend into.
My theory was that, since you used "soft", one of the Lookups during
the mounting process in the kernel failed with ETIMEDOUT. It isn't
coded to handle that. There are lots of things that will break in
the NFSv4 client if "soft" or "intr" are used. (That is in the mount_nfs
man page, but right at the end, so it could get missed.)
Maybe "broken mount" would have been a better term than "failed mount".
If more recent mount attempts are without "soft", then I would expect
them to work reliably. (If you feel daring, add the empty subdirs back
and see if it fails?)
I will try a case with empty subdirs on the client, to see if there is
a problem when I do it. (It should just cover them up until umount, but
it could certainly be broken:-)
It still smells like some sort of transport/net interface/... issue
is at the bottom of this. (see response to your next post)
It's possible. I just had another NFSv4 client (with the same server) lock
up:
load: 0.00 cmd: ls 17410 [nfsv4lck] 641.87r 0.00u 0.00s 0% 1512k
and:
load: 0.00 cmd: make 87546 [wait] 37095.09r 0.01u 0.01s 0% 844k
That make has been hung for hours, and the ls(1) was executed during that
lockup. I wish there was a way I could unhang these processes and unmount
the NFS mount without panicking the kernel, but alas even this fails:
# umount -f /sw
load: 0.00 cmd: umount 17479 [nfsclumnt] 1.27r 0.00u 0.04s 0% 788k
The plan is to implement a "hard forced" umount (something like -ff)
which will throw away data, but get the umount done, but it hasn't been
coded yet. (For 8.2 maybe?)
A "shutdown -p now" resulted in a panic with the speaker beeping
constantly and no console output.
It's possible the NICs are all suspect, but all of this worked fine a
couple of days ago when I was only using NFSv3.
Yea, if NFSv3 worked fine with the same kernel, it seems more likely
an experimental NFS server issue, possibly related to scheduling the
busy CPUs. (If it was a NIC related problem, it is most likely related
to the driver, but if the NFSv3 case was using the same driver, that
doesn't seem likely.)
You are now using "rsize=32768,wsize=32768" aren't you?
(If you aren't yet using that, try it, since larger bursts of
traffic can definitely "tickle" nics driver problems, to borrow
Jeremy's term.)
rick
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"