Of all the gin joints in all the towns in all the world, Matthew Dillon
had to walk into mine and say:
> :Yes, we do. I've run into this problem elsewhere but a quick fix was needed
> :so it just got hacked. NT NFS clients tend to trigger it too.
> :
> :The problem is that the sanity check is a fair way away from where the problem
> :packet is generated. The bad reply is generated in the readdirplus routine,
> :gets replied (without checking) and cached. The client drops the (oversize)
> :packet, resends, and the nfsd replies from the cache and this time hits
> :the sanity check and panics.
> :
> :...
> :
> :I will have another look shortly. Anyway, the clue is that the server
> :readdirplus routine is the apparent culprit.
> :
> :Cheers,
> :-Peter
>
> This makes a lot of sense. A report of du causing the panic, and
> the good possibility that readdirplus is caching an oversized response
> packet. Tell me what you come up with! I'll take a crack at it if you
> don't find anything.
Caching doesn't enter into it. The problem is bad arithmetic.
In /sys/nfs/nfs_serv.c:nfsrv_readdirplus(), we have the following
code:
/*
* If either the dircount or maxcount will be
* exceeded, get out now. Both of these lengths
* are calculated conservatively, including all
* XDR overheads.
*/
len += (7 * NFSX_UNSIGNED + nlen + rem + NFSX_V3FH +
NFSX_V3POSTOPATTR);
dirlen += (6 * NFSX_UNSIGNED + nlen + rem);
if (len > cnt || dirlen > fullsiz) {
eofflag = 0;
break;
}
I observed that the value of "len" didn't agree with the actual amount
of data beong consumed in the mbuf chain. It turns out that each
time through the loop, len is being incremented by 4 bytes too little.
In other words, 7 * NFSX_UNSIGNED should really be 8 * NFSX_UNSIGNED.
When I change 7 to 8, I no longer get oversized replies and everything
adds up.
This sanity code is trying to add up the amount of data consumed for
each entryplus3 that gets consumed by a directory entry. The entryplus3
is defined in nfs_prot.x like this:
struct entryplus3 {
fileid3 fileid;
filename3 name;
cookie3 cookie;
post_op_attr name_attributes;
post_op_fh3 name_handle;
entryplus3 *nextentry;
};
Unfortunately I haven't been able to wrap my brain around how this is
being counted up for the "len" calculation. Whatever it's doing, it's
off by 4 bytes. Possibly somebody forgot that "filename3" is a string,
which in XDR format consists of a string bytes, plus padding to a longword
boundary, *plus* a longword length value. Some comments would have been
useful here. (Hint, hint.)
What I don't know is whether or not the calculation for dirlen is
wrong or not. Hopefully now that I've shown everyone the light, maybe
somebody can tell me for sure.
-Bill
--
=============================================================================
-Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu
Work: [EMAIL PROTECTED] | Center for Telecommunications Research
Home: [EMAIL PROTECTED] | Columbia University, New York City
=============================================================================
"It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message