> On Sep 6, 2019, at 4:47 PM, Jason L Tibbitts III <ti...@math.uh.edu> wrote:
> 
>>>>>> "JBF" == J Bruce Fields <bfie...@fieldses.org> writes:
> 
> JBF> Those readdir changes were client-side, right?  Based on that I'd
> JBF> been assuming a client bug, but maybe it'd be worth getting a full
> JBF> packet capture of the readdir reply to make sure it's legit.
> 
> I have been working with bcodding on IRC for the past couple of days on
> this.  Fortunately I was able to come up with way to fill up a directory
> in such a way that it will fail with certainty and as a bonus doesn't
> include any user data so I can feel OK about sharing packet captures.  I
> have a capture alongside a kernel trace of the problematic operation in
> https://www.math.uh.edu/~tibbs/nfs/.  Not that I can particularly tell
> anything useful from that, but bcodding says that it seems to point to
> some issue in sunrpc.
> 
> And because I can easily reproduce this and I was able to do a bisect:
> 
> 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d is the first bad commit
> commit 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d
> Author: Chuck Lever <chuck.le...@oracle.com>
> Date:   Mon Feb 11 11:25:41 2019 -0500
> 
>    SUNRPC: Use au_rslack when computing reply buffer size
> 
>    au_rslack is significantly smaller than (au_cslack << 2). Using
>    that value results in smaller receive buffers. In some cases this
>    eliminates an extra segment in Reply chunks (RPC/RDMA).
> 
>    Signed-off-by: Chuck Lever <chuck.le...@oracle.com>
>    Signed-off-by: Anna Schumaker <anna.schuma...@netapp.com>
> 
> :040000 040000 d4d1ce2fbe0035c5bd9df976b8c448df85dcb505 
> 7011a792dfe72ff9cd70d66e45d353f3d7817e3e M      net
> 
> But of course, I can't say whether this is the actual bad commit or
> whether it just introduced a behavior change which alters the conditions
> under which the problem appears.

The first place I'd start looking is the XDR constants at the head of 
fs/nfs/nfs4xdr.c
having to do with READDIR.

The report of behavior changes with the use of krb5p also makes this commit 
plausible.


> And just to make sure that the blame doesn't lie with the old RHEL7
> kernel, I rsynced over the problematic directory to a machine running
> something slightly more modern (5.1.11, which I know I need to update,
> but it's already set up to do kerberised NFS) and the same problem
> exists, though the directory listing does fail at a different place.
> 
> - J<

--
Chuck Lever



Reply via email to