Re: FreeBSD NFS client goes into infinite retry loop

Steve Polyack Fri, 19 Mar 2010 19:41:19 -0700

On 3/19/2010 9:32 PM, Rick Macklem wrote:


On Fri, 19 Mar 2010, Steve Polyack wrote:

To anyone who is interested: I did some poking around with DTrace,which led me to the nfsiod client code.

In src/sys/nfsclient/nfs_nfsiod.c:
       } else {
           if (bp->b_iocmd == BIO_READ)
               (void) nfs_doio(bp->b_vp, bp, bp->b_rcred, NULL);
           else
               (void) nfs_doio(bp->b_vp, bp, bp->b_wcred, NULL);
       }


If you look t nfs_doio(), it decides whether or not to mark the buffer
invalid, based on the return value it gets. Some (EINTR, ETIMEDOUT, EIO)
are not considered fatal, but the others are. (When the async I/O
daemons call nfs_doio(), they are threads that couldn't care less if
the underlying I/O op succeeded. The outcome of the I/O operation
determines what nfs_doio() does with the buffer cache block.)


I was looking at this and noticed the above after my last post.

The result is that my problematic repeatable circumstance beginslogging "nfssvc_iod: iod 0 nfs_doio returned errno: 5" (correspondingto NFSERR_INVAL?) for each repetition of the failed write. The onlythings triggering this are my failed writes. I can also see thenfsiod0 process waking up each iteration.
Nope, errno 5 is EIO and that's where the problem is. I don't know why
the server is returning EIO after the file has been deleted on the
server (I assume you did that when running your little shell script?).

Yes, while running the simple shell script I simply deleted the file onthe NFS server itself.

Do we need some kind of "retry x times then abort" logic withinnfsiod_iod(), or does this belong in the subsequent functions, suchas nfs_doio()? I think it's best to avoid these sorts of infiniteloops which have the potential to take out the system or overload thenetwork due to dumb decisions made by unprivileged users.
Nope, people don't like data not getting written back to a server when
it is slow or temporarily network partitioned. The only thing that should
stop a client from retrying a write back to the server is a fatal error
from the server that says "this won't ever succeed".

I think we need to figure out if the EIO (NFS3ERR_IO in wireshark) or
if the server is sending NFS3ERR_STALE and the client is somehow munging
that into EIO, causing the confusion.

This makes sense. According to wireshark, the server is indeedtransmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALEinstead; it sounds more correct than marking it a general IO error.Also, the NFS server is serving its share off of a ZFS filesystem, if itmakes any difference. I suppose ZFS could be talking to the NFS serverthreads with some mismatched language, but I doubt it.


Thanks for the informative response,
Steve

_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: FreeBSD NFS client goes into infinite retry loop

Reply via email to