Michael Tuexen wrote: >> On 18. Mar 2021, at 21:55, Rick Macklem <rmack...@uoguelph.ca> wrote: >> >> Michael Tuexen wrote: >>>> On 18. Mar 2021, at 13:42, Scheffenegger, Richard >>>> <richard.scheffeneg...@netapp.com> wrote: >>>> >>>>>> Output from the NFS Client when the issue occurs # netstat -an | grep >>>>>> NFS.Server.IP.X >>>>>> tcp 0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 >>>>>> FIN_WAIT2 >>>>> I'm no TCP guy. Hopefully others might know why the client would be stuck >>>>> in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack, >>>>> but could be wrong?) >>>> >>>> When the client is in Fin-Wait2 this is the state you end up when the >>>> Client side actively close() the tcp session, and then the server also >>>> ACKed the FIN. >> Jason noted: >> >>> When the issue occurs, this is what I see on the NFS Server. >>> tcp4 0 0 NFS.Server.IP.X.2049 NFS.Client.IP.X.51550 >>> CLOSE_WAIT >>> >>> which corresponds to the state on the client side. The server received the >>> FIN >>> from the client and acked it. >>> The server is waiting for a close call to happen. >>> So the question is: Is the server also closing the connection? >> Did you mean to say "client closing the connection here?" >Yes. >> >> The server should call soclose() { it never calls soshutdown() } when >> soreceive(with MSG_WAIT) returns 0 bytes or an error that indicates >> the socket is broken. Btw, I looked and the soreceive() is done with MSG_DONTWAIT, but the EWOULDBLOCK is handled appropriately.
>> --> The soreceive() call is triggered by an upcall for the rcv side of the >> socket. >> So, are you saying the FreeBSD NFS server did not call soclose() for this >> case? >Yes. If the state at the server side is CLOSE_WAIT, no close call has happened >yet. >The FIN from the client was received, it was ACKED, but no close() call >(or shutdown(..., SHUT_WR) or shutdown(..., SHUT_RDWR)) was issued. Therefore, >no FIN was sent and the client should be in the FINWAIT-2 state. This was also >reported. So the reported states are consistent. For a test, I commented out the soclose() call in the server side krpc and, when I dismounted, it did leave the server socket in CLOSE_WAIT. For the FreeBSD client, it did the dismount and the socket was in FIN_WAIT2 for a little while and then disappeared (someone mentioned a short timeout and that seems to be the case). I might argue that the Linux client should not get hung when this occurs, but there does appear to be an issue on the FreeBSD end. So it does appear you have a case where the soclose() call is not happening on the FreeBSD NFS server. I am a little surprised since I don't think I've heard of this before and the code is at least 10years old (at least the parts related to this). For the soclose() to not happen, the reference count on the socket structure cannot have gone to zero. (ie a SVC_RELEASE() was missed) Upon code inspection, I was not able to spot a reference counting bug. (Not too surprising, since a reference counting bug should have shown up long ago.) The only thing I spotted that could conceivably explain this is that the function svc_vc_stat() which returns the indication that the socket has been closed at the other end did not bother to do any locking when it checked the status. (I am not yet sure if this could result in the status of XPRT_DIED being missed by the call, but if so, that would result in the soclose() call not happening.) I have attached a small patch, which I think is safe, that adds locking to svc_vc_stat(),which I am hoping you can try at some point. (I realize this is difficult for a production server, but...) I have tested it a little and will test it some more, to try and ensure it does not break anything. I have also cc'd mav@, since he's the guy who last worked on this code, in case he has any insight w.r.t. how the soclose() might get missed (or any other way the server socket gets stuck in CLOSE_WAIT). rick ps: I'll create a PR for this, so that it doesn't get forgotten. Best regards Michael > > rick > > Best regards > Michael >> This will last for ~2 min or so, but is asynchronous. However, the same >> 4-tuple can not be reused during this time. >> >> With other words, from the socket / TCP, a properly executed active close() >> will end up in this state. (If the other side initiated the close, a passive >> close, will not end in this state) >> >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
xprtdied.patch
Description: xprtdied.patch
_______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"