Re: NFS Mount Hangs

tuexen Sat, 10 Apr 2021 05:19:19 -0700

> On 10. Apr 2021, at 11:19, Scheffenegger, Richard 
> <richard.scheffeneg...@netapp.com> wrote:
> 
> Hi Rick,
> 
>> Well, I have some good news and some bad news (the bad is mostly for 
>> Richard).
>> 
>> The only message logged is:
>> tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, segment processed 
>> normally
>> 
>> But...the RST battle no longer occurs. Just one RST that works and then the 
>> SYN gets SYN,ACK'd by the FreeBSD end and off it goes...
>> 
>> So, what is different?
>> 
>> r367492 is reverted from the FreeBSD server.
>> I did the revert because I think it might be what otis@ hang is being caused 
>> by. (In his case, the Recv-Q grows on the socket for the stuck Linux client, 
>> while others work.
>> 
>> Why does reverting fix this?
>> My only guess is that the krpc gets the upcall right away and sees a EPIPE 
>> when it does soreceive()->results in soshutdown(SHUT_WR).
> 
> With r367492 you don't get the upcall with the same error state? Or you don't 
> get an error on a write() call, when there should be one?
My understanding is that he needs this error indication when calling shutdown().
> 
> From what you describe, this is on writes, isn't it? (I'm asking, at the 
> original problem that was fixed with r367492, occurs in the read path 
> (draining of ths so_rcv buffer in the upcall right away, which subsequently 
> influences the ACK sent by the stack).
> 
> I only added the so_snd buffer after some discussion, if the WAKESOR 
> shouldn't have a symmetric equivalent on WAKESOW....
> 
> Thus a partial backout (leaving the WAKESOR part inside, but reverting the 
> WAKESOW part) would still fix my initial problem about erraneous DSACKs 
> (which can also lead to extremely poor performance with Linux clients), but 
> possible address this issue...
> 
> Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 for 
> the revert only on the so_snd upcall?
Since the release of 13.0 is almost done, can we try to fix the issue instead 
of reverting the commit?
> 
> If this doesn't help, some major surgery will be necessary to prevent NFS 
> sessions with SACK enabled, to transmit DSACKs...
My understanding is that the problem is related to getting a local error 
indication after
receiving a RST segment too late or not at all.


Best regards
Michael
> 
> 
>> I know from a printf that this happened, but whether it caused the RST 
>> battle to not happen, I don't know.
>> 
>> I can put r367492 back in and do more testing if you'd like, but I think it 
>> probably needs to be reverted?
> 
> Please, I don't quite understand why the exact timing of the upcall would be 
> that critical here...
> 
> A comparison of the soxxx calls and errors between the "good" and the "bad" 
> would be perfect. I don't know if this is easy to do though, as these calls 
> appear to be scattered all around the RPC / NFS source paths.
> 
>> This does not explain the original hung Linux client problem, but does shed 
>> light on the RST war I could create by doing a network partitioning.
>> 
>> rick
> 
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: NFS Mount Hangs

Reply via email to