I went through all the instances, where there would be an immediate soupcall triggered (before r367492).
If the problem is related to a race condition, where the socket is unlocked before the upcall, I can change the patch in such a way, to retain the lock on the socket all through TCP processing. Both sorwakeups are with a locked socket (which is the critical part, I understand), while for the write upcall there is one unlocked, and one locked.... Richard Scheffenegger Consulting Solution Architect NAS & Networking NetApp +43 1 3676 811 3157 Direct Phone +43 664 8866 1857 Mobile Phone richard.scheffeneg...@netapp.com https://ts.la/richard49892 -----Ursprüngliche Nachricht----- Von: tue...@freebsd.org <tue...@freebsd.org> Gesendet: Samstag, 10. April 2021 18:13 An: Rick Macklem <rmack...@uoguelph.ca> Cc: Scheffenegger, Richard <richard.scheffeneg...@netapp.com>; Youssef GHORBAL <youssef.ghor...@pasteur.fr>; freebsd-net@freebsd.org Betreff: Re: NFS Mount Hangs NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. > On 10. Apr 2021, at 17:56, Rick Macklem <rmack...@uoguelph.ca> wrote: > > Scheffenegger, Richard <richard.scheffeneg...@netapp.com> wrote: >>> Rick wrote: >>> Hi Rick, >>> >>>> Well, I have some good news and some bad news (the bad is mostly for >>>> Richard). >>>> >>>> The only message logged is: >>>> tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, segment >>>> processed normally >>>> > Btw, I did get one additional message during further testing (with r367492 > reverted): > tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, connection > attempt aborted > by remote endpoint > > This only happened once of several test cycles. That is OK. > >>>> But...the RST battle no longer occurs. Just one RST that works and then >>>> the SYN gets SYN,ACK'd by the FreeBSD end and off it goes... >>>> >>>> So, what is different? >>>> >>>> r367492 is reverted from the FreeBSD server. >>>> I did the revert because I think it might be what otis@ hang is being >>>> caused by. (In his case, the Recv-Q grows on the socket for the stuck >>>> Linux client, while others work. >>>> >>>> Why does reverting fix this? >>>> My only guess is that the krpc gets the upcall right away and sees a EPIPE >>>> when it does soreceive()->results in soshutdown(SHUT_WR). > This was bogus and incorrect. The diagnostic printf() I saw was > generated for the back channel, and that would have occurred after the socket > was shut down. > >>> >>> With r367492 you don't get the upcall with the same error state? Or you >>> don't get an error on a write() call, when there should be one? > If Send-Q is 0 when the network is partitioned, after healing, the > krpc sees no activity on the socket (until it acquires/processes an RPC it > will not do a sosend()). > Without the 6minute timeout, the RST battle goes on "forever" (I've > never actually waited more than 30minutes, which is close enough to "forever" > for me). > --> With the 6minute timeout, the "battle" stops after 6minutes, when > --> the timeout > causes a soshutdown(..SHUT_WR) on the socket. > (Since the soshutdown() patch is not yet in "main". I got comments, but > no "reviewed" > on it, the 6minute timer won't help if enabled in main. The soclose() > won't happen > for TCP connections with the back channel enabled, such as Linux > 4.1/4.2 ones.) I'm confused. So you are saying that if the Send-Q is empty when you partition the network, and the peer starts to send SYNs after the healing, FreeBSD responds with a challenge ACK which triggers the sending of a RST by Linux. This RST is ignored multiple times. Is that true? Even with my patch for the the bug I introduced? What version of the kernel are you using? Best regards Michael > > If Send-Q is non-empty when the network is partitioned, the battle will not > happen. > >> >> My understanding is that he needs this error indication when calling >> shutdown(). > There are several ways the krpc notices that a TCP connection is no longer > functional. > - An error return like EPIPE from either sosend() or soreceive(). > - A return of 0 from soreceive() with no data (normal EOF from other end). > - A 6minute timeout on the server end, when no activity has occurred > on the connection. This timer is currently disabled for NFSv4.1/4.2 > mounts in "main", but I enabled it for this testing, to stop the "RST battle > goes on forever" > during testing. I am thinking of enabling it on "main", but this > crude bandaid shouldn't be thought of as a "fix for the RST battle". > >>> >>> From what you describe, this is on writes, isn't it? (I'm asking, at the >>> original problem that was fixed with r367492, occurs in the read path >>> (draining of ths so_rcv buffer in the upcall right away, which subsequently >>> influences the ACK sent by the stack). >>> >>> I only added the so_snd buffer after some discussion, if the WAKESOR >>> shouldn't have a symmetric equivalent on WAKESOW.... >>> >>> Thus a partial backout (leaving the WAKESOR part inside, but reverting the >>> WAKESOW part) would still fix my initial problem about erraneous DSACKs >>> (which can also lead to extremely poor performance with Linux clients), but >>> possible address this issue... >>> >>> Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 for >>> the revert only on the so_snd upcall? > Since the krpc only uses receive upcalls, I don't see how reverting > the send side would have any effect? > >> Since the release of 13.0 is almost done, can we try to fix the issue >> instead of reverting the commit? > I think it has already shipped broken. > I don't know if an errata is possible, or if it will be broken until 13.1. > > --> I am much more concerned with the otis@ stuck client problem than > --> this RST battle that only > occurs after a network partitioning, especially if it is 13.0 specific. > I did this testing to try to reproduce Jason's stuck client (with > connection in CLOSE_WAIT) > problem, which I failed to reproduce. > > rick > > Rs: agree, a good understanding where the interaction btwn stack, > socket and in kernel tcp user breaks is needed; > >> >> If this doesn't help, some major surgery will be necessary to prevent NFS >> sessions with SACK enabled, to transmit DSACKs... > > My understanding is that the problem is related to getting a local > error indication after receiving a RST segment too late or not at all. > > Rs: but the move of the upcall should not materially change that; i don’t > have a pc here to see if any upcall actually happens on rst... > > Best regards > Michael >> >> >>> I know from a printf that this happened, but whether it caused the RST >>> battle to not happen, I don't know. >>> >>> I can put r367492 back in and do more testing if you'd like, but I think it >>> probably needs to be reverted? >> >> Please, I don't quite understand why the exact timing of the upcall would be >> that critical here... >> >> A comparison of the soxxx calls and errors between the "good" and the "bad" >> would be perfect. I don't know if this is easy to do though, as these calls >> appear to be scattered all around the RPC / NFS source paths. >> >>> This does not explain the original hung Linux client problem, but does shed >>> light on the RST war I could create by doing a network partitioning. >>> >>> rick >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"