FWIW:

r367492 fixes an issue around "premature" transmission of an ACK due to the 
incoming segment only been partially processed at the time - related to 
in-kernel TCP consumers which use socket upcalls.

Rick mentioned, that the NFS server (one in-kernel TCP user) has stringent 
requirements on the state of the socket during the upcall, thus D29690 is 
retaining the lock on the socket buffer until TCP processing is finalized and 
the upcall can be done without running any risk for transmitting outdated 
information back to the other end.

However, I have no proper way to verify/validate this interaction.

My ask would be to test the behavior with D29690 first - but if similar hangs 
keep reoccurring, then revert r367492 (which will also mean more severe surgery 
on the TCP processing flow).

Thanks.

Richard Scheffenegger

-----Ursprüngliche Nachricht-----
Von: Rick Macklem <rmack...@uoguelph.ca> 
Gesendet: Donnerstag, 15. April 2021 23:05
An: Allan Jude <allanj...@freebsd.org>; freebsd-current@freebsd.org
Cc: Richard Scheffenegger <rsch...@freebsd.org>; Juraj Lutter <o...@freebsd.org>
Betreff: Re: NFS issues since upgrading to 13-RELEASE

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




I wrote:
[stuff snipped]
>- Alternately you can try rscheff@'s alternate proposed patch that is 
>at
>  https://reviews.freebsd.og/D29690.
Oops, that's
    https:/reviews.freebsd.org/D29690

rick

  I have not yet had time to test this one, but since I cannot reproduce the 
hang, I can
  only do testing of it to see that it is "no worse" than reverting r367492 for 
my
  setup.

Please let us know which you choose and whether or not it fixes your problem.

>> Any pointers for troubleshooting this? I've been looking through vmstat, 
>> gstat, top, etc. when the problem occurs, but I haven't been able to 
>> pinpoint the issue. I can get pcap, but it would be from the hosts, because 
>> I don't have a 10G tap or managed switch.
>>
>
>run `nfsstat -d 1` and try to capture a few lines from before, during, 
>and after the stall, and that may provide some insight.
>
>Specifically, does the queue length grow, suggesting it is waiting on 
>the I/O subsystem, or does it just stop getting traffic all together.

If the revert of r367492 does not fix the problem, monitor the TCP 
connection(s) via "netstat -a" and, if possible, capture packets via tcpdump -s 
0 -w hang.pcap host <nfs-client> or similar, run on the server.

Ideally the tcpdump would  be started before the "hang" occurs, but running one 
while the hang is occurring (until after it recovers) could also be useful.

Thanks for reporting this, rick

--
Allan Jude
_______________________________________________
freebsd-current@freebsd.org mailing list 
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to