8866 1857 Mobile Phone
richard.scheffeneg...@netapp.com
https://ts.la/richard49892
-Ursprüngliche Nachricht-
Von: Rick Macklem
Gesendet: Montag, 12. April 2021 00:50
An: Scheffenegger, Richard ;
tue...@freebsd.org
Cc: Youssef GHORBAL ; freebsd-net@freebsd.org
Betreff: Re: NFS Mount Hang
ussef GHORBAL; freebsd-net@freebsd.org
Subject: Re: NFS Mount Hangs
CAUTION: This email originated from outside of the University of Guelph. Do not
click links or open attachments unless you recognize the sender and know the
content is safe. If in doubt, forward suspicious emails to ith...@uoguel
eff: Re: NFS Mount Hangs
NetApp Security WARNING: This is an external email. Do not click links or open
attachments unless you recognize the sender and know the content is safe.
> On 10. Apr 2021, at 23:59, Rick Macklem wrote:
>
> tue...@freebsd.org wrote:
>> Rick wrote:
> [st
> On 10. Apr 2021, at 23:59, Rick Macklem wrote:
>
> tue...@freebsd.org wrote:
>> Rick wrote:
> [stuff snipped]
With r367492 you don't get the upcall with the same error state? Or you
don't get an error on a write() call, when there should be one?
>> If Send-Q is 0 when the network is
tue...@freebsd.org wrote:
>Rick wrote:
[stuff snipped]
>>> With r367492 you don't get the upcall with the same error state? Or you
>>> don't get an error on a write() call, when there should be one?
> If Send-Q is 0 when the network is partitioned, after healing, the krpc sees
> no activity on
>
bile Phone
richard.scheffeneg...@netapp.com
https://ts.la/richard49892
-Ursprüngliche Nachricht-
Von: tue...@freebsd.org
Gesendet: Samstag, 10. April 2021 18:13
An: Rick Macklem
Cc: Scheffenegger, Richard ; Youssef GHORBAL
; freebsd-net@freebsd.org
Betreff: Re: NFS Mount Hangs
Ne
> On 10. Apr 2021, at 17:56, Rick Macklem wrote:
>
> Scheffenegger, Richard wrote:
>>> Rick wrote:
>>> Hi Rick,
>>>
Well, I have some good news and some bad news (the bad is mostly for
Richard).
The only message logged is:
tcpflags 0x4; tcp_do_segment: Timestamp missi
Scheffenegger, Richard wrote:
>>Rick wrote:
>> Hi Rick,
>>
>>> Well, I have some good news and some bad news (the bad is mostly for
>>> Richard).
>>>
>>> The only message logged is:
>>> tcpflags 0x4; tcp_do_segment: Timestamp missing, segment processed
>>> normally
>>>
Btw, I did get one additio
ike it is for NFSv4.1 in freebsd-current.
>>>>I had forgotten to re-disable it.
>>>> So, when it does battle, it might have been the 6minute
>>>> timeout, which would then do the soshutdown(..SHUT_WR)
>>>> which kept it from getting "stuck&q
pcap for this one, started after the network was plugged
>>> back in and I noticed it was stuck for quite a while is here:
>>> fetch https://people.freebsd.org/~rmacklem/stuck.pcap
>>>
>>> In it, there is just a bunch of RST followed by SYN sent
>>> from cl
Von: tue...@freebsd.org
Gesendet: Samstag, April 10, 2021 2:19 PM
An: Scheffenegger, Richard
Cc: Rick Macklem; Youssef GHORBAL; freebsd-net@freebsd.org
Betreff: Re: NFS Mount Hangs
NetApp Security WARNING: This is an external email. Do not click links or open
> On 10. Apr 2021, at 11:19, Scheffenegger, Richard
> wrote:
>
> Hi Rick,
>
>> Well, I have some good news and some bad news (the bad is mostly for
>> Richard).
>>
>> The only message logged is:
>> tcpflags 0x4; tcp_do_segment: Timestamp missing, segment processed
>> normally
>>
>> But...th
ent->FreeBSD and FreeBSD just keeps sending
>>> acks for the old segment back.
>>> --> It looks like FreeBSD did the "RST, ACK" after the
>>> krpc did a soshutdown(..SHUT_WR) on the socket,
>>> for the one you've been looking at.
>
Hi Rick,
> Well, I have some good news and some bad news (the bad is mostly for Richard).
>
> The only message logged is:
> tcpflags 0x4; tcp_do_segment: Timestamp missing, segment processed
> normally
>
> But...the RST battle no longer occurs. Just one RST that works and then the
> SYN gets SYN
shutdown(..SHUT_WR) on the socket,
>> for the one you've been looking at.
>> I'll test some more...
>>
>>> I would like to understand why the reestablishment of the connection
>>> did not work...
>> It is looking like it takes either a non-empt
c did a soshutdown(..SHUT_WR) on the socket,
>> for the one you've been looking at.
>> I'll test some more...
>>
>>> I would like to understand why the reestablishment of the connection
>>> did not work...
>> It is looking like it takes eit
n
>>> did not work...
>> It is looking like it takes either a non-empty send-q or a
>> soshutdown(..SHUT_WR) to get the FreeBSD socket
>> out of established, where it just ignores the RSTs and
>> SYN packets.
>>
>> Thanks for looking at it, rick
>>
for the one you've been looking at.
>> I'll test some more...
>>
>>> I would like to understand why the reestablishment of the connection
>>> did not work...
>> It is looking like it takes either a non-empty send-q or a
>> soshutdown(..SHUT_WR) to get the FreeBSD socket
>> out of est
s for looking at it, rick
>
> Best regards
> Michael
>>
>> Have fun with it, rick
>>
>>
>>
>> From: tue...@freebsd.org
>> Sent: Sunday, April 4, 2021 12:41 PM
>> To: Rick Macklem
>> Cc: Scheff
ike to understand why the reestablishment of the connection
>> did not work...
> It is looking like it takes either a non-empty send-q or a
> soshutdown(..SHUT_WR) to get the FreeBSD socket
> out of established, where it just ignores the RSTs and
> SYN packets.
>
> Thanks for looking at it,
s and
SYN packets.
Thanks for looking at it, rick
Best regards
Michael
>
> Have fun with it, rick
>
>
> ________
> From: tue...@freebsd.org
> Sent: Sunday, April 4, 2021 12:41 PM
> To: Rick Macklem
> Cc: Scheffenegger, Richard; Youssef GHO
ue...@freebsd.org
> Sent: Sunday, April 4, 2021 12:41 PM
> To: Rick Macklem
> Cc: Scheffenegger, Richard; Youssef GHORBAL; freebsd-net@freebsd.org
> Subject: Re: NFS Mount Hangs
>
> CAUTION: This email originated from outside of the University of Guelph. Do
> not click links o
S val 2074098279 ecr 2671667056], length 48: NFS reply xid
> 697039765 reply ok 44 getattr ERROR: unk 10063
>
> This error 10063 after the partition heals is also "bad news". It indicates
> the Session
> (which is supposed to maintain "exactly once" RPC semantics is
ebsd-net@freebsd.org
Subject: Re: NFS Mount Hangs
CAUTION: This email originated from outside of the University of Guelph. Do not
click links or open attachments unless you recognize the sender and know the
content is safe. If in doubt, forward suspicious emails to ith...@uoguelph.ca
> On
reply ok 44 getattr ERROR: unk 10063
>
> This error 10063 after the partition heals is also "bad news". It indicates
> the Session
> (which is supposed to maintain "exactly once" RPC semantics is broken). I'll
> admit I
> suspect a Linux client bug, but will
gt; the Session
> (which is supposed to maintain "exactly once" RPC semantics is broken). I'll
> admit I
> suspect a Linux client bug, but will be investigating further.
>
> So, hopefully TCP conversant folk can confirm if the above is correct
> behaviour
>
nfirm if the above is correct behaviour
or if the RST should be ack'd sooner?
I could also see this becoming a "forever" TCP battle for other versions of
Linux client.
rick
________
From: Scheffenegger, Richard
Sent: Sunday, April 4, 2021 7:50 AM
To: Rick Macklem; tue.
-net@freebsd.org
Betreff: Re: NFS Mount Hangs
NetApp Security WARNING: This is an external email. Do not click links or open
attachments unless you recognize the sender and know the content is safe.
tue...@freebsd.org wrote:
>> On 2. Apr 2021, at 02:07, Rick Macklem wrote:
>>
&
igning the back channel.
Thanks for your help with this Michael, rick
Best regards
Michael
>
> rick
> ps: I can capture packets while doing this, if anyone has a use
> for them.
>
>
>
>
>
>
>
> From: owner-freebsd-...@
TCP connection gets stuck in CLOSE_WAIT and that is
why I've added the soshutdown(..SHUT_WR) calls,
which can happen before the client gets around to
re-assigning the back channel.
Thanks for your help with this Michael, rick
Best regards
Michael
>
> rick
> ps: I can capture packets w
them.
>
>
>
>
>
>
>
> From: owner-freebsd-...@freebsd.org on behalf
> of Youssef GHORBAL
> Sent: Saturday, March 27, 2021 6:57 PM
> To: Jason Breitman
> Cc: Rick Macklem; freebsd-net@freebsd.org
> Subject: Re
AL
Sent: Saturday, March 27, 2021 6:57 PM
To: Jason Breitman
Cc: Rick Macklem; freebsd-net@freebsd.org
Subject: Re: NFS Mount Hangs
CAUTION: This email originated from outside of the University of Guelph. Do not
click links or open attachments unless you recognize the sender and know the
content i
On 27 Mar 2021, at 13:20, Jason Breitman
mailto:jbreit...@tildenparkcapital.com>> wrote:
The issue happened again so we can say that disabling TSO and LRO on the NIC
did not resolve this issue.
# ifconfig lagg0 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso
# ifconfig lagg0
lagg0: flag
The issue happened again so we can say that disabling TSO and LRO on the NIC
did not resolve this issue.
# ifconfig lagg0 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso
# ifconfig lagg0
lagg0: flags=8943 metric 0 mtu
1500
options=8100b8
We can also say that the sysctl settings d
behalf
of Jason Breitman
Sent: Monday, March 22, 2021 9:24 AM
To: Youssef GHORBAL
Cc: freebsd-net@freebsd.org
Subject: Re: NFS Mount Hangs
CAUTION: This email originated from outside of the University of Guelph. Do not
click links or open attachments unless you recognize the sender and know the
Agreed. I had made the changes on the FreeBSD Server side and was suggesting
that a new TCP connection needed to be established between the client and
server for the settings to take effect.
I rebooted all of my Debian clients on Sunday to achieve that goal,
establishing a new NFSv4 TCP connect
> On 21 Mar 2021, at 23:21, Rick Macklem wrote:
>
> Youssef GHORBAL wrote:
>> Hi Jason,
>>
>>> On 17 Mar 2021, at 18:17, Jason Breitman
>>> wrote:
>>>
>>> Please review the details below and let me know if there is a setting that
>>> I should apply to my FreeBSD NFS Server or if there is
> On 21 Mar 2021, at 14:41, Jason Breitman
> wrote:
>
> Thanks for sharing as this sounds exactly like my issue.
>
> I had implemented the change below on 3/8/2021 and have experienced the NFS
> hang after that.
> Do I need to reboot or umount / mount all of the clients and then I will be
>
Youssef GHORBAL wrote:
>Hi Jason,
>
>> On 17 Mar 2021, at 18:17, Jason Breitman
>> wrote:
>>
>> Please review the details below and let me know if there is a setting that I
>> should apply to my FreeBSD NFS Server or if there is a bug fix that I can
>> apply to resolve my issue.
>> I shared t
Thanks for sharing as this sounds exactly like my issue.
I had implemented the change below on 3/8/2021 and have experienced the NFS
hang after that.
Do I need to reboot or umount / mount all of the clients and then I will be ok?
I had not rebooted the clients, but would to get out of this situa
The issue did trigger again.
I ran the script below for ~15 minutes and hope this gets you what you need.
Let me know if you require the full output without grepping nfsd.
#!/bin/sh
while true
do
/bin/date >> /tmp/nfs-hang.log
/bin/ps axHl | grep nfsd | grep -v grep >> /tmp/nfs-hang.log
Hi Jason,
> On 17 Mar 2021, at 18:17, Jason Breitman
> wrote:
>
> Please review the details below and let me know if there is a setting that I
> should apply to my FreeBSD NFS Server or if there is a bug fix that I can
> apply to resolve my issue.
> I shared this information with the linux-nf
Scheffenegger, Richard wrote:
>Sorry, I though this was a problem on stable/13.
>
>This is only in HEAD, stable/13 and 13.0 - never MFC'd to stable/12 or
>backported to >12.1
>
>> I did some reshuffling of socket-upcalls recently in the TCP stack, to
>> prevent some race conditions with our $wor
Jason Breitman wrote:
>Thank you for your focus on the issue I am having and I look forward to seeing
>your >patch ported to FreeBSD 12.X.
I'll only be committing the patch I am convinced it actually fixes something.
I'll be looking more closely at it and seeing what mav@ thinks aboutm it.
>I als
Sorry, I though this was a problem on stable/13.
This is only in HEAD, stable/13 and 13.0 - never MFC'd to stable/12 or
backported to 12.1
> I did some reshuffling of socket-upcalls recently in the TCP stack, to
> prevent some race conditions with our $work in-kernel NFS server
> implementatio
be impacted by this.
>
> Richard Scheffenegger
>
>
> -Ursprüngliche Nachricht-
> Von: owner-freebsd-...@freebsd.org Im Auftrag
> von Rick Macklem
> Gesendet: Freitag, 19. März 2021 16:58
> An: tue...@freebsd.org
> Cc: Scheffenegger, Richard ;
> freebsd-net@fr
Thank you for your focus on the issue I am having and I look forward to seeing
your patch ported to FreeBSD 12.X.
I also appreciate that you understand the difficulties in testing changes on a
core piece of infrastructure.
I will let the group know if the issue occurs following the change that
Mount Hangs
NetApp Security WARNING: This is an external email. Do not click links or open
attachments unless you recognize the sender and know the content is safe.
Michael Tuexen wrote:
>> On 18. Mar 2021, at 21:55, Rick Macklem wrote:
>>
>> Michael Tuexen wrote:
>>>
Michael Tuexen wrote:
>> On 18. Mar 2021, at 21:55, Rick Macklem wrote:
>>
>> Michael Tuexen wrote:
On 18. Mar 2021, at 13:42, Scheffenegger, Richard
wrote:
>> Output from the NFS Client when the issue occurs # netstat -an | grep
>> NFS.Server.IP.X
>> tcp0
> On 18. Mar 2021, at 21:55, Rick Macklem wrote:
>
> Michael Tuexen wrote:
>>> On 18. Mar 2021, at 13:42, Scheffenegger, Richard
>>> wrote:
>>>
> Output from the NFS Client when the issue occurs # netstat -an | grep
> NFS.Server.IP.X
> tcp0 0 NFS.Client.IP.X:46896
Michael Tuexen wrote:
>> On 18. Mar 2021, at 13:42, Scheffenegger, Richard
>> wrote:
>>
Output from the NFS Client when the issue occurs # netstat -an | grep
NFS.Server.IP.X
tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049
FIN_WAIT2
>>> I'm no TCP guy
> >>Output from the NFS Client when the issue occurs # netstat -an | grep
> >>NFS.Server.IP.X
> >>tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049
> >>FIN_WAIT2
> >I'm no TCP guy. Hopefully others might know why the client would be stuck in
> >FIN_WAIT2 (I vaguely recall
The laggproto is lacp and the switch is made by Extreme Networks.
Jason Breitman
On Mar 18, 2021, at 4:06 AM, Gerrit Kuehn wrote:
On Wed, 17 Mar 2021 18:17:14 -0400
Jason Breitman wrote:
> I will look into disabling the TSO and LRO options and let the group
> know how it goes. Below are the
> On 18. Mar 2021, at 13:53, Rodney W. Grimes
> wrote:
>
> Note I am NOT a TCP expert, but know enough about it to add a comment...
>
>> Alan Somers wrote:
>> [stuff snipped]
>>> Is the 128K limit related to MAXPHYS? If so, it should be greater in 13.0.
>> For the client, yes. For the server,
> On 18. Mar 2021, at 13:42, Scheffenegger, Richard
> wrote:
>
>>> Output from the NFS Client when the issue occurs # netstat -an | grep
>>> NFS.Server.IP.X
>>> tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049
>>> FIN_WAIT2
>> I'm no TCP guy. Hopefully others might kno
Note I am NOT a TCP expert, but know enough about it to add a comment...
> Alan Somers wrote:
> [stuff snipped]
> >Is the 128K limit related to MAXPHYS? If so, it should be greater in 13.0.
> For the client, yes. For the server, no.
> For the server, it is just a compile time constant NFS_SRVMAXI
>>Output from the NFS Client when the issue occurs # netstat -an | grep
>>NFS.Server.IP.X
>>tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049
>>FIN_WAIT2
>I'm no TCP guy. Hopefully others might know why the client would be stuck in
>FIN_WAIT2 (I vaguely recall this means
On Wed, 17 Mar 2021 18:17:14 -0400
Jason Breitman wrote:
> I will look into disabling the TSO and LRO options and let the group
> know how it goes. Below are the current options on the NFS Server.
> lagg0: flags=8943
> metric 0 mtu 1500
> options=e507bb
What laggproto are you using, and what k
We are using the Intel Ethernet Network Adapter X722.
Jason Breitman
On Mar 17, 2021, at 6:48 PM, Peter Eriksson wrote:
CLOSE_WAIT on the server side usually indicates that the kernel has sent the
ACK to the clients FIN (start of a shutdown) packet but hasn’t sent it’s own
FIN packet - somet
CLOSE_WAIT on the server side usually indicates that the kernel has sent the
ACK to the clients FIN (start of a shutdown) packet but hasn’t sent it’s own
FIN packet - something that usually happens when the server has read all data
queued up from the client and taken what actions it need to shut
Thank you for the responses.
The NFS Client does properly negotiate down to 128K for the rsize and wsize.
The client port should be changing as we are using the noresvport option.
On the NFS Client
cat /proc/mounts
nfs-server.domain.com:/data /mnt/data nfs4
rw,relatime,vers=4.1,rsize=131072,wsiz
Alan Somers wrote:
[stuff snipped]
>Is the 128K limit related to MAXPHYS? If so, it should be greater in 13.0.
For the client, yes. For the server, no.
For the server, it is just a compile time constant NFS_SRVMAXIO.
It's mainly related to the fact that I haven't gotten around to testing larger
s
On Wed, Mar 17, 2021 at 3:37 PM Rick Macklem wrote:
> Jason Breitman wrote:
> >Please review the details below and let me know if there is a setting
> that I should >apply to my FreeBSD NFS Server or if there is a bug fix that
> I can apply to resolve my >issue.
> >I shared this information with
Jason Breitman wrote:
>Please review the details below and let me know if there is a setting that I
>should >apply to my FreeBSD NFS Server or if there is a bug fix that I can
>apply to resolve my >issue.
>I shared this information with the linux-nfs mailing list and they believe the
>issue is >
Please review the details below and let me know if there is a setting that I
should apply to my FreeBSD NFS Server or if there is a bug fix that I can apply
to resolve my issue.
I shared this information with the linux-nfs mailing list and they believe the
issue is on the server side.
Issue
NFS
65 matches
Mail list logo