On 12.11.2010 03:29, Lawrence Stewart wrote:
On 11/12/10 07:39, Julian Elischer wrote:
On 11/11/10 6:36 AM, Christopher Penney wrote:
Hi,

I have a curious problem I'm hoping someone can help with or at least
educate me on.

I have several large Linux clusters and for each one we hide the compute
nodes behind a head node using NAT.  Historically, this has worked
very well
for us and any time a NAT gateway (the head node) reboots everything
recovers within a minute or two of it coming back up.  This includes NFS
mounts from Linux and Solaris NFS servers, license server connections,
etc.

Recently, we added a FreeBSD based NFS server to our cluster resources
and
have had significant issues with NFS mounts hanging if the head node
reboots.  We don't have this happen much, but it does occasionally
happen.
   I've explored this and it seems the behavior of FreeBSD differs a
bit from
at least Linux and Solaris with respect to TCP recovery.  I'm curious if
someone can explain this or offer any workarounds.

Here are some specifics from a test I ran:

Before the reboot two Linux clients were mounting the FreeBSD server.
They
were both using port 903 locally.  On the head node clientA:903 was
remapped
to headnode:903 and clientB:903 was remapped to headnode:601.  There
is no
activity when the reboot occurs.  The head node takes a few minutes to
come
back up (we kept it down for several minutes).

When it comes back up clientA and clientB try to reconnect to the FreeBSD
NFS server.  They both use the same source port, but since the head
node's
conntrack table is cleared it's a race to see who gets what port and this
time clientA:903 appears as headnode:601 and clientB:903 appears as
headnode:903 (>>>   they essentially switch places as far as the FreeBSD
server would see<<<   ).

The FreeBSD NFS server, since there was no outstanding acks it was
waiting
on, thinks things are ok so when it gets a SYN from the two clients it
only
responds with an ACK.  The ACK for each that it replies with is bogus
(invalid seq number) because it's using the return path the other
client was
using before the reboot so the client sends a RST back, but it never
gets to
the FreeBSD system since the head node's NAT hasn't yet seen the full
handshake (that would allow return packets).  The end result is a
"permanent" hang (at least until it would otherwise cleanup idle TCP
connections).

This is in stark contrast to the behavior of the other systems we have.
   Other systems respond to the SYN used to reconnect with a SYN/ACK.
They
appear to implicitly tear down the return path based on getting a SYN
from a
seemingly already established connection.

I'm assuming this is one of the grey areas where there is no specific
behavior outlined in an RFC?  Is there any way to make the FreeBSD system
more reliable in this situation (like making it implicitly tear down the
return)?  Or is there a way to adjust the NAT setup to allow the RST to
return to the FreeBSD system?  Currently, NAT is setup with simply:

iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -o bond0 -j SNAT --to
1.2.3.4

Where 1.2.3.4 is the intranet address and 10.1.0.0 is the cluster
network.

I just added NFS to the subject because the NFS people are thise you
need to
connect with.

Skimming Chris' problem description, I don't think I agree that this is
an NFS issue and agree with Chris that it's netstack related behaviour
as opposed to application related.

Chris, I have minimal cycles at the moment and your scenario is bending
my brain a little bit too much to give a quick response. A tcpdump
excerpt showing such an exchange would be very useful. I'll try come
back to it when I I have a sec. Andre, do you have a few cycles to
digest this in more detail?

I had very few cycles since EuroBSDCon as well but this weekend my
little son has a sleep over at my mother in law and my wife is at
work.  So I'm going to reduce my FreeBSD backlog.  There are a few
things that have queued up.  I should get enough time to take care
of this one.

--
Andre
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to