Hi, On Tue, 17 Apr 2018, Florian Westphal wrote:
> Dominique Martinet <asmad...@codewreck.org> wrote: > > [ CC Jozsef ] > > > Could it have something to do with the way I setup the connection? > > I don't think the "both remotes call connect() with carefully selected > > source/dest port" is a very common case.. > > > > If you look at the tcpdump outputs I attached the sequence usually is > > something like > > server > client SYN > > client > server SYN > > server > client SYNACK > > client > server ACK > > > > ultimately it IS a connection, but with an extra SYN packet in front of > > it (that first SYN opens up the conntrack of the nat so that the > > client's syn can come in, the client's conntrack will be that of a > > normal connection since its first SYN goes in directly after the > > server's (it didn't see the server's SYN)) > > > > Looking at my logs again, I'm seeing the same as you: > > > > This looks like the actual SYN/SYN/SYNACK/ACK: > > - 14.364090 seq=505004283 likely SYN coming out of server > > - 14.661731 seq=1913287797 on next line it says receiver > > end=505004284 so likely the matching SYN from client > > Which this time gets a proper SYNACK from server: > > 14.662020 seq=505004283 ack=1913287798 > > And following final dataless ACK: > > 14.687570 seq=1913287798 ack=505004284 > > > > Then as you point out some data ACK, where the scale poofs: > > 14.688762 seq=1913287798 ack=505004284+(0) sack=505004284+(0) win=229 > > end=1913287819 > > 14.688793 tcp_in_window: sender end=1913287798 maxend=1913316998 > > maxwin=29312 scale=7 receiver end=505004284 maxend=505033596 maxwin=29200 > > scale=7 > > 14.688824 tcp_in_window: > > 14.688852 seq=1913287798 ack=505004284+(0) sack=505004284+(0) win=229 > > end=1913287819 > > 14.688882 tcp_in_window: sender end=1913287819 maxend=1913287819 maxwin=229 > > scale=0 receiver end=505004284 maxend=505033596 maxwin=29200 scale=7 > > > > As you say, only tcp_options() will clear only on side of the scales. > > We don't have sender->td_maxwin == 0 (printed) so I see no other way > > than we are in the last else if: > > - we have after(end, sender->td_end) (end=1913287819 > sender > > end=1913287798) > > - I assume the tcp state machine must be confused because of the > > SYN/SYN/SYNACK/ACK pattern and we probably enter the next check, > > but since this is a data packet it doesn't have the tcp option for scale > > thus scale resets. > > Yes, this looks correct. Jozsef, can you please have a look? > > Problem seems to be that conntrack believes that ACK packet > re-initializes the connection: > > 595 /* > 596 * RFC 793: "if a TCP is reinitialized ... then it need > 597 * not wait at all; it must only be sure to use sequence > 598 * numbers larger than those recently used." > 599 */ > 600 sender->td_end = > 601 sender->td_maxend = end; > 602 sender->td_maxwin = (win == 0 ? 1 : win); > 603 > 604 tcp_options(skb, dataoff, tcph, sender); > > and last line clears the scale value (no wscale option in data packet). > > > Transitions are: > server > client SYN sNO -> sSS > client > server SYN sSS -> sS2 > server > client SYNACK sS2 -> sSR /* here */ > client > server ACK sSR -> sES > > SYN/ACK was observed in original direction so we hit > state->state == TCP_CONNTRACK_SYN_RECV && dir == IP_CT_DIR_REPLY test > when we see the ack packet and end up in the 'TCP is reinitialized' branch. > > AFAICS, without this, connection would move to sES just fine, > as the data ack is in window. Yes, the state transition is wrong for simultaneous open, because the tcp_conntracks table is not (cannot be) smart enough. Could you verify the next untested patch? diff --git a/include/uapi/linux/netfilter/nf_conntrack_tcp.h b/include/uapi/linux/netfilter/nf_conntrack_tcp.h index 74b9115..bcba72d 100644 --- a/include/uapi/linux/netfilter/nf_conntrack_tcp.h +++ b/include/uapi/linux/netfilter/nf_conntrack_tcp.h @@ -46,6 +46,9 @@ enum tcp_conntrack { /* Marks possibility for expected RFC5961 challenge ACK */ #define IP_CT_EXP_CHALLENGE_ACK 0x40 +/* Simultaneous open initialized */ +#define IP_CT_TCP_SIMULTANEOUS_OPEN 0x80 + struct nf_ct_tcp_flags { __u8 flags; __u8 mask; diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index e97cdc1..8e67910 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -981,6 +981,17 @@ static int tcp_packet(struct nf_conn *ct, return NF_ACCEPT; /* Don't change state */ } break; + case TCP_CONNTRACK_SYN_SENT2: + /* tcp_conntracks table is not smart enough to handle + * simultaneous open. + */ + ct->proto.tcp.last_flags |= IP_CT_TCP_SIMULTANEOUS_OPEN; + break; + case TCP_CONNTRACK_SYN_RECV: + if (dir == IP_CT_DIR_REPLY && index == TCP_ACK_SET && + ct->proto.tcp.last_flags & IP_CT_TCP_SIMULTANEOUS_OPEN) + new_state = TCP_CONNTRACK_ESTABLISHED; + break; case TCP_CONNTRACK_CLOSE: if (index == TCP_RST_SET && (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_MAXACK_SET) Best regards, Jozsef - E-mail : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences H-1525 Budapest 114, POB. 49, Hungary