On 5/2/2016 9:37 AM, Sowmini Varadhan wrote:
On (05/02/16 09:20), Santosh Shilimkar wrote:
        rds_conn_transition(conn, RDS_CONN_DOWN, RDS_CONN_CONNECTING);
+       if (rs_tcp->t_sock) {
+               /* Need to resolve a duelling SYN between peers.
+                * We have an outstanding SYN to this peer, which may
+                * potentially have transitioned to the RDS_CONN_UP state,
+                * so we must quiesce any send threads before resetting
+                * c_transport_data.
+                */
+               wait_event(conn->c_waitq,
+                          !test_bit(RDS_IN_XMIT, &conn->c_flags));
Would it be good to check the return value of rds_conn_transition()
since if CONN is already UP above will fail and then send message
might again race and we will let message through even though passive
hasn't finished its connection.

no, that was the original issue that I was running into, which needed
commit 241b2719 - prior to that commit, if the conn was already UP,
we'd end up doing a rds_conn_drop on a good connection, and both sides
would end up in a pair of infinite 3WH loops. Even if we dont do
a rds_conn_drop on the UP connection, we've just (before
rds_tcp_accept_one) sent out a syn-ack on the incoming syn, and now
need to RST that syn-ac.  The other side is going to receive the rst,
and get confused about what to clean up (since there's already an UP
connection going on).

In short, when there is a duel, it's cleanest to have a deterministic
arbitration- both sides use the numeric value of saddr and faddr to
figure out which side is active, which side is passive. (Thus the
basis on the BGP router-id based model for 241b2719)

FWIW, much of this is actually a corner case-  in practice, its not
frequent to have syns crossing each other at "almost the same time".

Sounds good. Thanks for expanding it. Patch looks good to me.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Reply via email to