On our production systems we did hit the "relay_connect: no connection in
flight" on a so regular bases that I had to make it non-fatal with the
result of leaking sockets.

After more investigation I found the problem to be a race against
connecting to the backend servers. In short:
- relay_read_http() will open a connection if following conditions are met
  cre->dir == RELAY_DIR_REQUEST && cre->toread <= 0 && cre->dst->bev == NULL

- relay_connect() does not initialize con->se_out.bev (which is also
  cre->dst->bev). Instead this is deferred to relay_connected()

- if a event happens that calls relay_read_http() while connecting to the
  backend then relay_connect() will be called again. Result is the panic
  since the count gets out of sync.

The following diff solves this issue by adding an extra flag to
ctl_relay_event to know if a relay is already connected (or the connect is
pending). relay_close() will then clean the flag when closing the session.
I decided to use a flag since the EMFILE || ENFILE case is hard to detect
otherwise.

Running with this on production with no visible issues at the moment.
I think it would make sense to restructure the http proxy code more and
introduce a proper state machine but that is a much bigger and complex
issue, so lets fix the bug first.

OK?
-- 
:wq Claudio

Index: relay.c
===================================================================
RCS file: /cvs/src/usr.sbin/relayd/relay.c,v
retrieving revision 1.194
diff -u -p -r1.194 relay.c
--- relay.c     18 May 2015 16:57:20 -0000      1.194
+++ relay.c     21 May 2015 08:15:05 -0000
@@ -1496,6 +1498,12 @@ relay_connect(struct rsession *con)
        struct timeval   evtpause = { 1, 0 };
        int              bnds = -1, ret;
 
+       /* relay_connect should only be called once per relay */
+       if (con->se_out.connected) {
+               log_debug("%s: connect already called once", __func__);
+               return (0);
+       }
+
        /* Connection is already established but session not active */
        if ((rlay->rl_conf.flags & F_TLSINSPECT) && con->se_out.s != -1) {
                if (con->se_out.ssl == NULL) {
@@ -1555,6 +1565,8 @@ relay_connect(struct rsession *con)
                        event_del(&rlay->rl_ev);
                        evtimer_add(&con->se_inflightevt, &evtpause);
                        evtimer_add(&rlay->rl_evt, &evtpause);
+                       /* this connect is pending */
+                       con->se_out.connected = 1;
                        return (0);
                } else {
                        if (con->se_retry) {
@@ -1572,6 +1584,7 @@ relay_connect(struct rsession *con)
                }
        }
 
+       con->se_out.connected = 1;
        relay_inflight--;
        DPRINTF("%s: inflight decremented, now %d",__func__,
            relay_inflight);
@@ -1673,6 +1686,7 @@ relay_close(struct rsession *con, const 
                        event_add(&rlay->rl_ev, NULL);
                }
        }
+       con->se_out.connected = 0;
 
        if (con->se_out.buf != NULL)
                free(con->se_out.buf);
Index: relay_http.c
===================================================================
RCS file: /cvs/src/usr.sbin/relayd/relay_http.c,v
retrieving revision 1.47
diff -u -p -r1.47 relay_http.c
--- relay_http.c        22 May 2015 01:34:13 -0000      1.47
+++ relay_http.c        22 May 2015 13:40:38 -0000
@@ -419,7 +419,7 @@ relay_read_http(struct bufferevent *bev,
                relay_reset_http(cre);
  done:
                if (cre->dir == RELAY_DIR_REQUEST && cre->toread <= 0 &&
-                   cre->dst->bev == NULL) {
+                   !cre->dst->connected) {
                        if (rlay->rl_conf.fwdmode == FWD_TRANS) {
                                relay_bindanyreq(con, 0, IPPROTO_TCP);
                                return;
Index: relayd.h
===================================================================
RCS file: /cvs/src/usr.sbin/relayd/relayd.h,v
retrieving revision 1.209
diff -u -p -r1.209 relayd.h
--- relayd.h    2 May 2015 13:15:24 -0000       1.209
+++ relayd.h    21 May 2015 06:55:44 -0000
@@ -200,6 +200,7 @@ struct ctl_relay_event {
        int                      line;
        int                      done;
        int                      timedout;
+       int                      connected;
        enum direction           dir;
 
        u_int8_t                *buf;

Reply via email to