On Thu, 24 May 2012, Vijay Singh wrote:
Folks, I am trying to understand the need to hold the V_tcbinfo lock in part
of this function [code included below for reference].
At present this causes the socket upcall to be called with the V_tcbinfo
lock held, which I'd like to avoid. We release this lock right after this
section.
Looking at the code, it seems the lock is needed if we were in FIN_WAIT_2
and tcp_close() the connection. The lock also seems to be protecting
V_twq_2msl.
Would it be an acceptable solution if we deferred calling socantrecvmore()
till after the lock can be dropped (after the swtich statement).
tcp_twstart() can be changed to return tp if the connections survives, or
NULL if it doesn't, much like what tcp_close() does. Also a new lock could
be added to protect the V_rwq_2msl queue.
If this sounds acceptable, I can generate a patch against -CURRENT. I would
appreciate feedback.
Hi Vijay:
Sorry for responding to your query with such a delay.
A quick glance at tcp_twstart() appears to confirm your description -- I read
it as requiring the incpbinfo lock for several reasons:
1) Potential call to tcp_close().
2) Potential call to tcp_tw_2msl_scan() which may in turn call tcp_twclose(),
which requires the pcbinfo lock (see 5 below).
3) Direct use of V_twq_2msl.
4) Potential call to tcp_tw_2msl_scan(), which uses V_twq_2msl, and may also
call tcp_tw_close (see 5).
5) In the tcp_twclose() callgraph subtree, the pcbinfo lock is used for
several things, including tcp_tw_2msl_stop() and in_pcbfree(). The call to
sofree() may recurse, under some circumstances, back into pru_detach(), and
then into tcp_detach() which requires the lock.
Especially with the pcbgroup reworking so that the pcbhash rather than pcbinfo
lock is used in a number of places, there are increasing numbers of places
where this sort of optimisation could be made.
One caution to ponder: socantrecvmore() can trigger an upcall (per your
comment) in the receive path. It used to be that upcalls did worrying things,
such as call back down the stack -- I think we've largely eliminated those
(NFS UDP was one such example, now fixed?) and therefore they are probably OK,
but worth checking the so_upcall consumers to see what the implications might
be, if any.
Robert
-vijay
relevant code from tcp_do_segment():
if (thflags & TH_FIN) {
if (TCPS_HAVERCVDFIN(tp->t_state) == 0) {
socantrcvmore(so);
/*
* If connection is half-synchronized
* (ie NEEDSYN flag on) then delay ACK,
* so it may be piggybacked when SYN is sent.
* Otherwise, since we received a FIN then no
* more input can be expected, send ACK now.
*/
if (tp->t_flags & TF_NEEDSYN)
tp->t_flags |= TF_DELACK;
else
tp->t_flags |= TF_ACKNOW;
tp->rcv_nxt++;
}
switch (tp->t_state) {
/*
* In SYN_RECEIVED and ESTABLISHED STATES
* enter the CLOSE_WAIT state.
*/
case TCPS_SYN_RECEIVED:
tp->t_starttime = ticks;
/* FALLTHROUGH */
case TCPS_ESTABLISHED:
tp->t_state = TCPS_CLOSE_WAIT;
break;
/*
* If still in FIN_WAIT_1 STATE FIN has not been acked so
* enter the CLOSING state.
*/
case TCPS_FIN_WAIT_1:
tp->t_state = TCPS_CLOSING;
break;
/*
* In FIN_WAIT_2 state enter the TIME_WAIT state,
* starting the time-wait timer, turning off the other
* standard timers.
*/
case TCPS_FIN_WAIT_2:
INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
KASSERT(ti_locked == TI_WLOCKED, ("%s: dodata "
"TCP_FIN_WAIT_2 ti_locked: %d", __func__,
ti_locked));
tcp_twstart(tp);
INP_INFO_WUNLOCK(&V_tcbinfo);
return;
}
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"