On 29 May 2012, at 21:09, vasanth rao naik sabavat wrote:

> I am trying to understand the socket <--> protocol layer as part of our 
> project. I was trying to understand why the sotoinpcb() is called before 
> taking any locks. Also, I am trying to understand scenario of a 
> multi-threaded process trying to do socket operations simultaneously on a 
> multicore cpu.
> 
> I have gone through the socket life cycle comments in the code and gave good 
> understanding of the socket life cycle. Thank you for the reference.

Hi Vasanth:

Historically, the so->so_pcb pointer in BSD was protected by spl's, and could 
only be followed safely while at an elevated spl (probably splnet -- details 
forgotten at this point!).

In FreeBSD 6.x, I made a substantial revisions to the semantics of the 
socket<->pcb relationship in order to reduce the amount of synchronisation 
required. Among other things, I made it so that the validity of the so->so_pcb 
pointer is entirely defined by the protocol, and also made it so that all 
protocols could safely follow so->so_pcb without locks held, by virtue of the 
reference model. This trades off slightly greater memory use (inpcbs are always 
allocated for sockets, even after they have closed) for reduced synchronisation 
overhead + improved stability (due to reduced complexity). The socket life 
cycle ensures that no access to so->so_pcb occurs before pru_attach() has 
returned, and also ensures that no socket access will occur from the moment 
pru_detach() is called. As pru_attach() and pru_detach() are responsible for 
allocating and freeing pcb state, this means that all other pru_method() calls 
can safely dereference so_pcb in all protocols.

Synchronisation is required to use the socket, but the nature of the 
synchronisation depends on the protocol, and different protocols use quite 
different locking strategies (e.g., netnatm vs unix domain sockets vs 
IPv4/IPv6). There are similar reference concerns in the other direction, which 
among other things allow TCP to hold a reference on the socket it represents 
until it's done with it, regardless of API-layer close operations. We 
universally place protocol locks before socket-layer locks in the lock order so 
that calls into the socket layer are safe from the protocol while holding locks 
required to stabilise pcbs -- this means that socket locks can't be held over 
calls down the stack, mandating a stronger reference model.

None of this precludes bugs, of course, but the design is fairly coherent. The 
area of greatest weakness in synchronisation in the network stack is actually 
in the socket state machine (so_state and friends), where the stack is unclear 
whether the protocol or the socket layer is driving the state machine. I've 
been gradually pushing in the direction of the protocol driving state 
transitions, since that allows atomicity between layers due to protocol locks 
being held over socket locks when calling into the socket layer from the 
protocol.

Robert_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Reply via email to