Re: TCP stack lock contention with short-lived connections

Julien Charbon Mon, 26 May 2014 06:37:25 -0700


 Hi Navdeep


On 23/05/14 23:37, Navdeep Parhar wrote:

On 05/23/14 13:52, Julien Charbon wrote:

On 23/05/14 14:06, Julien Charbon wrote:

On 27/02/14 11:32, Julien Charbon wrote:

On 07/11/13 14:55, Julien Charbon wrote:

On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon
<jchar...@verisign.com> wrote:

I have put technical and how-to-repeat details in below
PR:

kern/183659: TCP stack lock contention with short-lived
connections
http://www.freebsd.org/cgi/query-pr.cgi?pr=183659

We are currently working on this performance improvement
effort;  it will impact only the TCP locking strategy not
the TCP stack logic itself.  We will share on freebsd-net
the patches we made for reviewing and improvement
propositions;  anyway this change might also require enough
eyeballs to avoid tricky race conditions introduction in
TCP stack.


Joined the two cumulative patches (tcp-scale-inp-list-v1.patch and
tcp-scale-pcbinfo-rlock-v1.patch) we discussed the most at BSDCan
2014.

[...]

_However_ it would be a miracle that this change does not introduce
new race condition(s) (hence the 'alpha' tag in commit message).
Most of TCP stack locking strategy being now on inpcb lock
shoulders.  That said, from our tests point of view, this change is
completely stable: No kernel/lock assertion, no unexpected TCP
behavior, stable performance results.  Moreover, before tagging
this change as 'beta' we need to test more thoroughly these
features:

- VNET, - PCBGROUP/RSS/TCP timer per cpu, - TCP Offloading (we need
a NIC with a good TCP offloading support)


I can assess the impact (and fix any fallout) on the parts of the
kernel that deal with TCP_OFFLOAD, and the TOE driver in
dev/cxgbe/tom.  But I was hoping to do that only after there was
general agreement on net@ that these locking changes are sound and
should be taken into HEAD. Lack of reviews seems to be holding this
back, correct?

Correct, these changes have been reviewed and tested only internallyyet. As we aren't finding any issues, we share them for wider testing,comments and reviews.

An advice for reviewers: When reading code don't be fooled byremaining ipi_lock read lock (INP_INFO_RLOCK), you should consideripi_lock as completely deleted in TCP stack. All TCP code that wasunder ipi_lock umbrella is now only protected by inp_lock. Just keepthat in mind.

Below, just for your information, more details on context of thesechanges:

o The rough consensus at BSDCan was that there is a shared interestfor scalability improvement of TCP workloads with potential high rate ofconnections establishment and tear-down.


 o Our requirements for this task were:

- Achieve more than 100k TCP connections per second without droppinga single packet in reception- Use a strategy that does not require to change all network stacksin a row (TCP, UDP, RAW, etc.)- Be able to progressively introduce better performance, leveragingalready in place mutex strategy

  - Keep the TCP stack stable (obviously)

 o Solutions we did try to implement and that failed:

#1 Decompose ipi_lock in finer grained locks on ipi_hashbase's bucket(i.e. add a mutex in struct inpcbhead). Did not work as the inducedchange was quite big, and keeping network stacks shared code (inin_pcb.{h, c}) clean was much more difficult than expected.


  #2 Revert the lock ordering from:

ipi_lock > inpcb lock > ipi_hash_lock, pcbgroup locks

 to:

inpcb lock > ipi_lock, ipi_hash_lock, pcbgroup locks

The change was just a all-or-nothing giant patch, it did not handlethe full inp list traversal functions and having a different lockordering between TCP stack and other network stacks was quite achallenge to maintain.


 My 2 cents.

--
Julien

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: TCP stack lock contention with short-lived connections

Reply via email to