Hi Navdeep
On 23/05/14 23:37, Navdeep Parhar wrote:
On 05/23/14 13:52, Julien Charbon wrote:
On 23/05/14 14:06, Julien Charbon wrote:
On 27/02/14 11:32, Julien Charbon wrote:
On 07/11/13 14:55, Julien Charbon wrote:
On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon
<jchar...@verisign.com> wrote:
I have put technical and how-to-repeat details in below
PR:
kern/183659: TCP stack lock contention with short-lived
connections
http://www.freebsd.org/cgi/query-pr.cgi?pr=183659
We are currently working on this performance improvement
effort; it will impact only the TCP locking strategy not
the TCP stack logic itself. We will share on freebsd-net
the patches we made for reviewing and improvement
propositions; anyway this change might also require enough
eyeballs to avoid tricky race conditions introduction in
TCP stack.
Joined the two cumulative patches (tcp-scale-inp-list-v1.patch and
tcp-scale-pcbinfo-rlock-v1.patch) we discussed the most at BSDCan
2014.
[...]
_However_ it would be a miracle that this change does not introduce
new race condition(s) (hence the 'alpha' tag in commit message).
Most of TCP stack locking strategy being now on inpcb lock
shoulders. That said, from our tests point of view, this change is
completely stable: No kernel/lock assertion, no unexpected TCP
behavior, stable performance results. Moreover, before tagging
this change as 'beta' we need to test more thoroughly these
features:
- VNET, - PCBGROUP/RSS/TCP timer per cpu, - TCP Offloading (we need
a NIC with a good TCP offloading support)
I can assess the impact (and fix any fallout) on the parts of the
kernel that deal with TCP_OFFLOAD, and the TOE driver in
dev/cxgbe/tom. But I was hoping to do that only after there was
general agreement on net@ that these locking changes are sound and
should be taken into HEAD. Lack of reviews seems to be holding this
back, correct?
Correct, these changes have been reviewed and tested only internally
yet. As we aren't finding any issues, we share them for wider testing,
comments and reviews.
An advice for reviewers: When reading code don't be fooled by
remaining ipi_lock read lock (INP_INFO_RLOCK), you should consider
ipi_lock as completely deleted in TCP stack. All TCP code that was
under ipi_lock umbrella is now only protected by inp_lock. Just keep
that in mind.
Below, just for your information, more details on context of these
changes:
o The rough consensus at BSDCan was that there is a shared interest
for scalability improvement of TCP workloads with potential high rate of
connections establishment and tear-down.
o Our requirements for this task were:
- Achieve more than 100k TCP connections per second without dropping
a single packet in reception
- Use a strategy that does not require to change all network stacks
in a row (TCP, UDP, RAW, etc.)
- Be able to progressively introduce better performance, leveraging
already in place mutex strategy
- Keep the TCP stack stable (obviously)
o Solutions we did try to implement and that failed:
#1 Decompose ipi_lock in finer grained locks on ipi_hashbase's bucket
(i.e. add a mutex in struct inpcbhead). Did not work as the induced
change was quite big, and keeping network stacks shared code (in
in_pcb.{h, c}) clean was much more difficult than expected.
#2 Revert the lock ordering from:
ipi_lock > inpcb lock > ipi_hash_lock, pcbgroup locks
to:
inpcb lock > ipi_lock, ipi_hash_lock, pcbgroup locks
The change was just a all-or-nothing giant patch, it did not handle
the full inp list traversal functions and having a different lock
ordering between TCP stack and other network stacks was quite a
challenge to maintain.
My 2 cents.
--
Julien
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"