I hope he does not take offence at name shortening :)

I've sligtly modified UDP receiving path and run several benchmarks
in the following cases:
1. pure recvfrom() using copy_to_user() with 4k and 40k buffers.
2. recvfrom() remains the same, but skb->data is copied into kernel
buffer which can be mapped into userspace with 4k and 40k buffers
instead of copy_to_user().
3. recvfrom() remains the same, but no data is copied at all, and only 
iovec pointer is increased and it's size decreased.

Receiving is simple userspace application with one thread,
which does blocking read from UDP socket with default socket/stack parameters.

Receiver runs on 2.4 Ghz Xeon (HT enabled) with 1Gb of RAM and e1000 gigabit 
NIC.
Sender runs on amd64 nvidia nforce4 with 1Gb of RAM and r8169 NIC.
Machines are connected with d-link dgs-1216t gigabit switch.

Performance graph attached.

Conclusions:
at least in UDP case with 1gbit NIC performance was not increased,
but it can be the result of either NIC speed (I do not entrust to
nvidia and/or realtek), or broken sender application.
So the only observable result here is CPU usage changes:
it was decreased by 30% for copy_to_user() -> memcpy() changes
with 40k buffers. 4k buffers are too small to see any performance
changes due to syscall overhead.

If we transform CPU related changes to network speed, we still can
not get 6 times (or even 2 times) performance gain.

Luckily TCP processing is much more costly, e1000 interrupt handler
is too big, there are a lot of context switches and other
cache-unfriendly and locking stuff, but I still
wonder where does 6 (!) times performance gain lives.

-- 
        Evgeniy Polyakov

Attachment: netchannel_speed.png
Description: PNG image

Reply via email to