I hope he does not take offence at name shortening :) I've sligtly modified UDP receiving path and run several benchmarks in the following cases: 1. pure recvfrom() using copy_to_user() with 4k and 40k buffers. 2. recvfrom() remains the same, but skb->data is copied into kernel buffer which can be mapped into userspace with 4k and 40k buffers instead of copy_to_user(). 3. recvfrom() remains the same, but no data is copied at all, and only iovec pointer is increased and it's size decreased.
Receiving is simple userspace application with one thread, which does blocking read from UDP socket with default socket/stack parameters. Receiver runs on 2.4 Ghz Xeon (HT enabled) with 1Gb of RAM and e1000 gigabit NIC. Sender runs on amd64 nvidia nforce4 with 1Gb of RAM and r8169 NIC. Machines are connected with d-link dgs-1216t gigabit switch. Performance graph attached. Conclusions: at least in UDP case with 1gbit NIC performance was not increased, but it can be the result of either NIC speed (I do not entrust to nvidia and/or realtek), or broken sender application. So the only observable result here is CPU usage changes: it was decreased by 30% for copy_to_user() -> memcpy() changes with 40k buffers. 4k buffers are too small to see any performance changes due to syscall overhead. If we transform CPU related changes to network speed, we still can not get 6 times (or even 2 times) performance gain. Luckily TCP processing is much more costly, e1000 interrupt handler is too big, there are a lot of context switches and other cache-unfriendly and locking stuff, but I still wonder where does 6 (!) times performance gain lives. -- Evgeniy Polyakov
netchannel_speed.png
Description: PNG image