Hi Karl O. Pinc wrote: > So, I believe it's easy and cheap to add hardware > to a OpenVPN box and create a situation where > the kernel/userspace transition cost does matter.
It's easy and cheap if you add a second box or third. But if you are approaching tens of openvpn boxes (at various locations) and you want to maintain rendundancy, it is not especially cheap nor is it especially easy. Maintaining the system gets a bit trickier and simply relying on ssh and grep doesn't cut it any more. Regarding the idea of moving openvpn to kernel, I think there are MUCH easier ways to improve the situation. As a large part of time is spent in moving data between kernel/user, that interface could be improved. There are two interfaces, udp sockets and tun devices. For both of them, experimental proposed rx/tx ring implementations exist: http://lwn.net/Articles/276856/ vringfd() is a genering buffer ring interface, developed for lguest and seems to be implemented for tun device. Not sure how much it would improve the situation, but I'm guessing quite a bit. http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap packet_ring_tx (and rx, IIRC) interface written by Johann Baudy. Quoting that wiki "I've reached 107Mo/S using a small embedded target (PPC405-300Mhz) where I've never exceeded 20Mo/s with normal raw socket." Also our own test show that using standard (ie, not neccessarily optimized for routing) Opteron servers and intel e1000, we can saturate 1G easilly (with up to 300Kpps in one direction), with about 10% of a single core working in softirq (50% if conntrack is in the picture). However, with openvpn, we can go up to ~80 Kpps without encryption and in that case, rougly half of the load goes into softirq and the other half goes into userland. Most of the drop from 300 to 80 should be wasted work. With encryption, something along the lines of 50Kpps could be maintained (this is an estimation from another earlier test). Furthermore, roughly half of the CPU time goes into user and the other half into kernelland. Thinking out loud - if the machine can route much higher packet rates without breaking a sweat and encryption takes about half of the load, then optimizing the user/kernel interaction could improve performance up to 50%. Even if the ring buffers turn out to be less effective than I optimistically presume, 10% performance increase wouldn't be too shabby either. It doesn't seem too hard to implement at the first glance and if someone wants to try it a compensation could be arranged for that effort. Siim