Re: [Openvpn-devel] Get server to listen for TCP and UDP connections

David Sommerseth Wed, 03 Jun 2009 03:46:09 +0000


James Yonan wrote:

David Sommerseth wrote:
M. G. wrote:
Hello,
I recently changed my VPN-tunnel from TCP to UDP for the sake ofbetter performance. It generally works very well but I noticed that Ican't connect to my server from some networks when using UDP, e.g. atwork. This may be an issue with the NAT/firewall configuration whichI have no influence on.Since I am connecting from random locations where I don't knowbeforehand if a UDP connection will succeed, I'm now kind of worriedthat it'll get a game of luck if a connection succeeds from aspecific location.So far my story, now to the question:What is the reason that OpenVPN doesn't have an option to listen forTCP AND UDP connections simultaneously? Is it a technical problemthat I cannot see, or am I simply the only one who thinks this wouldbe a nice feature?
In theory, you are very right, this should not be a big issue. But Ibelieve this is connected to that openvpn is neither forked northreaded when working with connections. It's all in the same processscope, which does its own scheduling regarding the connections, ifI've understood the code right. And to make the currentimplementation work concurrently with both TCP and UDP will besomewhat more difficult. And due to this, openvpn do not really usethe power of multi-core hosts as one single process can only be run onone core at the time.
I've been looking at the whole connection handling, and I've beenconsidering to try to rewrite it and to use proper Posix threading.The challenge with doing that is that it might not work so well onnon-posix compatible platforms. And I'm not sure how that would workfor the Windows world. But by moving over to a proper threadingmodel, I'd expect both performance and scalability to improve, andalso concurrent TCP and UDP setups to work out more easily. And withseveral threads, openvpn can also make better use of all available CPUcores on the host.
I know I could run two server instances on different tun devices butI think it'd be much nicer (and resource-friendly) if I could justput "proto udp tcp-server" (or whatever) into the config file and beflexible with my connections to the server.
It would sure be nicer to use the same tun device, also from a
configuration perspective (less instances to take care of). But I'mnot sure it is that much more resource-friendly, except less memoryusage perhaps. The kernel usually won't spend much extra CPU time onsleeping processes or devices.
But on the other side, if your openvpn processes are configured to runon separate CPU cores, you might have a better performance if you havemultiple clients connecting at the same time, given the status of thecurrent scheduling implementation.
I worry about trying to debug multi-threaded code -- it could fail innon-deterministic ways. How can we maintain product quality when thecode could have subtle race conditions that only show up in heavyproduction usage and leave no useful info to enable its reproduction?Personally, multithreading scares me, as does any design pattern thathas the potential to introduce non-deterministic bugs.

I can understand that threading can complicate debugging somewhat, but withwell thought locks this is doable. It is also plenty of code which do thiswithout any issues.

I would also argue that multithreading is not a performance panacea,because of the necessity of using locking primitives such as mutexes,etc. that lock the global bus, and the often overlooked costs ofmaintaining cache coherency among multiple processors and cores, whenmultiple threads are writing to the same data structures.

But the current implementation do have reported issues with more than 150clients to one OpenVPN process. This is an incredibly low number on todayshardware. The current implementation also do not make use of multi-cores,which is a big limitation for bigger enterprises.

In Linux there is also done considerable work on FUTEX, as an enhancedMUTEX - to overcome the performance loss in mutexes can introduce.


http://en.wikipedia.org/wiki/Futex
http://linux.die.net/man/2/futex

I believe a better alternative to multithreading, in OpenVPN's case, isto use multiple processes where each process has it's own asynchronousevent reactor (e.g. Python/Twisted, libevent, Boost::Asio), and theprocesses communicate via messaging, such as over a local unix domainsocket or Windows named pipe. This also has a performance advantage,because separate processes aren't going to be fighting over the samememory.

Please ... don't go the Boost path ... that will probably be more a pain.Granted, it do have quite some advantages during development, but the painsyou will hit will not always be as light as you might believe. (I've beenlooking the Apache Qpid code, and Boost really gives some challenges tothis project).

Regarding threading vs forking, there is one thing you overlook: Theperformance loss when the kernel needs to swap between processes vsthreads. The kernel spends considerably less time on task swapping betweenthreands.

I wrote two test programs a long time ago to see the performance differencebetween fork and pthread, and it was quite noticeable. These test programsuses Posix MQ for communication between two processes which is eitherforked out (separate processes) or threaded. It sends 10 million messagesand do not use any form of locking during message transfers. This test waswritten to do a simple measurement of performance difference between forkvs pthread.

For the forked version, load on a dual core Intel Xeon 1.6GHz peaked at1.4, average approx. around 1.15 during the run and the time statisticslooks like this:


real    0m55.007s
user    0m16.547s
sys     1m26.520s

For the pthread version on the same box, the system load reached peaked at0.8 and had an approx. average of 0.7. The time statistics of the run is:


real    0m18.653s
user    0m6.952s
sys     0m28.278s

I have the test programs available if you are interested. This test wasrun on a older Fedora 8 box running a 2.6.24.7 Real-Time kernel. The onlysystem tweaking I did was to increase /proc/sys/fs/mqueue/msg_max to 1000from 10, to avoid the queue to fill up while sending. Both programs wasrun using non-realtime schedule priority (SCHED_OTHER).

I also noticed that the pthread version peaked on the Queue size (number ofmessages on the queue) at 27, while the fork version peaked at 10.

IMHO, threading is the only way to go if you want performance in anyproduct. Granted that you have to have clever locking mechanism, but thatis anyway a more sensible model than to implement an internal scheduler asOpenVPN seems to have now for handling all connections. In addition, thekernel do swapping between threads way faster than between separateprocesses, which again disfavours separate processes.

And again for performance IPC needs to be memory based. Writing to socketsstill introduce more latency and makes the kernel work even more on thecommunication between the processes. Another thing to keep in mind is thatyou should, in OpenVPN perspective, try to avoid moving data between kernelspace and user space as much as possible, which very often happens whenusing system calls like read() and write().



kind regards,

David Sommerseth

Re: [Openvpn-devel] Get server to listen for TCP and UDP connections

Reply via email to