James Yonan wrote:
David Sommerseth wrote:
M. G. wrote:
Hello,
I recently changed my VPN-tunnel from TCP to UDP for the sake of better performance. It generally works very well but I noticed that I can't connect to my server from some networks when using UDP, e.g. at work. This may be an issue with the NAT/firewall configuration which I have no influence on. Since I am connecting from random locations where I don't know beforehand if a UDP connection will succeed, I'm now kind of worried that it'll get a game of luck if a connection succeeds from a specific location. So far my story, now to the question: What is the reason that OpenVPN doesn't have an option to listen for TCP AND UDP connections simultaneously? Is it a technical problem that I cannot see, or am I simply the only one who thinks this would be a nice feature?

In theory, you are very right, this should not be a big issue. But I believe this is connected to that openvpn is neither forked nor threaded when working with connections. It's all in the same process scope, which does its own scheduling regarding the connections, if I've understood the code right. And to make the current implementation work concurrently with both TCP and UDP will be somewhat more difficult. And due to this, openvpn do not really use the power of multi-core hosts as one single process can only be run on one core at the time.

I've been looking at the whole connection handling, and I've been considering to try to rewrite it and to use proper Posix threading. The challenge with doing that is that it might not work so well on non-posix compatible platforms. And I'm not sure how that would work for the Windows world. But by moving over to a proper threading model, I'd expect both performance and scalability to improve, and also concurrent TCP and UDP setups to work out more easily. And with several threads, openvpn can also make better use of all available CPU cores on the host.

I know I could run two server instances on different tun devices but I think it'd be much nicer (and resource-friendly) if I could just put "proto udp tcp-server" (or whatever) into the config file and be flexible with my connections to the server.

It would sure be nicer to use the same tun device, also from a
configuration perspective (less instances to take care of). But I'm not sure it is that much more resource-friendly, except less memory usage perhaps. The kernel usually won't spend much extra CPU time on sleeping processes or devices.

But on the other side, if your openvpn processes are configured to run on separate CPU cores, you might have a better performance if you have multiple clients connecting at the same time, given the status of the current scheduling implementation.

I worry about trying to debug multi-threaded code -- it could fail in non-deterministic ways. How can we maintain product quality when the code could have subtle race conditions that only show up in heavy production usage and leave no useful info to enable its reproduction? Personally, multithreading scares me, as does any design pattern that has the potential to introduce non-deterministic bugs.

I can understand that threading can complicate debugging somewhat, but with well thought locks this is doable. It is also plenty of code which do this without any issues.

I would also argue that multithreading is not a performance panacea, because of the necessity of using locking primitives such as mutexes, etc. that lock the global bus, and the often overlooked costs of maintaining cache coherency among multiple processors and cores, when multiple threads are writing to the same data structures.

But the current implementation do have reported issues with more than 150 clients to one OpenVPN process. This is an incredibly low number on todays hardware. The current implementation also do not make use of multi-cores, which is a big limitation for bigger enterprises.

In Linux there is also done considerable work on FUTEX, as an enhanced MUTEX - to overcome the performance loss in mutexes can introduce.

http://en.wikipedia.org/wiki/Futex
http://linux.die.net/man/2/futex

I believe a better alternative to multithreading, in OpenVPN's case, is to use multiple processes where each process has it's own asynchronous event reactor (e.g. Python/Twisted, libevent, Boost::Asio), and the processes communicate via messaging, such as over a local unix domain socket or Windows named pipe. This also has a performance advantage, because separate processes aren't going to be fighting over the same memory.

Please ... don't go the Boost path ... that will probably be more a pain. Granted, it do have quite some advantages during development, but the pains you will hit will not always be as light as you might believe. (I've been looking the Apache Qpid code, and Boost really gives some challenges to this project).

Regarding threading vs forking, there is one thing you overlook: The performance loss when the kernel needs to swap between processes vs threads. The kernel spends considerably less time on task swapping between threands.

I wrote two test programs a long time ago to see the performance difference between fork and pthread, and it was quite noticeable. These test programs uses Posix MQ for communication between two processes which is either forked out (separate processes) or threaded. It sends 10 million messages and do not use any form of locking during message transfers. This test was written to do a simple measurement of performance difference between fork vs pthread.

For the forked version, load on a dual core Intel Xeon 1.6GHz peaked at 1.4, average approx. around 1.15 during the run and the time statistics looks like this:

real    0m55.007s
user    0m16.547s
sys     1m26.520s

For the pthread version on the same box, the system load reached peaked at 0.8 and had an approx. average of 0.7. The time statistics of the run is:

real    0m18.653s
user    0m6.952s
sys     0m28.278s

I have the test programs available if you are interested. This test was run on a older Fedora 8 box running a 2.6.24.7 Real-Time kernel. The only system tweaking I did was to increase /proc/sys/fs/mqueue/msg_max to 1000 from 10, to avoid the queue to fill up while sending. Both programs was run using non-realtime schedule priority (SCHED_OTHER).

I also noticed that the pthread version peaked on the Queue size (number of messages on the queue) at 27, while the fork version peaked at 10.

IMHO, threading is the only way to go if you want performance in any product. Granted that you have to have clever locking mechanism, but that is anyway a more sensible model than to implement an internal scheduler as OpenVPN seems to have now for handling all connections. In addition, the kernel do swapping between threads way faster than between separate processes, which again disfavours separate processes.

And again for performance IPC needs to be memory based. Writing to sockets still introduce more latency and makes the kernel work even more on the communication between the processes. Another thing to keep in mind is that you should, in OpenVPN perspective, try to avoid moving data between kernel space and user space as much as possible, which very often happens when using system calls like read() and write().


kind regards,

David Sommerseth


Reply via email to