Daniel Johnson wrote: > David Sommerseth wrote: >> I am willing to believe that this issue might be caused by the >> plugin, somehow. I see you posted a auth-pam.c patch, and it might >> be you have already fixed this issue by now. > > To my understanding, the segfault is indeed occurring when the plugin > could to be evaluating the username and password, but until I can > engineer another round of crashes I can't confirm it. My patch > merely improves the substitution of "USERNAME" and "PASSWORD", > and has been in use since my tests began around mid-August. It could > still be the source of my occasional crashes, but even then something > has changed to magnify the effect.
Well, what I struggled with was that I was using the srand() function. I did not detect any memory leaks, but I got some pointers which might have become wild. I replaced my functions to use OpenSSL rand-functions, and since that time, OpenVPN has been stable. A stray memory pointer is all you need to make it crash. >> One thing to do, to grab more debug info, is to run openvpn server >> via gdb. Another approach can be to run it via valgrind (but that >> chokes performance, noticeably). That way you will be able to get a >> backtrace on more where concretely things goes wrong. > > I'll look into using gdb on Monday. If these were physical machines > I'd think that valgrind wouldn't hurt too badly, we've only got a > 3mbps internet connecting in the first place. But since these are > VMs....well, do you think valgrind might be an additional burden > to the host computers? Valgrind will not give any extra load on the servers, it will just slow down OpenVPN ... and it easily goes down to 20-25 times slower, so with slow, I really mean reeeeaaaaally sloooooow. In small "file" transfers, and especially if not time critical, you might not notice it so much. But I guarantee you that you will notice it if you try to stream media over such a link or do bigger file transfers. Anyway, valgrind with --leak-check=full gives a good overview. If run with --show-reachable=yes, it will provide even more info, but might not reveal too much in this phase. And be aware, it might show up a lot of false positives as well. Many of them seems to be connected to OpenSSL (BIO calls, among others) >> The main reason I am guessing it is the plugin, is that I've been >> writing a plugin myself and struggled with a very unstable OpenVPN >> solution until I found some NULL pointers and string functions, which >> caused SEGV. After those things were sorted out and fixed, the >> plugin and OpenVPN have been rock solid again. > >> Please confirm my suspicion or send a backtrace from gdb. To do that >> in a good way, OpenVPN must have debug information, ie. compiled with >> '-g'. > > I'll go ahead and compile with -g on both servers, but I think I'll > wait to upgrade Mercury to v2.1_rc15 until I look at gdb. Perhaps > it will be helpful to have one crash in rc13 and one in rc15? Probably will not make a big difference, if the assumption is correct that the instability is caused by the plug-in. Btw. I'm running v2.1_rc15 on the server side with my plug-in (http://www.eurephia.net/ - also an authentication plug-in, with iptables integration) without any such issues at all. Both valgrind and gdb will give an idea of where things crashes. gdb will be more precise, but valgrind can often confirm or give a very good indication as well. If you set ulimit -c unlimited, you should get a core file as well, which can be used with gdb afterwards. kind regards, David Sommerseth