On Thu, Oct 1, 2009 at 6:07 PM, Dwight Schauer <dscha...@buchanan.com> wrote: > We are using openssl-32bit-0.9.8a-18.26 on SLES 10.2 (x86_64). > > This problem is only occurring on a very small percentage of our installs, > and is not readily repeatable, but is always results in the same back trace. > > Program terminated with signal 11, Segmentation fault. > #0 0xb7bf54e3 in ssl3_read_n () from /usr/lib/libssl.so.0.9.8 > (gdb) bt > #0 0xb7bf54e3 in ssl3_read_n () from /usr/lib/libssl.so.0.9.8 > #1 0x00000000 in ?? () > (gdb) quit > > I'm digging into this issue more and can provide more information when I get > it as far as what is calling ssl_read_n and what is being passed in. > > I have two questions right now though: > > 1) Are there any known issues as to why ssl3_read_n would cause a > segmentation fault?
SSL/SSL_CTX and other OpenSSL structures/'contexts' must be properly initialized using the correct OpenSSL calls, or you'll have trouble on your hands down the lane. Failing to check return codes for any one or more of these init/setup API functions can be quite fatal this way, as you continue on while having a still-uninitialized, or worse: partly init-ed, structures in your hands. To answer #2 at the same time: OpenSSL uses callbacks at several levels, so, say, an uninitialized/not-correctly-set-up SSL, SSL_CTX, BIO, or other 'context' can blow your run-time to kingdom come once it is invoked. Here, the NULL pointer for the function in your call trace is a 'quite probably' hint of such a 'still null == not correctly set up' callback lurking in one (or more) of the above-mentioned data structures. Suspect #1 here (with such extremely limited info) would be some error occurring before or while the SSL filter is attached to the rbio BIO chain (a BIO_s_socket() BIO source/sink maybe?) > 2) Is there any know reason why the call trace would be truncated? There is > nothing past #1) and the other threads have clean call traces. See above. I suggest compiling a OpenSSL library instance in debug mode and all compiler optimizations disabled ('./config -d' should do it for setting this up; doublecheck that with the documentation as I'm rusty on this part) and then checking again; the debug symbols etc. should aid in diagnosing the issue. Meanwhile, a review whether your application code checks and handles /all/ returns (i.e. error feedback) from each and any OpenSSL API invocations, especially the ones which 'configure' a part or whole, may be beneficent as well. It's too often I find applications fail this way because someone failed to check for an error 'which never happens anyway'. (And since you're talking about 'installs', see whether you can distribute 'debug builds' like that to the sites that sometimes exhibit the issue. Did you put assertions in your code to verify run-time assumptions?) -- Met vriendelijke groeten / Best regards, Ger Hobbelt -------------------------------------------------- web: http://www.hobbelt.com/ http://www.hebbut.net/ mail: g...@hobbelt.com mobile: +31-6-11 120 978 -------------------------------------------------- ______________________________________________________________________ OpenSSL Project http://www.openssl.org User Support Mailing List openssl-users@openssl.org Automated List Manager majord...@openssl.org