The stack trace showing a null sha1 transform kindof caught my attention here, I wouldnt go by the the GDB call trace coz its obviously a memory
leak and the gdb stack could have been corrupted, many a times I see 0x0
in the frames but when you actually try to print the ctx address it would
be valid. CTX is definitely valid here,

prabhu, earlier I was assuming you are using the linux sha1 in the kernel
which is a loadable module, and I realise your just using plain openssl
from userspace and linking with libcrypto. Linux sha1 has a limitation on
the sha1_tfm structure, perhaps libcrypto sha1 is also the same way?
Its obvious that you have ran out of sha1_tfms which is why when you
actually sleep it helps as other threads would have released theirs.

If you dont mind sending ur client code snipped, I could debug..
my email id would be [EMAIL PROTECTED]

Thanks
--Gayathri



Even reducing the thread stack size didn't help.
I observe that the thread creation as such is not a problem. I create
about 1000 threads , delay in each thread the SSL_connect for about 10
sec.
Once the delay expires and each client make connections to the server
the seg fault occurs.

You know, looking back at your original trace, it seems I may have jumped to conclusions. It's hard to be sure because I don't know what OpenSSL version you are using, so the line numbers don't tell me anything, but check this
out:

#0  SHA1_Init (c=0x0) at sha_locl.h:150
#1  0x405b2bb0 in init (ctx=0x0) at m_sha1.c:72
#2  0x405afc91 in EVP_DigestInit_ex (ctx=0x4d606230, type=0x4061f620,
impl=0x0) at digest.c:207
#3  0x405ac08e in ssleay_rand_add (buf=0x0, num=0, add=
2.5863007356866632e-306) at md_rand.c:263
#4  0x405ace6e in RAND_add (buf=0x8a269f8, num=144861688, entropy=0)
at rand_lib.c:151

I'm guessing frame #2 is this:

        return ctx->digest->init(ctx);

Which calls this:

static int init(EVP_MD_CTX *ctx)
        { return SHA1_Init(ctx->md_data); }

Notice that 'init' was called with a NULL context. But the context cannot have been NULL in frame 2 because if it was ctx->digest would have faulted. So it looks like the stack in frame #2 cannot have lead to the stack in frame #1.

This is not a memory exhaustion issue or a failure to check for NULL. It looks like stack corruption. The real puzzle is why stack corruption would only occur with a large number of threads.

I'm thinking perhaps there's some concurrency issue with ssleay_rand_add, but I've been over it twice and I don't see any issue. The md context would be unique for each thread, so it should be safe.

Maybe someone will read this and it will resonate with something they know? If you can, please tell us what version of OpenSSL this was. This will allow people to understand the line numbers better and make sure they're not looking at code that has whatever bit you already fixed.

        DS


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           [EMAIL PROTECTED]


********************************************************************************
This email message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential, proprietary and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please immediately notify the sender by reply email and destroy all copies of the original message. Thank you.

Intoto Inc.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to