You just saved my career!
Next time you're near Columbus, OH, USA, look me up - I'll buy you all the
beer you can drink.
Thanks a million, Mr. Wallace! I've been trying to solve this problem for a
month and have been getting nowhere.
Bill Rebey
***NOTE: For everyone else out there who isn't familiar with this call, it
is...um...pretty important. The full extent of the code that I added that
corrects this problem is:
//------------------
extern "C" unsigned long pthreads_thread_id ()
{
return (pthread_self ());
}
...
...in main (), at initialization time...
...
CRYPTO_set_id_callback(pthreads_thread_id);
//-------------------
This code is apparently REQUIRED code for Multithreaded apps. You may not
be "seeing" any problems, but they are there. I was typically getting this
crash to happen only after several thousand threads had come up and gone
down, with 80 simultaneous threads running at all times at a rate of about
1000 threads a second. (A pretty stressful test.) However, in a more
realistic scenario with threads starting and stopping at a much slower rate,
your program may mysteriously crash periodically, perhaps every few hours,
days, months, etc. Put this code in your app and you're golden. Since I've
started typing this, my test app has burned through about 3.8 million
threads without a peep or a memory leak. It used to crash after just a few
thousand threads and a few seconds.
-----Original Message-----
From: Wallace, William [mailto:[EMAIL PROTECTED]]
Sent: Monday, August 07, 2000 3:41 PM
To: '[EMAIL PROTECTED]'
Subject: RE: Crash bug exemplified
Try adding something like this to the initialization section of your test
program.
CRYPTO_set_id_callback((unsigned long (*)()) pthreads_thread_id);
> -----Original Message-----
> From: Bill Rebey [mailto:[EMAIL PROTECTED]]
> Sent: Monday, August 07, 2000 2:25 PM
> To: Openssl-Dev (E-mail)
> Subject: Crash bug exemplified
>
>
> The attached program is about as small as I can make a test app that
> exemplifies the problem that my server application is having.
> I have posted
> about it repeatedly with no results, probably because nobody
> can (or wants
> to <g>) reproduce it. This little test program is only about
> 160 lines long
> with comments. It just tries to keep a bunch of transient
> threads going at
> once (the threads don't do anything - they just exit after
> sleeping for a
> millisecond).
>
> <<comp>> <<link>> <<tst.cpp>>
> This problem happens on SPARC Solaris. This program demonstrates the
> problem very quickly (usually within a minute) on both a
> SPARC Ultra-2 with
> Solaris 2.6, and a SPARC Ultra-60 with Solaris 2.8. My
> "real" app doesn't
> crash nearly this fast, as it doesn't put nearly the
> stress-test on OpenSSL
> that the test app does - but it most certainly crashes every
> time I test it;
> it just takes hours instead of seconds.
>
> Can anyone reproduce this and fix it? I'm in a VERY bad spot
> here because I
> can't ship my product until I get OpenSSL to work. My
> company pretty much
> threw sand in RSA's face in favor of using OpenSSL, on my
> recommendation,
> and now I can't make OpenSSL work and we can't ship my
> product. This is
> hardly a great career move for me. If anyone can identify
> and fix this bug,
> I would greatly appreciate it. I look pretty stupid right
> now to the folks
> in upper management, and I feel like my hands are tied. I'm
> trying to use
> Purify to determine the problem, but I've never used it
> before and will
> probably be slow to figure out how to make it work and
> understand exactly
> what it's telling me.
>
> If anyone sees any obvious misuse problem, PLEASE let me
> know. I would LOVE
> to hear "you're doing it wrong - you forgot to make this
> function call!" and
> be done with it, but as far as I can tell, I'm obeying the
> OpenSSL usage
> laws to the letter.
>
> If you run the "comp" and "link" scripts to build this little
> test program,
> then run the resultant "tst" executable, it should crash
> after a short time
> and if you run dbx against the resultant core, you should get
> the following
> stack in response to the dbx "where" command:
>
> core file header read successfully
> Reading ld.so.1
> Reading libsocket.so.1
> Reading libCrun.so.1
> Reading libm.so.1
> Reading libw.so.1
> Reading libthread.so.1
> Reading libc.so.1
> Reading libnsl.so.1
> Reading libdl.so.1
> Reading libmp.so.2
> Reading libc_psr.so.1
> detected a multithreaded program
> t@3937 (l@48) terminated by signal BUS (invalid address alignment)
> Current function is ThreadMain
> 100 int iErr = ERR_get_error ();
> (/opt/SUNWspro/bin/../WS6/bin/sparcv9/dbx) where
> current thread: t@3937
> [1] t_delete(0x9, 0xff2b6000, 0x150, 0x65300, 0x651a8, 0x150), at
> 0xff241798
> [2] realfree(0x9, 0xff2bc7b0, 0xff2b6000, 0x65300, 0x153,
> 0x65308), at
> 0xff241420
> [3] cleanfree(0x0, 0xff2b6000, 0xff2bc724, 0xff2bc7a4,
> 0xff2bc730, 0x0),
> at 0xff241cb4
> [4] _malloc_unlocked(0x60, 0x0, 0xff2b6000, 0x60, 0x5,
> 0x0), at 0xff240e20
> [5] malloc(0x60, 0x60, 0x62798, 0x150, 0x0, 0x0), at 0xff240d3c
> [6] CRYPTO_malloc(0x5a5b0, 0x470d0, 0x77, 0x5a400, 0x470d0,
> 0x60), at
> 0x17070
> [7] lh_new(0x1cba0, 0x1cbb8, 0x470d0, 0x2be, 0x1cbb8,
> 0x14c), at 0x34604
> [8] ERR_get_state(0x5a400, 0x0, 0x673e0, 0x430d8, 0x673e0,
> 0xf7509b28), at
> 0x1ce6c
> [9] get_error_values(0x1, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x1c4a0
> =>[10] ThreadMain(pNothing = (nil)), line 100 in "tst.cpp"
> (/opt/SUNWspro/bin/../WS6/bin/sparcv9/dbx) quit
>
> Thanks for your help,
>
> Bill Rebey
>
>
>
>
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]
______________________________________________________________________
OpenSSL Project http://www.openssl.org
User Support Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]