On Wed, Feb 18, 2004, Paul L. Allen wrote:

> I have a client/server application secured by certificates on both
> ends using OpenSSL 0.9.7c on RedHat 9.  Client and server exchange
> messages consisting of lines of ASCII text using BIO_puts() and
> BIO_gets(). I include a call to BIO_flush() after each BIO_puts() in
> order to ensure that the entire message gets flushed to the other
> side.  The conversation between client and server is transaction-
> oriented in that the client makes a request to the server and the
> server sends back a response.  In some cases the server might issue
> one or more queries back to the client before giving its final response.
> In all cases, the client and the server strictly alternate in sending
> messages.
> 
> I have a version of the client that reads transactions from a file
> and issues them to the server in order.  If I re-arrange the transaction
> order, I can reliably cause the client side to either hang in a
> BIO_flush() call or run to completion.  At the point of the hang, the
> server is waiting for a line in BIO_gets().  The client has returned
> from sending a line with BIO_puts() and is wedged in BIO_flush().  If
> I arrange for NULL crypto to be used, a network trace shows that the
> client's final message has not been sent.  Interestingly, the last
> packet is a TCP ACK from the client containing no data.  I don't know
> enough about TCP or SSL to tell if this is significant.  When the
> client hangs, if I leave it alone for a few minutes it will time
> out and die with an "Alarm Clock" error.  (I'm apparently not handling
> SIGALRM.)  The server then wakes up, notices that the client has gone
> away, and listens for more client connections.  In this particular
> setup, client and server are on the same machine communicating over
> the loopback interface.
> 
> I've been working on this software for some time now, and this
> particular setup of RH9 and OpenSSL 0.9.7c has been stable since
> December.  This is the first time I've seen the software behave this
> way.  Since I've been continuously modifying the software on both
> ends, I'm reasonably certain that the problem is in my code somewhere.
> I'm darned if I can put my finger on the error, however.
> 
> If I attach a debugger to the hung client, it claims to be in
> int_malloc() in /lib/tls/libc.so.6.  This was called by malloc()
> in the same library and gdb sees nothing above this on the stack.
> Very strange.  I've strace'd the compiler and linker while building
> both client and server, and they're clearly picking up my installed
> copy of 0.9.7c in /usr/local rather than the RedHat version in /usr/lib.
> The SSL libraries are linked statically, so there's no confusion at
> runtime about which library to link.
> 
> I'm wondering if anyone can suggest what sorts of things might cause
> the behavior I'm seeing?  Is there a way to ask a chain of BIO's about
> their state of health?  I'm querying the number of bytes waiting in
> the write buffer just before calling BIO_flush().  In all cases,
> including just before the final hang, there are bytes waiting to be
> flushed.  No error is ever reported until the final hang.
> 
> Is this an interesting enough problem?  Anybody have any ideas?
> 

Firstly I hope you are checking the return values from BIO_gets(), BIO_puts()
and BIO_flush().

Presumably you are using a buffering BIO to get the BIO_gets() functionality.
It is possible something odd is happening in there. Are the lines you are
using very long?

It might be an idea to place some debugging printfs around the low level
socket calls in crypto/bio/bss_sock.c to see if the hang is occurring at that
level.

Steve.
--
Dr Stephen N. Henson. Email, S/MIME and PGP keys: see homepage
OpenSSL project core developer and freelance consultant.
Funding needed! Details on homepage.
Homepage: http://www.drh-consultancy.demon.co.uk
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to