Hi David, Thanks and yes, these are the conundrums I'm curious about: 1) why does the process get hung on __read_nocancel (), when the connection is set to non-blocking, and only under heavy congestion?, and 2) if the connection did turn blocking, why aren't the added timeouts working?
I'll keep looking and any 'what to look at' pointers or confirmation would be appreciated. Cheers, Mark On 5/25/09 9:18 PM, "David Schwartz" <dav...@webmaster.com> wrote: > >> Background: the TR-069 client uses the gSoap system that in turn >> calls OpenSSL. The communications to the server at HTTP/SOAP >> based using SSL or non_SSL. The problems are being experienced in >> Linux 2.6.x systems 32-bit and 64-bit, on MIPS and AMD processors; >> i.e. both embedded Linux systems and normal development systems. >> WANem is configured for T1 link, 100ms delay, 10ms jitter, and >> 40% to 50% packet loss. gSoap uses a select() call with timeout >> prior to calling SSL_read. > > Why? You can't 'select' on the decrypted data stream. The SSL_read function > reads *decrypted* data from the OpenSSL output stream, not encrypted data > from the socket (unless that happens to be necessary). It is a serious > mistake to call 'select' prior to calling SSL_read. For example, suppose the > data has already been read from the socket (SSL_write can result in data > being read from the socket), and an SSL_read would completely immediately. > You will be calling 'select' to wait for data that has already been read. > >> In addition, I added code to set >> SO_RCVTIMEO and SO_SNDTIMEO to 60 seconds on the socket. > > Why? You're using non-blocking operations. Fortunately, this will do nothing > because these timeouts only affect blocking operations, but if they did take > affect, the would destroy the integrity of the connection. (They are > timeouts for the operations themselves, not just for the calls that initiate > them.) > >> Various stack backtraces from entry into gSoap are presented >> below, each one was captured from a core file produced from >> kill -3'ing the hung up client. They are just representative of the >> problem happening from several different entry points into OpenSSL. >> >> (gdb) bt 1 >> #0 0x0000003b8d40bf7b in __read_nocancel () from /lib64/libpthread.so.0 >> #1 0x0000003b91499091 in BIO_new_socket () from /lib64/libcrypto.so.6 >> #2 0x0000003b9149766f in BIO_read () from /lib64/libcrypto.so.6 >> #3 0x0000003f9642047d in ssl3_read_n () from /lib64/libssl.so.6 >> #4 0x0000003f964209dd in ssl3_read_bytes () from /lib64/libssl.so.6 >> #5 0x0000003f9641de64 in ssl3_shutdown () from /lib64/libssl.so.6 >> #6 0x0000000000455961 in tcp_disconnect (soap=0x596960) at >> gsoap/stdsoap2.c:4013 >> #7 0x0000000000455c9c in soap_closesock (soap=0x596960) at >> gsoap/stdsoap2.c:4069 > > How is the connection made non-blocking exactly? > >> #0 0x0000003b8d40bf7b in __read_nocancel () from /lib64/libpthread.so.0 >> #1 0x00000000004a8e0b in sock_read () >> #2 0x00000000004a7f8b in BIO_read () >> #3 0x000000000049411d in ssl3_read_n () >> #4 0x0000000000494c60 in ssl3_read_bytes () >> #5 0x0000000000495b44 in ssl3_get_message () >> #6 0x000000000048fea1 in ssl3_get_server_hello () >> #7 0x0000000000490a66 in ssl3_connect () >> #8 0x00000000004947da in ssl3_write_bytes () >> #9 0x0000000000449194 in fsend (soap=0x6a8de0, >> s=0x6a91c0 "POST /dps/TR069 HTTP/1.1\r\nHost: >> 10.2.2.22:8443\r\nUser-Agent: gSOAP/2.7\r\nContent-Type: text/xml; >> charset=utf-8\r\nContent-Length: \ >> 2393\r\nConnection: keep-alive\r\nSOAPAction: \"\"\r\n\r\n\" >> xmlns:xsi=\"http://www.w3"..., n=174) at gsoap/stdsoap2.c:470 >> #10 0x0000000000449859 in soap_flush_raw (soap=0x6a8de0, >> s=0x6a91c0 "POST /dps/TR069 HTTP/1.1\r\nHost: >> 10.2.2.22:8443\r\nUser-Agent: gSOAP/2.7\r\nContent-Type: text/xml; >> charset=utf-8\r\nContent-Length: \ >> 2393\r\nConnection: keep-alive\r\nSOAPAction: \"\"\r\n\r\n\" >> xmlns:xsi=\"http://www.w3"..., n=174) at gsoap/stdsoap2.c:671 >> #11 0x0000000000449589 in soap_flush (soap=0x6a8de0) at >> gsoap/stdsoap2.c:637 >> #12 0x0000000000458e8c in soap_end_send (soap=0x6a8de0) at >> gsoap/stdsoap2.c:5399 > > Are you sure the connection in non-blocking? Are you absolutely 100% sure? > It looks like you simply have forgetten to set the connection non-blocking. > As a result, you may block in one direction forever even though you could > make forward progress in the other direction. > > DS > > > ______________________________________________________________________ > OpenSSL Project http://www.openssl.org > User Support Mailing List openssl-users@openssl.org > Automated List Manager majord...@openssl.org ______________________________________________________________________ OpenSSL Project http://www.openssl.org User Support Mailing List openssl-users@openssl.org Automated List Manager majord...@openssl.org