On Oct 5, 2011, at 14:46 , Jared Johnson wrote: > eh... didn't mean to send this with no subject :) > >> Our software generates a whole lot of concurrent LDAP traffic right now >> and we started running into an issue where our child processes would hang >> forever waiting around for LDAP operations that had apparently hung on the >> server end. This would happen with start_tls(), search(), and bind() >> calls alike. I noticed that Net::LDAP passes the Timeout parameter passed >> to new() into the socket object, so I presume that the intention is for >> operations other than only connecting to time out (e.g. start_SSL() >> inherits the timeout, so $ldap->start_tls() would presumably time out in >> the same time that the initial connection would have). However this was >> not happening. We dug around debugged for days (repro was difficult) and >> found a couple of changes that would make things work a bit better. >> >> First, and maybe least controversially, the root issue we found was that >> process() winds up doing a can_read() call that potentially blocks >> forever. Changing this is pretty simple: >> >> @@ -824,10 +824,11 @@ sub process { >> my $ldap = shift; >> my $what = shift; >> my $sock = $ldap->socket or return LDAP_SERVER_DOWN; >> + my $timeout = $sock->timeout;
$sock->timeout is intended to be a connect timeout. Why should read timeout be the same. >> my $sel = IO::Select->new($sock); >> my $ready; >> >> - for( $ready = 1 ; $ready ; $ready = $sel->can_read(0)) { >> + for( $ready = 1 ; $ready ; $ready = $sel->can_read( $timeout )) { This would not be sufficient to stop the client hanging. ->process is called from within ->sync which will keep calling process until it gets the message response it is looking for. Also if this message being waited on was the result of a call to, say, ->search. The callback for the search should be called with a client side error >> my $pdu; >> asn_read($sock, $pdu) >> or return _drop_conn($ldap, LDAP_OPERATIONS_ERROR, "Communications >> Error"); >> >> >> We tested that patch and it helped to mitigate this issue -- although not >> perfectly, given that multiple process() calls can happen for a given LDAP >> operation, so the effective timeout can be nudged upward if a lot of >> process() calls hang for a long time but don't quite reach the timeout. >> Still, it's better than nothing. If one were to be a bit pedantic, one >> could set some hash key for the life of an external call to indicate how >> long we've been waiting, and then in process(), do something like: >> >> ( $ldap->{waited} < $timeout ? $sel->can_read( $timeout - $ldap->{waited} >> ) : () ) >> >> That would probably require more robust testing though. >> >> Something that actually worked a bit better for us, but which it seems may >> be less portable, was to set SO_SNDTIMEO and SO_RCVTIMEO on the underlying >> socket right after establishing the connection: >> >> @@ -117,6 +117,15 @@ sub new { >> >> return undef unless $obj->{net_ldap_socket}; >> >> + if ( $arg->{timeout} and IO::Socket->can('SO_SNDTIMEO') ) { >> + my $timeval = pack 'L!L!', $ldap_opts->{timeout}, 0; >> + eval { >> + $obj->{net_ldap_socket}->sockopt( SO_SNDTIMEO, $timeval ) or die >> $!; >> + $obj->{net_ldap_socket}->sockopt( SO_RCVTIMEO, $timeval ) or die >> $!; >> + }; Using low level socket options is almost never the right thing to be doing. using select or poll would be the right solution. Graham.