On Oct 5, 2011, at 14:46 , Jared Johnson wrote:

> eh... didn't mean to send this with no subject :)
> 
>> Our software generates a whole lot of concurrent LDAP traffic right now
>> and we started running into an issue where our child processes would hang
>> forever waiting around for LDAP operations that had apparently hung on the
>> server end.  This would happen with start_tls(), search(), and bind()
>> calls alike.  I noticed that Net::LDAP passes the Timeout parameter passed
>> to new() into the socket object, so I presume that the intention is for
>> operations other than only connecting to time out (e.g. start_SSL()
>> inherits the timeout, so $ldap->start_tls() would presumably time out in
>> the same time that the initial connection would have).  However this was
>> not happening.  We dug around debugged for days (repro was difficult) and
>> found a couple of changes that would make things work a bit better.
>> 
>> First, and maybe least controversially, the root issue we found was that
>> process() winds up doing a can_read() call that potentially blocks
>> forever.  Changing this is pretty simple:
>> 
>> @@ -824,10 +824,11 @@ sub process {
>>  my $ldap = shift;
>>  my $what = shift;
>>  my $sock = $ldap->socket or return LDAP_SERVER_DOWN;
>> +  my $timeout = $sock->timeout;

$sock->timeout is intended to be a connect timeout. Why should read timeout be 
the same.

>>  my $sel = IO::Select->new($sock);
>>  my $ready;
>> 
>> -  for( $ready = 1 ; $ready ; $ready = $sel->can_read(0)) {
>> +  for( $ready = 1 ; $ready ; $ready = $sel->can_read( $timeout )) {

This would not be sufficient to stop the client hanging. ->process is called 
from within ->sync which will keep calling process until it gets the message 
response it is looking for.

Also if this message being waited on was the result of a call to, say, 
->search. The callback for the search should be called with a client side error

>>    my $pdu;
>>    asn_read($sock, $pdu)
>>      or return _drop_conn($ldap, LDAP_OPERATIONS_ERROR, "Communications
>> Error");
>> 
>> 
>> We tested that patch and it helped to mitigate this issue -- although not
>> perfectly, given that multiple process() calls can happen for a given LDAP
>> operation, so the effective timeout can be nudged upward if a lot of
>> process() calls hang for a long time but don't quite reach the timeout.
>> Still, it's better than nothing.  If one were to be a bit pedantic, one
>> could set some hash key for the life of an external call to indicate how
>> long we've been waiting, and then in process(), do something like:
>> 
>> ( $ldap->{waited} < $timeout ? $sel->can_read( $timeout - $ldap->{waited}
>> ) : () )
>> 
>> That would probably require more robust testing though.
>> 
>> Something that actually worked a bit better for us, but which it seems may
>> be less portable, was to set SO_SNDTIMEO and SO_RCVTIMEO on the underlying
>> socket right after establishing the connection:
>> 
>> @@ -117,6 +117,15 @@ sub new {
>> 
>>   return undef unless $obj->{net_ldap_socket};
>> 
>> +  if ( $arg->{timeout} and IO::Socket->can('SO_SNDTIMEO') ) {
>> +    my $timeval = pack 'L!L!', $ldap_opts->{timeout}, 0;
>> +    eval {
>> +      $obj->{net_ldap_socket}->sockopt( SO_SNDTIMEO, $timeval ) or die
>> $!;
>> +      $obj->{net_ldap_socket}->sockopt( SO_RCVTIMEO, $timeval ) or die
>> $!;
>> +    };

Using low level socket options is almost never the right thing to be doing. 
using select or poll would be the right solution.

Graham.

Reply via email to