Re: timeout in LDAP access

Corinna Vinschen Wed, 25 Jun 2014 14:15:07 -0700

On Jun 25 22:44, Denis Excoffier wrote:
> On 2014-06-25 12:15, Corinna Vinschen wrote:
> >> Stay tuned.  I'm rewriting the LDAP access code to perform all critical
> >> LDAP calls in interruptible threads.  The Windows LDAP calls don't
> >> provide any kind of synchronization, only timeouts.  I hoped to get away
> >> with short timeouts but it seems I hoped in vain.
> >> 
> >> So the next iteration of this code will not use any timeout other than
> >> the default LDAP network timeout of 2 minutes, but the calls will be
> >> interruptible by signals.
> >> 
> > 
> > No more artificial timeouts, but the LDAP calls will be interruptible by
> > a signal now.
> > 
> > Also, if an error occurs during ad enumeration, getpwent/getgrent will
> > return NULL with errno set accordingly.
> > 
> > Please test,
> I did. Again, i instrumented ldap.cc by replacing all debug_printf() calls
> with system_printf() because my /usr/bin/strace does not work. Again, i
> tested with ‘getent passwd > result’ and 'db_enum: all’.
> 
> I got the following message:
> [ldap_init] getent 6024 cyg_ldap::connect_non_ssl: ldap_bind(xxxxxx.zzz) 0x51
> and getent stops after the 376000 users in my own domain. No timeout occurred
> but the enumeration was stopped by LDAP_SERVER_DOWN (0x51) [the xxxxxx.zzz
> domain name has been edited here but it was completely new to me, never seen
> before].


You asked for errors being propagated up the chain to the
getpwent/getgrent calls and that's exactly what happens now.  There are
a lot of LDAP error codes.  How is Cygwin supposed to handle every one
of them?  Do we need a list of ignorable and non-ignorable error codes?

Alternatively this gets reverted and Cywin does *not* break the search
if an error occurs, but instead skips this domain and starts enumerating
the next domain, just as before?

> Also, there was a large delay (more than 2 min, say at least 8 minutes) 
> between
> the end of output and the end of getent. I got one single system_printf
> message (see above).

I can't observe this.  It needs debugging in your environment so I know
which part of the source is responsible for this delay under what
circumstances.

(and I still think it's a crazy idea to enumerate 500K users)

> More than that, i added system_printf("starting open in domain %W", domain)
> immediately at the beginning of cyg_ldap::open, and run ‘getent passwd’ now 
> during
> one minute (wait 60s, then Control-C). I got 1080 ‘starting open in domain 
> (null)’
> messages on stderr and 1016 normal passwd entries on stdout. The discrepancy
> 1016 vs 1080 is ok because stdout was not properly flushed out.

60 seconds for 1016 user entries?  That sounds incredibly slow.

> It seems that
> - domain is printed as ‘(null)’? Strange

Not at all.  This indicates the primary domain.

> - there are as many open() calls as passwd entries in the output?

The open function is called for every account, but that doesn't mean it
really needs opening.  That's what the early return is for.  The code
starts like this:

int
cyg_ldap::open (PCWSTR domain)
{
  int ret = 0;

  /* Already open? */
  if (lh)
    return 0;

  if ((ret = connect (domain)) != NO_ERROR)
    goto err;
  [...]

Did you add the system_printf before the "/* Already open? */" comment,
by any chance?

> Also strange
> - EIO (or equivalent) is produced for LDAP_SERVER_DOWN, it probably should be
>   better if this were not the case?

See above.

> I suppose it will need more testing, but i’m currently unavailable for tests,
> by the way until Friday 08:00 UTC.

No worries.  Thanks for pulling this through.


Corinna
-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

pgpBTDGNqtABj.pgp
Description: PGP signature

Re: timeout in LDAP access

Reply via email to