On Jun 25 22:44, Denis Excoffier wrote: > On 2014-06-25 12:15, Corinna Vinschen wrote: > >> Stay tuned. I'm rewriting the LDAP access code to perform all critical > >> LDAP calls in interruptible threads. The Windows LDAP calls don't > >> provide any kind of synchronization, only timeouts. I hoped to get away > >> with short timeouts but it seems I hoped in vain. > >> > >> So the next iteration of this code will not use any timeout other than > >> the default LDAP network timeout of 2 minutes, but the calls will be > >> interruptible by signals. > >> > > > > No more artificial timeouts, but the LDAP calls will be interruptible by > > a signal now. > > > > Also, if an error occurs during ad enumeration, getpwent/getgrent will > > return NULL with errno set accordingly. > > > > Please test, > I did. Again, i instrumented ldap.cc by replacing all debug_printf() calls > with system_printf() because my /usr/bin/strace does not work. Again, i > tested with ‘getent passwd > result’ and 'db_enum: all’. > > I got the following message: > [ldap_init] getent 6024 cyg_ldap::connect_non_ssl: ldap_bind(xxxxxx.zzz) 0x51 > and getent stops after the 376000 users in my own domain. No timeout occurred > but the enumeration was stopped by LDAP_SERVER_DOWN (0x51) [the xxxxxx.zzz > domain name has been edited here but it was completely new to me, never seen > before].
You asked for errors being propagated up the chain to the getpwent/getgrent calls and that's exactly what happens now. There are a lot of LDAP error codes. How is Cygwin supposed to handle every one of them? Do we need a list of ignorable and non-ignorable error codes? Alternatively this gets reverted and Cywin does *not* break the search if an error occurs, but instead skips this domain and starts enumerating the next domain, just as before? > Also, there was a large delay (more than 2 min, say at least 8 minutes) > between > the end of output and the end of getent. I got one single system_printf > message (see above). I can't observe this. It needs debugging in your environment so I know which part of the source is responsible for this delay under what circumstances. (and I still think it's a crazy idea to enumerate 500K users) > More than that, i added system_printf("starting open in domain %W", domain) > immediately at the beginning of cyg_ldap::open, and run ‘getent passwd’ now > during > one minute (wait 60s, then Control-C). I got 1080 ‘starting open in domain > (null)’ > messages on stderr and 1016 normal passwd entries on stdout. The discrepancy > 1016 vs 1080 is ok because stdout was not properly flushed out. 60 seconds for 1016 user entries? That sounds incredibly slow. > It seems that > - domain is printed as ‘(null)’? Strange Not at all. This indicates the primary domain. > - there are as many open() calls as passwd entries in the output? The open function is called for every account, but that doesn't mean it really needs opening. That's what the early return is for. The code starts like this: int cyg_ldap::open (PCWSTR domain) { int ret = 0; /* Already open? */ if (lh) return 0; if ((ret = connect (domain)) != NO_ERROR) goto err; [...] Did you add the system_printf before the "/* Already open? */" comment, by any chance? > Also strange > - EIO (or equivalent) is produced for LDAP_SERVER_DOWN, it probably should be > better if this were not the case? See above. > I suppose it will need more testing, but i’m currently unavailable for tests, > by the way until Friday 08:00 UTC. No worries. Thanks for pulling this through. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat
pgpBTDGNqtABj.pgp
Description: PGP signature