On Sun, 2010-09-12 at 08:46 +0200, Vincent Danjean wrote:
> I've a ldap client that is not my dns server and that get its IP (and
> gateway and DNS server) with DHCP. When nslcd is started and a first
> request to nslcd is done before /etc/resolv.conf is correctly filled,
> then this request fails (normal) but also any future requests done
> (even after /etc/resolv.conf is correct).

First of all, it is recommended to use and IP address for your LDAP
server or at least something that can be locally resolved. Otherwise, if
your DNS server is unavailable your LDAP server will also be
unavailable.

> Step to reproduce on my system:
> ifdown eth0 ; sleep 2 ; (sleep 10 ; ifup eth0 ) & (sleep 5 ; id vdanjean ) & 
> nslcd -d
> 
> In this case, command "id vdanjean" gives:
> aya:~# id vdanjean
> id: vdanjean : utilisateur inexistant
> aya:~# id vdanjean
> uid=2001 gid=2001(vdanjean) 
> groupes=4294967295,4(adm),20(dialout),24(cdrom),25(floppy),29(audio),44(video),46(plugdev),100(users),122(kvm),116(libvirt),125(freevo),10000(photos),3000
> aya:~# 
> [when I got the correct answer, it is after several seconds]

When nslcd find that the LDAP server is unavailable it first does a
number of retries (once every second). If nslcd has determined that the
LDAP server is unavailable for 10 seconds it will only retry once every
10 seconds. This mechanism is in place to avoid getting the whole system
locked up while retrying connections to the LDAP server.

> Getting the good answer or the bad one depends on which thread/process
> (I do not know precisely how nslcd works) handles the request. If this
> is a thread launch before /etc/resolv.conf is correct, I got in the
> log:

The availability of the LDAP server is shared between the threads but
each thread (there are 5 by default) has their own connection.

> =========
> nslcd: [1bd7b7] DEBUG: ldap_initialize(ldap://ldap.danjean.fr/)
[...]
> nslcd: [1bd7b7] DEBUG: ldap_simple_bind_s(NULL,NULL) 
> (uri="ldap://ldap.danjean.fr/";)
> nslcd: [1bd7b7] failed to bind to LDAP server ldap://ldap.danjean.fr/: Can't 
> contact LDAP server: No such file or directory
> nslcd: [1bd7b7] no available LDAP server found
> =========
> This is repeated several times.

The "No such file or directory" part is a bit weird. I only reproduce
this if there is no /etc/resolv.conf at all. You should also first get a
couple of lines saying "no available LDAP server found, sleeping 1
seconds".

> When I got an answer, I have the same kind of log, but I also have other
> threadsloging successful ldap requests such as:
> ==========
> nslcd: [8c895d] DEBUG: connection from pid=7998 uid=0 gid=0
> nslcd: [8c895d] DEBUG: nslcd_group_bygid(2001)
> nslcd: [8c895d] DEBUG: myldap_search(base="dc=danjean,dc=fr", 
> filter="(&(objectClass=posixGroup)(gidNumber=2001))")
> nslcd: [8c895d] DEBUG: ldap_result(): end of results
> ==========

This happens if there is already a working connection. When using a
hostname instead of an IP address you are also dependant on what nscd
returns (if you're using that). It may be that nscd also caches negative
host name lookups.

> My guess is that, when a thread fails to resolve a name with the DNS
> due to a bad /etc/resolv.conf file, something is cached and latter
> ldap_simple_bind_s still fail.

This is more or less correct. If nslcd has determined that the LDAP
server is unavailable for more than 10 seconds it will "cache" that
state for 10 seconds.

> The correct fix for this bug would be to find where the info is cached
> and discard it in case of a failed connection.

The whole point of having that information cached is to not have the
whole system hang if the LDAP server becomes unavailable. You could
increase the reconnect_retrytime option in nslcd.conf if you think that
the period should be longer.

> However, this is perhaps to intrusive for squeeze. For squeeze, you
> should be able to, at least, put a script
> in /etc/resolvconf/update-libc.d to restart nslcd when dns changes.

Do you think /etc/resolvconf/update-libc.d is the best place? What
about /etc/network/if-up.d? That should also catch the case where the
network goes up. Then again the init script should have started after
hostname lookups are available ($remote_fs which implies working
networking and $named which implies working hostname lookups).

> Something as simple as:
> if [ -x /etc/init.d/nslcd ]; then
>     /etc/init.d/nslcd restart
> fi

I'm not sure it's that simple. First, you also need to check if nslcd is
running in the first place. Also when you are shutting down
update-libc.d is likely also run (not sure about this though). I'm not
sure you want to restart nslcd then.

In short, I don't see what good solution (short term or otherwise) is
available for this problem. There is an idea that the behaviour of nslcd
should be different during booting (e.g. immediately start testing
whether the LDAP server is available and only become available for the
NSS module when the LDAP server is determined to be up) but it has not
been implemented at this point.

Anyway, thanks for the bugreport. I'll see what I can do about this but
I'm not sure if I can get this into squeeze.

-- 
-- arthur - [email protected] - http://people.debian.org/~adejong --

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to