i believe i tracked it down to a couple lines in db_ldap_bind and fixed
it - dovecot-auth is reconnecting to ldap in the condition where it was
not previously:
Thanks. http://hg.dovecot.org/dovecot-1.0/rev/8dcc215fbc06
timo, i think i found another spot where it won't reconnect.
when ldap_conn_reconnect was getting called, it wasn't completely
reconnecting, and the requests in conn->delayed_requests_tail would
never be processed. when i changed the code to force a connection close
at the start of ldap_conn_reconnect then it would reconnect
successfully. this does cause auth failures when ldap is unconnected
(which from my limited understanding of the code appears to not be the
original desire), but it does cause the system to recover gracefully.
you might be able to come up with a better way to handle this (my c is
weak).
here's a patch that incorporates that one small change and the previous
one as well:
--- dovecot-1.0.3/src/auth/db-ldap.c.orig 2007-12-19
22:01:46.622328000 +0000
+++ dovecot-1.0.3/src/auth/db-ldap.c 2007-12-19 22:03:08.145721000 +0000
@@ -294,7 +294,7 @@
static void ldap_conn_reconnect(struct ldap_connection *conn)
{
- ldap_conn_close(conn, FALSE);
+ ldap_conn_close(conn, TRUE);
if (db_ldap_connect(conn) < 0) {
/* failed to reconnect. fail all requests. */
@@ -446,7 +446,10 @@
msgid = ldap_bind(conn->ld, conn->set.dn, conn->set.dnpass,
LDAP_AUTH_SIMPLE);
if (msgid == -1) {
- db_ldap_connect_finish(conn, ldap_get_errno(conn));
+ if (db_ldap_connect_finish(conn, ldap_get_errno(conn)) < 0) {
+ /* lost connection, close it */
+ ldap_conn_close(conn, TRUE);
+ }
i_free(ldap_request);
return -1;
}