Am 13.01.2012 19:29, schrieb Mark Moseley: > On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen <t...@iki.fi> wrote: >> On 13.1.2012, at 4.00, Mark Moseley wrote: >> >>> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL >>> server has gone away" errors, despite having multiple hosts defined in >>> my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing >>> the same thing with 2.0.16 on Debian Squeeze 64-bit. >>> >>> E.g.: >>> >>> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: >>> MySQL server has gone away >>> >>> Our mail mysql servers are busy enough that wait_timeout is set to a >>> whopping 30 seconds. On my regular boxes, I see a good deal of these >>> in the logs. I've been doing a lot of mucking with doveadm/dsync >>> (working on maildir->mdbox migration finally, yay!) on test boxes >>> (same dovecot package & version) and when I get this error, despite >>> the log saying it's retrying, it doesn't seem to be. Instead I get: >>> >>> dsync(root): Error: user ...: Auth USER lookup failed >> >> Try with only one host in the "connect" string? My guess: Both the >> connections have timed out, and the retrying fails as well (there is only >> one retry). Although if the retrying lookup fails, there should be an error >> logged about it also (you don't see one?) >> >> Also another idea to avoid them in the first place: >> >> service auth-worker { >> idle_kill = 20 >> } >> > > With just one 'connect' host, it seems to reconnect just fine (using > the same tests as above) and I'm not seeing the same error. It worked > every time that I tried, with no complaints of "MySQL server has gone > away". > > If there are multiple hosts, it seems like the most robust thing to do > would be to exhaust the existing connections and if none of those > succeed, then start a new connection to one of them. It will probably > result in much more convoluted logic but it'd probably match better > what people expect from a retry. > > Alternatively, since in all my tests, the mysql server has closed the > connection prior to this, is the auth worker not recognizing its > connection is already half-closed (in which case, it probably > shouldn't even consider it a legitimate connection and just > automatically reconnect, i.e. try #1, not the retry, which would > happen after another failure). > > I'll give the idle_kill a try too. I kind of like the idea of > idle_kill for auth processes anyway, just to free up some connections > on the mysql server.
by the way , if you use sql for auth have you tried auth caching ? http://wiki.dovecot.org/Authentication/Caching i.e. # Authentication cache size (e.g. 10M). 0 means it's disabled. Note that # bsdauth, PAM and vpopmail require cache_key to be set for caching to be used. auth_cache_size = 10M # Time to live for cached data. After TTL expires the cached record is no # longer used, *except* if the main database lookup returns internal failure. # We also try to handle password changes automatically: If user's previous # authentication was successful, but this one wasn't, the cache isn't used. # For now this works only with plaintext authentication. auth_cache_ttl = 1 hour # TTL for negative hits (user not found, password mismatch). # 0 disables caching them completely. auth_cache_negative_ttl = 0 -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria