On Fri, Jan 13, 2012 at 11:38 AM, Robert Schetterer <rob...@schetterer.org> wrote: > Am 13.01.2012 19:29, schrieb Mark Moseley: >> On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen <t...@iki.fi> wrote: >>> On 13.1.2012, at 4.00, Mark Moseley wrote: >>> >>>> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL >>>> server has gone away" errors, despite having multiple hosts defined in >>>> my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing >>>> the same thing with 2.0.16 on Debian Squeeze 64-bit. >>>> >>>> E.g.: >>>> >>>> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: >>>> MySQL server has gone away >>>> >>>> Our mail mysql servers are busy enough that wait_timeout is set to a >>>> whopping 30 seconds. On my regular boxes, I see a good deal of these >>>> in the logs. I've been doing a lot of mucking with doveadm/dsync >>>> (working on maildir->mdbox migration finally, yay!) on test boxes >>>> (same dovecot package & version) and when I get this error, despite >>>> the log saying it's retrying, it doesn't seem to be. Instead I get: >>>> >>>> dsync(root): Error: user ...: Auth USER lookup failed >>> >>> Try with only one host in the "connect" string? My guess: Both the >>> connections have timed out, and the retrying fails as well (there is only >>> one retry). Although if the retrying lookup fails, there should be an error >>> logged about it also (you don't see one?) >>> >>> Also another idea to avoid them in the first place: >>> >>> service auth-worker { >>> idle_kill = 20 >>> } >>> >> >> With just one 'connect' host, it seems to reconnect just fine (using >> the same tests as above) and I'm not seeing the same error. It worked >> every time that I tried, with no complaints of "MySQL server has gone >> away". >> >> If there are multiple hosts, it seems like the most robust thing to do >> would be to exhaust the existing connections and if none of those >> succeed, then start a new connection to one of them. It will probably >> result in much more convoluted logic but it'd probably match better >> what people expect from a retry. >> >> Alternatively, since in all my tests, the mysql server has closed the >> connection prior to this, is the auth worker not recognizing its >> connection is already half-closed (in which case, it probably >> shouldn't even consider it a legitimate connection and just >> automatically reconnect, i.e. try #1, not the retry, which would >> happen after another failure). >> >> I'll give the idle_kill a try too. I kind of like the idea of >> idle_kill for auth processes anyway, just to free up some connections >> on the mysql server. > > by the way , if you use sql for auth have you tried auth caching ? > > http://wiki.dovecot.org/Authentication/Caching > > i.e. > > # Authentication cache size (e.g. 10M). 0 means it's disabled. Note that > # bsdauth, PAM and vpopmail require cache_key to be set for caching to > be used. > > auth_cache_size = 10M > > # Time to live for cached data. After TTL expires the cached record is no > # longer used, *except* if the main database lookup returns internal > failure. > # We also try to handle password changes automatically: If user's previous > # authentication was successful, but this one wasn't, the cache isn't used. > # For now this works only with plaintext authentication. > > auth_cache_ttl = 1 hour > > # TTL for negative hits (user not found, password mismatch). > # 0 disables caching them completely. > > auth_cache_negative_ttl = 0
Yup, we have caching turned on for our production boxes. On this particular box, I'd just shut off caching so that I could work on a script for converting from maildir->mdbox and run it repeatedly on the same mailbox. I got tired of restarting dovecot between each test :)