On Fri, Jan 13, 2012 at 11:38 AM, Robert Schetterer
<rob...@schetterer.org> wrote:
> Am 13.01.2012 19:29, schrieb Mark Moseley:
>> On Fri, Jan 13, 2012 at 1:36 AM, Timo Sirainen <t...@iki.fi> wrote:
>>> On 13.1.2012, at 4.00, Mark Moseley wrote:
>>>
>>>> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL
>>>> server has gone away" errors, despite having multiple hosts defined in
>>>> my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing
>>>> the same thing with 2.0.16 on Debian Squeeze 64-bit.
>>>>
>>>> E.g.:
>>>>
>>>> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying:
>>>> MySQL server has gone away
>>>>
>>>> Our mail mysql servers are busy enough that wait_timeout is set to a
>>>> whopping 30 seconds. On my regular boxes, I see a good deal of these
>>>> in the logs. I've been doing a lot of mucking with doveadm/dsync
>>>> (working on maildir->mdbox migration finally, yay!) on test boxes
>>>> (same dovecot package & version) and when I get this error, despite
>>>> the log saying it's retrying, it doesn't seem to be. Instead I get:
>>>>
>>>> dsync(root): Error: user ...: Auth USER lookup failed
>>>
>>> Try with only one host in the "connect" string? My guess: Both the 
>>> connections have timed out, and the retrying fails as well (there is only 
>>> one retry). Although if the retrying lookup fails, there should be an error 
>>> logged about it also (you don't see one?)
>>>
>>> Also another idea to avoid them in the first place:
>>>
>>> service auth-worker {
>>>  idle_kill = 20
>>> }
>>>
>>
>> With just one 'connect' host, it seems to reconnect just fine (using
>> the same tests as above) and I'm not seeing the same error. It worked
>> every time that I tried, with no complaints of "MySQL server has gone
>> away".
>>
>> If there are multiple hosts, it seems like the most robust thing to do
>> would be to exhaust the existing connections and if none of those
>> succeed, then start a new connection to one of them. It will probably
>> result in much more convoluted logic but it'd probably match better
>> what people expect from a retry.
>>
>> Alternatively, since in all my tests, the mysql server has closed the
>> connection prior to this, is the auth worker not recognizing its
>> connection is already half-closed (in which case, it probably
>> shouldn't even consider it a legitimate connection and just
>> automatically reconnect, i.e. try #1, not the retry, which would
>> happen after another failure).
>>
>> I'll give the idle_kill a try too. I kind of like the idea of
>> idle_kill for auth processes anyway, just to free up some connections
>> on the mysql server.
>
> by the way , if you use sql for auth have you tried auth caching ?
>
> http://wiki.dovecot.org/Authentication/Caching
>
> i.e.
>
> # Authentication cache size (e.g. 10M). 0 means it's disabled. Note that
> # bsdauth, PAM and vpopmail require cache_key to be set for caching to
> be used.
>
> auth_cache_size = 10M
>
> # Time to live for cached data. After TTL expires the cached record is no
> # longer used, *except* if the main database lookup returns internal
> failure.
> # We also try to handle password changes automatically: If user's previous
> # authentication was successful, but this one wasn't, the cache isn't used.
> # For now this works only with plaintext authentication.
>
> auth_cache_ttl = 1 hour
>
> # TTL for negative hits (user not found, password mismatch).
> # 0 disables caching them completely.
>
> auth_cache_negative_ttl = 0


Yup, we have caching turned on for our production boxes. On this
particular box, I'd just shut off caching so that I could work on a
script for converting from maildir->mdbox and run it repeatedly on the
same mailbox. I got tired of restarting dovecot between each test :)

Reply via email to