06-02-2012 22:47, Timo Sirainen yazmış:
On 3.2.2012, at 16.16, Mark Zealey wrote:
I was doing some testing on sdbox yesterday. Basically I did the following
procedure:
1) Create new sdbox; deliver 2 messages into it (u.1, u.2)
2) Create a copy of the index file (no cache file created yet)
3) deliver another message to the mailbox (u.3)
4) copy back index file from stage (2)
5) deliver new mail
Then the message delivered in stage 3 ie u.3 gets replaced with the message
delivered in (5) also called u.3.
http://hg.dovecot.org/dovecot-2.1/rev/a765e0a895a9 fixes this.
I've not actually tried this patch yet, but looking at it, it is perhaps
useful for the situation I described below when the index is corrupt. In
this case I am describing however, the not is NOT corrupt - it is simply
an older version (ie it only thinks there are the first 2 mails in the
directory, not the 3rd). This could happen for example when mails are
being stored on different storage than indexes; say for example you have
2 servers with remote NFS stored mails but local indexes that rsync
between the servers every hour. You manually fail over one server to the
other and you then have a copy of the correct indexes but only from an
hour ago. The mails are all there on the shared storage but because the
indexes are out of date, when a new message comes in it will be
automatically overwritten.
(speaking of which, it would be great if force-resync also rebuilt the cache
files if there are valid cache files around, rather than just doing away with
them)
Well, ideally there shouldn't be so much corruption that this matters..
That's true, but in our experience we usually get corruption in batches
rather than a one-off occurrence. Our most common case is something like
this: Say for example there's an issue with the NFS server (assuming we
are storing indexes on there as well now) and so we have to killall -9
dovecot processes or similar. In that case you get a number of corrupted
indexes on the server. Rebuilding the indexes generates an IO storm (say
via lmtp or a pop3 access); then the clients log in via imap and we have
to re-read all the messages to generate the cache files which is a
second IO storm. If the caches were rebuilt at least semi-intelligently
(ie you could extract from the cache files a list of things that had
previously been cached) that would reduce the effects of rare storage
level issues such as this.
Mark