Hi 

Looking for some insight into mdbox index file management and recovery from 
corruptions.


I have a two node cluster on NFS with proxy director in front for user 
stickness. One node (a nominated master) bidirectionally replicates to a 3rd 
node on a DR site.

We periodically get index file corruptions resulting in rebuilds. However the 
user experience is poor as messages read/deleted from months/years ago all 
reappear as unread again.

We've seen corruption because of NFS NTP time sync problems, proxy not being 
stick, but also the DR node being off line for a while and then tripping 
corruption within production when it comes back on.

Error message example (1 of):

Error: Corrupted dbox file /mailshare/.. (removed) ../home/mail/storage/m.4 
(around offset=993548): EOF reading msg header (got 0/30 bytes)
https://wiki2.dovecot.org/MailboxFormat/dbox- i've read up on all the 
documentation I can find and understand "
you must not lose the dbox index files, they can't be regenerated without data 
loss."

Questions:
#1 Any additional tips for avoiding mdbox index corruptions with dsync? Or 
should I revert to maildir format? I like the performance premise of the mdbox 
but these index corruptions are a reliability issue.

#2 I'm guessing read status is one of the meta data items lost. But its seems 
it can't recover it from dovecot.index.backup files either. Any technique to 
preserve that item as its key to the user experience?

#3 If index/transaction logs are so critical is there some kind of check point 
backups I can take? Native dovecot feature or do I need to script something.

#4 I've noticed that rebuilding the index does not work if the 
dovecot.index.log file is lost (deleted as a hard test). The 
dovecot.index.cache can be but once the log file i gone messages are not 
automatically (or manually that i can find) recovered from the storage 
directory.

I've not seen any dovecot.index.log file corruptions but that file seems very 
high risk. If rebuilding the index only from the log file or a combination 
process from storage directory?

Is there perhaps an option to just use the transaction log and not the index? 
Although that doesn't sound wise for performance.

#5 In additional to status UNREAD we also notice files moved to the trash 
reappear. Is that expected behavior?

Thanks
Raymond


Reply via email to