I've been planning on adding these for years. Maybe it's about time soon. I 
guess they could be added already to v2.2, but enabled only by a new setting 
because it requires file format changes that old Dovecots can't then read. I 
could probably patch v2.1 also so it is able to at least read the new format 
without failing. For v2.3 this new format could then be made the default.

And what would the checksums be exactly? Would the standard CRC32 and CRC8 work 
fine, or are there any better ones?

1. dovecot.index

v2.1+ always only fully recreates this file, never overwrites data to it. So 
the checksums could be written only when the dovecot.index is being recreated. 
There are 3 possible things to checksum:

 - header (32bit checksum)
 - all of the mail records (32bit checksum)
 - each mail record independently (8bit checksum per mail)

The header's checksum could be verified every time the index is opened. The 
full mail record checksum could be verified when something appears to be wrong, 
but it's probably a waste of time to check it in normal operation.

I'm not really sure about the per-mail checksums. It would be easy to create 
them while dovecot.index is being created, but after reading the file into 
memory the records are updated in many ways in many places. It's probably not 
worth the complexity and extra slowness to verify and/or update the checksums 
in all the different places. So is it worth it to even have them? In error 
conditions when fixing up indexes it could be useful to skip over records with 
broken checksums (and check if the mail is in dovecot.index.backup with correct 
checksum). Maybe that's enough to be worth 1 byte per message?..

2. dovecot.index.log

This file is only appended to. Each committed transaction could be prefixed in 
the new format with <transaction size><transaction 32bit checksum>. With the 
new format this wouldn't actually increase the log file size much, because 
there is already some space wasted for a compatibility "boundary" record that 
could be removed now.

3. dovecot.index.cache

Cache file is the most complex file. Its headers get overwritten once in a 
while. Probably not worth the trouble to checksum the header itself, and 
there's not a lot that could be done even if a broken checksum was found. But 
each mail_cache_record could have its own checksum. A 8bit checksum could be 
added without increasing the file's size. Maybe that would be enough?

4. dovecot.index.thread

This is a rather simple file and a 32bit checksum could be added to its header, 
and verified every time the file is read (because it's fully read anyway).

5. dovecot.mailbox.log

This file doesn't even have a header. There are 3 unused bytes in each record 
currently. One of them could be used for a new "flags" parameter, with the only 
flag being "checksum added". There would still be space left for 8bit or 16bit 
checksum.

6. Other files

There are also some text files, like dovecot-acl, subscriptions, quota usage 
and Sieve scripts. They probably have to be without checksums for now. 

Reply via email to