On 06 Mar 2015, at 21:58, Daniel Miller <dmil...@amfes.com> wrote:
> 
> On 3/6/2015 7:53 AM, Timo Sirainen wrote:
>> http://dovecot.org/releases/2.2/rc/dovecot-2.2.16.rc1.tar.gz
>> http://dovecot.org/releases/2.2/rc/dovecot-2.2.16.rc1.tar.gz.sig
>> 
>> Looks like it's been a long time since v2.2.15. There have been a ton of 
>> changes since it was released though, so here's a release candidate first to 
>> find out if somebody can find any bugs before the final v2.2.16.
>> 
>> Unfortunately I haven't had time/energy to read Dovecot mailing list for a 
>> while now. I'm hoping this will change, but I don't really expect it to 
>> happen anytime soon. On the positive side for Dovecot, it's now becoming 
>> used in more and more multi-million user installations, which brings all 
>> kinds of nice new improvements.
> 
> Great to hear both Dovecot and you are doing well.  I do need to ask you to 
> check the list for two threads:
> 
> mdbox attachment errors
> Rebuilding SIS attachment links from log
> 
> A few of us have been having SIS problems.

Unless there's a way to reproduce a bug I don't think I can do anything about 
it (I could spend hours looking at the code or trying to reproduce it and come 
up with nothing). But a while ago I did think about a SIS redesign that would 
make it much less likely to break - just need to get it actually implemented:

Currently single instance storage works by having one global directory that 
contains all the attachments. They are hashed by the attachment content, so for 
example /var/attachments/ac/7d/ac7d1274891248912489124 would be the attachment. 
Then each instance would have its own hard link to it, e.g. 
/var/attachments/ac/7d/hashes/ac7d1274891248912489124-1234567890. sdbox and 
mdbox can use these by containing the "ac7d1274891248912489124-1234567890" in 
the header metadata. When mail is deleted, the hard link is deleted. If the 
link count had been 2, the original attachment file was deleted also. (There's 
of course some race conditions here, but in those rare situations the 
attachment would just be duplicated, which isn't too bad.)

The main problem with the old design is that all the users' attachments are 
dumped into a single global directory. It's difficult to take backups and in 
general it seems too difficult to manage correctly so I haven't really 
recommended using it in any bigger installations.

So here's the new idea, which is nearly the same as the old, but with a small 
change that makes it much nicer I think:

Instead of storing the attachment hard links to a global dir, store the hard 
links under the user's mail dir. This way taking backups doesn't require 
anything complicated, just tar the user's mail dir. You can rm -rf the user 
without forever leaving the user's attachments lying around in the global dir 
(assuming there's a job that periodically cleans out attachments with link 
count=1). In general there's no easy way to accidentally break things.

The only new complication here is that if users are split to multiple 
filesystems, hard linking across them isn't going to work. So this would then 
require not only having a per-user mail directory but also per-user attachment 
directory (which would actually be the per-filesystem attachment dir).

The SIS is implemented as lib-fs backend wrapper, so a new one could be 
implemented easily without breaking the old one.

Reply via email to