On 25.6.2013, at 14.14, Charles Marcus <cmar...@media-brokers.com> wrote:

>>         + doveadm: Added "deduplicate" command to expunge message duplicates.
> 
> Hey Timo,
> 
> 2 questions on this new 'deduplicate' capability of doveadm...
> 
> Obviously this could be scripted with a cron job, but I was wondering if it 
> wouldn't make sense to do this automatically whenever messages are being 
> moved around in the mailstore?
> 
> An interesting 'feature' of gmail is that if/when you are copying lots of 
> messages from a non gmail account to a gmail account through IMAP, if the 
> folder you are copying from contains duplicate messages, gmail will silently 
> discard the duplicates after the first one is successfully copied up...
> 
> I discovered this a long time ago the first time I encountered an anomaly 
> where I copied an entire folder, but the number of messages on the gmail 
> account didn't match the number in the source folder. After comparing, I 
> discovered that there were duplicates in the source folder, which accounted 
> for the discrepancy.

There's currently no efficient way to do that automatically in Dovecot. Also 
there are several potential problems.. Like if there are duplicate Message-ID: 
headers, but the body is different, should that be a duplicate? What if the 
body is the same but headers differ with e.g. the Subject line (maybe it's just 
[Dovecot] prefix)? What if only the Received: headers are different? And so on..

Anyway, copy&pasting what I just wrote to another reply about doveadm 
deduplicate:

The main idea behind it is to be able to revert some (more or less) accidental 
duplication of emails due to something that admin did, or possibly due to some 
bug in Dovecot (e.g. dsync). There are two modes of operation, both work only 
for duplicates within the same folder:

1) Deduplicate by message GUID. These duplicates could have only been caused by 
copying the mail (IMAP COPY, doveadm copy) or by "doveadm import" that imports 
old messages from e.g. a backup.

2) Deduplicate by Message-Id: header (-m parameter). I added this just because 
some people had asked for it previously. I'm not sure how/when it's actually 
useful.

Reply via email to