Re: Remove "Duplicate" emails (and documentation update)

Joseph Tam Fri, 23 Feb 2018 15:48:07 -0800


On Fri, 23 Feb 2018, @lbutlr wrote:

$ doveadm -f table fetch -u kremels 'hdr.message-id guid uid
hdr.x-listname' mailbox "Archive" | sort| awk 'cnt[$1]++{if
(cnt[$1]==2) print prev[$1]; print} {prev[$1]=$0}' |grep -E "[0-9] +$"
|awk '{print "doveadm expunge -u kremels MAILBOX-GUID "$2" UID "$3}?


I was unaware of the syntax "hdr.{header}" -- all the reference materials
I've seen only refers to "hdr" which returns the entire header block.
This is handy to know because up to now, I've been filtering "hdr"
fetches through grep.  I've tried updating the Wiki, but it's immutable,
so would someone update the documentation:

        https://wiki.dovecot.org/Tools/Doveadm/Fetch
        (and man page in distribution)

        hdr[.{x}]
                Header {x} of message.  If missing, the
                entire header is fetched.

First, even after expunging a message and running doveadm index -u
kremels ?Archive?, subsequent runs still show the same duplicate
messages.


I suspect client side caching.  If you query IMAP directly, does
it report the correct number of messages?

        (Using openssl s_client, or netcat or telnet, or whatever)
        x1 LOGIN kremels yourpassword
        x2 SELECT INBOX
                ... look for "* {count} EXISTS" ...
        x3 LOGOUT

If {count} is what you expected, then dovecot has the correct information
and it's likely some client-side caching issue.

Second, what I really want to do is run this over ALL the mailboxes,
except for Junk and Sent but if that is possible I can?t find the right
syntax.


You mean to remove duplicates from any 2 mailboxes, or remove duplicates
in mailboxes also found in Archive?

If the latter, try

        doveadm -f table fetch -u kremels \
                hdr.message-id \
                mailbox Archive \
                | sort -b >list0

        doveadm -f table fetch -u kremels \
                'hdr.message-id guid uid' \
                NOT mailbox Archive \
                NOT mailbox Junk \
                NOT mailbox Sent \
                | sort -b >list1

The list of duplicate message-id, guid and uid will then be ...

        join -j1 list0 list1

You can process it via awk with one invocation of doveadm (2nd form
without exclusion of Archive) but you'll need to know the guid of
Archive beforehand.

Joseph Tam <jtam.h...@gmail.com>

Re: Remove "Duplicate" emails (and documentation update)

Reply via email to