On 13.Nov 2013, 14:11, Cameron Simpson wrote:
> On 13Nov2013 09:06, Chris Down <ch...@chrisdown.name> wrote:
> > On 2013-11-12 19:22:24 +0100, Jonas Petong wrote:
> > > Today I accidentally copied my mails into the same folder where they had 
> > > been
> > > stored before (evil keybinding!!!) and now I'm faced with about a 1000 
> > > copies
> > > within my inbox. Since those duplicates do not have a unique mail-id, it's
> > > hopeless to filter them with mutts integrated duplicate limiting pattern.
> > > Command '<limit>~=' has no effect in my case and deleting them by hand
> > > will take me hours!
> > > 
> > > I know this question has been (unsuccessfully) asked before. Anyhow is 
> > > there is
> > > a way to tag every other mail (literally every nth mail of my 
> > > inbox-folder) and
> > > afterwards delete them? I know something about linux-scripting but 
> > > unfortunately
> > > I have no clue where to start with and even which script-language to use.
> > 
> >     for every file:
> >         read file and put the message-id in a dict in { message-id: [file1, 
> > file2..fileN] } order
> > 
> >     for each key in that dict:
> >         delete all filename values except the first
> > 
> > It should not be very complicated to write. If nobody else comes up with
> > something, I can possibly it for you after work.
> 
> Based on Jonas' post:
> 
>  Since those duplicates do not have a unique mail-id, it's hopeless
>  to filter them with mutts integrated duplicate limiting pattern.
>  Command '<limit>~=' has no effect
> 
> I'd infer that the message-id fields are unique.
> 
> Jonas:
> 
> _Why_/_how_ did you get duplicate messages with distinct message-ids?
> Have you verified (by inspecting a pair of duplicate messages) that
> their Message-ID headers are different?

First of all, thank you both for tendering me your solutions to my problem. This
was very quick and deep in detail!

Cameron, you were right, the message id's are the same. From the matter of fact
that limiting my Inbox by ~= did not work led me to the conclusion that their
IDs have been different. Seems like you've teached me wrong so.

> 
> If the message-ids are unqiue for the duplicate messages I would:
> 
>   Move all the messages to a Maildir folder if they are not already so.
>     This lets you deal with each message as a distinct file.
> 
>   Write a script long the lines of Chris Down's suggestion, but collate
>   messages by subject line, and store a tuple of:
>     (message-file-path, Date:-header-value, Message-ID:-header-value)
    > 
    > You may then want to compare messages with identical Date: values.

I did now store my duplicates into a new Maildirs for testing purposes. Just to
ensure I can't do any harm to my valid mails I received before yesterday. I did
load my new Maildir with option '-f' and collated them by subject line, as you
suggested.

> Or, if you are truly sure that the folder contains an exact and complete 
> duplicate:
> load all the filenames, order by Date:-header, iterate over the list (after 
> ordering)
> and _move_ every second item into another Maildir folder (in case you're 
> wrong).
> 
>   L = []
>   for each Maildir-file-in-new,cur:
>     load in the message headers and get the Date: header string
>     L.append( (date:-value, subject:-value, maildir-file-path) )
> 
>   L = sorted(L)
>   for i in range(0, len(L), 2):
>     move the file L[i][1] into another directory
> 

Now this part is not too easy to understand for me, even if I'm highly motivated
:-) How do I call this commands and what is my path-environment to
prepend to the script? As annotated before, I made a backup of my original
Maildir so there is no danger of deleting anything important anymore. There is a
chance to solve this by try and error so from my side :-)

> Note that you don't need to _parse_ the Date: header; if these are
> duplicated messages the literal text of the Date: header should be
> identical for the adjacent messages. HOWEVER, you probably want to
> ensure either that all the identical date/subject groupings are
> only pairs, in case of multiple distinct messages with identical
> dates.
> 
> Cheers,
    > -- 
    > Cameron Simpson <c...@zip.com.au>
    > 
    > If you can't annoy somebody, there's little point in writing.
    >         - Kingsley Amis

-- 
"the basis of a healthy, tidy mind is a big trash basket." [Kurt Tucholsky]

Reply via email to