Yet another 'duplicate' thread

2013-11-12 Thread Jonas Petong
Today I accidentally copied my mails into the same folder where they had been
stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies
within my inbox. Since those duplicates do not have a unique mail-id, it's
hopeless to filter them with mutts integrated duplicate limiting pattern.
Command '~=' has no effect in my case and deleting them by hand
will take me hours!

I know this question has been (unsuccessfully) asked before. Anyhow is there is
a way to tag every other mail (literally every nth mail of my inbox-folder) and
afterwards delete them? I know something about linux-scripting but unfortunately
I have no clue where to start with and even which script-language to use.

This close-to-topic approach with 'fdupes' has been released some time ago
(http://consolematt.wordpress.com/tag/fdupes/) but in my view it seems way to
complicated. As I could recognize from mutts mailing archive, I'm not the only
one who has had trouble with it. Therefore I appreciate any hint which drives me
into the right direction and helps me solving this.

Running Mutt 1.5.21 under Ubuntu Gnome 13.10. (Linux 3.11.0-13-generic).

cheers,
jonas



Re: Yet another 'duplicate' thread

2013-11-12 Thread Ken Moffat
On Tue, Nov 12, 2013 at 07:22:24PM +0100, Jonas Petong wrote:
> Today I accidentally copied my mails into the same folder where they had been
> stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies
> within my inbox. Since those duplicates do not have a unique mail-id, it's
> hopeless to filter them with mutts integrated duplicate limiting pattern.
> Command '~=' has no effect in my case and deleting them by hand
> will take me hours!
> 
> I know this question has been (unsuccessfully) asked before. Anyhow is there 
> is
> a way to tag every other mail (literally every nth mail of my inbox-folder) 
> and
> afterwards delete them? I know something about linux-scripting but 
> unfortunately
> I have no clue where to start with and even which script-language to use.
> 
> This close-to-topic approach with 'fdupes' has been released some time ago
> (http://consolematt.wordpress.com/tag/fdupes/) but in my view it seems way to
> complicated. As I could recognize from mutts mailing archive, I'm not the only
> one who has had trouble with it. Therefore I appreciate any hint which drives 
> me
> into the right direction and helps me solving this.
> 
> Running Mutt 1.5.21 under Ubuntu Gnome 13.10. (Linux 3.11.0-13-generic).
> 
 I don't have a script, but I usually view lists without threading,
using date/time sent in sender's timezone (%d) - I'm sure that using
the local time zone (%D) probably works the same way.  On occasion I've
had to change which of my upstreams was subscribed to heavy-traffic
lists such as lkml, and at other times I've occasionally had mails
appearing twice after upstream problems.  When needed, it's just a
case of looking at the index and deleting every other mail.
Tedious, but achievable - particularly for only 1000 mails - I've
done more than that in the past ;-)

 And after marking a batch to be deleted, I can look at which are
marked (just in case I had finger trouble) and specify the message
number to go to and undelete.

 I believe the order in which I see mails is governed by
index_format [ I haven't looked at this stuff in ages - why break
what works for me ]. Mine is:

set index_format="%4C %Z %{%b %d} %-15.15n (%?l?%4l&%4c?) %s"

 If you aren't a reckless person, turn off incoming mail and backup
the directory or mbox before you try *any* solution.

ĸen
-- 
das eine Mal als Tragödie, dieses Mal als Farce


Re: Yet another 'duplicate' thread

2013-11-12 Thread Chris Down
On 2013-11-12 19:22:24 +0100, Jonas Petong wrote:
> Today I accidentally copied my mails into the same folder where they had been
> stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies
> within my inbox. Since those duplicates do not have a unique mail-id, it's
> hopeless to filter them with mutts integrated duplicate limiting pattern.
> Command '~=' has no effect in my case and deleting them by hand
> will take me hours!
> 
> I know this question has been (unsuccessfully) asked before. Anyhow is there 
> is
> a way to tag every other mail (literally every nth mail of my inbox-folder) 
> and
> afterwards delete them? I know something about linux-scripting but 
> unfortunately
> I have no clue where to start with and even which script-language to use.

for every file:
read file and put the message-id in a dict in { message-id: [file1, 
file2..fileN] } order

for each key in that dict:
delete all filename values except the first

It should not be very complicated to write. If nobody else comes up with
something, I can possibly it for you after work.


pgpfkgvJm0Edy.pgp
Description: PGP signature


Re: Yet another 'duplicate' thread

2013-11-12 Thread Cameron Simpson
On 13Nov2013 09:06, Chris Down  wrote:
> On 2013-11-12 19:22:24 +0100, Jonas Petong wrote:
> > Today I accidentally copied my mails into the same folder where they had 
> > been
> > stored before (evil keybinding!!!) and now I'm faced with about a 1000 
> > copies
> > within my inbox. Since those duplicates do not have a unique mail-id, it's
> > hopeless to filter them with mutts integrated duplicate limiting pattern.
> > Command '~=' has no effect in my case and deleting them by hand
> > will take me hours!
> > 
> > I know this question has been (unsuccessfully) asked before. Anyhow is 
> > there is
> > a way to tag every other mail (literally every nth mail of my inbox-folder) 
> > and
> > afterwards delete them? I know something about linux-scripting but 
> > unfortunately
> > I have no clue where to start with and even which script-language to use.
> 
> for every file:
> read file and put the message-id in a dict in { message-id: [file1, 
> file2..fileN] } order
> 
> for each key in that dict:
> delete all filename values except the first
> 
> It should not be very complicated to write. If nobody else comes up with
> something, I can possibly it for you after work.

Based on Jonas' post:

 Since those duplicates do not have a unique mail-id, it's hopeless
 to filter them with mutts integrated duplicate limiting pattern.
 Command '~=' has no effect

I'd infer that the message-id fields are unique.

Jonas:

_Why_/_how_ did you get duplicate messages with distinct message-ids?
Have you verified (by inspecting a pair of duplicate messages) that
their Message-ID headers are different?

If the message-ids are unqiue for the duplicate messages I would:

  Move all the messages to a Maildir folder if they are not already so.
This lets you deal with each message as a distinct file.

  Write a script long the lines of Chris Down's suggestion, but collate
  messages by subject line, and store a tuple of:
(message-file-path, Date:-header-value, Message-ID:-header-value)

You may then want to compare messages with identical Date: values.

Or, if you are truly sure that the folder contains an exact and complete 
duplicate:
load all the filenames, order by Date:-header, iterate over the list (after 
ordering)
and _move_ every second item into another Maildir folder (in case you're wrong).

  L = []
  for each Maildir-file-in-new,cur:
load in the message headers and get the Date: header string
L.append( (date:-value, subject:-value, maildir-file-path) )

  L = sorted(L)
  for i in range(0, len(L), 2):
move the file L[i][1] into another directory

Note that you don't need to _parse_ the Date: header; if these are
duplicated messages the literal text of the Date: header should be
identical for the adjacent messages. HOWEVER, you probably want to
ensure either that all the identical date/subject groupings are
only pairs, in case of multiple distinct messages with identical
dates.

Cheers,
-- 
Cameron Simpson 

If you can't annoy somebody, there's little point in writing.
- Kingsley Amis