Yet another 'duplicate' thread
Today I accidentally copied my mails into the same folder where they had been stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies within my inbox. Since those duplicates do not have a unique mail-id, it's hopeless to filter them with mutts integrated duplicate limiting pattern. Command '~=' has no effect in my case and deleting them by hand will take me hours! I know this question has been (unsuccessfully) asked before. Anyhow is there is a way to tag every other mail (literally every nth mail of my inbox-folder) and afterwards delete them? I know something about linux-scripting but unfortunately I have no clue where to start with and even which script-language to use. This close-to-topic approach with 'fdupes' has been released some time ago (http://consolematt.wordpress.com/tag/fdupes/) but in my view it seems way to complicated. As I could recognize from mutts mailing archive, I'm not the only one who has had trouble with it. Therefore I appreciate any hint which drives me into the right direction and helps me solving this. Running Mutt 1.5.21 under Ubuntu Gnome 13.10. (Linux 3.11.0-13-generic). cheers, jonas
Re: Yet another 'duplicate' thread
On Tue, Nov 12, 2013 at 07:22:24PM +0100, Jonas Petong wrote: > Today I accidentally copied my mails into the same folder where they had been > stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies > within my inbox. Since those duplicates do not have a unique mail-id, it's > hopeless to filter them with mutts integrated duplicate limiting pattern. > Command '~=' has no effect in my case and deleting them by hand > will take me hours! > > I know this question has been (unsuccessfully) asked before. Anyhow is there > is > a way to tag every other mail (literally every nth mail of my inbox-folder) > and > afterwards delete them? I know something about linux-scripting but > unfortunately > I have no clue where to start with and even which script-language to use. > > This close-to-topic approach with 'fdupes' has been released some time ago > (http://consolematt.wordpress.com/tag/fdupes/) but in my view it seems way to > complicated. As I could recognize from mutts mailing archive, I'm not the only > one who has had trouble with it. Therefore I appreciate any hint which drives > me > into the right direction and helps me solving this. > > Running Mutt 1.5.21 under Ubuntu Gnome 13.10. (Linux 3.11.0-13-generic). > I don't have a script, but I usually view lists without threading, using date/time sent in sender's timezone (%d) - I'm sure that using the local time zone (%D) probably works the same way. On occasion I've had to change which of my upstreams was subscribed to heavy-traffic lists such as lkml, and at other times I've occasionally had mails appearing twice after upstream problems. When needed, it's just a case of looking at the index and deleting every other mail. Tedious, but achievable - particularly for only 1000 mails - I've done more than that in the past ;-) And after marking a batch to be deleted, I can look at which are marked (just in case I had finger trouble) and specify the message number to go to and undelete. I believe the order in which I see mails is governed by index_format [ I haven't looked at this stuff in ages - why break what works for me ]. Mine is: set index_format="%4C %Z %{%b %d} %-15.15n (%?l?%4l&%4c?) %s" If you aren't a reckless person, turn off incoming mail and backup the directory or mbox before you try *any* solution. ĸen -- das eine Mal als Tragödie, dieses Mal als Farce
Re: Yet another 'duplicate' thread
On 2013-11-12 19:22:24 +0100, Jonas Petong wrote: > Today I accidentally copied my mails into the same folder where they had been > stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies > within my inbox. Since those duplicates do not have a unique mail-id, it's > hopeless to filter them with mutts integrated duplicate limiting pattern. > Command '~=' has no effect in my case and deleting them by hand > will take me hours! > > I know this question has been (unsuccessfully) asked before. Anyhow is there > is > a way to tag every other mail (literally every nth mail of my inbox-folder) > and > afterwards delete them? I know something about linux-scripting but > unfortunately > I have no clue where to start with and even which script-language to use. for every file: read file and put the message-id in a dict in { message-id: [file1, file2..fileN] } order for each key in that dict: delete all filename values except the first It should not be very complicated to write. If nobody else comes up with something, I can possibly it for you after work. pgpfkgvJm0Edy.pgp Description: PGP signature
Re: Yet another 'duplicate' thread
On 13Nov2013 09:06, Chris Down wrote: > On 2013-11-12 19:22:24 +0100, Jonas Petong wrote: > > Today I accidentally copied my mails into the same folder where they had > > been > > stored before (evil keybinding!!!) and now I'm faced with about a 1000 > > copies > > within my inbox. Since those duplicates do not have a unique mail-id, it's > > hopeless to filter them with mutts integrated duplicate limiting pattern. > > Command '~=' has no effect in my case and deleting them by hand > > will take me hours! > > > > I know this question has been (unsuccessfully) asked before. Anyhow is > > there is > > a way to tag every other mail (literally every nth mail of my inbox-folder) > > and > > afterwards delete them? I know something about linux-scripting but > > unfortunately > > I have no clue where to start with and even which script-language to use. > > for every file: > read file and put the message-id in a dict in { message-id: [file1, > file2..fileN] } order > > for each key in that dict: > delete all filename values except the first > > It should not be very complicated to write. If nobody else comes up with > something, I can possibly it for you after work. Based on Jonas' post: Since those duplicates do not have a unique mail-id, it's hopeless to filter them with mutts integrated duplicate limiting pattern. Command '~=' has no effect I'd infer that the message-id fields are unique. Jonas: _Why_/_how_ did you get duplicate messages with distinct message-ids? Have you verified (by inspecting a pair of duplicate messages) that their Message-ID headers are different? If the message-ids are unqiue for the duplicate messages I would: Move all the messages to a Maildir folder if they are not already so. This lets you deal with each message as a distinct file. Write a script long the lines of Chris Down's suggestion, but collate messages by subject line, and store a tuple of: (message-file-path, Date:-header-value, Message-ID:-header-value) You may then want to compare messages with identical Date: values. Or, if you are truly sure that the folder contains an exact and complete duplicate: load all the filenames, order by Date:-header, iterate over the list (after ordering) and _move_ every second item into another Maildir folder (in case you're wrong). L = [] for each Maildir-file-in-new,cur: load in the message headers and get the Date: header string L.append( (date:-value, subject:-value, maildir-file-path) ) L = sorted(L) for i in range(0, len(L), 2): move the file L[i][1] into another directory Note that you don't need to _parse_ the Date: header; if these are duplicated messages the literal text of the Date: header should be identical for the adjacent messages. HOWEVER, you probably want to ensure either that all the identical date/subject groupings are only pairs, in case of multiple distinct messages with identical dates. Cheers, -- Cameron Simpson If you can't annoy somebody, there's little point in writing. - Kingsley Amis