Re: Yet another 'duplicate' thread

Cameron Simpson Wed, 13 Nov 2013 15:57:08 -0800

On 13.Nov 2013, 13:01, Nathan Stratton Treadway wrote:
> (Note that as I understand this limit only works when the sort order is
>         > "thread".  That is, with no limit applied you should be seeing the
>         > duplicate messages marked with an "=" character your mailbox index
>         > listing, and then those marked messages will be selected by the "~="
>         > filter.)

Worth restating. This is something of a mutt annoyance - silent failure.

On 13Nov2013 20:38, Jonas Petong <jonas.pet...@web.de> wrote:
> Sorry for that one!  Cameron, could you explain me anyhow how to use that 
> script
> you proposed? Or at least which environment to set? Might be of use for 
> further
> "stuck in nowhere" problems (even if for no reason as in my case). You all 
> have
> a great day!

Well, the script as supplied is pseudocode (and of course untested),
but based around using Python. (If you don't know Python, it is
well worth learning.)

A fuller (but still totally untested) sketch might look like this:

  #!/usr/bin/python

  import sys
  import email.parser
  from mailbox import Maildir

  # get the maildir pathname from the command line
  mdirpath = sys.argv[1]

  # open the Maildir
  M = Maildir(mdirpath)

  # list holding message information
  L = []
  for key in M.keys():
    # open the message file
    fp = M.get_file(key)
    # load the headers from this message
    hdrs = email.parser.Parser().parse(fp, headersonly=True)
    # speculative: get the filename of the message
    pathname = fp.name
    fp.close()
    # make a tuple with the info we want
    info = hdrs['date'], hdrs['subject'], hdrs['message-id'], key, pathname
    L.append(info)

  # sort the list
  # because we have date then subject in the tuple, the sort order is date then 
subject
  # (then message-id, then key)
  L = sorted(L)

  # this last bit could be adapted to move every second message elsewhere
  for i in range(0, len(L), 2):
    date, subject, message_id, key, pathname = L[i]
    fp.close()
    ... decide what to do ...

The last loop iterates 0, 2, 4, ... up to the largest index in the list L.

Pulling every second message like this is very fragile - you needed
to be totally sure that you had an exactly duplicated set of messages.

Personally, I would be inclined to make a dict instead of a list,
mapping message-ids to a list of message paths (or the info tuples).
Then you can iterate over the dict and remove or move sideways the
second and following messages for each message-id, leaving only the
original.

I'd also be writing this script to print a report instead of
moving/deleting. Then I can examine the output for sanity before
hitting the button. If the report went:

  pathname message-id date subject

it would be easy to read the pathnames from a second script to do
the actual message removal. Or whatever.

Please feel free to ask whatever questions you like. I do a lot of
stuff with Maildirs and Python; I replaced procmail with my own
mail filing program a year or so ago.

Cheers,
-- 
Cameron Simpson <c...@zip.com.au>

Q: How many user support people does it take to change a light bulb?
A: We have an exact copy of the light bulb here and it seems to be
   working fine.  Can you tell me what kind of system you have?

Re: Yet another 'duplicate' thread

Reply via email to