Jamie Paul Griffin wrote:

> Hi
> 
> Does anyone have or know of a perl or python script, or even a shell
> script, that removes the multipart/(mixed|alternative| ... ) parts of
> incoming mail and leaves or converts the message into plain text?
> Also, i wouldn't want to lose any attachments that people might send me.
> 
> Jamie.

hi,

i wrote something like that. by default, it converts to text anything
that can be converted to text and deletes everything else but you
can turn off any specific transformation. it can delete specific
mail headers. it translates (most) winmail.dat attachments. if a
transformation fails, it leaves the original in place for safety by
default. it works via procmail on individual messages or it can be
applied to an entire mbox file.

it requires the presence of various utilities (e.g. perl, antiword
or catdoc, xls2csv, lynx, pdftotext and mktemp). you'd probably just
need lynx and mktemp installed.

it is available at:

    http://raf.org/textmail/

and its help message is:

    usage: textmail [options]
    options:
      -h       - Print the help message then exit
      -m       - Print the manpage then exit
      -w       - Print the manpage in html format then exit
      -r       - Print the manpage in nroff format then exit
      -M       - Output in mailbox format
      -T       - Output in raw mail format (for smtp)
      -W       - Don't replace MS Word attachments with text
      -E       - Don't replace MS Excel attachments with csv
      -H       - Don't replace HTML attachments with text
      -R       - Don't replace RTF attachments with text
      -P       - Don't replace PDF attachments with text
      -U       - Don't translate winmail.dat attachments
      -L       - Don't reduce appledouble attachments
      -I       - Don't delete image attachments
      -A       - Don't delete audio attachments
      -V       - Don't delete video attachments
      -X       - Don't delete MS Windows executable attachments
      -B       - Don't recode text that was base64-encoded
      -S       - Don't replace spaces in filenames with underscores
      -Z       - Do translate signed content (discards signatures)
      -O       - Delete all application/octet-stream attachments
      -!       - Delete all application/* attachments
      -D hdrs  - Delete headers (list of header prefixes and filenames)
      -K types - Keep attachments (list of mimetypes and filenames)
      -f       - On translation error, keep translation, not original
      -?       - Print paths of helper applications then exit
    
    Filters a mail message or mbox, replacing MS Word, MS Excel, HTML, RTF and 
PDF
    attachments with the plain text contained therein. By default, the following
    attachments are also deleted: image, audio, video and MS Windows 
executables.
    MS winmail.dat attachments are replaced by any attachments contained therein
    which are then replaced by text or deleted in the same fashion. Any of these
    actions can be suppressed with the command line options. Mail headers can 
also
    be selectively deleted.

it may or may not be quite what you want. without the -H option,
it replaces multipart/aternative where the alternatives are html
and text with just the text part.

you might want to try it with the following options:

    :0 fw
    | textmail -MWERPIAVXBS

that would only translate html and leave all other attachments as they
are except for winmail.dat attachments. if you want to leave winmail.dat
attachments untranslated as well, add the -U option to the command.

use at your own risk, obviously. :-)

cheers,
raf

Reply via email to