Jamie Paul Griffin wrote: > Hi > > Does anyone have or know of a perl or python script, or even a shell > script, that removes the multipart/(mixed|alternative| ... ) parts of > incoming mail and leaves or converts the message into plain text? > Also, i wouldn't want to lose any attachments that people might send me. > > Jamie.
hi, i wrote something like that. by default, it converts to text anything that can be converted to text and deletes everything else but you can turn off any specific transformation. it can delete specific mail headers. it translates (most) winmail.dat attachments. if a transformation fails, it leaves the original in place for safety by default. it works via procmail on individual messages or it can be applied to an entire mbox file. it requires the presence of various utilities (e.g. perl, antiword or catdoc, xls2csv, lynx, pdftotext and mktemp). you'd probably just need lynx and mktemp installed. it is available at: http://raf.org/textmail/ and its help message is: usage: textmail [options] options: -h - Print the help message then exit -m - Print the manpage then exit -w - Print the manpage in html format then exit -r - Print the manpage in nroff format then exit -M - Output in mailbox format -T - Output in raw mail format (for smtp) -W - Don't replace MS Word attachments with text -E - Don't replace MS Excel attachments with csv -H - Don't replace HTML attachments with text -R - Don't replace RTF attachments with text -P - Don't replace PDF attachments with text -U - Don't translate winmail.dat attachments -L - Don't reduce appledouble attachments -I - Don't delete image attachments -A - Don't delete audio attachments -V - Don't delete video attachments -X - Don't delete MS Windows executable attachments -B - Don't recode text that was base64-encoded -S - Don't replace spaces in filenames with underscores -Z - Do translate signed content (discards signatures) -O - Delete all application/octet-stream attachments -! - Delete all application/* attachments -D hdrs - Delete headers (list of header prefixes and filenames) -K types - Keep attachments (list of mimetypes and filenames) -f - On translation error, keep translation, not original -? - Print paths of helper applications then exit Filters a mail message or mbox, replacing MS Word, MS Excel, HTML, RTF and PDF attachments with the plain text contained therein. By default, the following attachments are also deleted: image, audio, video and MS Windows executables. MS winmail.dat attachments are replaced by any attachments contained therein which are then replaced by text or deleted in the same fashion. Any of these actions can be suppressed with the command line options. Mail headers can also be selectively deleted. it may or may not be quite what you want. without the -H option, it replaces multipart/aternative where the alternatives are html and text with just the text part. you might want to try it with the following options: :0 fw | textmail -MWERPIAVXBS that would only translate html and leave all other attachments as they are except for winmail.dat attachments. if you want to leave winmail.dat attachments untranslated as well, add the -U option to the command. use at your own risk, obviously. :-) cheers, raf