* On 20 Jan 2014, Jan-Herbert Damm wrote: > > Hello all, > > i would like to send an automatic answer to html-mails sent to me (because i'm > tired of writing back that i prefer plain-text).
I don't like HTML mail either (for most constructions of "HTML mail"). However, as tired as you are of sending it, I'm sure others are tired of receiving it. So don't send it. If everyone autoreplied to me with the things they don't like about my email none of us would ever get work done, and vice versa. Philosophically: This doesn't scale, and its usefulness therefore depends on your assurance that you're better than other people and that they need your help to improve. http://en.wikipedia.org/wiki/Robustness_principle That said, here's what you need to think about to actually do this. It's well beyond a matter of mutt vs. procmail, and needs some serious thought about how email works, so I think it's actually somewhat in scope for this list. > My setup for receiving mail is: fetchmail --> procmail --> spamassasin --> > mutt > > and for sending: mutt --> msmtp --> ... > > A hint on how to proceed will help me. Any effort at this needs to be very precise, because an exact determination of whether you're receiving "HTML mail" is not script-simple. Not all mail containing HTML is either "HTML mail" or, really, badly done. If you do it wrong, you'll be sending autoreplies to people to complain about something that they might well be doing correctly. If you're going to auto-reply, you don't really need to worry about the presence of HTML. Your real goal should be to detect reliably whether the incoming message has a meaningful text part that your mail reader will see and use, or has incorrect alternative encodings. Things to be aware of/things that could go wrong: A message might have no MIME whatsoever -- that is, it might be implicitly and nominally plain text -- but it might still contain HTML. Do you choose to detect and respond to this? What if someone is writing plain text email about HTML? For MIME messages, you will need at least to extract each content-type header in each MIME part of the mail. You can perhaps make 95% reliable generalizations about a flat, linear list of content types that you find. For example: * if you see only text/html MIME parts, then the mail is most probably HTML (but see above; some nominally HTML mail in fact contains only plain text). * if you see multipart parts with only html text contents, likewise. * if you see multipart containers with a mix of plain and html contents, you should be cautious. The order of parts probably tells you whether the html or the plain is primary (first), but this could be misleading. For 100% certainty you need a full-depth parse tree such as you'd get with python's mime message parser, so that you know for sure which text parts belong to which container. A message containing both HTML and plain text parts might or might not be "HTML mail". Multipart/alternative is the preferred MIME structure for expressing alternative views of identical content. It's good for sending mail that is HTML for those who can and wish to see HTML, while sending plain to others. A multipart/alternative message contains two or more sub-parts of different content-types. The user agent should be able to select whichever format its user prefers. If the user/agent does not express preference, the first part should be used. The first part should be the simplest available encoding -- that is, in most cases, plain text. This ensures that the default view is the most available/accessible one. So there are two ways that a multipart/alternative commonly fails: 1. they often (usually?) put the HTML part first, because that's how the sender would prefer for you to see the mail. This disrespects the rule that it should be the most accessible -- it favors the sender over the receiver. 2. they sometimes put in a plain part, and put it first, but its only content is to tell you that you should use an HTML-capable mail application. This breaks for onyone who expresses preference to text in their application settings. In light of #2, you also would ideally have some statistical heuristic that tells whether the "plaintext" version is obviously not a translation of the HTML version into plain text. If the HTML is 98K and the text is one line, it's probably not the same content. If the HTML is 12 lines and the text is two, who knows? You may need to look at the co-incidence of individual words to avoid making errors in automated analysis. It's non-trivial, and probably not worthwhile as a one-off procmail recipe. As a standalone program that vets the plaintext compatibility of a MIME or non-MIME message, which can be incorporated into a procmail rule, it would be nice to have on several fronts. Beyond this you should also be cognizant of the obligations of any autoreply mechanism to detect and prevent mail loops. Most of my HTML mail comes from automated retail systems and such, whose mailboxes are unattended and which will autoreply to me if I autoreply to them. -- David Champion • d...@bikeshed.us