Re: auto reply to html-mails

David Champion Mon, 20 Jan 2014 13:30:40 -0800

* On 20 Jan 2014, Jan-Herbert Damm wrote: 
> 
> Hello all,
> 
> i would like to send an automatic answer to html-mails sent to me (because i'm
> tired of writing back that i prefer plain-text).


I don't like HTML mail either (for most constructions of "HTML mail").
However, as tired as you are of sending it, I'm sure others are tired
of receiving it.  So don't send it.

If everyone autoreplied to me with the things they don't like about
my email none of us would ever get work done, and vice versa.
Philosophically: This doesn't scale, and its usefulness therefore
depends on your assurance that you're better than other people and that
they need your help to improve.

http://en.wikipedia.org/wiki/Robustness_principle

That said, here's what you need to think about to actually do this.
It's well beyond a matter of mutt vs. procmail, and needs some serious
thought about how email works, so I think it's actually somewhat in
scope for this list.


> My setup for receiving mail is: fetchmail --> procmail --> spamassasin --> 
> mutt 
> 
> and for sending: mutt --> msmtp --> ...
> 
> A hint on how to proceed will help me.

Any effort at this needs to be very precise, because an exact
determination of whether you're receiving "HTML mail" is not
script-simple.  Not all mail containing HTML is either "HTML mail" or,
really, badly done.  If you do it wrong, you'll be sending autoreplies
to people to complain about something that they might well be doing
correctly.

If you're going to auto-reply, you don't really need to worry about the
presence of HTML.  Your real goal should be to detect reliably whether
the incoming message has a meaningful text part that your mail reader
will see and use, or has incorrect alternative encodings.

Things to be aware of/things that could go wrong:

A message might have no MIME whatsoever -- that is, it might be
implicitly and nominally plain text -- but it might still contain HTML.
Do you choose to detect and respond to this?  What if someone is writing
plain text email about HTML?

For MIME messages, you will need at least to extract each content-type
header in each MIME part of the mail.  You can perhaps make 95% reliable
generalizations about a flat, linear list of content types that you
find.  For example:

* if you see only text/html MIME parts, then the mail is most probably
HTML (but see above; some nominally HTML mail in fact contains only
plain text).

* if you see multipart parts with only html text contents, likewise.

* if you see multipart containers with a mix of plain and html contents,
you should be cautious.  The order of parts probably tells you whether
the html or the plain is primary (first), but this could be misleading.
For 100% certainty you need a full-depth parse tree such as you'd get
with python's mime message parser, so that you know for sure which text
parts belong to which container.

A message containing both HTML and plain text parts might or might not
be "HTML mail".  Multipart/alternative is the preferred MIME structure
for expressing alternative views of identical content.  It's good for
sending mail that is HTML for those who can and wish to see HTML, while
sending plain to others.  A multipart/alternative message contains two
or more sub-parts of different content-types.  The user agent should be
able to select whichever format its user prefers.  If the user/agent
does not express preference, the first part should be used.  The first
part should be the simplest available encoding -- that is, in most
cases, plain text.  This ensures that the default view is the most
available/accessible one.

So there are two ways that a multipart/alternative commonly fails:

1. they often (usually?) put the HTML part first, because that's how the
sender would prefer for you to see the mail.  This disrespects the rule
that it should be the most accessible -- it favors the sender over the
receiver.

2. they sometimes put in a plain part, and put it first, but its
only content is to tell you that you should use an HTML-capable mail
application.  This breaks for onyone who expresses preference to text in
their application settings.

In light of #2, you also would ideally have some statistical heuristic
that tells whether the "plaintext" version is obviously not a
translation of the HTML version into plain text.  If the HTML is 98K and
the text is one line, it's probably not the same content.  If the HTML
is 12 lines and the text is two, who knows?  You may need to look at the
co-incidence of individual words to avoid making errors in automated
analysis.

It's non-trivial, and probably not worthwhile as a one-off procmail
recipe.  As a standalone program that vets the plaintext compatibility
of a MIME or non-MIME message, which can be incorporated into a procmail
rule, it would be nice to have on several fronts.

Beyond this you should also be cognizant of the obligations of any
autoreply mechanism to detect and prevent mail loops.  Most of my HTML
mail comes from automated retail systems and such, whose mailboxes are
unattended and which will autoreply to me if I autoreply to them.

-- 
David Champion • d...@bikeshed.us

Re: auto reply to html-mails

Reply via email to