Hi All,
I'm writing an application that sends out emails, for workflow-item tracking purposes.
I am using a VERP-style addressing mechanism, whereby I send each message from an uniquely generated email address, so that I can relate bounces, notification failures, etc, to the original address to which the email was sent.
Problem is differentiating between the different types of message that can come back to that address. For example, if the message was undeliverable for "permanent" reasons, e.g. user moved on, invalid email address, etc, then I get back an email to the unique address, which needs to be parsed in order to find the reason for failure. But I also get back vacation emails to the same address, e.g. "I'm out of the office at the moment, I'll read your email when I get back". The former should be recognised by the application, so that appropriate action can be taken, i.e. ask admin for a new address. But the latter should be essentially ignored, because it doesn't affect whether the recipient actually received the email.
I've been researching a little, and found the following approaches:-
1. RFC 1839, which specifies header values giving easy-to-parse reason-codes for the return mail. But I'm uncertain as to how widely supported RFC 1839 is "in the wild".
2. The mailman approach, which is essentially the linear application of algorithmic matchers, each of which is given a chance to recognise bounced mails, in order to determine the nature of the bounce. AFAICT, this method involves a significant amount of coding, as it requires writing a new matcher for each MTA/MUA in existence (surprise, surprise, they all do it slightly differently). I see that the mailman distribution comes with a couple dozen matchers, which is obviously far from complete. I'd prefer not to go down this path, since I could end up writing hundreds of matchers, as I incrementally discover the different styles that real-world M[T|U]As use. And mailman is GPL, which is a problem in this case.
I am trying to think of more robust and less costly (in coding time) approaches. Maybe some form of text-matching algorithm, such as
1. Bayesian classification? 2. Keyword recognition?
I'd be grateful for any pointers or suggestions for existing python solutions to this problem.
TIA,
-- alan kennedy ------------------------------------------------------ email alan: http://xhaus.com/contact/alan -- http://mail.python.org/mailman/listinfo/python-list