> Still this is for validation, not well-formedness. I wonder whether > checking XML for well-formedness would provide better results.
Checking HTML email for even rudimentary format shows that several mail clients who do not have a good outlook on how email should be created tend to exchange very crappy email. This email comes from legit clients, with legit servers and had several places where they ignore the RFC's or for some reason think that every email needs to look like a full page ad in a marketing magazine. I can think that maybe several XHTML based rules could be created, run against the spam/ham corpus and scores set accordingly. Since this is generally raw, or body tests, I would think these tests would be very expensive. -- Michael Scheidell, CTO >|SECNAP Network Security Winner 2008 Network Products Guide Hot Companies FreeBSD SpamAssassin Ports maintainer _________________________________________________________________________ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.spammertrap.com _________________________________________________________________________