> Still this is for validation, not well-formedness.  I wonder whether
> checking XML for well-formedness would provide better results.

Checking HTML email for even rudimentary format shows that several mail
clients who do not have a good outlook on how email should be created tend
to exchange very crappy email.

This email comes from legit clients, with legit servers and had several
places where they ignore the RFC's or for some reason think that every email
needs to look like a full page ad in a marketing magazine.

I can think that maybe several XHTML based rules could be created, run
against the spam/ham corpus and scores set accordingly.

Since this is generally raw, or body tests, I would think these tests would
be very expensive.


-- 
Michael Scheidell, CTO
>|SECNAP Network Security
Winner 2008 Network Products Guide Hot Companies
FreeBSD SpamAssassin Ports maintainer


_________________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.spammertrap.com
_________________________________________________________________________

Reply via email to