I found that "tidy -eq" gives a pretty good result. To normalize the score,
I figure it makes sense to divide the resulting line count by the byte
count of the input file.
I ran some MS Outlook output through and the most frequent complaint was
the unknown tag <o:p>, but there was also a nesting issue involving <span>,
<font>, and <hr>. (I'm guessing tidy doesn't understand namespaces or how
to load the MS Office namespace needed to resolve <o:p>.)
Some known spam generated a much higher result, about 0.003
errors/character versus 0.001 for the Outlook email. But this wasn't a real
sample. For that I'd need to generate a plugin wrapper for tidy and run it
over a corpus. (I've got the beginnings of such a plugin coded, based on
the PDFassassin plugin which in turn in based on the Ocr plugin.)