That or someone could always write an OCR module for perl. Wow I'm sure
that'd be a joyride. Any bored college students out there looking for a good
c/sci project? 

It could also potentially be linked with an external program like GOCR or
another GNU OCR program. If anyone is curious on digging into this idea,
GOCR is at http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/. I don't know
if it's undergone an extensive security test however, on how it handles data
extracted from the image. (Can you imagine an image with a buffer overflow
encoded into it? I can.)

-James

-----Original Message-----
From: Michael Moncur [mailto:[EMAIL PROTECTED]] 
Sent: Friday, April 26, 2002 1:26 AM
To: [EMAIL PROTECTED]
Subject: RE: [SAtalk] Text as images


My thought on this is that there should be an eval test that calculates a
ratio of HTML tags to actual (between the tags) content. This would really
just be the ratio of the lengths of the raw and "cooked" versions of the
body, I think. Are both of those available to an eval test or would it have
to scan through the entire raw body?

Ideally this would be on a sliding scale like LINES_OF_YELLING so we could
have 2-3 different levels. It might be worthwhile to look at specific tags,
counting <img> and <frame> as worse than <b> or <p>.

I may see if I can have a go at writing this myself, but I'm no perl wizard.

--
michael moncur   mgm at starlingtech.com   http://www.starlingtech.com/
"I have learned to use the word `impossible' with the greatest caution."
                -- Wernher von Braun


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to