That or someone could always write an OCR module for perl. Wow I'm sure that'd be a joyride. Any bored college students out there looking for a good c/sci project?
It could also potentially be linked with an external program like GOCR or another GNU OCR program. If anyone is curious on digging into this idea, GOCR is at http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/. I don't know if it's undergone an extensive security test however, on how it handles data extracted from the image. (Can you imagine an image with a buffer overflow encoded into it? I can.) -James -----Original Message----- From: Michael Moncur [mailto:[EMAIL PROTECTED]] Sent: Friday, April 26, 2002 1:26 AM To: [EMAIL PROTECTED] Subject: RE: [SAtalk] Text as images My thought on this is that there should be an eval test that calculates a ratio of HTML tags to actual (between the tags) content. This would really just be the ratio of the lengths of the raw and "cooked" versions of the body, I think. Are both of those available to an eval test or would it have to scan through the entire raw body? Ideally this would be on a sliding scale like LINES_OF_YELLING so we could have 2-3 different levels. It might be worthwhile to look at specific tags, counting <img> and <frame> as worse than <b> or <p>. I may see if I can have a go at writing this myself, but I'm no perl wizard. -- michael moncur mgm at starlingtech.com http://www.starlingtech.com/ "I have learned to use the word `impossible' with the greatest caution." -- Wernher von Braun _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk