Re: word file spam

Jonas Eckerman Tue, 13 Oct 2009 12:54:48 -0700

Matus UHLAR - fantomas wrote:

Yes, but generic plugin should be able extract images for later processing

> (FuzzyOCR or maybe even things like Bayes) too ;)


That would depend on what you mean by "generic". :-)

It's a generic text extractor plugin, with the ability to call an OCRprogram for getting text from images. Wich is what I wanted, and is whatJohn mentioned in his post.

It's not a generic attachment parser and object extractor (though itmight become one).

I do want it to be able to add stuff rendered to HTML, butMail::SpamAssassin::Message::Node doesn't (currently) have aset_rendered variant for doing that, and I haven't had the time to workon Mail::SpamAssassin::Message::Node.

I'm not sure exactly what would be the correct way to add parts (such asextracted images) to the message. I have thought about it, and theplugins plugin architecture does support this. I just haven't had thetime to find out how to do it.

I don't know what you mean by "even things like Bayes". The plugin doesmake the extracted text available to bayes (this is what I made it for),and it can call OCR programs.

Making extracted images available for FuzzyOCR is (as mentioned above)something I want to do. Since I don't do any OCR at all here, that's apretty low priority though (unless people start asking for it more).


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/

Re: word file spam

Reply via email to