Jason Haar wrote:

Speaking of image/rtf/word attachment spam; is there any work going on
to standardize this so that the textual output of such attachments could
be fed back into SA?

Just as a note:

I'm currently working on a modular plugin for extracting text and add it to SA message parts.

The plugin can use either external tools or it's own simple plugin modules. How to extract text from parts is configurable, and based on mime types and file names, so new formats can be added by simply configuring for new external tolls or creating a new plugin module.

My *far* from finished module currently manages to extract text from Word documents (using antiword), OpenXML text documents (using a simple plugin) and RTF (using unrtf).

I haven't tested where and how the extracted text is available to SpamAssassin yet (as noted, it's *far* from finished), but I am using "set_rendered" method as in the example, so it should work. ;-)

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/

Reply via email to