Jason Haar wrote:
Speaking of image/rtf/word attachment spam; is there any work going on
to standardize this so that the textual output of such attachments could
be fed back into SA?
Just as a note:
I'm currently working on a modular plugin for extracting text and add it
to SA message parts.
The plugin can use either external tools or it's own simple plugin
modules. How to extract text from parts is configurable, and based on
mime types and file names, so new formats can be added by simply
configuring for new external tolls or creating a new plugin module.
My *far* from finished module currently manages to extract text from
Word documents (using antiword), OpenXML text documents (using a simple
plugin) and RTF (using unrtf).
I haven't tested where and how the extracted text is available to
SpamAssassin yet (as noted, it's *far* from finished), but I am using
"set_rendered" method as in the example, so it should work. ;-)
Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/