Matus UHLAR - fantomas wrote:

I'm currently working on a modular plugin for extracting text and add it to SA message parts.

if possible, extract images too, so the fuzzyocr and similar plugins would
be able to look at that too.

You meen extract images and add them as parts to the message?

I guess that should be doable. I know that "unrtf" can extract images from RTF files. I'll probably implement support for this, but I'll probably not implement actually doing it right away.

IIRC spammers did even put PDF's to .doc files to make the stuff harder, but
if you manage the above, it shouldn't be hard to extract PDF's too :)

This I don't understand. Do they put PDFs inside .doc files as if the ..doc was an archive?

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/

Reply via email to