Matus UHLAR - fantomas wrote:
I'm currently working on a modular plugin for extracting text and add it
to SA message parts.
if possible, extract images too, so the fuzzyocr and similar plugins would
be able to look at that too.
You meen extract images and add them as parts to the message?
I guess that should be doable. I know that "unrtf" can extract images
from RTF files. I'll probably implement support for this, but I'll
probably not implement actually doing it right away.
IIRC spammers did even put PDF's to .doc files to make the stuff harder, but
if you manage the above, it shouldn't be hard to extract PDF's too :)
This I don't understand. Do they put PDFs inside .doc files as if the
..doc was an archive?
Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/