PDFassassin

Olivier Nicole Thu, 15 Nov 2012 03:31:58 -0800

Hi,

While going through old stuff, I noticed I have been using a modified
version of PDFassassin.


I did not even know that I ported it to my new mail server.

What it does, basically, it extracts the text from the PDF attachment
and stuff it back to SA for further anlysis. The only difference with
the original PDFassassin is that the text is added at the end of the
original message, so that it does not launch a second instance of SA
to check it.

Similarily. images are attached at the end of the original message,
for further scanning by whatever your image scanner is.

That way, PDFassassin does not try to identify spam, but only extracts
the various parts of a PDF document, for SA to analyze.

I wonder if someone would be interested in reviewing what I have done?

In the same way, I am wondering if something similar exists for all
the (open|libre|MS)office documents?

Finally, I am wondering if fuzzyOCR still has any interest? Like
above, I'd like to see it push the stings it can identify to the body
of the message, for further analysis by SA, rather than having it's
own list of spam words.

Best regards,

Olivier

PDFassassin

Reply via email to