Hi folks,
Regrets if this is the wrong list.
Wanted to be able to score on text found in PDF files. Did not see any
obvious route, so made a plugin that calls XPDF's pdfinfo and pdftotext
to get the text that is then scored.
Sample local.cf could be :
pdftotext_cmd /usr/local/bin/pdftotext
pdfinfo_cmd /usr/local/bin/pdfinfo
body PDF_TO_TEXT
eval:check_pdftext("^Error","sex","drugs",'Title:\s+stock_tmp.pdf:4','Creator:\s+OpenOffice.org
1.1.4:4')
Notice that a :4 gives a find of that regex 4 points.
Really don't know if this was the right road to follow, as I copied the
AntiVirus.pm and came up with this:
http://support.ednet.ns.ca/SpamAssassin/PDFText.pm
So far... it appears to work as expected and didn't take down a pretty
busy server ;).
Enjoy hearing any positive criticisms :).
JES