Olivier,

Thank you *ever* so much for replying.
Regards
Brent

On 2018/10/16 06:49, Olivier wrote:
Brent,

I have Fuzzy OCR installed and running, but the only rule that was
trigered 22 times during the past 40 days was FUZZY_OCR_WRONG_CTYPE,
meaning that the image type does not match the content-type set for
MIME.

That is still a  valid catch, but not based on the OCR'ed text.

One of my holdback with FuzzyOCR is that you have to provide an
independant word list, while we have a very good tool to analyze text
contents: SpamAssassin itself. So I would much prefer FuzzyOCR to feed
the OCR'ed text back to SA for further analysis (the way pdfAssassin is
working). But then, we need a way to detect that the OCR process has
worked, that some more or less valid text, in a valid language has been
extracted.

Another approach I like is the one of Image Cerberus (dig in
http://prag.diee.unica.it/amilab) which uses meta data of the image
(size, histogram of colours, etc.) to classify the image as probable
spam or probable ham and then implements Bayes classifier.

As for your question about the place for image scanning, if your MTA has
the resources to do so, why not? And if FuzzyOCR is not yet the ultimate
OCR solution, it is still improving, so why give-up a tool that can
help?

Regards,

Olivier

Reply via email to