Olivier, Thank you *ever* so much for replying. Regards Brent
On 2018/10/16 06:49, Olivier wrote:
Brent, I have Fuzzy OCR installed and running, but the only rule that was trigered 22 times during the past 40 days was FUZZY_OCR_WRONG_CTYPE, meaning that the image type does not match the content-type set for MIME. That is still a valid catch, but not based on the OCR'ed text. One of my holdback with FuzzyOCR is that you have to provide an independant word list, while we have a very good tool to analyze text contents: SpamAssassin itself. So I would much prefer FuzzyOCR to feed the OCR'ed text back to SA for further analysis (the way pdfAssassin is working). But then, we need a way to detect that the OCR process has worked, that some more or less valid text, in a valid language has been extracted. Another approach I like is the one of Image Cerberus (dig in http://prag.diee.unica.it/amilab) which uses meta data of the image (size, histogram of colours, etc.) to classify the image as probable spam or probable ham and then implements Bayes classifier. As for your question about the place for image scanning, if your MTA has the resources to do so, why not? And if FuzzyOCR is not yet the ultimate OCR solution, it is still improving, so why give-up a tool that can help? Regards, Olivier