On Tue, 16 Oct 2018 11:49:54 +0700 Olivier wrote:
One of my holdback with FuzzyOCR is that you have to provide an
independant word list, while we have a very good tool to analyze text
contents: SpamAssassin itself. So I would much prefer FuzzyOCR to feed
the OCR'ed text back to SA for further analysis (the way pdfAssassin
is working).

On 16.10.18 13:34, RW wrote:
That works as long as the OCR remains very accurate. What happened
before was that the deployment of OCR lead spammers to make their text
much less readable.

I think that original reason was that available OCR programs were not
reliable enough.

I have tested gocr, ocrad and tesseract some >10 years ago, with not very
satisfying results, gocr being best at that time.

Since then, google took tesseract and made it much better.

I believe tht currently it would bve viable to push ocr output to
spamassassin for processing with bayes and other rules.


As for your question about the place for image scanning, if your MTA
has the resources to do so, why not?

Because it's better if it's combined with other information.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
A day without sunshine is like, night.

Reply via email to