On Thu, Sep 1, 2016 at 12:27 AM, Olivier <olivier.nic...@cs.ait.ac.th> wrote:
> I am running it, it does not do a very good job at extracting the
> text from the images. Then it uses it's own list of keywords to
> detect spam: to me it's the biggest problem, it should push back
> the text to SpamAssassin and let SA rules decide what to do with it.
>
      I do agree that the OCR program should be doing the OCR'ing and
the text filtering should be left to a program that does that for a
living.

On 01.09.16 13:59, RW wrote:
It's a long time since I've used it, but IIRC the point of FuzzyOCR is
that it does fuzzy matching on a dictionary of "bad" words - similar to
the way that spelling checkers find the mostly likely suggestions. This
gives it a very limited ability to deal with imperfectly read words.

it's the same as Olivier wrote above :-)

Putting garbled OCR text through SA body rules may be more trouble than
it's worth.

garbled, yes. I've had this discussion some years back and tesseract has
currently much much better results than it had those years ago.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Boost your system's speed by 500% - DEL C:\WINDOWS\*.*

Reply via email to