Re: How SA reactes to a bunch of garbage characters

Olivier Mon, 13 Jun 2016 23:51:14 -0700

Matus,

>>Sure the OCR results are not very precise. But could we imagine that
>>they are pushed in a part of the message that will not go through Bayes?
> where do you want to push the ORC'ed test, if not back to SA to check other
> rules like bayes?


To a part that would do regexp rules, but not Bayes? I don't know if it
is possible.

> the PDF is technically something different: PDF (often) contains plain text,
> that does not have to be OCRed and this it will not be misinterpreted.

But isn't it troubling the Bayes process if we inject the mail body +
the part extracted from PDF? Should we not better submit only the
original message? I have no answer on that.

> I would skip gocr and ocrad, since tesseract behaves great now...
> (the debian fuzzyocr package requires all of them, dunno why)

I'll take your advice, I jus noticed that tesseract was not enabled by
default! I use FreeBSD, could it be required at install only, but
disabled later in your configuration of FuzzyOcr?

Best regards,

Olivier

--

Re: How SA reactes to a bunch of garbage characters

Reply via email to