Re: Individual character variation lists

John Green Thu, 13 Mar 2014 11:28:00 -0700

That is perfect, thank you.

I'm sure I know the answer to this: These are trained off of observations 
made over a specific set of data, correct? Not some a priori deductions 
from the algorithms?


JG

On Wednesday, March 12, 2014 8:55:24 AM UTC-4, Nick White wrote:
>
> Hi John, 
>
> On Wed, Mar 12, 2014 at 04:57:38AM -0700, John Green wrote: 
> > Bottom line up front: Has anyone compiled a list of common 
> misperceptions on 
> > the part of tesseract? E.g.: e is often seen as o and l can be mistaken 
> for 1, 
> > etc. 
>
> Tesseract has some basic information of that sort built in to its 
> training files, which it uses to help recognition. 
>
> You can see the list for english by unpacking the english 
> .traineddata file: 
>
>   combine_tessdata -u /path/to/eng.traineddata eng. 
>
> And then looking at the resulting eng.unicharambigs file. It's 
> documented in the manpage unicharambigs.5, and it's pretty 
> straightforward. 
>
> Nick 
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Individual character variation lists

Reply via email to