That is perfect, thank you. I'm sure I know the answer to this: These are trained off of observations made over a specific set of data, correct? Not some a priori deductions from the algorithms?
JG On Wednesday, March 12, 2014 8:55:24 AM UTC-4, Nick White wrote: > > Hi John, > > On Wed, Mar 12, 2014 at 04:57:38AM -0700, John Green wrote: > > Bottom line up front: Has anyone compiled a list of common > misperceptions on > > the part of tesseract? E.g.: e is often seen as o and l can be mistaken > for 1, > > etc. > > Tesseract has some basic information of that sort built in to its > training files, which it uses to help recognition. > > You can see the list for english by unpacking the english > .traineddata file: > > combine_tessdata -u /path/to/eng.traineddata eng. > > And then looking at the resulting eng.unicharambigs file. It's > documented in the manpage unicharambigs.5, and it's pretty > straightforward. > > Nick > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

