Re: [tesseract-ocr] Re: Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread shree
My information IS dated - I haven't followed the recent changes. Please see this thread - almost a year old which talked of the upcoming changes for training https://groups.google.com/forum/#!searchin/tesseract-dev/fonts/tesseract-dev/4lxGjCGLBSw/CH1cZsovPjIJ On Wednesday, July 9, 2014

Re: [tesseract-ocr] Re: Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread Albrecht Hilker
> As far as I understand, the font limitation applies up to tesseract 3.02. Major changes to training are currently in the works in SVN for 3.03 The files I am talking about are downloaded from https://code.google.com/p/tesseract-ocr/downloads/list They are all declared as version 3.02. For ex

Re: [tesseract-ocr] need help removing garbage characters from my OCR

2014-07-08 Thread Nick White
Hi Alex, If you're up for some programming, you could recognise the squares yourself, and pass each one separately to tesseract with the PSM_SINGLE_CHAR segmentation type. That should help if Tesseract is not segmenting each whole square separately. If the board is always the same size, you co

Re: [tesseract-ocr] Re: Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread Shree Devi Kumar
As far as I understand, the font limitation applies up to tesseract 3.02. Major changes to training are currently in the works in SVN for 3.03 (not fully released yet - hence you see large number of fonts for english traineddata but not for others). The other languages traineddata maybe forthcomin

[tesseract-ocr] Re: need help removing garbage characters from my OCR

2014-07-08 Thread Paul
You will probably need a better binarization technique. See [1], [2]. [1]: https://groups.google.com/d/topic/tesseract-ocr/y-Yjxr1tRTQ/discussion [2]: https://groups.google.com/d/topic/tesseract-ocr/neyvXo2TAn0/discussion Am Dienstag, 8. Juli 2014 07:31:39 UTC+2 schrieb Alex Ryan: > > I'm trying

[tesseract-ocr] Re: Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread Paul
If you have a look at intproto.h, you'll see there is a similar limitation, bit it's much more complicated. Unfortunately I don't have an overview of what is possible yet, but I'm working on it. :) Just use normproto.h as a reference. Am Dienstag, 8. Juli 2014 02:55:37 UTC+2 schrieb Albrecht Hi

[tesseract-ocr] need help removing garbage characters from my OCR

2014-07-08 Thread Alex Ryan
I'm trying to make a words with friends cheat for a university project. I'm obviously trying to OCR the tiles from a screen shot of the app. I have tesseract 3.03 set up and running fine, but I'm not getting useable output. I've tried various training methods but so far haven't hit upon the righ

[tesseract-ocr] Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread Albrecht Hilker
The manual "Training Tesseract 3" says: > Tesseract needs to know about different shapes of the same character by having different fonts separated explicitly. > This used to be limited to 32 fonts, but the limit has been raised to 64. > It is set by the constant MAX_NUM_CONFIGS defined in intpro