[tesseract-ocr] Re: Extract text from bright background color image(yellow)

2019-05-22 Thread April Shar
Will it still recognize black/white text if thresholding? On Tuesday, May 21, 2019 at 5:51:48 PM UTC+8, April Shar wrote: > > I'm using tesseract 3.0.2, seems like it cannot read text with very bright > background (see image). Is there a way to do so? Microsoft Azure works > better and can read

[tesseract-ocr] Can I train Tesseract to a very high accuracy, if I only have 10,000 ish total words that it would ever need to recognize?

2019-05-22 Thread Rafay Kalim
Hey, Just wondering if there is a way to train a Tesseract model (I am aware of the wordslist when training) to only recognize a wordlist of 10,000 words, and nothing else to increase its accuracy to close to 100%? Is this doable? -- You received this message because you are subscribed to t

Re: [tesseract-ocr] Facing some problem in understanding fine tuning

2019-05-22 Thread Jennil Thiyam
I think this tiff file is generated after running on tesstrain.sh. am i right?? On Wed 22 May, 2019, 6:24 PM Shree Devi Kumar You have to add the character in the training text and then generate box > tiff paid using the text and a bengali font which supports your additional > character. > > On We

Re: [tesseract-ocr] OCRing simple numbers unreliable

2019-05-22 Thread Lorenzo Bolzani
Hi, try these (in any combination): psm 6 or 7 remove white border (all or most) downscale so that the font is 20/50px tall fine tune a model to recognize only numbers threshold Otherwise post more details about how you are using tesseract. Bye Lorenzo Il giorno mer 22 mag 2019 alle ore 11:

Re: [tesseract-ocr] Facing some problem in understanding fine tuning

2019-05-22 Thread Shree Devi Kumar
You have to add the character in the training text and then generate box tiff paid using the text and a bengali font which supports your additional character. On Wed, 22 May 2019, 18:16 Jennil Thiyam, wrote: > The layout of writing is in some manner in the ben_training.txt, (i have > attached th

Re: [tesseract-ocr] Facing some problem in understanding fine tuning

2019-05-22 Thread Jennil Thiyam
The layout of writing is in some manner in the ben_training.txt, (i have attached the sshot). could u please explain how do i put my character in this file On Wed, May 22, 2019 at 5:35 PM Jennil Thiyam wrote: > we used bengali script, but with one extra character, that is what i want > to add, s

Re: [tesseract-ocr] Facing some problem in understanding fine tuning

2019-05-22 Thread Jennil Thiyam
we used bengali script, but with one extra character, that is what i want to add, so will it work if i put that character in the ben_training.txt like they did in plus-minus training On Wed, May 22, 2019 at 5:24 PM Shree Devi Kumar wrote: > > I want to add only one character in the already exist

Re: [tesseract-ocr] Facing some problem in understanding fine tuning

2019-05-22 Thread Shree Devi Kumar
> I want to add only one character in the already existing ben.traindata model. What character do you want to add? You should be able to do the same process as the plus-minus training for one character as shown in example for English. On Wed, May 22, 2019 at 1:51 PM Jennil Thiyam wrote: > I am

[tesseract-ocr] OCRing simple numbers unreliable

2019-05-22 Thread Borek Lupoměský
I am OCRing numbers from images. I do all the processing with ImageMagick to end up with single isolated number (up to three numerals) Tesseract does good job most of the time, but sometimes it doesn't recognize number correctly. For example this image of number 27

[tesseract-ocr] Facing some problem in understanding fine tuning

2019-05-22 Thread Jennil Thiyam
I am planning to perform fine tuning training in ben.traindata. According to he procedure written it is said to we that "The training requires a new unicharset/recoder, optional language models, and the old traineddata file containing the old unicharset/recoder." Here I get the old traindata, bu