[tesseract-ocr] Re: Tesseract mistakes letters for numbers

2021-08-11 Thread zdenop
tesseract string.jpg - Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 558 SI312533 I use language model from here https://github.com/tesseract-ocr/tessdata and tesseract 4.1.1 leptonica-1.81.0 (May 22 2021, 16:14:25) [MSC v.1928 LIB Release x64] libgif 5.2.1 : li

[tesseract-ocr] Re: Digits recognition

2021-08-11 Thread zdenop
Use legacy engine instead of LSTM (you will need language model from https://github.com/tesseract-ocr/tessdata): tesseract processed-res.png - --oem 0 Estimating resolution as 878 115 Dátum: štvrtok 15. júla 2021, čas: 15:18:29 UTC+2, odosielateľ: yss...@gmail.com > Hi, > > I'm n00b for tess

[tesseract-ocr] PDF Font Family.

2021-08-11 Thread Saddam Quraishi
Hi All, I have created from Image to Searchable PDF. But Here we observed that font style type of Text behind the image is glyphless by default. Can some one help to set proper font style type as per input file content. Regards, Saddam -- You received this message because you are subscribed t

[tesseract-ocr] Re: Digits recognition

2021-08-11 Thread Ajinkya Bobade
Hello, Tesseract 4 isn't designed for digit recognition, Tesseract 4 identifies relationship between digits and then predicts the word/ sentence as a whole Regards Ajinkya Creator of AI Scanner https://imagescanner-online.com/ On Thursday, 15 July 2021 at 18:48:29 UTC+5:30 yss...@gmail.com wro

[tesseract-ocr] Re: Number recognition from images

2021-08-11 Thread Ajinkya Bobade
Hello, You need to first focus on localizing text and then using Tesseract on this localized text. Tesseract cannot be directly applied on images which contain lots of non text background. Regards Ajinkya Creator of AI Scanner https://imagescanner-online.com/ On Friday, 16 July 2021 at 23:27:

[tesseract-ocr] Re: Tesseract mistakes letters for numbers

2021-08-11 Thread Ajinkya Bobade
Hello, To do this you will need to retrain Tessearct on top of the model that you currently use. The current model that you use is not trained on this specific font, so it approximates the digit, take few samples of the format that you need and retrain it on top of original weights. If you have

[tesseract-ocr] Re: PSM value

2021-08-11 Thread Ajinkya Bobade
Hello, Whether it needs training or not depends on your test set, for many clients I have trained tesseract from scratch because it required highly specific output. If that is the case you need to retrain. But if its generic use case there is no need to retrain. I am making a wild guess here th