[tesseract-ocr] Deserialize header failed: while making lstmf file

2021-09-23 Thread Meet Yogi
I'm using command tesseract tiff_file_path name_of_lstm_file lstm.train for example tesseract batch3.tiff batch3 lstm.train while doing so I'm getting the following error Tesseract Open Source OCR Engine v4.1.1 with Leptonica Page 1 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating

[tesseract-ocr] Re: Spaces recognition

2021-09-23 Thread David Smith
Did you try setting the preserve_interword_spaces flag? On Wednesday, 22 September 2021 at 06:27:58 UTC+1 julioah...@gmail.com wrote: > Hi guys > > I am using tesseract-OCR to read text from image, the issue is that it > does not recognize the spaces between words. > > I am using C++, can anyb

Re: [tesseract-ocr] Re: v4.1.1 - Segmentation fault on train data generation; all .lstmf files are exactly 1GB

2021-09-23 Thread Sim Tov
The reason I use v4.1.1 is because it is the version that is supplied with the recently released stable Debian 11. It will remain like this for the next 2 years (approx).. So my question is - whether it is OK to use the .lstmf files I got so far for training, or must the process of their genera

[tesseract-ocr] Re: Deserialize header failed: while making lstmf file

2021-09-23 Thread Sim Tov
What is inside your training_text file? I had similar issue when the lines in that file were too long... try to make them as short as 5-7 words (and then break it with a newline) On Thursday, September 23, 2021 at 10:07:26 AM UTC+3 Meet Yogi wrote: > I'm using command > tesseract tiff_file_pa