I did a find command in the root directory and searched for the tesstrain 
script. It could only find the script that i pulled from the latest 
tesseract git repo. My training script calls that specific tesstrain script 
using a relative path so it couldn't be an older version

On Thursday, January 7, 2021 at 11:01:55 AM UTC-6 shree wrote:

> Old versions of tesstrain.sh used to limit training to 3 pages. Looks like 
> you may have an old version in the path somewhere.
>
> On Thu, Jan 7, 2021 at 10:17 PM Kamui 7 <qntmm...@gmail.com> wrote:
>
>> I have a script to train tesseract and I ran it on Arch Linux, Debian, 
>> and even a docker container and they all produce the same errors. I checked 
>> to make sure the script is correct as well. 
>>
>> Bug 1:
>> This happens when tesstrain runs text2image. The max pages parameter does 
>> not work at all. It ends up only rendering 4 pages regardless of what I 
>> pass in for the maxpages parameter. I even tried hardcoding it into the 
>> tesstrain_utils.sh file and it still does the same thing. 
>>
>> Bug 2:
>> After it finishes producing those 4 pages, i finetune it with 
>> lstmtraining and the resulting output is full of "Encoding of string 
>> failed!" errors.
>>
>> Bug 3:
>> Along with those encoding errors, it also outputs the following text:
>>
>> "Image too small to scale!! (2x48 vs min width of 3)
>> Line cannot be recognized!!
>> Image not trainable"
>>
>> I will upload my script along with the Dockerfile if anyone wants to take 
>> a look. 
>>
>>
>> https://drive.google.com/file/d/1FkW1q1cXwOxY6Yi1A1cMzInbtJa9L01M/view?usp=sharing
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/42a49dfd-7b52-437e-8840-9dbdddbad0aen%40googlegroups.com.

Reply via email to