Looks like that fixed bug #1. Now it is able to successfully create 400 
pages. Do you have any ideas as to why the other 2 errors are occurring?
On Thursday, January 7, 2021 at 11:28:12 AM UTC-6 shree wrote:

> Your training text file is only 175 lines, so the rendered image fits in 4 
> pages. You need to use a larger text if you want more pages.
>
> Also check that your fonts support both English and Japanese as the text 
> seems to have samples of both languages.
>
> On Thu, Jan 7, 2021, 22:40 Kamui 7 <qntmm...@gmail.com> wrote:
>
>> I did a find command in the root directory and searched for the tesstrain 
>> script. It could only find the script that i pulled from the latest 
>> tesseract git repo. My training script calls that specific tesstrain script 
>> using a relative path so it couldn't be an older version
>>
>> On Thursday, January 7, 2021 at 11:01:55 AM UTC-6 shree wrote:
>>
>>> Old versions of tesstrain.sh used to limit training to 3 pages. Looks 
>>> like you may have an old version in the path somewhere.
>>>
>>> On Thu, Jan 7, 2021 at 10:17 PM Kamui 7 <qntmm...@gmail.com> wrote:
>>>
>>>> I have a script to train tesseract and I ran it on Arch Linux, Debian, 
>>>> and even a docker container and they all produce the same errors. I 
>>>> checked 
>>>> to make sure the script is correct as well. 
>>>>
>>>> Bug 1:
>>>> This happens when tesstrain runs text2image. The max pages parameter 
>>>> does not work at all. It ends up only rendering 4 pages regardless of what 
>>>> I pass in for the maxpages parameter. I even tried hardcoding it into the 
>>>> tesstrain_utils.sh file and it still does the same thing. 
>>>>
>>>> Bug 2:
>>>> After it finishes producing those 4 pages, i finetune it with 
>>>> lstmtraining and the resulting output is full of "Encoding of string 
>>>> failed!" errors.
>>>>
>>>> Bug 3:
>>>> Along with those encoding errors, it also outputs the following text:
>>>>
>>>> "Image too small to scale!! (2x48 vs min width of 3)
>>>> Line cannot be recognized!!
>>>> Image not trainable"
>>>>
>>>> I will upload my script along with the Dockerfile if anyone wants to 
>>>> take a look. 
>>>>
>>>>
>>>> https://drive.google.com/file/d/1FkW1q1cXwOxY6Yi1A1cMzInbtJa9L01M/view?usp=sharing
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/42a49dfd-7b52-437e-8840-9dbdddbad0aen%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/42a49dfd-7b52-437e-8840-9dbdddbad0aen%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/61c8baf6-837f-47f9-ab1d-bc636722194an%40googlegroups.com.

Reply via email to