Hi Zdenko,

At what step in the make file the all-gt file is created? I am still unable 
to move forward with the custom model training. 

Any help would be greatly appreciated. Thanks!

On Wednesday, 26 April 2023 at 09:47:55 UTC-6 zdenop wrote:

> make training TESSDATA=./usr/local/share/tessdata
> unicharset_extractor --output_unicharset "data/foo/unicharset" --norm_mode 
> 2 "data/foo/all-gt"
>
> Failed to read data from: data/foo/all-gt....
>
>
> This indicates you already run training that failed...
> Clean your training and start it once again. Pay attention to why 
> "data/foo/all-gt" is not created (there will be an error message).
>
> Zdenko
>
>
> st 26. 4. 2023 o 2:07 Madhav Pandey <mad.dev...@gmail.com> napísal(a):
>
>> @zdenop 
>>
>> This is the entire training output:
>>
>> ```make training TESSDATA=./usr/local/share/tessdata
>> unicharset_extractor --output_unicharset "data/foo/unicharset" 
>> --norm_mode 2 "data/foo/all-gt"
>> Failed to read data from: data/foo/all-gt
>> Wrote unicharset file data/foo/unicharset
>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" -t 
>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.gt.txt" > 
>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.box"
>> set -x; \
>>         tesseract "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" 
>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif 
>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" -t 
>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.gt.txt" > 
>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.box"
>> set -x; \
>>         tesseract "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" 
>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif 
>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" -t 
>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.gt.txt" > 
>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.box"
>> set -x; \
>>         tesseract "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" 
>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif 
>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>> python3 shuffle.py 0 "data/foo/all-lstmf"
>> Traceback (most recent call last):
>>   File "/Users/m/Code/git/tesstrain/shuffle.py", line 24, in <module>
>>     fd0 = open(sys.argv[2], 'r')
>> FileNotFoundError: [Errno 2] No such file or directory: 
>> 'data/foo/all-lstmf'
>> make: *** [data/foo/all-lstmf] Error 1```
>>
>> For this run, I just have 3 text and tif files. 
>>
>> I did follow macos installation section from this page: 
>> https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos and 
>> installed everything that is mentioned here. 
>>
>> Do I have to install anything else before running the training? 
>>
>> On Tuesday, 25 April 2023 at 00:27:28 UTC-6 zdenop wrote:
>>
>>> Did you install all the necessary dependencies?
>>> Did you check & fixed all errors (before this error) in training output?
>>>
>>> Zdenko
>>>
>>>
>>> ut 25. 4. 2023 o 8:21 Madhav Pandey <mad.dev...@gmail.com> napísal(a):
>>>
>>>> Hi Everyone,
>>>>
>>>> I am relatively new to tesseract and OCR as whole. 
>>>>
>>>> I have been trying to training do the setup for training model locally 
>>>> using the guide 
>>>> https://github.com/tesseract-ocr/tesstrain/blob/main/README.md
>>>>
>>>> I have copied the sample training data into the `data/foo` directory 
>>>> but when I run `make training`, I will always end up getting this error:
>>>>
>>>> ```Failed to read data from: data/foo/all-gt
>>>> Wrote unicharset file data/foo/unicharset
>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>> Traceback (most recent call last):
>>>>   File "shuffle.py", line 24, in <module>
>>>>     fd0 = open(sys.argv[2], 'r')
>>>> FileNotFoundError: [Errno 2] No such file or directory: 
>>>> 'data/foo/all-lstmf'
>>>> make: *** [data/foo/all-lstmf] Error 1
>>>> ```
>>>>
>>>> Can someone please help resolve this error?
>>>>
>>>> Thank you!
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e5e096e9-2e66-4877-882c-513110aa43f6n%40googlegroups.com.

Reply via email to