I just tried to run these all commands, but I got 
error https://prnt.sc/lLHeR27J2U65

On Tuesday, June 6, 2023 at 10:03:17 AM UTC+2 zdenop wrote:

> Do not create files manually.
> If "make training" does not work it means:
>
>    1. you miss some dependency or input data are wrong
>    2. also you miss error message for 1.
>
> I strongly suggest you to start training from the beginning 
> (including cloning tesstraing) and pay attention to all messages:
>
> git clone --depth 1 https://github.com/tesseract-ocr/tesstrain.git 
> cd tesstrain
> make tesseract-langdata
> mkdir tessdata_best
> wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata 
> -P tessdata_best
> unzip ocrd-testset.zip -d data/ocrd-ground-truth
> make training MODEL_NAME=ocrd TESSDATA=tessdata_best MAX_ITERATIONS=10000
>
>
> Zdenko
>
>
> po 5. 6. 2023 o 4:22 Madhav Pandey <mad.dev...@gmail.com> napísal(a):
>
>> Hi Zdenop,
>>
>> Apologies. I got your name wrong in the thread. 
>>
>> Can you please help me in resolving this issue? Because make training 
>> command was not creating the all-gt file. I manually created it and kept it 
>> at the MODEL_NAME directory. 
>>
>> The way I created it was by copy over all the single lines from the text 
>> files and storing it in the all-gt file. I am not sure if this is the right 
>> approach. Please correct me if I am wrong here. 
>>
>> Now after doing this, i am getting this error:
>>
>> python3 shuffle.py 0 "data/Apex/all-lstmf"
>> Traceback (most recent call last):
>>   File 
>> "/Users/madpande/Code/git/tesseract_tutorial/tesstrain/shuffle.py", line 
>> 24, in <module>
>>     fd0 = open(sys.argv[2], 'r')
>> FileNotFoundError: [Errno 2] No such file or directory: 
>> 'data/Apex/all-lstmf'
>>
>>
>> I am pretty sure I am missing something here. Please help!
>>
>> Thanks!
>>
>> On Thursday, 1 June 2023 at 23:39:01 UTC-6 Madhav Pandey wrote:
>>
>>> Hi Zdenko,
>>>
>>> At what step in the make file the all-gt file is created? I am still 
>>> unable to move forward with the custom model training. 
>>>
>>> Any help would be greatly appreciated. Thanks!
>>>
>>> On Wednesday, 26 April 2023 at 09:47:55 UTC-6 zdenop wrote:
>>>
>>>> make training TESSDATA=./usr/local/share/tessdata
>>>> unicharset_extractor --output_unicharset "data/foo/unicharset" 
>>>> --norm_mode 2 "data/foo/all-gt"
>>>>
>>>> Failed to read data from: data/foo/all-gt....
>>>>
>>>>
>>>> This indicates you already run training that failed...
>>>> Clean your training and start it once again. Pay attention to why 
>>>> "data/foo/all-gt" is not created (there will be an error message).
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> st 26. 4. 2023 o 2:07 Madhav Pandey <mad.dev...@gmail.com> napísal(a):
>>>>
>>>>> @zdenop 
>>>>>
>>>>> This is the entire training output:
>>>>>
>>>>> ```make training TESSDATA=./usr/local/share/tessdata
>>>>> unicharset_extractor --output_unicharset "data/foo/unicharset" 
>>>>> --norm_mode 2 "data/foo/all-gt"
>>>>> Failed to read data from: data/foo/all-gt
>>>>> Wrote unicharset file data/foo/unicharset
>>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" -t 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.gt.txt" > 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.box"
>>>>> set -x; \
>>>>>         tesseract 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" 
>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif 
>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" -t 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.gt.txt" > 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.box"
>>>>> set -x; \
>>>>>         tesseract 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" 
>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif 
>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" -t 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.gt.txt" > 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.box"
>>>>> set -x; \
>>>>>         tesseract 
>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" 
>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif 
>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>>> Traceback (most recent call last):
>>>>>   File "/Users/m/Code/git/tesstrain/shuffle.py", line 24, in <module>
>>>>>     fd0 = open(sys.argv[2], 'r')
>>>>> FileNotFoundError: [Errno 2] No such file or directory: 
>>>>> 'data/foo/all-lstmf'
>>>>> make: *** [data/foo/all-lstmf] Error 1```
>>>>>
>>>>> For this run, I just have 3 text and tif files. 
>>>>>
>>>>> I did follow macos installation section from this page: 
>>>>> https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos and 
>>>>> installed everything that is mentioned here. 
>>>>>
>>>>> Do I have to install anything else before running the training? 
>>>>>
>>>>> On Tuesday, 25 April 2023 at 00:27:28 UTC-6 zdenop wrote:
>>>>>
>>>>>> Did you install all the necessary dependencies?
>>>>>> Did you check & fixed all errors (before this error) in training 
>>>>>> output?
>>>>>>
>>>>>> Zdenko
>>>>>>
>>>>>>
>>>>>> ut 25. 4. 2023 o 8:21 Madhav Pandey <mad.dev...@gmail.com> 
>>>>>> napísal(a):
>>>>>>
>>>>>>> Hi Everyone,
>>>>>>>
>>>>>>> I am relatively new to tesseract and OCR as whole. 
>>>>>>>
>>>>>>> I have been trying to training do the setup for training model 
>>>>>>> locally using the guide 
>>>>>>> https://github.com/tesseract-ocr/tesstrain/blob/main/README.md
>>>>>>>
>>>>>>> I have copied the sample training data into the `data/foo` directory 
>>>>>>> but when I run `make training`, I will always end up getting this error:
>>>>>>>
>>>>>>> ```Failed to read data from: data/foo/all-gt
>>>>>>> Wrote unicharset file data/foo/unicharset
>>>>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>>>>> Traceback (most recent call last):
>>>>>>>   File "shuffle.py", line 24, in <module>
>>>>>>>     fd0 = open(sys.argv[2], 'r')
>>>>>>> FileNotFoundError: [Errno 2] No such file or directory: 
>>>>>>> 'data/foo/all-lstmf'
>>>>>>> make: *** [data/foo/all-lstmf] Error 1
>>>>>>> ```
>>>>>>>
>>>>>>> Can someone please help resolve this error?
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/22a2db2c-0738-4d5c-99de-f7761d40ddeen%40googlegroups.com.

Reply via email to