Do you have a ground-truth?

On Friday, October 27, 2023 at 6:32:38 PM UTC+3 develop...@gmail.com wrote:

>
> I just tried to run these all commands, but I got error 
> https://prnt.sc/lLHeR27J2U65
>
> On Tuesday, June 6, 2023 at 10:03:17 AM UTC+2 zdenop wrote:
>
>> Do not create files manually.
>> If "make training" does not work it means:
>>
>>    1. you miss some dependency or input data are wrong
>>    2. also you miss error message for 1.
>>
>> I strongly suggest you to start training from the beginning 
>> (including cloning tesstraing) and pay attention to all messages:
>>
>> git clone --depth 1 https://github.com/tesseract-ocr/tesstrain.git 
>> cd tesstrain
>> make tesseract-langdata
>> mkdir tessdata_best
>> wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata 
>> -P tessdata_best
>> unzip ocrd-testset.zip -d data/ocrd-ground-truth
>> make training MODEL_NAME=ocrd TESSDATA=tessdata_best MAX_ITERATIONS=10000
>>
>>
>> Zdenko
>>
>>
>> po 5. 6. 2023 o 4:22 Madhav Pandey <mad.dev...@gmail.com> napísal(a):
>>
>>> Hi Zdenop,
>>>
>>> Apologies. I got your name wrong in the thread. 
>>>
>>> Can you please help me in resolving this issue? Because make training 
>>> command was not creating the all-gt file. I manually created it and kept it 
>>> at the MODEL_NAME directory. 
>>>
>>> The way I created it was by copy over all the single lines from the text 
>>> files and storing it in the all-gt file. I am not sure if this is the right 
>>> approach. Please correct me if I am wrong here. 
>>>
>>> Now after doing this, i am getting this error:
>>>
>>> python3 shuffle.py 0 "data/Apex/all-lstmf"
>>> Traceback (most recent call last):
>>>   File 
>>> "/Users/madpande/Code/git/tesseract_tutorial/tesstrain/shuffle.py", line 
>>> 24, in <module>
>>>     fd0 = open(sys.argv[2], 'r')
>>> FileNotFoundError: [Errno 2] No such file or directory: 
>>> 'data/Apex/all-lstmf'
>>>
>>>
>>> I am pretty sure I am missing something here. Please help!
>>>
>>> Thanks!
>>>
>>> On Thursday, 1 June 2023 at 23:39:01 UTC-6 Madhav Pandey wrote:
>>>
>>>> Hi Zdenko,
>>>>
>>>> At what step in the make file the all-gt file is created? I am still 
>>>> unable to move forward with the custom model training. 
>>>>
>>>> Any help would be greatly appreciated. Thanks!
>>>>
>>>> On Wednesday, 26 April 2023 at 09:47:55 UTC-6 zdenop wrote:
>>>>
>>>>> make training TESSDATA=./usr/local/share/tessdata
>>>>> unicharset_extractor --output_unicharset "data/foo/unicharset" 
>>>>> --norm_mode 2 "data/foo/all-gt"
>>>>>
>>>>> Failed to read data from: data/foo/all-gt....
>>>>>
>>>>>
>>>>> This indicates you already run training that failed...
>>>>> Clean your training and start it once again. Pay attention to why 
>>>>> "data/foo/all-gt" is not created (there will be an error message).
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> st 26. 4. 2023 o 2:07 Madhav Pandey <mad.dev...@gmail.com> napísal(a):
>>>>>
>>>>>> @zdenop 
>>>>>>
>>>>>> This is the entire training output:
>>>>>>
>>>>>> ```make training TESSDATA=./usr/local/share/tessdata
>>>>>> unicharset_extractor --output_unicharset "data/foo/unicharset" 
>>>>>> --norm_mode 2 "data/foo/all-gt"
>>>>>> Failed to read data from: data/foo/all-gt
>>>>>> Wrote unicharset file data/foo/unicharset
>>>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" -t 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.gt.txt" > 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.box"
>>>>>> set -x; \
>>>>>>         tesseract 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" 
>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif 
>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" -t 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.gt.txt" > 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.box"
>>>>>> set -x; \
>>>>>>         tesseract 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" 
>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif 
>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" -t 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.gt.txt" > 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.box"
>>>>>> set -x; \
>>>>>>         tesseract 
>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" 
>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif 
>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>>>> Traceback (most recent call last):
>>>>>>   File "/Users/m/Code/git/tesstrain/shuffle.py", line 24, in <module>
>>>>>>     fd0 = open(sys.argv[2], 'r')
>>>>>> FileNotFoundError: [Errno 2] No such file or directory: 
>>>>>> 'data/foo/all-lstmf'
>>>>>> make: *** [data/foo/all-lstmf] Error 1```
>>>>>>
>>>>>> For this run, I just have 3 text and tif files. 
>>>>>>
>>>>>> I did follow macos installation section from this page: 
>>>>>> https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos and 
>>>>>> installed everything that is mentioned here. 
>>>>>>
>>>>>> Do I have to install anything else before running the training? 
>>>>>>
>>>>>> On Tuesday, 25 April 2023 at 00:27:28 UTC-6 zdenop wrote:
>>>>>>
>>>>>>> Did you install all the necessary dependencies?
>>>>>>> Did you check & fixed all errors (before this error) in training 
>>>>>>> output?
>>>>>>>
>>>>>>> Zdenko
>>>>>>>
>>>>>>>
>>>>>>> ut 25. 4. 2023 o 8:21 Madhav Pandey <mad.dev...@gmail.com> 
>>>>>>> napísal(a):
>>>>>>>
>>>>>>>> Hi Everyone,
>>>>>>>>
>>>>>>>> I am relatively new to tesseract and OCR as whole. 
>>>>>>>>
>>>>>>>> I have been trying to training do the setup for training model 
>>>>>>>> locally using the guide 
>>>>>>>> https://github.com/tesseract-ocr/tesstrain/blob/main/README.md
>>>>>>>>
>>>>>>>> I have copied the sample training data into the `data/foo` 
>>>>>>>> directory but when I run `make training`, I will always end up getting 
>>>>>>>> this 
>>>>>>>> error:
>>>>>>>>
>>>>>>>> ```Failed to read data from: data/foo/all-gt
>>>>>>>> Wrote unicharset file data/foo/unicharset
>>>>>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>>>>>> Traceback (most recent call last):
>>>>>>>>   File "shuffle.py", line 24, in <module>
>>>>>>>>     fd0 = open(sys.argv[2], 'r')
>>>>>>>> FileNotFoundError: [Errno 2] No such file or directory: 
>>>>>>>> 'data/foo/all-lstmf'
>>>>>>>> make: *** [data/foo/all-lstmf] Error 1
>>>>>>>> ```
>>>>>>>>
>>>>>>>> Can someone please help resolve this error?
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com
>>>>>>>>  
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>
>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/09681859-3aeb-4c6c-92a8-904254fd4e35n%40googlegroups.com.

Reply via email to