Do not create files manually.
If "make training" does not work it means:

   1. you miss some dependency or input data are wrong
   2. also you miss error message for 1.

I strongly suggest you to start training from the beginning
(including cloning tesstraing) and pay attention to all messages:

git clone --depth 1 https://github.com/tesseract-ocr/tesstrain.git
cd tesstrain
make tesseract-langdata
mkdir tessdata_best
wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata -P
tessdata_best
unzip ocrd-testset.zip -d data/ocrd-ground-truth
make training MODEL_NAME=ocrd TESSDATA=tessdata_best MAX_ITERATIONS=10000


Zdenko


po 5. 6. 2023 o 4:22 Madhav Pandey <mad.develope...@gmail.com> napísal(a):

> Hi Zdenop,
>
> Apologies. I got your name wrong in the thread.
>
> Can you please help me in resolving this issue? Because make training
> command was not creating the all-gt file. I manually created it and kept it
> at the MODEL_NAME directory.
>
> The way I created it was by copy over all the single lines from the text
> files and storing it in the all-gt file. I am not sure if this is the right
> approach. Please correct me if I am wrong here.
>
> Now after doing this, i am getting this error:
>
> python3 shuffle.py 0 "data/Apex/all-lstmf"
> Traceback (most recent call last):
>   File "/Users/madpande/Code/git/tesseract_tutorial/tesstrain/shuffle.py",
> line 24, in <module>
>     fd0 = open(sys.argv[2], 'r')
> FileNotFoundError: [Errno 2] No such file or directory:
> 'data/Apex/all-lstmf'
>
>
> I am pretty sure I am missing something here. Please help!
>
> Thanks!
>
> On Thursday, 1 June 2023 at 23:39:01 UTC-6 Madhav Pandey wrote:
>
>> Hi Zdenko,
>>
>> At what step in the make file the all-gt file is created? I am still
>> unable to move forward with the custom model training.
>>
>> Any help would be greatly appreciated. Thanks!
>>
>> On Wednesday, 26 April 2023 at 09:47:55 UTC-6 zdenop wrote:
>>
>>> make training TESSDATA=./usr/local/share/tessdata
>>> unicharset_extractor --output_unicharset "data/foo/unicharset"
>>> --norm_mode 2 "data/foo/all-gt"
>>>
>>> Failed to read data from: data/foo/all-gt....
>>>
>>>
>>> This indicates you already run training that failed...
>>> Clean your training and start it once again. Pay attention to why
>>> "data/foo/all-gt" is not created (there will be an error message).
>>>
>>> Zdenko
>>>
>>>
>>> st 26. 4. 2023 o 2:07 Madhav Pandey <mad.dev...@gmail.com> napísal(a):
>>>
>>>> @zdenop
>>>>
>>>> This is the entire training output:
>>>>
>>>> ```make training TESSDATA=./usr/local/share/tessdata
>>>> unicharset_extractor --output_unicharset "data/foo/unicharset"
>>>> --norm_mode 2 "data/foo/all-gt"
>>>> Failed to read data from: data/foo/all-gt
>>>> Wrote unicharset file data/foo/unicharset
>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" -t
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.gt.txt" >
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.box"
>>>> set -x; \
>>>>         tesseract
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif"
>>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif
>>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" -t
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.gt.txt" >
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.box"
>>>> set -x; \
>>>>         tesseract
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif"
>>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif
>>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" -t
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.gt.txt" >
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.box"
>>>> set -x; \
>>>>         tesseract
>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif"
>>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif
>>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>> Traceback (most recent call last):
>>>>   File "/Users/m/Code/git/tesstrain/shuffle.py", line 24, in <module>
>>>>     fd0 = open(sys.argv[2], 'r')
>>>> FileNotFoundError: [Errno 2] No such file or directory:
>>>> 'data/foo/all-lstmf'
>>>> make: *** [data/foo/all-lstmf] Error 1```
>>>>
>>>> For this run, I just have 3 text and tif files.
>>>>
>>>> I did follow macos installation section from this page:
>>>> https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos and
>>>> installed everything that is mentioned here.
>>>>
>>>> Do I have to install anything else before running the training?
>>>>
>>>> On Tuesday, 25 April 2023 at 00:27:28 UTC-6 zdenop wrote:
>>>>
>>>>> Did you install all the necessary dependencies?
>>>>> Did you check & fixed all errors (before this error) in training
>>>>> output?
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> ut 25. 4. 2023 o 8:21 Madhav Pandey <mad.dev...@gmail.com> napísal(a):
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> I am relatively new to tesseract and OCR as whole.
>>>>>>
>>>>>> I have been trying to training do the setup for training model
>>>>>> locally using the guide
>>>>>> https://github.com/tesseract-ocr/tesstrain/blob/main/README.md
>>>>>>
>>>>>> I have copied the sample training data into the `data/foo` directory
>>>>>> but when I run `make training`, I will always end up getting this error:
>>>>>>
>>>>>> ```Failed to read data from: data/foo/all-gt
>>>>>> Wrote unicharset file data/foo/unicharset
>>>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>>>> Traceback (most recent call last):
>>>>>>   File "shuffle.py", line 24, in <module>
>>>>>>     fd0 = open(sys.argv[2], 'r')
>>>>>> FileNotFoundError: [Errno 2] No such file or directory:
>>>>>> 'data/foo/all-lstmf'
>>>>>> make: *** [data/foo/all-lstmf] Error 1
>>>>>> ```
>>>>>>
>>>>>> Can someone please help resolve this error?
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>>
>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w2JKVDzw%3DenofWuPG5fWcPbg81YzO7avwKBFPJo3CYQg%40mail.gmail.com.

Reply via email to