Hi Zdenko, At what step in the make file the all-gt file is created? I am still unable to move forward with the custom model training.
Any help would be greatly appreciated. Thanks! On Wednesday, 26 April 2023 at 09:47:55 UTC-6 zdenop wrote: > make training TESSDATA=./usr/local/share/tessdata > unicharset_extractor --output_unicharset "data/foo/unicharset" --norm_mode > 2 "data/foo/all-gt" > > Failed to read data from: data/foo/all-gt.... > > > This indicates you already run training that failed... > Clean your training and start it once again. Pay attention to why > "data/foo/all-gt" is not created (there will be an error message). > > Zdenko > > > st 26. 4. 2023 o 2:07 Madhav Pandey <mad.dev...@gmail.com> napísal(a): > >> @zdenop >> >> This is the entire training output: >> >> ```make training TESSDATA=./usr/local/share/tessdata >> unicharset_extractor --output_unicharset "data/foo/unicharset" >> --norm_mode 2 "data/foo/all-gt" >> Failed to read data from: data/foo/all-gt >> Wrote unicharset file data/foo/unicharset >> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i >> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" -t >> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.gt.txt" > >> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.box" >> set -x; \ >> tesseract "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" >> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train >> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif >> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train >> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i >> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" -t >> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.gt.txt" > >> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.box" >> set -x; \ >> tesseract "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" >> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train >> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif >> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train >> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i >> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" -t >> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.gt.txt" > >> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.box" >> set -x; \ >> tesseract "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" >> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train >> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif >> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train >> python3 shuffle.py 0 "data/foo/all-lstmf" >> Traceback (most recent call last): >> File "/Users/m/Code/git/tesstrain/shuffle.py", line 24, in <module> >> fd0 = open(sys.argv[2], 'r') >> FileNotFoundError: [Errno 2] No such file or directory: >> 'data/foo/all-lstmf' >> make: *** [data/foo/all-lstmf] Error 1``` >> >> For this run, I just have 3 text and tif files. >> >> I did follow macos installation section from this page: >> https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos and >> installed everything that is mentioned here. >> >> Do I have to install anything else before running the training? >> >> On Tuesday, 25 April 2023 at 00:27:28 UTC-6 zdenop wrote: >> >>> Did you install all the necessary dependencies? >>> Did you check & fixed all errors (before this error) in training output? >>> >>> Zdenko >>> >>> >>> ut 25. 4. 2023 o 8:21 Madhav Pandey <mad.dev...@gmail.com> napísal(a): >>> >>>> Hi Everyone, >>>> >>>> I am relatively new to tesseract and OCR as whole. >>>> >>>> I have been trying to training do the setup for training model locally >>>> using the guide >>>> https://github.com/tesseract-ocr/tesstrain/blob/main/README.md >>>> >>>> I have copied the sample training data into the `data/foo` directory >>>> but when I run `make training`, I will always end up getting this error: >>>> >>>> ```Failed to read data from: data/foo/all-gt >>>> Wrote unicharset file data/foo/unicharset >>>> python3 shuffle.py 0 "data/foo/all-lstmf" >>>> Traceback (most recent call last): >>>> File "shuffle.py", line 24, in <module> >>>> fd0 = open(sys.argv[2], 'r') >>>> FileNotFoundError: [Errno 2] No such file or directory: >>>> 'data/foo/all-lstmf' >>>> make: *** [data/foo/all-lstmf] Error 1 >>>> ``` >>>> >>>> Can someone please help resolve this error? >>>> >>>> Thank you! >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e5e096e9-2e66-4877-882c-513110aa43f6n%40googlegroups.com.