Hi Zdenop, Apologies. I got your name wrong in the thread.
Can you please help me in resolving this issue? Because make training command was not creating the all-gt file. I manually created it and kept it at the MODEL_NAME directory. The way I created it was by copy over all the single lines from the text files and storing it in the all-gt file. I am not sure if this is the right approach. Please correct me if I am wrong here. Now after doing this, i am getting this error: python3 shuffle.py 0 "data/Apex/all-lstmf" Traceback (most recent call last): File "/Users/madpande/Code/git/tesseract_tutorial/tesstrain/shuffle.py", line 24, in <module> fd0 = open(sys.argv[2], 'r') FileNotFoundError: [Errno 2] No such file or directory: 'data/Apex/all-lstmf' I am pretty sure I am missing something here. Please help! Thanks! On Thursday, 1 June 2023 at 23:39:01 UTC-6 Madhav Pandey wrote: > Hi Zdenko, > > At what step in the make file the all-gt file is created? I am still > unable to move forward with the custom model training. > > Any help would be greatly appreciated. Thanks! > > On Wednesday, 26 April 2023 at 09:47:55 UTC-6 zdenop wrote: > >> make training TESSDATA=./usr/local/share/tessdata >> unicharset_extractor --output_unicharset "data/foo/unicharset" >> --norm_mode 2 "data/foo/all-gt" >> >> Failed to read data from: data/foo/all-gt.... >> >> >> This indicates you already run training that failed... >> Clean your training and start it once again. Pay attention to why >> "data/foo/all-gt" is not created (there will be an error message). >> >> Zdenko >> >> >> st 26. 4. 2023 o 2:07 Madhav Pandey <mad.dev...@gmail.com> napísal(a): >> >>> @zdenop >>> >>> This is the entire training output: >>> >>> ```make training TESSDATA=./usr/local/share/tessdata >>> unicharset_extractor --output_unicharset "data/foo/unicharset" >>> --norm_mode 2 "data/foo/all-gt" >>> Failed to read data from: data/foo/all-gt >>> Wrote unicharset file data/foo/unicharset >>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i >>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" -t >>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.gt.txt" > >>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.box" >>> set -x; \ >>> tesseract >>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" >>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train >>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif >>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train >>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i >>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" -t >>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.gt.txt" > >>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.box" >>> set -x; \ >>> tesseract >>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" >>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train >>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif >>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train >>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i >>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" -t >>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.gt.txt" > >>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.box" >>> set -x; \ >>> tesseract >>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" >>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train >>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif >>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train >>> python3 shuffle.py 0 "data/foo/all-lstmf" >>> Traceback (most recent call last): >>> File "/Users/m/Code/git/tesstrain/shuffle.py", line 24, in <module> >>> fd0 = open(sys.argv[2], 'r') >>> FileNotFoundError: [Errno 2] No such file or directory: >>> 'data/foo/all-lstmf' >>> make: *** [data/foo/all-lstmf] Error 1``` >>> >>> For this run, I just have 3 text and tif files. >>> >>> I did follow macos installation section from this page: >>> https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos and >>> installed everything that is mentioned here. >>> >>> Do I have to install anything else before running the training? >>> >>> On Tuesday, 25 April 2023 at 00:27:28 UTC-6 zdenop wrote: >>> >>>> Did you install all the necessary dependencies? >>>> Did you check & fixed all errors (before this error) in training output? >>>> >>>> Zdenko >>>> >>>> >>>> ut 25. 4. 2023 o 8:21 Madhav Pandey <mad.dev...@gmail.com> napísal(a): >>>> >>>>> Hi Everyone, >>>>> >>>>> I am relatively new to tesseract and OCR as whole. >>>>> >>>>> I have been trying to training do the setup for training model locally >>>>> using the guide >>>>> https://github.com/tesseract-ocr/tesstrain/blob/main/README.md >>>>> >>>>> I have copied the sample training data into the `data/foo` directory >>>>> but when I run `make training`, I will always end up getting this error: >>>>> >>>>> ```Failed to read data from: data/foo/all-gt >>>>> Wrote unicharset file data/foo/unicharset >>>>> python3 shuffle.py 0 "data/foo/all-lstmf" >>>>> Traceback (most recent call last): >>>>> File "shuffle.py", line 24, in <module> >>>>> fd0 = open(sys.argv[2], 'r') >>>>> FileNotFoundError: [Errno 2] No such file or directory: >>>>> 'data/foo/all-lstmf' >>>>> make: *** [data/foo/all-lstmf] Error 1 >>>>> ``` >>>>> >>>>> Can someone please help resolve this error? >>>>> >>>>> Thank you! >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com.