make training TESSDATA=./usr/local/share/tessdata unicharset_extractor --output_unicharset "data/foo/unicharset" --norm_mode 2 "data/foo/all-gt" Failed to read data from: data/foo/all-gt....
This indicates you already run training that failed... Clean your training and start it once again. Pay attention to why "data/foo/all-gt" is not created (there will be an error message). Zdenko st 26. 4. 2023 o 2:07 Madhav Pandey <mad.develope...@gmail.com> napísal(a): > @zdenop > > This is the entire training output: > > ```make training TESSDATA=./usr/local/share/tessdata > unicharset_extractor --output_unicharset "data/foo/unicharset" --norm_mode > 2 "data/foo/all-gt" > Failed to read data from: data/foo/all-gt > Wrote unicharset file data/foo/unicharset > PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i > "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" -t > "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.gt.txt" > > "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.box" > set -x; \ > tesseract "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" > data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train > + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif > data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train > PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i > "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" -t > "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.gt.txt" > > "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.box" > set -x; \ > tesseract "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" > data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train > + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif > data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train > PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i > "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" -t > "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.gt.txt" > > "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.box" > set -x; \ > tesseract "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" > data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train > + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif > data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train > python3 shuffle.py 0 "data/foo/all-lstmf" > Traceback (most recent call last): > File "/Users/m/Code/git/tesstrain/shuffle.py", line 24, in <module> > fd0 = open(sys.argv[2], 'r') > FileNotFoundError: [Errno 2] No such file or directory: > 'data/foo/all-lstmf' > make: *** [data/foo/all-lstmf] Error 1``` > > For this run, I just have 3 text and tif files. > > I did follow macos installation section from this page: > https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos and > installed everything that is mentioned here. > > Do I have to install anything else before running the training? > > On Tuesday, 25 April 2023 at 00:27:28 UTC-6 zdenop wrote: > >> Did you install all the necessary dependencies? >> Did you check & fixed all errors (before this error) in training output? >> >> Zdenko >> >> >> ut 25. 4. 2023 o 8:21 Madhav Pandey <mad.dev...@gmail.com> napísal(a): >> >>> Hi Everyone, >>> >>> I am relatively new to tesseract and OCR as whole. >>> >>> I have been trying to training do the setup for training model locally >>> using the guide >>> https://github.com/tesseract-ocr/tesstrain/blob/main/README.md >>> >>> I have copied the sample training data into the `data/foo` directory but >>> when I run `make training`, I will always end up getting this error: >>> >>> ```Failed to read data from: data/foo/all-gt >>> Wrote unicharset file data/foo/unicharset >>> python3 shuffle.py 0 "data/foo/all-lstmf" >>> Traceback (most recent call last): >>> File "shuffle.py", line 24, in <module> >>> fd0 = open(sys.argv[2], 'r') >>> FileNotFoundError: [Errno 2] No such file or directory: >>> 'data/foo/all-lstmf' >>> make: *** [data/foo/all-lstmf] Error 1 >>> ``` >>> >>> Can someone please help resolve this error? >>> >>> Thank you! >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yCzW0VS4ybdioMTweYTN9NVe%3DaiWZbtLV_hT4Ae-SLjA%40mail.gmail.com.