Can I train my custom images? I'm going to build France Receipts scanner. 
So I need to train these all to increase accuracy. How do you suggest? 
Zdenop

On Saturday, October 28, 2023 at 11:58:10 AM UTC+2 zdenop wrote:

> It does not work on windows (directly) but it works on linux => use WSL if 
> you really need training. 
> Or wait until somebody find a fix for windows (or send the fix - this is 
> an open source project so everybody should contribute ;-) )
>
> Zdenko
>
>
> pi 27. 10. 2023 o 17:32 Dev Solution <develop...@gmail.com> napísal(a):
>
>>
>> I just tried to run these all commands, but I got error 
>> https://prnt.sc/lLHeR27J2U65
>>
>> On Tuesday, June 6, 2023 at 10:03:17 AM UTC+2 zdenop wrote:
>>
>>> Do not create files manually.
>>> If "make training" does not work it means:
>>>
>>>    1. you miss some dependency or input data are wrong
>>>    2. also you miss error message for 1.
>>>
>>> I strongly suggest you to start training from the beginning 
>>> (including cloning tesstraing) and pay attention to all messages:
>>>
>>> git clone --depth 1 https://github.com/tesseract-ocr/tesstrain.git 
>>> cd tesstrain
>>> make tesseract-langdata
>>> mkdir tessdata_best
>>> wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata 
>>> -P tessdata_best
>>> unzip ocrd-testset.zip -d data/ocrd-ground-truth
>>> make training MODEL_NAME=ocrd TESSDATA=tessdata_best MAX_ITERATIONS=10000
>>>
>>>
>>> Zdenko
>>>
>>>
>>> po 5. 6. 2023 o 4:22 Madhav Pandey <mad.dev...@gmail.com> napísal(a):
>>>
>>>> Hi Zdenop,
>>>>
>>>> Apologies. I got your name wrong in the thread. 
>>>>
>>>> Can you please help me in resolving this issue? Because make training 
>>>> command was not creating the all-gt file. I manually created it and kept 
>>>> it 
>>>> at the MODEL_NAME directory. 
>>>>
>>>> The way I created it was by copy over all the single lines from the 
>>>> text files and storing it in the all-gt file. I am not sure if this is the 
>>>> right approach. Please correct me if I am wrong here. 
>>>>
>>>> Now after doing this, i am getting this error:
>>>>
>>>> python3 shuffle.py 0 "data/Apex/all-lstmf"
>>>> Traceback (most recent call last):
>>>>   File 
>>>> "/Users/madpande/Code/git/tesseract_tutorial/tesstrain/shuffle.py", line 
>>>> 24, in <module>
>>>>     fd0 = open(sys.argv[2], 'r')
>>>> FileNotFoundError: [Errno 2] No such file or directory: 
>>>> 'data/Apex/all-lstmf'
>>>>
>>>>
>>>> I am pretty sure I am missing something here. Please help!
>>>>
>>>> Thanks!
>>>>
>>>> On Thursday, 1 June 2023 at 23:39:01 UTC-6 Madhav Pandey wrote:
>>>>
>>>>> Hi Zdenko,
>>>>>
>>>>> At what step in the make file the all-gt file is created? I am still 
>>>>> unable to move forward with the custom model training. 
>>>>>
>>>>> Any help would be greatly appreciated. Thanks!
>>>>>
>>>>> On Wednesday, 26 April 2023 at 09:47:55 UTC-6 zdenop wrote:
>>>>>
>>>>>> make training TESSDATA=./usr/local/share/tessdata
>>>>>> unicharset_extractor --output_unicharset "data/foo/unicharset" 
>>>>>> --norm_mode 2 "data/foo/all-gt"
>>>>>>
>>>>>> Failed to read data from: data/foo/all-gt....
>>>>>>
>>>>>>
>>>>>> This indicates you already run training that failed...
>>>>>> Clean your training and start it once again. Pay attention to why 
>>>>>> "data/foo/all-gt" is not created (there will be an error message).
>>>>>>
>>>>>> Zdenko
>>>>>>
>>>>>>
>>>>>> st 26. 4. 2023 o 2:07 Madhav Pandey <mad.dev...@gmail.com> 
>>>>>> napísal(a):
>>>>>>
>>>>>>> @zdenop 
>>>>>>>
>>>>>>> This is the entire training output:
>>>>>>>
>>>>>>> ```make training TESSDATA=./usr/local/share/tessdata
>>>>>>> unicharset_extractor --output_unicharset "data/foo/unicharset" 
>>>>>>> --norm_mode 2 "data/foo/all-gt"
>>>>>>> Failed to read data from: data/foo/all-gt
>>>>>>> Wrote unicharset file data/foo/unicharset
>>>>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" -t 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.gt.txt" > 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.box"
>>>>>>> set -x; \
>>>>>>>         tesseract 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" 
>>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>>>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif 
>>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>>>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" -t 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.gt.txt" > 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.box"
>>>>>>> set -x; \
>>>>>>>         tesseract 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" 
>>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>>>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif 
>>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>>>>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" -t 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.gt.txt" > 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.box"
>>>>>>> set -x; \
>>>>>>>         tesseract 
>>>>>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" 
>>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>>>>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif 
>>>>>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>>>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>>>>> Traceback (most recent call last):
>>>>>>>   File "/Users/m/Code/git/tesstrain/shuffle.py", line 24, in <module>
>>>>>>>     fd0 = open(sys.argv[2], 'r')
>>>>>>> FileNotFoundError: [Errno 2] No such file or directory: 
>>>>>>> 'data/foo/all-lstmf'
>>>>>>> make: *** [data/foo/all-lstmf] Error 1```
>>>>>>>
>>>>>>> For this run, I just have 3 text and tif files. 
>>>>>>>
>>>>>>> I did follow macos installation section from this page: 
>>>>>>> https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos and 
>>>>>>> installed everything that is mentioned here. 
>>>>>>>
>>>>>>> Do I have to install anything else before running the training? 
>>>>>>>
>>>>>>> On Tuesday, 25 April 2023 at 00:27:28 UTC-6 zdenop wrote:
>>>>>>>
>>>>>>>> Did you install all the necessary dependencies?
>>>>>>>> Did you check & fixed all errors (before this error) in training 
>>>>>>>> output?
>>>>>>>>
>>>>>>>> Zdenko
>>>>>>>>
>>>>>>>>
>>>>>>>> ut 25. 4. 2023 o 8:21 Madhav Pandey <mad.dev...@gmail.com> 
>>>>>>>> napísal(a):
>>>>>>>>
>>>>>>>>> Hi Everyone,
>>>>>>>>>
>>>>>>>>> I am relatively new to tesseract and OCR as whole. 
>>>>>>>>>
>>>>>>>>> I have been trying to training do the setup for training model 
>>>>>>>>> locally using the guide 
>>>>>>>>> https://github.com/tesseract-ocr/tesstrain/blob/main/README.md
>>>>>>>>>
>>>>>>>>> I have copied the sample training data into the `data/foo` 
>>>>>>>>> directory but when I run `make training`, I will always end up 
>>>>>>>>> getting this 
>>>>>>>>> error:
>>>>>>>>>
>>>>>>>>> ```Failed to read data from: data/foo/all-gt
>>>>>>>>> Wrote unicharset file data/foo/unicharset
>>>>>>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>   File "shuffle.py", line 24, in <module>
>>>>>>>>>     fd0 = open(sys.argv[2], 'r')
>>>>>>>>> FileNotFoundError: [Errno 2] No such file or directory: 
>>>>>>>>> 'data/foo/all-lstmf'
>>>>>>>>> make: *** [data/foo/all-lstmf] Error 1
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> Can someone please help resolve this error?
>>>>>>>>>
>>>>>>>>> Thank you!
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>
>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/22a2db2c-0738-4d5c-99de-f7761d40ddeen%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/22a2db2c-0738-4d5c-99de-f7761d40ddeen%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f50e814c-3edf-45ef-aed6-bb379b2d1ef0n%40googlegroups.com.

Reply via email to