Check this https://github.com/tesseract-ocr/tesseract/issues/1685
On Wed, 1 Sep 2021 at 6:22 PM, Samruddhi Dhake <sam22dh...@gmail.com> wrote: > For images. > I have to create my own trainneddata for my images. So for that I am > following steps mentioned in this documentation > https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html > As per the steps I have created box file, lstm file and unicharset file. > And next step is to create traineddata using tesstrain.sh followed by the > next step i.e. lstmtraining.exe . > I am getting such errors while performing at step tesstrain.sh. > > On Wednesday, September 1, 2021 at 6:11:27 PM UTC+5:30 P007 wrote: > >> I mean working with font only? >> Or images?? >> >> On Wed, 1 Sep 2021 at 6:09 PM, Samruddhi Dhake <sam22...@gmail.com> >> wrote: >> >>> Yes, I am working for eng language. >>> I am using tessdata.(C:\Program Files\Tesseract-OCR\tessdata) >>> >>> On Wednesday, September 1, 2021 at 5:57:24 PM UTC+5:30 P007 wrote: >>> >>>> Okay, >>>> >>>> Wait you are working for English language right? >>>> What kind of dataset you used here. >>>> >>>> On Wed, 1 Sep 2021 at 5:53 PM, Samruddhi Dhake <sam22...@gmail.com> >>>> wrote: >>>> >>>>> No. Tessstrain.sh didn't work. I am running tesstrain.sh on cygwin. >>>>> Command-> >>>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang eng >>>>> --linedata_only --noextract_font_properties --langdata_dir 'C:/Program >>>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program >>>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata --fontlist >>>>> 'Arial'* >>>>> >>>>> After hitting enter for tesstrain.sh, it is processing text2image and >>>>> giving following error >>>>> === Starting training for language 'eng' >>>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program >>>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize 12 >>>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I >>>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing >>>>> Fontconfig error: Cannot load default config file >>>>> Could not find font named 'Arial'. >>>>> Please correct --font arg. >>>>> ERROR: Program Program failed. Abort. >>>>> >>>>> As per previous suggestions, I ran text2image.exe command on cmd and >>>>> its working and giving me all available fonts. >>>>> >>>>> Then after running tesstrain.sh, why text2image command is failing and >>>>> it is not creating tempfolder under /tmp/ and I am getting fonts.config >>>>> error. >>>>> It is expected that fonts.config file which gets created in >>>>> tempfolder(here in my case font_tmp.0doGBqWc3I) should gets written and it >>>>> should include font 'Arial' and then Arial font can be found. >>>>> Don't why it is not creating.. >>>>> >>>>> Regards, >>>>> Samruddhi >>>>> >>>>> On Wednesday, September 1, 2021 at 5:31:10 PM UTC+5:30 P007 wrote: >>>>> >>>>>> >>>>>> Tesstrain.sh work for you ? >>>>>> >>>>>> On Wed, 1 Sep 2021 at 5:09 PM, Samruddhi Dhake <sam22...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> In this text2image, there is an rgument --fontconfig_tempdir which >>>>>>> creates temp folder where fonts.conf gets added. >>>>>>> >>>>>>> I checked /tmp/, no other tempfolder is created( font_tmp.0doGBqWc3I) >>>>>>> >>>>>>> Has anybody this issue? >>>>>>> >>>>>>> Regards, >>>>>>> Samruddhi >>>>>>> >>>>>>> On Tuesday, August 31, 2021 at 7:24:46 PM UTC+5:30 Samruddhi Dhake >>>>>>> wrote: >>>>>>> >>>>>>>> >"C:\Program Files\Tesseract-OCR\text2image.exe" >>>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp >>>>>>>> --list_available_fonts >>>>>>>> This worked. I got list of available fonts which contains Arial and >>>>>>>> Arial Bold too. >>>>>>>> >>>>>>>> Now this time,in Cygwin Bash, I tried giving --fontlist 'Arial' for >>>>>>>> tesstrain.sh >>>>>>>> Command-> >>>>>>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang >>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir >>>>>>>> 'C:/Program >>>>>>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program >>>>>>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata >>>>>>>> --fontlist >>>>>>>> 'Arial'* >>>>>>>> >>>>>>>> === Starting training for language 'eng' >>>>>>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program >>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize 12 >>>>>>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>>>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I >>>>>>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing >>>>>>>> Fontconfig error: Cannot load default config file >>>>>>>> Could not find font named 'Arial'. >>>>>>>> Please correct --font arg. >>>>>>>> ERROR: Program Program failed. Abort. >>>>>>>> >>>>>>>> Still I am getting this font.conf error. Any idea how to resolve >>>>>>>> this font.conf error? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Samruddhi >>>>>>>> >>>>>>>> On Tuesday, August 31, 2021 at 4:50:14 PM UTC+5:30 zdenop wrote: >>>>>>>> >>>>>>>>> try run this: >>>>>>>>> "C:\Program Files\Tesseract-OCR\text2image.exe" >>>>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp >>>>>>>>> --list_available_fonts >>>>>>>>> >>>>>>>>> Zdenko >>>>>>>>> >>>>>>>>> >>>>>>>>> po 30. 8. 2021 o 16:45 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>> napísal(a): >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am running command -> >>>>>>>>>> >>>>>>>>>> ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang >>>>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir >>>>>>>>>> "C:/Program >>>>>>>>>> Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program >>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata >>>>>>>>>> >>>>>>>>>> And after hitting enter -> (processing) >>>>>>>>>> === *Starting training for language 'eng'* >>>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program >>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ >>>>>>>>>> --ptsize 12 >>>>>>>>>> --font=Arial Bold >>>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS* >>>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing* >>>>>>>>>> *Fontconfig error: Cannot load default config file* >>>>>>>>>> *Could not find font named 'Arial Bold'.* >>>>>>>>>> *Please correct --font arg.* >>>>>>>>>> *ERROR: Program Program failed. Abort.* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I will break it to ask few queries. >>>>>>>>>> >>>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program >>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ >>>>>>>>>> --ptsize 12 >>>>>>>>>> --font=Arial Bold >>>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS* >>>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing* >>>>>>>>>> ----> Here, I am not giving input as Arial Bold. Outputbase , >>>>>>>>>> this should create temp folder 'font_tmp.s9cdSHrzKS' but its not >>>>>>>>>> creating. >>>>>>>>>> And so does fontconfig_tmpdir'. So it is giving writing error >>>>>>>>>> >>>>>>>>>> *Fontconfig error: Cannot load default config file* >>>>>>>>>> ----> To resolve this error, I added >>>>>>>>>> FONTCONFIG_FILE=%WINDIR%\fonts.conf to environment >>>>>>>>>> variables(referring >>>>>>>>>> https://forums.wesnoth.org/viewtopic.php?t=22821) >>>>>>>>>> But still not resolved. >>>>>>>>>> >>>>>>>>>> I was checking-> *text2image.exe ----list_available_fonts* >>>>>>>>>> And after hitting enter, I got -> Fontconfig warning: >>>>>>>>>> "/tmp\fonts.conf", line 4: empty font directory name ignored >>>>>>>>>> >>>>>>>>>> The contents of the fonts.conf file which gets created are-> >>>>>>>>>> <?xml version="1.0"?> >>>>>>>>>> <!DOCTYPE fontconfig SYSTEM "fonts.dtd"> >>>>>>>>>> <fontconfig> >>>>>>>>>> <dir></dir> >>>>>>>>>> <cachedir>/tmp</cachedir> >>>>>>>>>> <config></config> >>>>>>>>>> </fontconfig> >>>>>>>>>> >>>>>>>>>> Can you please help me how can this be resolved? Or Am I giving >>>>>>>>>> correct tesstrain.sh command with its args? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Samruddhi >>>>>>>>>> On Monday, August 30, 2021 at 5:12:21 PM UTC+5:30 zdenop wrote: >>>>>>>>>> >>>>>>>>>>> First of all: use quotes for multi word names, or escape >>>>>>>>>>> space/special symbols (e.g. --font="Arial Bold") >>>>>>>>>>> Next: fix error message: "Unable to open >>>>>>>>>>> '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing" >>>>>>>>>>> Next: check available font for text2image with option >>>>>>>>>>> --list_available_fonts >>>>>>>>>>> etc... >>>>>>>>>>> >>>>>>>>>>> PS: I would suggest using linux for training instead of windows >>>>>>>>>>> (e.g. in WSL[1]) >>>>>>>>>>> [1] https://docs.microsoft.com/en-us/windows/wsl/install-win10 >>>>>>>>>>> >>>>>>>>>>> Zdenko >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> po 30. 8. 2021 o 12:12 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>>>> napísal(a): >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Text2Image error is gone. I am getting *font-config error*. >>>>>>>>>>>> >>>>>>>>>>>> SDE26@DTP-SDE26-IND /cygdrive/c/Program Files/Tesseract-OCR >>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts >>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>>> --langdata_dir >>>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir >>>>>>>>>>>> "C:/Program >>>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata >>>>>>>>>>>> Creating new directory D:Testtrainneddata >>>>>>>>>>>> >>>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>>> [Mon Aug 30 15:34:53 IST 2021] /cygdrive/c/Program >>>>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts >>>>>>>>>>>> --ptsize 12 >>>>>>>>>>>> --font=Arial Bold >>>>>>>>>>>> --outputbase=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt >>>>>>>>>>>> --text=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt >>>>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.hbC9F3LEQX >>>>>>>>>>>> Unable to open '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing >>>>>>>>>>>> Fontconfig error: Cannot load default config file >>>>>>>>>>>> Could not find font named 'Arial Bold'. >>>>>>>>>>>> Please correct --font arg. >>>>>>>>>>>> ERROR: Program Program failed. Abort. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I have Arial Bold font on my machine. Don't know why it cannot >>>>>>>>>>>> find. And in /tmp/ folder there is no font_tmp.hbC9F3LEQX where >>>>>>>>>>>> fonts.conf >>>>>>>>>>>> cannot be opened for writing. >>>>>>>>>>>> How can I resolve this? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Samruddhi >>>>>>>>>>>> >>>>>>>>>>>> On Wednesday, August 25, 2021 at 8:18:47 PM UTC+5:30 zdenop >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Honestly, I have no clue what you are doing: text2image is at >>>>>>>>>>>>> the same location as the tesseract executable. So if you have >>>>>>>>>>>>> tesseract in >>>>>>>>>>>>> the path, text2image must work too. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [image: image.png] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Zdenko >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> st 25. 8. 2021 o 16:26 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>>>>>> napísal(a): >>>>>>>>>>>>> >>>>>>>>>>>>>> As you suggested, I installed Tesseract v5.0.0 on my Windows >>>>>>>>>>>>>> machine (Index of /tesseract (uni-mannheim.de) >>>>>>>>>>>>>> <https://digi.bib.uni-mannheim.de/tesseract/>). This >>>>>>>>>>>>>> included training tools too. >>>>>>>>>>>>>> I performed all the previous steps(boxfile, lstmf >>>>>>>>>>>>>> file,unicharset) >>>>>>>>>>>>>> >>>>>>>>>>>>>> But still after running tesstrain.sh command in Cygwin, I am >>>>>>>>>>>>>> getting following error, >>>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts >>>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>>>>> --langdata_dir >>>>>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir >>>>>>>>>>>>>> "C:/Program >>>>>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir >>>>>>>>>>>>>> D:/Bugs/1206806/folder/trainneddata >>>>>>>>>>>>>> Creating new directory D:/Bugs/1206806/folder/trainneddata >>>>>>>>>>>>>> >>>>>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>>>>> which: no text2image in >>>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program Files/Microsoft >>>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files >>>>>>>>>>>>>> (x86)/NVIDIA >>>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files >>>>>>>>>>>>>> (x86)/Intel/Intel(R) >>>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program >>>>>>>>>>>>>> Files/Intel/Intel(R) >>>>>>>>>>>>>> Management Engine >>>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>> Files/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>> Files/Microsoft SQL Server/100/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program Files >>>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web >>>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows >>>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files (x86)/Windows >>>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program Files >>>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 6.0.20/bin:/cygdrive/c/Program >>>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>> Server/Client >>>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>> Files/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>> Files >>>>>>>>>>>>>> (x86)/Microsoft SQL Server/150/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>> Files >>>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>>>>>>>>>>>>> which: no text2image in (./api) >>>>>>>>>>>>>> which: no text2image in (./training) >>>>>>>>>>>>>> ERROR: 'text2image' not found >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am I missing something? Can you please guild me? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 5:59:49 PM UTC+5:30 Samruddhi >>>>>>>>>>>>>> Dhake wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can you please provide link for steps to install Tesseract >>>>>>>>>>>>>>> and training tools on Windows? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:42:48 PM UTC+5:30 Samruddhi >>>>>>>>>>>>>>> Dhake wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> How to install tesseract and training tools on Windows? >>>>>>>>>>>>>>>> Do I have to install Tesseract Windows exe? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:20:37 PM UTC+5:30 zdenop >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So there are only 2 possibilities: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. Install tesseract and training tools >>>>>>>>>>>>>>>>> 2. Learn how to handle & use not installed sw. This >>>>>>>>>>>>>>>>> option is not related to tesseract. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Zdenko >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ut 24. 8. 2021 o 9:17 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>>>>>>>>>> napísal(a): >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I haven't installed Tesseract. I have kept in a folder >>>>>>>>>>>>>>>>>> and I am running exe by giving its path. I have generated >>>>>>>>>>>>>>>>>> training tools >>>>>>>>>>>>>>>>>> through source code. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To create box file, command->(I gave absoulute path of >>>>>>>>>>>>>>>>>> tesseract.exe) >>>>>>>>>>>>>>>>>> ..\tesseract.exe Dim4.tif Dim4 lstmbox >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To create box file, command-> >>>>>>>>>>>>>>>>>> tesseract.exe Dim4.tif Dim4 lstm.train >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To create unicharset, command-> >>>>>>>>>>>>>>>>>> unicharset_extractor.exe --output_unicharset >>>>>>>>>>>>>>>>>> ..\own.unicharset ..\langdata\eng\eng.training_text >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> And to create trainned data, using tesstrain.sh command, >>>>>>>>>>>>>>>>>> .\src\training\tesstrain.sh --fonts_dir C:\Windows\Fonts >>>>>>>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>>>>>>>>> --langdata_dir >>>>>>>>>>>>>>>>>> langdata --tessdata_dir tessdata --output_dir trainneddata >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 12:24:29 PM UTC+5:30 >>>>>>>>>>>>>>>>>> Samruddhi Dhake wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have generated training tools through source code. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Monday, August 23, 2021 at 7:09:02 PM UTC+5:30 zdenop >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> How did you install tesseract? Did you also install >>>>>>>>>>>>>>>>>>>> training tools? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Zdenko >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> po 23. 8. 2021 o 15:34 Samruddhi Dhake < >>>>>>>>>>>>>>>>>>>> sam22...@gmail.com> napísal(a): >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I am creating my own trainneddata using tesseract >>>>>>>>>>>>>>>>>>>>> v4.1.1 on Windows 10. >>>>>>>>>>>>>>>>>>>>> I am referring documentation >>>>>>>>>>>>>>>>>>>>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have successfully created .box file and .lstmf file >>>>>>>>>>>>>>>>>>>>> using lstmbox and lstm.train respectively. >>>>>>>>>>>>>>>>>>>>> So next step, I installed Cygwin to run tesstrain.sh >>>>>>>>>>>>>>>>>>>>> command to create training data. >>>>>>>>>>>>>>>>>>>>> But I am getting below error. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir >>>>>>>>>>>>>>>>>>>>> C:/Windows/Fonts --lang eng --linedata_only >>>>>>>>>>>>>>>>>>>>> --noextract_font_properties >>>>>>>>>>>>>>>>>>>>> --langdata_dir ./langdata --tessdata_dir ./tessdata >>>>>>>>>>>>>>>>>>>>> --output_dir >>>>>>>>>>>>>>>>>>>>> ./trainneddata >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>>>>>>>>>>>> which: no text2image in >>>>>>>>>>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/NVIDIA >>>>>>>>>>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/Intel/Intel(R) >>>>>>>>>>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files/Intel/Intel(R) >>>>>>>>>>>>>>>>>>>>> Management Engine >>>>>>>>>>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/100/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files/Microsoft/Web Platform >>>>>>>>>>>>>>>>>>>>> Installer:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web >>>>>>>>>>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/Windows >>>>>>>>>>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/Windows >>>>>>>>>>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 >>>>>>>>>>>>>>>>>>>>> 6.0.20/bin:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/Client >>>>>>>>>>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/150/DTS/Binn:/cygdrive/c/Program Files/Microsoft >>>>>>>>>>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/140/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>>>>>>>>>>>>>>>>>>>> which: no text2image in (./api) >>>>>>>>>>>>>>>>>>>>> which: no text2image in (./training) >>>>>>>>>>>>>>>>>>>>> ERROR: 'text2image' not found >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I found text2image comes after running command 'make >>>>>>>>>>>>>>>>>>>>> training'. >>>>>>>>>>>>>>>>>>>>> Can you please help me how this can be done in WIndows >>>>>>>>>>>>>>>>>>>>> 10? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed >>>>>>>>>>>>>>>>>>>>> to the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving >>>>>>>>>>>>>>>>>>>>> emails from it, send an email to >>>>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com >>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> You received this message because you are subscribed to >>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>>>>> from it, send an email to >>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com >>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>> >>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com >>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>> . >>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>>> >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com >>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>> >>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com >>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>> >>>>>>>>> >>>>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>> >>>>>> >>>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>> >>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/89197941-16d3-4747-b280-95ddb9979b40n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/89197941-16d3-4747-b280-95ddb9979b40n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFoW%2BHJuAAA3k_zfHvDQnsdEQriq-O%3DmkmMa0spxH0%2BN029eMw%40mail.gmail.com.