I checked this link. It has no tesstrain reference. Tesstrain internally calls text2image.exe. So if I ran text2image.exe, how will I get trainned data? What are the further steps to get trainneddata?
On Wednesday, September 1, 2021 at 6:25:13 PM UTC+5:30 P007 wrote: > > Check this > https://github.com/tesseract-ocr/tesseract/issues/1685 > > On Wed, 1 Sep 2021 at 6:22 PM, Samruddhi Dhake <sam22...@gmail.com> wrote: > >> For images. >> I have to create my own trainneddata for my images. So for that I am >> following steps mentioned in this documentation >> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html >> As per the steps I have created box file, lstm file and unicharset file. >> And next step is to create traineddata using tesstrain.sh followed by the >> next step i.e. lstmtraining.exe . >> I am getting such errors while performing at step tesstrain.sh. >> >> On Wednesday, September 1, 2021 at 6:11:27 PM UTC+5:30 P007 wrote: >> >>> I mean working with font only? >>> Or images?? >>> >>> On Wed, 1 Sep 2021 at 6:09 PM, Samruddhi Dhake <sam22...@gmail.com> >>> wrote: >>> >>>> Yes, I am working for eng language. >>>> I am using tessdata.(C:\Program Files\Tesseract-OCR\tessdata) >>>> >>>> On Wednesday, September 1, 2021 at 5:57:24 PM UTC+5:30 P007 wrote: >>>> >>>>> Okay, >>>>> >>>>> Wait you are working for English language right? >>>>> What kind of dataset you used here. >>>>> >>>>> On Wed, 1 Sep 2021 at 5:53 PM, Samruddhi Dhake <sam22...@gmail.com> >>>>> wrote: >>>>> >>>>>> No. Tessstrain.sh didn't work. I am running tesstrain.sh on cygwin. >>>>>> Command-> >>>>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang eng >>>>>> --linedata_only --noextract_font_properties --langdata_dir 'C:/Program >>>>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program >>>>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata >>>>>> --fontlist >>>>>> 'Arial'* >>>>>> >>>>>> After hitting enter for tesstrain.sh, it is processing text2image and >>>>>> giving following error >>>>>> === Starting training for language 'eng' >>>>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program >>>>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize 12 >>>>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I >>>>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing >>>>>> Fontconfig error: Cannot load default config file >>>>>> Could not find font named 'Arial'. >>>>>> Please correct --font arg. >>>>>> ERROR: Program Program failed. Abort. >>>>>> >>>>>> As per previous suggestions, I ran text2image.exe command on cmd and >>>>>> its working and giving me all available fonts. >>>>>> >>>>>> Then after running tesstrain.sh, why text2image command is failing >>>>>> and it is not creating tempfolder under /tmp/ and I am getting >>>>>> fonts.config >>>>>> error. >>>>>> It is expected that fonts.config file which gets created in >>>>>> tempfolder(here in my case font_tmp.0doGBqWc3I) should gets written and >>>>>> it >>>>>> should include font 'Arial' and then Arial font can be found. >>>>>> Don't why it is not creating.. >>>>>> >>>>>> Regards, >>>>>> Samruddhi >>>>>> >>>>>> On Wednesday, September 1, 2021 at 5:31:10 PM UTC+5:30 P007 wrote: >>>>>> >>>>>>> >>>>>>> Tesstrain.sh work for you ? >>>>>>> >>>>>>> On Wed, 1 Sep 2021 at 5:09 PM, Samruddhi Dhake <sam22...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> In this text2image, there is an rgument --fontconfig_tempdir which >>>>>>>> creates temp folder where fonts.conf gets added. >>>>>>>> >>>>>>>> I checked /tmp/, no other tempfolder is created( >>>>>>>> font_tmp.0doGBqWc3I) >>>>>>>> >>>>>>>> Has anybody this issue? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Samruddhi >>>>>>>> >>>>>>>> On Tuesday, August 31, 2021 at 7:24:46 PM UTC+5:30 Samruddhi Dhake >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >"C:\Program Files\Tesseract-OCR\text2image.exe" >>>>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp >>>>>>>>> --list_available_fonts >>>>>>>>> This worked. I got list of available fonts which contains Arial >>>>>>>>> and Arial Bold too. >>>>>>>>> >>>>>>>>> Now this time,in Cygwin Bash, I tried giving --fontlist 'Arial' >>>>>>>>> for tesstrain.sh >>>>>>>>> Command-> >>>>>>>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang >>>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir >>>>>>>>> 'C:/Program >>>>>>>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program >>>>>>>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata >>>>>>>>> --fontlist >>>>>>>>> 'Arial'* >>>>>>>>> >>>>>>>>> === Starting training for language 'eng' >>>>>>>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program >>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize >>>>>>>>> 12 >>>>>>>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>>>>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I >>>>>>>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing >>>>>>>>> Fontconfig error: Cannot load default config file >>>>>>>>> Could not find font named 'Arial'. >>>>>>>>> Please correct --font arg. >>>>>>>>> ERROR: Program Program failed. Abort. >>>>>>>>> >>>>>>>>> Still I am getting this font.conf error. Any idea how to resolve >>>>>>>>> this font.conf error? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Samruddhi >>>>>>>>> >>>>>>>>> On Tuesday, August 31, 2021 at 4:50:14 PM UTC+5:30 zdenop wrote: >>>>>>>>> >>>>>>>>>> try run this: >>>>>>>>>> "C:\Program Files\Tesseract-OCR\text2image.exe" >>>>>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp >>>>>>>>>> --list_available_fonts >>>>>>>>>> >>>>>>>>>> Zdenko >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> po 30. 8. 2021 o 16:45 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>>> napísal(a): >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am running command -> >>>>>>>>>>> >>>>>>>>>>> ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang >>>>>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir >>>>>>>>>>> "C:/Program >>>>>>>>>>> Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program >>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata >>>>>>>>>>> >>>>>>>>>>> And after hitting enter -> (processing) >>>>>>>>>>> === *Starting training for language 'eng'* >>>>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program >>>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ >>>>>>>>>>> --ptsize 12 >>>>>>>>>>> --font=Arial Bold >>>>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS* >>>>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for >>>>>>>>>>> writing* >>>>>>>>>>> *Fontconfig error: Cannot load default config file* >>>>>>>>>>> *Could not find font named 'Arial Bold'.* >>>>>>>>>>> *Please correct --font arg.* >>>>>>>>>>> *ERROR: Program Program failed. Abort.* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I will break it to ask few queries. >>>>>>>>>>> >>>>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program >>>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ >>>>>>>>>>> --ptsize 12 >>>>>>>>>>> --font=Arial Bold >>>>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS* >>>>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for >>>>>>>>>>> writing* >>>>>>>>>>> ----> Here, I am not giving input as Arial Bold. Outputbase , >>>>>>>>>>> this should create temp folder 'font_tmp.s9cdSHrzKS' but its not >>>>>>>>>>> creating. >>>>>>>>>>> And so does fontconfig_tmpdir'. So it is giving writing error >>>>>>>>>>> >>>>>>>>>>> *Fontconfig error: Cannot load default config file* >>>>>>>>>>> ----> To resolve this error, I added >>>>>>>>>>> FONTCONFIG_FILE=%WINDIR%\fonts.conf to environment >>>>>>>>>>> variables(referring >>>>>>>>>>> https://forums.wesnoth.org/viewtopic.php?t=22821) >>>>>>>>>>> But still not resolved. >>>>>>>>>>> >>>>>>>>>>> I was checking-> *text2image.exe ----list_available_fonts* >>>>>>>>>>> And after hitting enter, I got -> Fontconfig warning: >>>>>>>>>>> "/tmp\fonts.conf", line 4: empty font directory name ignored >>>>>>>>>>> >>>>>>>>>>> The contents of the fonts.conf file which gets created are-> >>>>>>>>>>> <?xml version="1.0"?> >>>>>>>>>>> <!DOCTYPE fontconfig SYSTEM "fonts.dtd"> >>>>>>>>>>> <fontconfig> >>>>>>>>>>> <dir></dir> >>>>>>>>>>> <cachedir>/tmp</cachedir> >>>>>>>>>>> <config></config> >>>>>>>>>>> </fontconfig> >>>>>>>>>>> >>>>>>>>>>> Can you please help me how can this be resolved? Or Am I giving >>>>>>>>>>> correct tesstrain.sh command with its args? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Samruddhi >>>>>>>>>>> On Monday, August 30, 2021 at 5:12:21 PM UTC+5:30 zdenop wrote: >>>>>>>>>>> >>>>>>>>>>>> First of all: use quotes for multi word names, or escape >>>>>>>>>>>> space/special symbols (e.g. --font="Arial Bold") >>>>>>>>>>>> Next: fix error message: "Unable to open >>>>>>>>>>>> '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing" >>>>>>>>>>>> Next: check available font for text2image with option >>>>>>>>>>>> --list_available_fonts >>>>>>>>>>>> etc... >>>>>>>>>>>> >>>>>>>>>>>> PS: I would suggest using linux for training instead of windows >>>>>>>>>>>> (e.g. in WSL[1]) >>>>>>>>>>>> [1] https://docs.microsoft.com/en-us/windows/wsl/install-win10 >>>>>>>>>>>> >>>>>>>>>>>> Zdenko >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> po 30. 8. 2021 o 12:12 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>>>>> napísal(a): >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Text2Image error is gone. I am getting *font-config error*. >>>>>>>>>>>>> >>>>>>>>>>>>> SDE26@DTP-SDE26-IND /cygdrive/c/Program Files/Tesseract-OCR >>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts >>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>>>> --langdata_dir >>>>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir >>>>>>>>>>>>> "C:/Program >>>>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata >>>>>>>>>>>>> Creating new directory D:Testtrainneddata >>>>>>>>>>>>> >>>>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>>>> [Mon Aug 30 15:34:53 IST 2021] /cygdrive/c/Program >>>>>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts >>>>>>>>>>>>> --ptsize 12 >>>>>>>>>>>>> --font=Arial Bold >>>>>>>>>>>>> --outputbase=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt >>>>>>>>>>>>> --text=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt >>>>>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.hbC9F3LEQX >>>>>>>>>>>>> Unable to open '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for >>>>>>>>>>>>> writing >>>>>>>>>>>>> Fontconfig error: Cannot load default config file >>>>>>>>>>>>> Could not find font named 'Arial Bold'. >>>>>>>>>>>>> Please correct --font arg. >>>>>>>>>>>>> ERROR: Program Program failed. Abort. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I have Arial Bold font on my machine. Don't know why it cannot >>>>>>>>>>>>> find. And in /tmp/ folder there is no font_tmp.hbC9F3LEQX where >>>>>>>>>>>>> fonts.conf >>>>>>>>>>>>> cannot be opened for writing. >>>>>>>>>>>>> How can I resolve this? >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>> >>>>>>>>>>>>> On Wednesday, August 25, 2021 at 8:18:47 PM UTC+5:30 zdenop >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Honestly, I have no clue what you are doing: text2image is at >>>>>>>>>>>>>> the same location as the tesseract executable. So if you have >>>>>>>>>>>>>> tesseract in >>>>>>>>>>>>>> the path, text2image must work too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [image: image.png] >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Zdenko >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> st 25. 8. 2021 o 16:26 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>>>>>>> napísal(a): >>>>>>>>>>>>>> >>>>>>>>>>>>>>> As you suggested, I installed Tesseract v5.0.0 on my Windows >>>>>>>>>>>>>>> machine (Index of /tesseract (uni-mannheim.de) >>>>>>>>>>>>>>> <https://digi.bib.uni-mannheim.de/tesseract/>). This >>>>>>>>>>>>>>> included training tools too. >>>>>>>>>>>>>>> I performed all the previous steps(boxfile, lstmf >>>>>>>>>>>>>>> file,unicharset) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But still after running tesstrain.sh command in Cygwin, I am >>>>>>>>>>>>>>> getting following error, >>>>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts >>>>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>>>>>> --langdata_dir >>>>>>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir >>>>>>>>>>>>>>> "C:/Program >>>>>>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir >>>>>>>>>>>>>>> D:/Bugs/1206806/folder/trainneddata >>>>>>>>>>>>>>> Creating new directory D:/Bugs/1206806/folder/trainneddata >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>>>>>> which: no text2image in >>>>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program Files/Microsoft >>>>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files >>>>>>>>>>>>>>> (x86)/NVIDIA >>>>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files >>>>>>>>>>>>>>> (x86)/Intel/Intel(R) >>>>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program >>>>>>>>>>>>>>> Files/Intel/Intel(R) >>>>>>>>>>>>>>> Management Engine >>>>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>> Files/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>> Files/Microsoft SQL Server/100/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program >>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web >>>>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files >>>>>>>>>>>>>>> (x86)/Windows >>>>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>> (x86)/Windows >>>>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program Files >>>>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 6.0.20/bin:/cygdrive/c/Program >>>>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>>> Server/Client >>>>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>> Files/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>> (x86)/Microsoft SQL Server/150/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>>>>>>>>>>>>>> which: no text2image in (./api) >>>>>>>>>>>>>>> which: no text2image in (./training) >>>>>>>>>>>>>>> ERROR: 'text2image' not found >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am I missing something? Can you please guild me? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 5:59:49 PM UTC+5:30 Samruddhi >>>>>>>>>>>>>>> Dhake wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can you please provide link for steps to install Tesseract >>>>>>>>>>>>>>>> and training tools on Windows? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:42:48 PM UTC+5:30 >>>>>>>>>>>>>>>> Samruddhi Dhake wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> How to install tesseract and training tools on Windows? >>>>>>>>>>>>>>>>> Do I have to install Tesseract Windows exe? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:20:37 PM UTC+5:30 zdenop >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So there are only 2 possibilities: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. Install tesseract and training tools >>>>>>>>>>>>>>>>>> 2. Learn how to handle & use not installed sw. This >>>>>>>>>>>>>>>>>> option is not related to tesseract. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Zdenko >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ut 24. 8. 2021 o 9:17 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>>>>>>>>>>> napísal(a): >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I haven't installed Tesseract. I have kept in a folder >>>>>>>>>>>>>>>>>>> and I am running exe by giving its path. I have generated >>>>>>>>>>>>>>>>>>> training tools >>>>>>>>>>>>>>>>>>> through source code. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> To create box file, command->(I gave absoulute path of >>>>>>>>>>>>>>>>>>> tesseract.exe) >>>>>>>>>>>>>>>>>>> ..\tesseract.exe Dim4.tif Dim4 lstmbox >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> To create box file, command-> >>>>>>>>>>>>>>>>>>> tesseract.exe Dim4.tif Dim4 lstm.train >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> To create unicharset, command-> >>>>>>>>>>>>>>>>>>> unicharset_extractor.exe --output_unicharset >>>>>>>>>>>>>>>>>>> ..\own.unicharset ..\langdata\eng\eng.training_text >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> And to create trainned data, using tesstrain.sh command, >>>>>>>>>>>>>>>>>>> .\src\training\tesstrain.sh --fonts_dir C:\Windows\Fonts >>>>>>>>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>>>>>>>>>> --langdata_dir >>>>>>>>>>>>>>>>>>> langdata --tessdata_dir tessdata --output_dir trainneddata >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 12:24:29 PM UTC+5:30 >>>>>>>>>>>>>>>>>>> Samruddhi Dhake wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have generated training tools through source code. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Monday, August 23, 2021 at 7:09:02 PM UTC+5:30 >>>>>>>>>>>>>>>>>>>> zdenop wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> How did you install tesseract? Did you also install >>>>>>>>>>>>>>>>>>>>> training tools? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Zdenko >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> po 23. 8. 2021 o 15:34 Samruddhi Dhake < >>>>>>>>>>>>>>>>>>>>> sam22...@gmail.com> napísal(a): >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I am creating my own trainneddata using tesseract >>>>>>>>>>>>>>>>>>>>>> v4.1.1 on Windows 10. >>>>>>>>>>>>>>>>>>>>>> I am referring documentation >>>>>>>>>>>>>>>>>>>>>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I have successfully created .box file and .lstmf file >>>>>>>>>>>>>>>>>>>>>> using lstmbox and lstm.train respectively. >>>>>>>>>>>>>>>>>>>>>> So next step, I installed Cygwin to run tesstrain.sh >>>>>>>>>>>>>>>>>>>>>> command to create training data. >>>>>>>>>>>>>>>>>>>>>> But I am getting below error. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir >>>>>>>>>>>>>>>>>>>>>> C:/Windows/Fonts --lang eng --linedata_only >>>>>>>>>>>>>>>>>>>>>> --noextract_font_properties >>>>>>>>>>>>>>>>>>>>>> --langdata_dir ./langdata --tessdata_dir ./tessdata >>>>>>>>>>>>>>>>>>>>>> --output_dir >>>>>>>>>>>>>>>>>>>>>> ./trainneddata >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>>>>>>>>>>>>> which: no text2image in >>>>>>>>>>>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/NVIDIA >>>>>>>>>>>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Intel/Intel(R) >>>>>>>>>>>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> Files/Intel/Intel(R) >>>>>>>>>>>>>>>>>>>>>> Management Engine >>>>>>>>>>>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/100/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> Files/Microsoft/Web Platform >>>>>>>>>>>>>>>>>>>>>> Installer:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web >>>>>>>>>>>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Windows >>>>>>>>>>>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Windows >>>>>>>>>>>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 >>>>>>>>>>>>>>>>>>>>>> 6.0.20/bin:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/Client >>>>>>>>>>>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/150/DTS/Binn:/cygdrive/c/Program Files/Microsoft >>>>>>>>>>>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/Client >>>>>>>>>>>>>>>>>>>>>> SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/140/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>>>>>>>>>>>>>>>>>>>>> which: no text2image in (./api) >>>>>>>>>>>>>>>>>>>>>> which: no text2image in (./training) >>>>>>>>>>>>>>>>>>>>>> ERROR: 'text2image' not found >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I found text2image comes after running command 'make >>>>>>>>>>>>>>>>>>>>>> training'. >>>>>>>>>>>>>>>>>>>>>> Can you please help me how this can be done in >>>>>>>>>>>>>>>>>>>>>> WIndows 10? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed >>>>>>>>>>>>>>>>>>>>>> to the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving >>>>>>>>>>>>>>>>>>>>>> emails from it, send an email to >>>>>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> You received this message because you are subscribed to >>>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>>>>>> from it, send an email to >>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>> from it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>> . >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>>>> >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com >>>>>>>>>>>>> >>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>> >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com >>>>>>>>>>> >>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>> >>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>> >>>>>>> >>>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>> >>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>> >>>>> >>>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/89197941-16d3-4747-b280-95ddb9979b40n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/89197941-16d3-4747-b280-95ddb9979b40n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c8140d82-1cf1-410b-af44-747d11ffef1fn%40googlegroups.com.