First of all: use quotes for multi word names, or escape space/special symbols (e.g. --font="Arial Bold") Next: fix error message: "Unable to open '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing" Next: check available font for text2image with option --list_available_fonts etc...
PS: I would suggest using linux for training instead of windows (e.g. in WSL[1]) [1] https://docs.microsoft.com/en-us/windows/wsl/install-win10 Zdenko po 30. 8. 2021 o 12:12 Samruddhi Dhake <sam22dh...@gmail.com> napísal(a): > Hi, > > Text2Image error is gone. I am getting *font-config error*. > > SDE26@DTP-SDE26-IND /cygdrive/c/Program Files/Tesseract-OCR > $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang eng > --linedata_only --noextract_font_properties --langdata_dir "C:/Program > Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program > Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata > Creating new directory D:Testtrainneddata > > === Starting training for language 'eng' > [Mon Aug 30 15:34:53 IST 2021] /cygdrive/c/Program > Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts --ptsize 12 > --font=Arial Bold --outputbase=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt > --text=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt > --fontconfig_tmpdir=/tmp/font_tmp.hbC9F3LEQX > Unable to open '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing > Fontconfig error: Cannot load default config file > Could not find font named 'Arial Bold'. > Please correct --font arg. > ERROR: Program Program failed. Abort. > > > I have Arial Bold font on my machine. Don't know why it cannot find. And > in /tmp/ folder there is no font_tmp.hbC9F3LEQX where fonts.conf cannot be > opened for writing. > How can I resolve this? > > Regards, > Samruddhi > > On Wednesday, August 25, 2021 at 8:18:47 PM UTC+5:30 zdenop wrote: > >> Honestly, I have no clue what you are doing: text2image is at the same >> location as the tesseract executable. So if you have tesseract in the path, >> text2image must work too. >> >> >> [image: image.png] >> >> >> Zdenko >> >> >> st 25. 8. 2021 o 16:26 Samruddhi Dhake <sam22...@gmail.com> napísal(a): >> >>> As you suggested, I installed Tesseract v5.0.0 on my Windows machine (Index >>> of /tesseract (uni-mannheim.de) >>> <https://digi.bib.uni-mannheim.de/tesseract/>). This included training >>> tools too. >>> I performed all the previous steps(boxfile, lstmf file,unicharset) >>> >>> But still after running tesstrain.sh command in Cygwin, I am getting >>> following error, >>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang eng >>> --linedata_only --noextract_font_properties --langdata_dir "C:/Program >>> Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program >>> Files/Tesseract-OCR/tessdata" --output_dir >>> D:/Bugs/1206806/folder/trainneddata >>> Creating new directory D:/Bugs/1206806/folder/trainneddata >>> >>> === Starting training for language 'eng' >>> which: no text2image in (/usr/local/bin:/usr/bin:/cygdrive/c/Program >>> Files/Microsoft MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files >>> (x86)/NVIDIA Corporation/PhysX/Common:/cygdrive/c/Program Files >>> (x86)/Intel/Intel(R) Management Engine Components/iCLS:/cygdrive/c/Program >>> Files/Intel/Intel(R) Management Engine >>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>> Files (x86)/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>> Files/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>> Files/Microsoft SQL Server/100/DTS/Binn:/cygdrive/c/Program >>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program Files >>> (x86)/Microsoft ASP.NET/ASP.NET Web Pages/v1.0:/cygdrive/c/Program >>> Files/Microsoft SQL >>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows >>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program Files/Microsoft >>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files (x86)/Windows >>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program Files >>> (x86)/Oracle/Berkeley DB 12cR1 6.0.20/bin:/cygdrive/c/Program >>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL Server/Client >>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>> Files (x86)/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program >>> Files/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program Files >>> (x86)/Microsoft SQL Server/150/DTS/Binn:/cygdrive/c/Program Files/Microsoft >>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files (x86)/Microsoft SQL >>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files >>> (x86)/Microsoft SQL Server/140/Tools/Binn:/cygdrive/c/Program Files >>> (x86)/Microsoft SQL Server/140/DTS/Binn:/cygdrive/c/Program Files >>> (x86)/Microsoft SQL >>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>> which: no text2image in (./api) >>> which: no text2image in (./training) >>> ERROR: 'text2image' not found >>> >>> Am I missing something? Can you please guild me? >>> >>> Regards, >>> Samruddhi >>> On Tuesday, August 24, 2021 at 5:59:49 PM UTC+5:30 Samruddhi Dhake wrote: >>> >>>> >>>> Can you please provide link for steps to install Tesseract and training >>>> tools on Windows? >>>> >>>> Samruddhi >>>> On Tuesday, August 24, 2021 at 3:42:48 PM UTC+5:30 Samruddhi Dhake >>>> wrote: >>>> >>>>> How to install tesseract and training tools on Windows? >>>>> Do I have to install Tesseract Windows exe? >>>>> >>>>> Samruddhi >>>>> >>>>> On Tuesday, August 24, 2021 at 3:20:37 PM UTC+5:30 zdenop wrote: >>>>> >>>>>> So there are only 2 possibilities: >>>>>> >>>>>> 1. Install tesseract and training tools >>>>>> 2. Learn how to handle & use not installed sw. This option is not >>>>>> related to tesseract. >>>>>> >>>>>> >>>>>> Zdenko >>>>>> >>>>>> >>>>>> ut 24. 8. 2021 o 9:17 Samruddhi Dhake <sam22...@gmail.com> >>>>>> napísal(a): >>>>>> >>>>>>> I haven't installed Tesseract. I have kept in a folder and I am >>>>>>> running exe by giving its path. I have generated training tools through >>>>>>> source code. >>>>>>> >>>>>>> To create box file, command->(I gave absoulute path of tesseract.exe) >>>>>>> ..\tesseract.exe Dim4.tif Dim4 lstmbox >>>>>>> >>>>>>> To create box file, command-> >>>>>>> tesseract.exe Dim4.tif Dim4 lstm.train >>>>>>> >>>>>>> To create unicharset, command-> >>>>>>> unicharset_extractor.exe --output_unicharset ..\own.unicharset >>>>>>> ..\langdata\eng\eng.training_text >>>>>>> >>>>>>> And to create trainned data, using tesstrain.sh command, >>>>>>> .\src\training\tesstrain.sh --fonts_dir C:\Windows\Fonts --lang eng >>>>>>> --linedata_only --noextract_font_properties --langdata_dir langdata >>>>>>> --tessdata_dir tessdata --output_dir trainneddata >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Samruddhi >>>>>>> On Tuesday, August 24, 2021 at 12:24:29 PM UTC+5:30 Samruddhi Dhake >>>>>>> wrote: >>>>>>> >>>>>>>> I have generated training tools through source code. >>>>>>>> >>>>>>>> On Monday, August 23, 2021 at 7:09:02 PM UTC+5:30 zdenop wrote: >>>>>>>> >>>>>>>>> How did you install tesseract? Did you also install training tools? >>>>>>>>> >>>>>>>>> Zdenko >>>>>>>>> >>>>>>>>> >>>>>>>>> po 23. 8. 2021 o 15:34 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>> napísal(a): >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I am creating my own trainneddata using tesseract v4.1.1 on >>>>>>>>>> Windows 10. >>>>>>>>>> I am referring documentation >>>>>>>>>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html >>>>>>>>>> >>>>>>>>>> I have successfully created .box file and .lstmf file >>>>>>>>>> using lstmbox and lstm.train respectively. >>>>>>>>>> So next step, I installed Cygwin to run tesstrain.sh command to >>>>>>>>>> create training data. >>>>>>>>>> But I am getting below error. >>>>>>>>>> >>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang >>>>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir >>>>>>>>>> ./langdata >>>>>>>>>> --tessdata_dir ./tessdata --output_dir ./trainneddata >>>>>>>>>> >>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>> which: no text2image in >>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program Files/Microsoft >>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files (x86)/NVIDIA >>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files >>>>>>>>>> (x86)/Intel/Intel(R) >>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program >>>>>>>>>> Files/Intel/Intel(R) >>>>>>>>>> Management Engine >>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>> Files (x86)/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>> Files/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>> Files/Microsoft SQL Server/100/DTS/Binn:/cygdrive/c/Program >>>>>>>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program Files >>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web >>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows >>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>> Files/Microsoft >>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files (x86)/Windows >>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program Files >>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 6.0.20/bin:/cygdrive/c/Program >>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL Server/Client >>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>> Files (x86)/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>> Files/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>> (x86)/Microsoft SQL Server/150/DTS/Binn:/cygdrive/c/Program >>>>>>>>>> Files/Microsoft >>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files (x86)/Microsoft SQL >>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>> (x86)/Microsoft SQL Server/140/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>> (x86)/Microsoft SQL Server/140/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>>>>>>>>> which: no text2image in (./api) >>>>>>>>>> which: no text2image in (./training) >>>>>>>>>> ERROR: 'text2image' not found >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I found text2image comes after running command 'make training'. >>>>>>>>>> Can you please help me how this can be done in WIndows 10? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Samruddhi >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com >>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y2SGCmjpnk_w8M6HM95mBDLUOiaNgOqKFyZCVRLQOQug%40mail.gmail.com.