try run this: "C:\Program Files\Tesseract-OCR\text2image.exe" --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp --list_available_fonts
Zdenko po 30. 8. 2021 o 16:45 Samruddhi Dhake <sam22dh...@gmail.com> napísal(a): > Hi, > > I am running command -> > > ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang eng > --linedata_only --noextract_font_properties --langdata_dir "C:/Program > Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program > Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata > > And after hitting enter -> (processing) > === *Starting training for language 'eng'* > *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program > Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ --ptsize 12 > --font=Arial Bold --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt > --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt > --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS* > *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing* > *Fontconfig error: Cannot load default config file* > *Could not find font named 'Arial Bold'.* > *Please correct --font arg.* > *ERROR: Program Program failed. Abort.* > > > I will break it to ask few queries. > > *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program > Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ --ptsize 12 > --font=Arial Bold --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt > --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt > --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS* > *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing* > ----> Here, I am not giving input as Arial Bold. Outputbase , this should > create temp folder 'font_tmp.s9cdSHrzKS' but its not creating. > And so does fontconfig_tmpdir'. So it is giving writing error > > *Fontconfig error: Cannot load default config file* > ----> To resolve this error, I added FONTCONFIG_FILE=%WINDIR%\fonts.conf > to environment variables(referring > https://forums.wesnoth.org/viewtopic.php?t=22821) > But still not resolved. > > I was checking-> *text2image.exe ----list_available_fonts* > And after hitting enter, I got -> Fontconfig warning: "/tmp\fonts.conf", > line 4: empty font directory name ignored > > The contents of the fonts.conf file which gets created are-> > <?xml version="1.0"?> > <!DOCTYPE fontconfig SYSTEM "fonts.dtd"> > <fontconfig> > <dir></dir> > <cachedir>/tmp</cachedir> > <config></config> > </fontconfig> > > Can you please help me how can this be resolved? Or Am I giving correct > tesstrain.sh command with its args? > > Regards, > Samruddhi > On Monday, August 30, 2021 at 5:12:21 PM UTC+5:30 zdenop wrote: > >> First of all: use quotes for multi word names, or escape space/special >> symbols (e.g. --font="Arial Bold") >> Next: fix error message: "Unable to open >> '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing" >> Next: check available font for text2image with option >> --list_available_fonts >> etc... >> >> PS: I would suggest using linux for training instead of windows (e.g. in >> WSL[1]) >> [1] https://docs.microsoft.com/en-us/windows/wsl/install-win10 >> >> Zdenko >> >> >> >> po 30. 8. 2021 o 12:12 Samruddhi Dhake <sam22...@gmail.com> napísal(a): >> >>> Hi, >>> >>> Text2Image error is gone. I am getting *font-config error*. >>> >>> SDE26@DTP-SDE26-IND /cygdrive/c/Program Files/Tesseract-OCR >>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang eng >>> --linedata_only --noextract_font_properties --langdata_dir "C:/Program >>> Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program >>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata >>> Creating new directory D:Testtrainneddata >>> >>> === Starting training for language 'eng' >>> [Mon Aug 30 15:34:53 IST 2021] /cygdrive/c/Program >>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts --ptsize 12 >>> --font=Arial Bold --outputbase=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt >>> --text=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt >>> --fontconfig_tmpdir=/tmp/font_tmp.hbC9F3LEQX >>> Unable to open '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing >>> Fontconfig error: Cannot load default config file >>> Could not find font named 'Arial Bold'. >>> Please correct --font arg. >>> ERROR: Program Program failed. Abort. >>> >>> >>> I have Arial Bold font on my machine. Don't know why it cannot find. And >>> in /tmp/ folder there is no font_tmp.hbC9F3LEQX where fonts.conf cannot be >>> opened for writing. >>> How can I resolve this? >>> >>> Regards, >>> Samruddhi >>> >>> On Wednesday, August 25, 2021 at 8:18:47 PM UTC+5:30 zdenop wrote: >>> >>>> Honestly, I have no clue what you are doing: text2image is at the same >>>> location as the tesseract executable. So if you have tesseract in the path, >>>> text2image must work too. >>>> >>>> >>>> [image: image.png] >>>> >>>> >>>> Zdenko >>>> >>>> >>>> st 25. 8. 2021 o 16:26 Samruddhi Dhake <sam22...@gmail.com> napísal(a): >>>> >>>>> As you suggested, I installed Tesseract v5.0.0 on my Windows machine >>>>> (Index >>>>> of /tesseract (uni-mannheim.de) >>>>> <https://digi.bib.uni-mannheim.de/tesseract/>). This included >>>>> training tools too. >>>>> I performed all the previous steps(boxfile, lstmf file,unicharset) >>>>> >>>>> But still after running tesstrain.sh command in Cygwin, I am getting >>>>> following error, >>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang eng >>>>> --linedata_only --noextract_font_properties --langdata_dir "C:/Program >>>>> Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program >>>>> Files/Tesseract-OCR/tessdata" --output_dir >>>>> D:/Bugs/1206806/folder/trainneddata >>>>> Creating new directory D:/Bugs/1206806/folder/trainneddata >>>>> >>>>> === Starting training for language 'eng' >>>>> which: no text2image in (/usr/local/bin:/usr/bin:/cygdrive/c/Program >>>>> Files/Microsoft MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files >>>>> (x86)/NVIDIA Corporation/PhysX/Common:/cygdrive/c/Program Files >>>>> (x86)/Intel/Intel(R) Management Engine Components/iCLS:/cygdrive/c/Program >>>>> Files/Intel/Intel(R) Management Engine >>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>>>> Files (x86)/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>>>> Files/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>>>> Files/Microsoft SQL Server/100/DTS/Binn:/cygdrive/c/Program >>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program Files >>>>> (x86)/Microsoft ASP.NET/ASP.NET Web Pages/v1.0:/cygdrive/c/Program >>>>> Files/Microsoft SQL >>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows >>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program Files/Microsoft >>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files (x86)/Windows >>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program Files >>>>> (x86)/Oracle/Berkeley DB 12cR1 6.0.20/bin:/cygdrive/c/Program >>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL Server/Client >>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>>>> Files (x86)/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program >>>>> Files/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program Files >>>>> (x86)/Microsoft SQL Server/150/DTS/Binn:/cygdrive/c/Program >>>>> Files/Microsoft >>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files (x86)/Microsoft SQL >>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files >>>>> (x86)/Microsoft SQL Server/140/Tools/Binn:/cygdrive/c/Program Files >>>>> (x86)/Microsoft SQL Server/140/DTS/Binn:/cygdrive/c/Program Files >>>>> (x86)/Microsoft SQL >>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>>>> which: no text2image in (./api) >>>>> which: no text2image in (./training) >>>>> ERROR: 'text2image' not found >>>>> >>>>> Am I missing something? Can you please guild me? >>>>> >>>>> Regards, >>>>> Samruddhi >>>>> On Tuesday, August 24, 2021 at 5:59:49 PM UTC+5:30 Samruddhi Dhake >>>>> wrote: >>>>> >>>>>> >>>>>> Can you please provide link for steps to install Tesseract and >>>>>> training tools on Windows? >>>>>> >>>>>> Samruddhi >>>>>> On Tuesday, August 24, 2021 at 3:42:48 PM UTC+5:30 Samruddhi Dhake >>>>>> wrote: >>>>>> >>>>>>> How to install tesseract and training tools on Windows? >>>>>>> Do I have to install Tesseract Windows exe? >>>>>>> >>>>>>> Samruddhi >>>>>>> >>>>>>> On Tuesday, August 24, 2021 at 3:20:37 PM UTC+5:30 zdenop wrote: >>>>>>> >>>>>>>> So there are only 2 possibilities: >>>>>>>> >>>>>>>> 1. Install tesseract and training tools >>>>>>>> 2. Learn how to handle & use not installed sw. This option is >>>>>>>> not related to tesseract. >>>>>>>> >>>>>>>> >>>>>>>> Zdenko >>>>>>>> >>>>>>>> >>>>>>>> ut 24. 8. 2021 o 9:17 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>> napísal(a): >>>>>>>> >>>>>>>>> I haven't installed Tesseract. I have kept in a folder and I am >>>>>>>>> running exe by giving its path. I have generated training tools >>>>>>>>> through >>>>>>>>> source code. >>>>>>>>> >>>>>>>>> To create box file, command->(I gave absoulute path of >>>>>>>>> tesseract.exe) >>>>>>>>> ..\tesseract.exe Dim4.tif Dim4 lstmbox >>>>>>>>> >>>>>>>>> To create box file, command-> >>>>>>>>> tesseract.exe Dim4.tif Dim4 lstm.train >>>>>>>>> >>>>>>>>> To create unicharset, command-> >>>>>>>>> unicharset_extractor.exe --output_unicharset ..\own.unicharset >>>>>>>>> ..\langdata\eng\eng.training_text >>>>>>>>> >>>>>>>>> And to create trainned data, using tesstrain.sh command, >>>>>>>>> .\src\training\tesstrain.sh --fonts_dir C:\Windows\Fonts --lang >>>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir >>>>>>>>> langdata >>>>>>>>> --tessdata_dir tessdata --output_dir trainneddata >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Samruddhi >>>>>>>>> On Tuesday, August 24, 2021 at 12:24:29 PM UTC+5:30 Samruddhi >>>>>>>>> Dhake wrote: >>>>>>>>> >>>>>>>>>> I have generated training tools through source code. >>>>>>>>>> >>>>>>>>>> On Monday, August 23, 2021 at 7:09:02 PM UTC+5:30 zdenop wrote: >>>>>>>>>> >>>>>>>>>>> How did you install tesseract? Did you also install training >>>>>>>>>>> tools? >>>>>>>>>>> >>>>>>>>>>> Zdenko >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> po 23. 8. 2021 o 15:34 Samruddhi Dhake <sam22...@gmail.com> >>>>>>>>>>> napísal(a): >>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> I am creating my own trainneddata using tesseract v4.1.1 on >>>>>>>>>>>> Windows 10. >>>>>>>>>>>> I am referring documentation >>>>>>>>>>>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html >>>>>>>>>>>> >>>>>>>>>>>> I have successfully created .box file and .lstmf file >>>>>>>>>>>> using lstmbox and lstm.train respectively. >>>>>>>>>>>> So next step, I installed Cygwin to run tesstrain.sh command to >>>>>>>>>>>> create training data. >>>>>>>>>>>> But I am getting below error. >>>>>>>>>>>> >>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts >>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>>> --langdata_dir >>>>>>>>>>>> ./langdata --tessdata_dir ./tessdata --output_dir ./trainneddata >>>>>>>>>>>> >>>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>>> which: no text2image in >>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program Files/Microsoft >>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files >>>>>>>>>>>> (x86)/NVIDIA >>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files >>>>>>>>>>>> (x86)/Intel/Intel(R) >>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program >>>>>>>>>>>> Files/Intel/Intel(R) >>>>>>>>>>>> Management Engine >>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>> Files (x86)/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>> Files/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>> Files/Microsoft SQL Server/100/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program Files >>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web >>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows >>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files (x86)/Windows >>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program Files >>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 6.0.20/bin:/cygdrive/c/Program >>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL Server/Client >>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>> Files (x86)/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>> Files/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>> (x86)/Microsoft SQL Server/150/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files (x86)/Microsoft >>>>>>>>>>>> SQL >>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>> (x86)/Microsoft SQL Server/140/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>> (x86)/Microsoft SQL Server/140/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>>>>>>>>>>> which: no text2image in (./api) >>>>>>>>>>>> which: no text2image in (./training) >>>>>>>>>>>> ERROR: 'text2image' not found >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I found text2image comes after running command 'make training'. >>>>>>>>>>>> Can you please help me how this can be done in WIndows 10? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Samruddhi >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com >>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>>> >>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x1gPqh%2BUQ7vTzbzd0__E%3Dr%2BOnHXisaVNVdnoSk-ujq2g%40mail.gmail.com.