Check this
https://github.com/tesseract-ocr/tesseract/issues/1685

On Wed, 1 Sep 2021 at 6:22 PM, Samruddhi Dhake <sam22dh...@gmail.com> wrote:

> For images.
> I have to create my own trainneddata for my images. So for that I am
> following steps mentioned in this documentation
> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html
> As per the steps I have created box file, lstm file and unicharset file.
> And next step is to create traineddata using tesstrain.sh followed by the
> next step i.e. lstmtraining.exe .
> I am getting such errors while performing at step tesstrain.sh.
>
> On Wednesday, September 1, 2021 at 6:11:27 PM UTC+5:30 P007 wrote:
>
>> I mean working with font only?
>> Or images??
>>
>> On Wed, 1 Sep 2021 at 6:09 PM, Samruddhi Dhake <sam22...@gmail.com>
>> wrote:
>>
>>> Yes, I am working for eng language.
>>> I am using tessdata.(C:\Program Files\Tesseract-OCR\tessdata)
>>>
>>> On Wednesday, September 1, 2021 at 5:57:24 PM UTC+5:30 P007 wrote:
>>>
>>>> Okay,
>>>>
>>>> Wait you are working for English language right?
>>>> What kind of dataset you used here.
>>>>
>>>> On Wed, 1 Sep 2021 at 5:53 PM, Samruddhi Dhake <sam22...@gmail.com>
>>>> wrote:
>>>>
>>>>> No. Tessstrain.sh didn't work. I am running tesstrain.sh on cygwin.
>>>>>  Command->
>>>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang eng
>>>>> --linedata_only --noextract_font_properties --langdata_dir 'C:/Program
>>>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program
>>>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata --fontlist
>>>>> 'Arial'*
>>>>>
>>>>> After hitting enter for tesstrain.sh, it is processing text2image and
>>>>> giving following error
>>>>> === Starting training for language 'eng'
>>>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program
>>>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize 12
>>>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt
>>>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt
>>>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I
>>>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing
>>>>> Fontconfig error: Cannot load default config file
>>>>> Could not find font named 'Arial'.
>>>>> Please correct --font arg.
>>>>> ERROR: Program Program failed. Abort.
>>>>>
>>>>> As per previous suggestions, I ran text2image.exe command on cmd and
>>>>> its working and giving me all available fonts.
>>>>>
>>>>> Then after running tesstrain.sh, why text2image command is failing and
>>>>> it is not creating tempfolder under /tmp/ and I am getting fonts.config
>>>>> error.
>>>>> It is expected that fonts.config file which gets created in
>>>>> tempfolder(here in my case font_tmp.0doGBqWc3I) should gets written and it
>>>>> should include font 'Arial' and then Arial font can be found.
>>>>> Don't why it is not creating..
>>>>>
>>>>> Regards,
>>>>> Samruddhi
>>>>>
>>>>> On Wednesday, September 1, 2021 at 5:31:10 PM UTC+5:30 P007 wrote:
>>>>>
>>>>>>
>>>>>> Tesstrain.sh work for you ?
>>>>>>
>>>>>> On Wed, 1 Sep 2021 at 5:09 PM, Samruddhi Dhake <sam22...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> In this text2image, there is an rgument --fontconfig_tempdir which
>>>>>>> creates temp folder where fonts.conf gets added.
>>>>>>>
>>>>>>> I checked /tmp/, no other tempfolder is created( font_tmp.0doGBqWc3I)
>>>>>>>
>>>>>>> Has anybody this issue?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Samruddhi
>>>>>>>
>>>>>>> On Tuesday, August 31, 2021 at 7:24:46 PM UTC+5:30 Samruddhi Dhake
>>>>>>> wrote:
>>>>>>>
>>>>>>>> >"C:\Program Files\Tesseract-OCR\text2image.exe"
>>>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp 
>>>>>>>> --list_available_fonts
>>>>>>>> This worked. I got list of available fonts which contains Arial and
>>>>>>>> Arial Bold too.
>>>>>>>>
>>>>>>>> Now this time,in Cygwin Bash, I tried giving --fontlist 'Arial' for
>>>>>>>> tesstrain.sh
>>>>>>>> Command->
>>>>>>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang
>>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir 
>>>>>>>> 'C:/Program
>>>>>>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program
>>>>>>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata 
>>>>>>>> --fontlist
>>>>>>>> 'Arial'*
>>>>>>>>
>>>>>>>> === Starting training for language 'eng'
>>>>>>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program
>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize 12
>>>>>>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt
>>>>>>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt
>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I
>>>>>>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing
>>>>>>>> Fontconfig error: Cannot load default config file
>>>>>>>> Could not find font named 'Arial'.
>>>>>>>> Please correct --font arg.
>>>>>>>> ERROR: Program Program failed. Abort.
>>>>>>>>
>>>>>>>> Still I am getting this font.conf error. Any idea how to resolve
>>>>>>>> this font.conf error?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Samruddhi
>>>>>>>>
>>>>>>>> On Tuesday, August 31, 2021 at 4:50:14 PM UTC+5:30 zdenop wrote:
>>>>>>>>
>>>>>>>>> try run this:
>>>>>>>>> "C:\Program Files\Tesseract-OCR\text2image.exe"
>>>>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp 
>>>>>>>>> --list_available_fonts
>>>>>>>>>
>>>>>>>>> Zdenko
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> po 30. 8. 2021 o 16:45 Samruddhi Dhake <sam22...@gmail.com>
>>>>>>>>> napísal(a):
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am running command ->
>>>>>>>>>>
>>>>>>>>>> ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang
>>>>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir 
>>>>>>>>>> "C:/Program
>>>>>>>>>> Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program
>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata
>>>>>>>>>>
>>>>>>>>>> And after hitting enter -> (processing)
>>>>>>>>>> === *Starting training for language 'eng'*
>>>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program
>>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ 
>>>>>>>>>> --ptsize 12
>>>>>>>>>> --font=Arial Bold 
>>>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt
>>>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt
>>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS*
>>>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing*
>>>>>>>>>> *Fontconfig error: Cannot load default config file*
>>>>>>>>>> *Could not find font named 'Arial Bold'.*
>>>>>>>>>> *Please correct --font arg.*
>>>>>>>>>> *ERROR: Program Program failed. Abort.*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I will break it to ask few queries.
>>>>>>>>>>
>>>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program
>>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ 
>>>>>>>>>> --ptsize 12
>>>>>>>>>> --font=Arial Bold 
>>>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt
>>>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt
>>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS*
>>>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing*
>>>>>>>>>> ----> Here, I am not giving input as Arial Bold. Outputbase ,
>>>>>>>>>> this should create temp folder 'font_tmp.s9cdSHrzKS' but its not 
>>>>>>>>>> creating.
>>>>>>>>>> And so does fontconfig_tmpdir'. So it is giving writing error
>>>>>>>>>>
>>>>>>>>>> *Fontconfig error: Cannot load default config file*
>>>>>>>>>> ----> To resolve this error, I added
>>>>>>>>>> FONTCONFIG_FILE=%WINDIR%\fonts.conf to environment 
>>>>>>>>>> variables(referring
>>>>>>>>>> https://forums.wesnoth.org/viewtopic.php?t=22821)
>>>>>>>>>> But still not resolved.
>>>>>>>>>>
>>>>>>>>>> I was checking-> *text2image.exe ----list_available_fonts*
>>>>>>>>>> And after hitting enter, I got -> Fontconfig warning:
>>>>>>>>>> "/tmp\fonts.conf", line 4: empty font directory name ignored
>>>>>>>>>>
>>>>>>>>>> The contents of the fonts.conf file which gets created are->
>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>> <!DOCTYPE fontconfig SYSTEM "fonts.dtd">
>>>>>>>>>> <fontconfig>
>>>>>>>>>> <dir></dir>
>>>>>>>>>> <cachedir>/tmp</cachedir>
>>>>>>>>>> <config></config>
>>>>>>>>>> </fontconfig>
>>>>>>>>>>
>>>>>>>>>> Can you please help me how can this be resolved? Or Am I giving
>>>>>>>>>> correct tesstrain.sh command with its args?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Samruddhi
>>>>>>>>>> On Monday, August 30, 2021 at 5:12:21 PM UTC+5:30 zdenop wrote:
>>>>>>>>>>
>>>>>>>>>>> First of all: use quotes for multi word names, or escape
>>>>>>>>>>> space/special symbols (e.g. --font="Arial Bold")
>>>>>>>>>>> Next: fix error message: "Unable to open
>>>>>>>>>>> '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing"
>>>>>>>>>>> Next: check available font for text2image with option
>>>>>>>>>>> --list_available_fonts
>>>>>>>>>>> etc...
>>>>>>>>>>>
>>>>>>>>>>> PS: I would suggest using linux for training instead of windows
>>>>>>>>>>> (e.g. in WSL[1])
>>>>>>>>>>> [1] https://docs.microsoft.com/en-us/windows/wsl/install-win10
>>>>>>>>>>>
>>>>>>>>>>> Zdenko
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> po 30. 8. 2021 o 12:12 Samruddhi Dhake <sam22...@gmail.com>
>>>>>>>>>>> napísal(a):
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Text2Image error is gone. I am getting *font-config error*.
>>>>>>>>>>>>
>>>>>>>>>>>> SDE26@DTP-SDE26-IND /cygdrive/c/Program Files/Tesseract-OCR
>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts
>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties 
>>>>>>>>>>>> --langdata_dir
>>>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir 
>>>>>>>>>>>> "C:/Program
>>>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata
>>>>>>>>>>>> Creating new directory D:Testtrainneddata
>>>>>>>>>>>>
>>>>>>>>>>>> === Starting training for language 'eng'
>>>>>>>>>>>> [Mon Aug 30 15:34:53 IST 2021] /cygdrive/c/Program
>>>>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts 
>>>>>>>>>>>> --ptsize 12
>>>>>>>>>>>> --font=Arial Bold 
>>>>>>>>>>>> --outputbase=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt
>>>>>>>>>>>> --text=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt
>>>>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.hbC9F3LEQX
>>>>>>>>>>>> Unable to open '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing
>>>>>>>>>>>> Fontconfig error: Cannot load default config file
>>>>>>>>>>>> Could not find font named 'Arial Bold'.
>>>>>>>>>>>> Please correct --font arg.
>>>>>>>>>>>> ERROR: Program Program failed. Abort.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I have Arial Bold font on my machine. Don't know why it cannot
>>>>>>>>>>>> find. And in /tmp/ folder there is no font_tmp.hbC9F3LEQX where 
>>>>>>>>>>>> fonts.conf
>>>>>>>>>>>> cannot be opened for writing.
>>>>>>>>>>>> How can I resolve this?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>
>>>>>>>>>>>> On Wednesday, August 25, 2021 at 8:18:47 PM UTC+5:30 zdenop
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Honestly, I have no clue what you are doing: text2image is at
>>>>>>>>>>>>> the same location as the tesseract executable. So if you have 
>>>>>>>>>>>>> tesseract in
>>>>>>>>>>>>> the path, text2image must work too.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> st 25. 8. 2021 o 16:26 Samruddhi Dhake <sam22...@gmail.com>
>>>>>>>>>>>>> napísal(a):
>>>>>>>>>>>>>
>>>>>>>>>>>>>> As you suggested, I installed Tesseract v5.0.0 on my Windows
>>>>>>>>>>>>>> machine  (Index of /tesseract (uni-mannheim.de)
>>>>>>>>>>>>>> <https://digi.bib.uni-mannheim.de/tesseract/>). This
>>>>>>>>>>>>>> included training tools too.
>>>>>>>>>>>>>> I performed all the previous steps(boxfile, lstmf
>>>>>>>>>>>>>> file,unicharset)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But still after running tesstrain.sh command in Cygwin, I am
>>>>>>>>>>>>>> getting following error,
>>>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts
>>>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties 
>>>>>>>>>>>>>> --langdata_dir
>>>>>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir 
>>>>>>>>>>>>>> "C:/Program
>>>>>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir
>>>>>>>>>>>>>> D:/Bugs/1206806/folder/trainneddata
>>>>>>>>>>>>>> Creating new directory D:/Bugs/1206806/folder/trainneddata
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> === Starting training for language 'eng'
>>>>>>>>>>>>>> which: no text2image in
>>>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program Files/Microsoft
>>>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files 
>>>>>>>>>>>>>> (x86)/NVIDIA
>>>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files 
>>>>>>>>>>>>>> (x86)/Intel/Intel(R)
>>>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program 
>>>>>>>>>>>>>> Files/Intel/Intel(R)
>>>>>>>>>>>>>> Management Engine
>>>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program
>>>>>>>>>>>>>> Files (x86)/Microsoft SQL 
>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program
>>>>>>>>>>>>>> Files/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program
>>>>>>>>>>>>>> Files/Microsoft SQL Server/100/DTS/Binn:/cygdrive/c/Program
>>>>>>>>>>>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program Files
>>>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web
>>>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL
>>>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program
>>>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows
>>>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program 
>>>>>>>>>>>>>> Files/Microsoft
>>>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files (x86)/Windows
>>>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program Files
>>>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 6.0.20/bin:/cygdrive/c/Program
>>>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL 
>>>>>>>>>>>>>> Server/Client
>>>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files
>>>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program
>>>>>>>>>>>>>> Files (x86)/Microsoft SQL 
>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program
>>>>>>>>>>>>>> Files/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>>> Files
>>>>>>>>>>>>>> (x86)/Microsoft SQL Server/150/DTS/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>>> Files/Microsoft
>>>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>>> (x86)/Microsoft SQL
>>>>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files
>>>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>>> Files
>>>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/DTS/Binn:/cygdrive/c/Program Files
>>>>>>>>>>>>>> (x86)/Microsoft SQL
>>>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools)
>>>>>>>>>>>>>> which: no text2image in (./api)
>>>>>>>>>>>>>> which: no text2image in (./training)
>>>>>>>>>>>>>> ERROR: 'text2image' not found
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am I missing something? Can you please guild me?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 5:59:49 PM UTC+5:30 Samruddhi
>>>>>>>>>>>>>> Dhake wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you please provide link for steps to install Tesseract
>>>>>>>>>>>>>>> and training tools on Windows?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:42:48 PM UTC+5:30 Samruddhi
>>>>>>>>>>>>>>> Dhake wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How to install tesseract and training tools on Windows?
>>>>>>>>>>>>>>>> Do I have to install Tesseract Windows exe?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:20:37 PM UTC+5:30 zdenop
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So there are only 2 possibilities:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    1. Install tesseract and training tools
>>>>>>>>>>>>>>>>>    2. Learn how to handle & use not installed sw. This
>>>>>>>>>>>>>>>>>    option is not related to tesseract.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ut 24. 8. 2021 o 9:17 Samruddhi Dhake <sam22...@gmail.com>
>>>>>>>>>>>>>>>>> napísal(a):
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I haven't installed Tesseract. I have kept in a folder
>>>>>>>>>>>>>>>>>> and I am running exe by giving its path. I have generated 
>>>>>>>>>>>>>>>>>> training tools
>>>>>>>>>>>>>>>>>> through source code.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> To create box file, command->(I gave absoulute path of
>>>>>>>>>>>>>>>>>> tesseract.exe)
>>>>>>>>>>>>>>>>>> ..\tesseract.exe Dim4.tif Dim4 lstmbox
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> To create box file, command->
>>>>>>>>>>>>>>>>>> tesseract.exe Dim4.tif Dim4 lstm.train
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> To create unicharset, command->
>>>>>>>>>>>>>>>>>> unicharset_extractor.exe --output_unicharset
>>>>>>>>>>>>>>>>>> ..\own.unicharset ..\langdata\eng\eng.training_text
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> And to create trainned data, using tesstrain.sh command,
>>>>>>>>>>>>>>>>>> .\src\training\tesstrain.sh --fonts_dir C:\Windows\Fonts
>>>>>>>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties 
>>>>>>>>>>>>>>>>>> --langdata_dir
>>>>>>>>>>>>>>>>>> langdata --tessdata_dir tessdata --output_dir trainneddata
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 12:24:29 PM UTC+5:30
>>>>>>>>>>>>>>>>>> Samruddhi Dhake wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have generated training tools through source code.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Monday, August 23, 2021 at 7:09:02 PM UTC+5:30 zdenop
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> How did you install tesseract? Did you also install
>>>>>>>>>>>>>>>>>>>> training tools?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> po 23. 8. 2021 o 15:34 Samruddhi Dhake <
>>>>>>>>>>>>>>>>>>>> sam22...@gmail.com> napísal(a):
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I am creating my own trainneddata using tesseract
>>>>>>>>>>>>>>>>>>>>> v4.1.1 on Windows 10.
>>>>>>>>>>>>>>>>>>>>> I am referring documentation
>>>>>>>>>>>>>>>>>>>>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I have successfully created .box file and .lstmf file
>>>>>>>>>>>>>>>>>>>>> using lstmbox and lstm.train respectively.
>>>>>>>>>>>>>>>>>>>>> So next step, I installed Cygwin to run tesstrain.sh
>>>>>>>>>>>>>>>>>>>>> command to create training data.
>>>>>>>>>>>>>>>>>>>>> But I am getting below error.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir
>>>>>>>>>>>>>>>>>>>>> C:/Windows/Fonts --lang eng --linedata_only 
>>>>>>>>>>>>>>>>>>>>> --noextract_font_properties
>>>>>>>>>>>>>>>>>>>>> --langdata_dir ./langdata --tessdata_dir ./tessdata 
>>>>>>>>>>>>>>>>>>>>> --output_dir
>>>>>>>>>>>>>>>>>>>>> ./trainneddata
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> === Starting training for language 'eng'
>>>>>>>>>>>>>>>>>>>>> which: no text2image in
>>>>>>>>>>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>>> Files/Microsoft
>>>>>>>>>>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>>> (x86)/NVIDIA
>>>>>>>>>>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>>> (x86)/Intel/Intel(R)
>>>>>>>>>>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>>> Files/Intel/Intel(R)
>>>>>>>>>>>>>>>>>>>>> Management Engine
>>>>>>>>>>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>>> Server/100/DTS/Binn:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>> Files/Microsoft/Web Platform 
>>>>>>>>>>>>>>>>>>>>> Installer:/cygdrive/c/Program Files
>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web
>>>>>>>>>>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL
>>>>>>>>>>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>>> (x86)/Windows
>>>>>>>>>>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>>> Files/Microsoft
>>>>>>>>>>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>>> (x86)/Windows
>>>>>>>>>>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>>> Files
>>>>>>>>>>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 
>>>>>>>>>>>>>>>>>>>>> 6.0.20/bin:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>>> Server/Client
>>>>>>>>>>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files
>>>>>>>>>>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program Files
>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>>> Server/150/DTS/Binn:/cygdrive/c/Program Files/Microsoft
>>>>>>>>>>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL
>>>>>>>>>>>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>>> Files
>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn:/cygdrive/c/Program Files
>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>>> Server/140/DTS/Binn:/cygdrive/c/Program Files
>>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL
>>>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools)
>>>>>>>>>>>>>>>>>>>>> which: no text2image in (./api)
>>>>>>>>>>>>>>>>>>>>> which: no text2image in (./training)
>>>>>>>>>>>>>>>>>>>>> ERROR: 'text2image' not found
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I found text2image comes after running command 'make
>>>>>>>>>>>>>>>>>>>>> training'.
>>>>>>>>>>>>>>>>>>>>> Can you please help me how this can be done in WIndows
>>>>>>>>>>>>>>>>>>>>> 10?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed
>>>>>>>>>>>>>>>>>>>>> to the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving
>>>>>>>>>>>>>>>>>>>>> emails from it, send an email to
>>>>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com.
>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com
>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> You received this message because you are subscribed to
>>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails
>>>>>>>>>>>>>>>>>> from it, send an email to
>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com
>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com
>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>>>>
>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com
>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>>
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com
>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>
>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>
>>>>>>
>>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>
>>>>
>>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/89197941-16d3-4747-b280-95ddb9979b40n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/89197941-16d3-4747-b280-95ddb9979b40n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAFoW%2BHJuAAA3k_zfHvDQnsdEQriq-O%3DmkmMa0spxH0%2BN029eMw%40mail.gmail.com.

Reply via email to