Please use the latest windows binaries from https://github.com/UB-Mannheim/tesseract/wiki provided by @stweil
How do you run bash script on windows10? @stweil I have not tried training on windows? Do you have feedback from others who have tried it. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, May 15, 2018 at 2:41 PM, reza <reza6...@gmail.com> wrote: > windows 10 > tesseract 4 alpha > > > On Tuesday, May 15, 2018 at 1:12:20 PM UTC+4:30, shree wrote: >> >> What o/s are you running it on? >> >> Which version of tesseract? >> >> > ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset >> does not exist or is not readable >> >> which version of icu library? >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Tue, May 15, 2018 at 1:00 PM, reza <reza...@gmail.com> wrote: >> >>> i used this attached finetune.sh file ... but that raised error. could u >>> help me ? >>> >>> thanks >>> >>> >>>> ###### MAKING TRAINING DATA ###### >>>> >>>> >>>>> === Starting training for language 'eng' >>>> >>>> [Tue, May 15, 2018 11:42:36 AM] /c/Program Files >>>>> (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts --font=Arial >>>>> --outputbase=/tmp/font_tmp.CpgpM0lbxD/sample_text.txt >>>>> --text=/tmp/font_tmp.CpgpM0lbxD/sample_text.txt >>>>> --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD >>>> >>>> Rendered page 0 to file C:/Users/asus/AppData/Local/Te >>>>> mp/font_tmp.CpgpM0lbxD/sample_text.txt.tif >>>> >>>> >>>>> === Phase I: Generating training images === >>>> >>>> Rendering using Arial >>>> >>>> Rendering using Corbel >>>> >>>> [Tue, May 15, 2018 11:42:37 AM] /c/Program Files >>>>> (x86)/Tesseract-OCR/text2image >>>>> --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD >>>>> --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 >>>>> --char_spacing=0.0 --exposure=0 >>>>> --outputbase=/tmp/tmp.6m4B2TUln1/eng/eng.Arial.exp0 >>>>> --max_pages=3 --font=Arial --text=./langdata/eng/eng.training_text >>>> >>>> [Tue, May 15, 2018 11:42:37 AM] /c/Program Files >>>>> (x86)/Tesseract-OCR/text2image >>>>> --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD >>>>> --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 >>>>> --char_spacing=0.0 --exposure=0 >>>>> --outputbase=/tmp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0 >>>>> --max_pages=3 --font=Corbel --text=./langdata/eng/eng.training_text >>>> >>>> Stripped 2 unrenderable words >>>> >>>> Rendered page 0 to file C:/Users/asus/AppData/Local/Te >>>>> mp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.tif >>>> >>>> Stripped 1 unrenderable words >>>> >>>> Rendered page 1 to file C:/Users/asus/AppData/Local/Te >>>>> mp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.tif >>>> >>>> Stripped 2 unrenderable words >>>> >>>> Rendered page 0 to file C:/Users/asus/AppData/Local/Te >>>>> mp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.tif >>>> >>>> Stripped 1 unrenderable words >>>> >>>> Rendered page 1 to file C:/Users/asus/AppData/Local/Te >>>>> mp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.tif >>>> >>>> >>>>> === Phase UP: Generating unicharset and unichar properties files === >>>> >>>> [Tue, May 15, 2018 11:42:39 AM] /c/Program Files >>>>> (x86)/Tesseract-OCR/unicharset_extractor --output_unicharset >>>>> /tmp/tmp.6m4B2TUln1/eng/eng.unicharset --norm_mode 1 >>>>> /tmp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.box >>>>> /tmp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.box >>>> >>>> Extracting unicharset from box file C:/Users/asus/AppData/Local/Te >>>>> mp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.box >>>> >>>> Extracting unicharset from box file C:/Users/asus/AppData/Local/Te >>>>> mp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.box >>>> >>>> ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset >>>>> does not exist or is not readable >>>> >>>> ###### MAKING EVAL DATA ###### >>>> >>>> >>>>> === Starting training for language 'eng' >>>> >>>> [Tue, May 15, 2018 11:42:40 AM] /c/Program Files >>>>> (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts --font=Calibri >>>>> --outputbase=/tmp/font_tmp.n0qq4iJk4q/sample_text.txt >>>>> --text=/tmp/font_tmp.n0qq4iJk4q/sample_text.txt >>>>> --fontconfig_tmpdir=/tmp/font_tmp.n0qq4iJk4q >>>> >>>> Rendered page 0 to file C:/Users/asus/AppData/Local/Te >>>>> mp/font_tmp.n0qq4iJk4q/sample_text.txt.tif >>>> >>>> >>>>> === Phase I: Generating training images === >>>> >>>> Rendering using Calibri >>>> >>>> [Tue, May 15, 2018 11:42:40 AM] /c/Program Files >>>>> (x86)/Tesseract-OCR/text2image >>>>> --fontconfig_tmpdir=/tmp/font_tmp.n0qq4iJk4q >>>>> --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 >>>>> --char_spacing=0.0 --exposure=0 >>>>> --outputbase=/tmp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0 >>>>> --max_pages=3 --font=Calibri --text=./langdata/eng/eng.training_text >>>> >>>> Stripped 2 unrenderable words >>>> >>>> Rendered page 0 to file C:/Users/asus/AppData/Local/Te >>>>> mp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.tif >>>> >>>> Stripped 1 unrenderable words >>>> >>>> Rendered page 1 to file C:/Users/asus/AppData/Local/Te >>>>> mp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.tif >>>> >>>> >>>>> === Phase UP: Generating unicharset and unichar properties files === >>>> >>>> [Tue, May 15, 2018 11:42:42 AM] /c/Program Files >>>>> (x86)/Tesseract-OCR/unicharset_extractor --output_unicharset >>>>> /tmp/tmp.h0l64TAxEq/eng/eng.unicharset --norm_mode 1 >>>>> /tmp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.box >>>> >>>> Extracting unicharset from box file C:/Users/asus/AppData/Local/Te >>>>> mp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.box >>>> >>>> ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.h0l64TAxEq/eng/eng.unicharset >>>>> does not exist or is not readable >>>> >>>> #### combine_tessdata to extract lstm model from previous trained set >>>>> #### >>>> >>>> Extracting tessdata components from ./tessdata_best/eng.traineddata >>>> >>>> Wrote ./trained_plus_chars/eng.lstm >>>> >>>> Version string:4.00.00alpha:eng:synth20170629 >>>> >>>> 17:lstm:size=401636, offset=192 >>>> >>>> 18:lstm-punc-dawg:size=4322, offset=401828 >>>> >>>> 19:lstm-word-dawg:size=3694794, offset=406150 >>>> >>>> 20:lstm-number-dawg:size=4738, offset=4100944 >>>> >>>> 21:lstm-unicharset:size=6360, offset=4105682 >>>> >>>> 22:lstm-recoder:size=1012, offset=4112042 >>>> >>>> 23:version:size=30, offset=4113054 >>>> >>>> #### training from previous optimum ##### >>>> >>>> finetune.sh: line 119: 11664 Segmentation fault lstmtraining >>>>> --model_output $train_output_dir/pluschars --continue_from >>>>> $train_output_dir/$Lang.lstm --old_traineddata >>>>> $tessdata_dir/$Lang.traineddata --traineddata >>>>> $train_output_dir/$Lang/$Lang.traineddata --max_iterations >>>>> $MaxIterations --debug_interval -1 --eval_listfile >>>>> $eval_output_dir/$Lang.training_files.txt --train_listfile >>>>> $train_output_dir/$Lang.training_files.txt >>>> >>>> #### Building final trained file ./trained_plus_chars/eng_NEW.traineddata >>>>> d#### >>>> >>>> finetune.sh: line 130: 11320 Segmentation fault lstmtraining >>>>> --stop_training --continue_from $train_output_dir/pluschars_checkpoint >>>>> --traineddata $train_output_dir/$Lang/$Lang.traineddata >>>>> --model_output $final_trained_data_file >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/7c46c196-e08d-4541-9f3b-b8a768792c9a%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/7c46c196-e08d-4541-9f3b-b8a768792c9a%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/3851abc9-90b5-4a09-a01f-ffbd583e6bab% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/3851abc9-90b5-4a09-a01f-ffbd583e6bab%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWY0b-Lw%2BoMpC8%3DpFMj4xvbfVtf3ovrgVT%2BckrrEmOhyw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.