windows 10 tesseract 4 alpha
On Tuesday, May 15, 2018 at 1:12:20 PM UTC+4:30, shree wrote: > > What o/s are you running it on? > > Which version of tesseract? > > > ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset > does not exist or is not readable > > which version of icu library? > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, May 15, 2018 at 1:00 PM, reza <reza...@gmail.com <javascript:>> > wrote: > >> i used this attached finetune.sh file ... but that raised error. could u >> help me ? >> >> thanks >> >> >>> ###### MAKING TRAINING DATA ###### >>> >>> >>>> === Starting training for language 'eng' >>> >>> [Tue, May 15, 2018 11:42:36 AM] /c/Program Files >>>> (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts --font=Arial >>>> --outputbase=/tmp/font_tmp.CpgpM0lbxD/sample_text.txt >>>> --text=/tmp/font_tmp.CpgpM0lbxD/sample_text.txt >>>> --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD >>> >>> Rendered page 0 to file >>>> C:/Users/asus/AppData/Local/Temp/font_tmp.CpgpM0lbxD/sample_text.txt.tif >>> >>> >>>> === Phase I: Generating training images === >>> >>> Rendering using Arial >>> >>> Rendering using Corbel >>> >>> [Tue, May 15, 2018 11:42:37 AM] /c/Program Files >>>> (x86)/Tesseract-OCR/text2image >>>> --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD >>>> --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 >>>> --char_spacing=0.0 --exposure=0 >>>> --outputbase=/tmp/tmp.6m4B2TUln1/eng/eng.Arial.exp0 --max_pages=3 >>>> --font=Arial --text=./langdata/eng/eng.training_text >>> >>> [Tue, May 15, 2018 11:42:37 AM] /c/Program Files >>>> (x86)/Tesseract-OCR/text2image >>>> --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD >>>> --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 >>>> --char_spacing=0.0 --exposure=0 >>>> --outputbase=/tmp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0 --max_pages=3 >>>> --font=Corbel --text=./langdata/eng/eng.training_text >>> >>> Stripped 2 unrenderable words >>> >>> Rendered page 0 to file >>>> C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.tif >>> >>> Stripped 1 unrenderable words >>> >>> Rendered page 1 to file >>>> C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.tif >>> >>> Stripped 2 unrenderable words >>> >>> Rendered page 0 to file >>>> C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.tif >>> >>> Stripped 1 unrenderable words >>> >>> Rendered page 1 to file >>>> C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.tif >>> >>> >>>> === Phase UP: Generating unicharset and unichar properties files === >>> >>> [Tue, May 15, 2018 11:42:39 AM] /c/Program Files >>>> (x86)/Tesseract-OCR/unicharset_extractor --output_unicharset >>>> /tmp/tmp.6m4B2TUln1/eng/eng.unicharset --norm_mode 1 >>>> /tmp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.box >>>> /tmp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.box >>> >>> Extracting unicharset from box file >>>> C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.box >>> >>> Extracting unicharset from box file >>>> C:/Users/asus/AppData/Local/Temp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.box >>> >>> ICU ERROR: U_FILE_ACCESS_ERRORERROR: >>>> /tmp/tmp.6m4B2TUln1/eng/eng.unicharset does not exist or is not readable >>> >>> ###### MAKING EVAL DATA ###### >>> >>> >>>> === Starting training for language 'eng' >>> >>> [Tue, May 15, 2018 11:42:40 AM] /c/Program Files >>>> (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts --font=Calibri >>>> --outputbase=/tmp/font_tmp.n0qq4iJk4q/sample_text.txt >>>> --text=/tmp/font_tmp.n0qq4iJk4q/sample_text.txt >>>> --fontconfig_tmpdir=/tmp/font_tmp.n0qq4iJk4q >>> >>> Rendered page 0 to file >>>> C:/Users/asus/AppData/Local/Temp/font_tmp.n0qq4iJk4q/sample_text.txt.tif >>> >>> >>>> === Phase I: Generating training images === >>> >>> Rendering using Calibri >>> >>> [Tue, May 15, 2018 11:42:40 AM] /c/Program Files >>>> (x86)/Tesseract-OCR/text2image >>>> --fontconfig_tmpdir=/tmp/font_tmp.n0qq4iJk4q >>>> --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 >>>> --char_spacing=0.0 --exposure=0 >>>> --outputbase=/tmp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0 --max_pages=3 >>>> --font=Calibri --text=./langdata/eng/eng.training_text >>> >>> Stripped 2 unrenderable words >>> >>> Rendered page 0 to file >>>> C:/Users/asus/AppData/Local/Temp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.tif >>> >>> Stripped 1 unrenderable words >>> >>> Rendered page 1 to file >>>> C:/Users/asus/AppData/Local/Temp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.tif >>> >>> >>>> === Phase UP: Generating unicharset and unichar properties files === >>> >>> [Tue, May 15, 2018 11:42:42 AM] /c/Program Files >>>> (x86)/Tesseract-OCR/unicharset_extractor --output_unicharset >>>> /tmp/tmp.h0l64TAxEq/eng/eng.unicharset --norm_mode 1 >>>> /tmp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.box >>> >>> Extracting unicharset from box file >>>> C:/Users/asus/AppData/Local/Temp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.box >>> >>> ICU ERROR: U_FILE_ACCESS_ERRORERROR: >>>> /tmp/tmp.h0l64TAxEq/eng/eng.unicharset does not exist or is not readable >>> >>> #### combine_tessdata to extract lstm model from previous trained set >>>> #### >>> >>> Extracting tessdata components from ./tessdata_best/eng.traineddata >>> >>> Wrote ./trained_plus_chars/eng.lstm >>> >>> Version string:4.00.00alpha:eng:synth20170629 >>> >>> 17:lstm:size=401636, offset=192 >>> >>> 18:lstm-punc-dawg:size=4322, offset=401828 >>> >>> 19:lstm-word-dawg:size=3694794, offset=406150 >>> >>> 20:lstm-number-dawg:size=4738, offset=4100944 >>> >>> 21:lstm-unicharset:size=6360, offset=4105682 >>> >>> 22:lstm-recoder:size=1012, offset=4112042 >>> >>> 23:version:size=30, offset=4113054 >>> >>> #### training from previous optimum ##### >>> >>> finetune.sh: line 119: 11664 Segmentation fault lstmtraining >>>> --model_output $train_output_dir/pluschars --continue_from >>>> $train_output_dir/$Lang.lstm --old_traineddata >>>> $tessdata_dir/$Lang.traineddata --traineddata >>>> $train_output_dir/$Lang/$Lang.traineddata --max_iterations $MaxIterations >>>> --debug_interval -1 --eval_listfile >>>> $eval_output_dir/$Lang.training_files.txt --train_listfile >>>> $train_output_dir/$Lang.training_files.txt >>> >>> #### Building final trained file >>>> ./trained_plus_chars/eng_NEW.traineddata d#### >>> >>> finetune.sh: line 130: 11320 Segmentation fault lstmtraining >>>> --stop_training --continue_from $train_output_dir/pluschars_checkpoint >>>> --traineddata $train_output_dir/$Lang/$Lang.traineddata --model_output >>>> $final_trained_data_file >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/7c46c196-e08d-4541-9f3b-b8a768792c9a%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/7c46c196-e08d-4541-9f3b-b8a768792c9a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3851abc9-90b5-4a09-a01f-ffbd583e6bab%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.