I want my application able to recognize characters like: 'Φ'

Vào 00:56:01 UTC+7 Thứ Bảy, ngày 20 tháng 10 năm 2018, tu tonquang đã viết:
>
> Hi,
>
> *I have some errors when I follow this tutorial to retrain tesseract: *
>
> I follow this link to retrain tesseract with my image dataset (I retrain 
> tesseract with real image, not from text file via tesstrain.sh)
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata
>
> It is my steps to retrain tesseract lstm:
>
>
> *Step1: I create my training data (tif image + box file) from my images.*
> I generated its via this command line: tesseract 
> [lang].[fontname].exp[num].tif [lang].[fontname].exp[num] batch.nochop 
> makebox
>
>
> *Step2: I edit manually by Qt-box-edito*r. (I done with this link: 
> https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-Make-Box-Files
> )
> So now I have files:
> .tif file
> .box file
> .lstmf file (generated by command: tesseract 
> [lang].[fontname].exp[num].tif [lang].[fontname].exp[num] lstm.train
> unicharset file
>
>
> *Step 3: I create .traineddata via this command:*
> combine_lang_model --input_unicharset unicharset --script_dir langdata 
> --output_dir output --lang "eng"
> With langdata I downloaded from here: 
> https://github.com/tesseract-ocr/langdata
>
>
> *Step4: I extract existing model from exist traineddata by command:*
> combine_tessdata -e /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata 
> eng.lstm
>
>
> *Step5: I retrain tesseract *(Fine Tuning for ± a few characters: 
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters)
>  
> by command:
> lstmtraining --model_output output_model --continue_from eng.lstm 
> --traineddata output_basic/eng/eng.traineddata --old_traineddata /usr/share 
> tesseract-ocr/4.00/tessdata/eng.traineddata --train_listfile 
> eng.training_files.txt --debug_interval -1 --max_iterations 400
>
>    - It is format of my eng.training_files.txt:
>    path/to/lstmf
>
> *I get an error like the following:*
>
> [image: Screenshot from 2018-10-19 21-49-00.png]
> *It is example about my training image:*
> [image: eng.centurygothic.exp0.png]
>
>
>
>
>
> *I try to retrain tesseract with from real image (not from text file via 
> tesstrain.sh)*
>
> Please share me something if you have any idea to fix it.
>
>
> Thank you for advance !
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d08df2e0-ccc3-49bc-90ab-6588f9ab6ef3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to