I've spent many hours trying to figure out how to do this, and went down 
many false paths. 

The apparent way to do this using documentation is called "Fine Tuning for 
Impact": 
https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#fine-tuning-for-impact
 
, as tesstrain.sh is now deprecated:

training/lstmtraining --model_output /path/to/output [--max_image_MB 6000] 
\ --continue_from /path/to/existing/model \ --traineddata 
/path/to/original/traineddata \ [--perfect_sample_delay 0] 
[--debug_interval 0] \ [--max_iterations 0] [--target_error_rate 0.01] \ 
--train_listfile /path/to/list/of/filenames.txt  

However, it's not really clear where or how to include the new font's data. 
The manpage 
<https://github.com/tesseract-ocr/tesseract/blob/main/doc/lstmtraining.1.asc>on 
Github for lstmtraining seems out-of-date and is not in accordance with its 
Linux usage guide (if you enter "lstmtraining" in the terminal). There is a 
--fonts_dir parameter mentioned on Linux, but I haven't tried it yet with 
my new font. I also don't know what the value to --train_listfile is 
supposed to be, since tesstraining.sh is deprecated and was the one to 
generate it. 

Can someone from the Tesseract team please clarify this?

Thanks,

Osman




On Sunday, November 14, 2021 at 10:39:45 PM UTC-8 myne...@163.com wrote:

> Hello,
>
> I am encountering the same issue and awaiting for any feedback to it.
>
> Could any body give us some guidance on it (training for specific font)?
>
> many thanks!
>
> Ant
>
>
>
>
>
>
> -------- 原始邮件 --------
> 发件人: Samruddhi Dhake <sam22...@gmail.com>
> 日期: 2021年11月9日周二 19:20
> 收件人: tesseract-ocr <tesser...@googlegroups.com>
> 主 题: [tesseract-ocr] Steps to create traineddata for specific font
>
> Hello,
>
> I am working Tesseract v4.1.1 on Windows 10.
> I am trying to create trained data for specific font. 
> Can anyone please mention steps to train for specific font? 
> I know basic steps and able to create custom traineddata. For specific 
> font, I am using tesstrain.sh. 
> But I am facing many issues. Can anyone please guide me?
>
> Regards,
> Samruddhi
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to tesseract-oc...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/c0b4d9c2-c3b4-4783-8327-f970f279d07bn%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/tesseract-ocr/c0b4d9c2-c3b4-4783-8327-f970f279d07bn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9b14fb23-18dd-4404-a47b-b6ddd34d7101n%40googlegroups.com.

Reply via email to