[tesseract-ocr] What are Langdata repository given for retraining Tesseract

Venkatapathy S Wed, 14 Apr 2021 21:52:33 -0700

Hi,
I want to retrain Tesseract from the scratch for a particular language(I 
have read as many resources as possible, including warnings, from the 
Tutorial <https://tesseract-ocr.github.io/tessdoc/>, Github 
<https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951> 
and 
this forum). Now to begin (and to get myself familiar with the process), I 
was trying to start with the English language. When I was going through the 
langdata files(https://github.com/tesseract-ocr/langdata) for English I 
found out that the training text contains only 72 lines. Does the training 
text provided in the langdata repository given as a sample text or is it 
exactly the same set used to train the default eng.traineddata model 
provided by the tesseract? Can someone help me with this, please?


Regards,
Venkat
https://sites.google.com/view/venkatapathy/home

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5f588dfc-5c8b-400a-96c5-65c547f27d46n%40googlegroups.com.

[tesseract-ocr] What are Langdata repository given for retraining Tesseract

Reply via email to