With opensourced data you will not be able to create (from scratch) the
same quality traineddata as Google provided.
However there are some projects that fine tuned Google model successfully
e.g. (UB-Mannheim/: https://madoc.bib.uni-mannheim.de/53748/ )


Zdenko


st 21. 6. 2023 o 4:38 Duy Khanh <touuk...@gmail.com> napĂ­sal(a):

> Hi. Is the existing eng.training_text in langdata_lstm the full text
> corpus used for training the eng.traineddata? Do we have a list of fonts
> used for generating the images?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1051395b-f3e9-4d5d-9f06-76454291d117n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/1051395b-f3e9-4d5d-9f06-76454291d117n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wjo-UdR8M9Hmsm8n9Mos3%3DM%2BwC71mE7gb0qNWWNeELOA%40mail.gmail.com.

Reply via email to