Re: [tesseract-ocr] Original training data for eng.traineddata

Zdenko Podobny Tue, 20 Jun 2023 22:15:06 -0700

With opensourced data you will not be able to create (from scratch) the
same quality traineddata as Google provided.
However there are some projects that fine tuned Google model successfully
e.g. (UB-Mannheim/: https://madoc.bib.uni-mannheim.de/53748/ )



Zdenko


st 21. 6. 2023 o 4:38 Duy Khanh <[email protected]> napísal(a):

> Hi. Is the existing eng.training_text in langdata_lstm the full text
> corpus used for training the eng.traineddata? Do we have a list of fonts
> used for generating the images?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1051395b-f3e9-4d5d-9f06-76454291d117n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/1051395b-f3e9-4d5d-9f06-76454291d117n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wjo-UdR8M9Hmsm8n9Mos3%3DM%2BwC71mE7gb0qNWWNeELOA%40mail.gmail.com.

Re: [tesseract-ocr] Original training data for eng.traineddata

Reply via email to