[tesseract-ocr] Training from Scratch

Simon Wed, 22 Nov 2023 05:46:58 -0800

As it is not properly possible to combine my traineddata from scratch with 
an existing one, I have decided to also train my traineddata model numbers. 
Therefore I wrote a script which synthetically generates groundtruth data 
with text2image. 
This script uses dozens of different fonts and creates numbers for the 
following formats. 
X.XXX
X.XX
X,XX
X,XXX
I generated 10,000 files to train the numbers. But unfortunately numbers 
get recognized pretty poorly with the best model. (most of times only "0."; 
"0" or "0," gets recognized)  
So I wanted to ask if It is not enough training (ground truth data) for 
proper recognition when I train several fonts. 
Thanks in advance for you help.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/41f85540-7d84-4f76-b2a0-f9280229547dn%40googlegroups.com.

[tesseract-ocr] Training from Scratch

Reply via email to