[tesseract-ocr] Trying to train a new font (LCD screen style), unable to get error rate under 40%

Chaitanya Vermani Tue, 20 Feb 2024 22:11:43 -0800

I have been trying to retrain tesseract to read characters on a LCD screen, 
like 0 with a slash, certain V, M, N, A s.
My setup:
Using tesseract 5.3.2, on a Debian 12 machine for training.
I used the code found 
here 
https://github.com/astutejoe/tesseract_tutorial/blob/main/split_training_text.py
to generate my training text. Training text is based on eng.training_text. 
I also have the correct eng.trainneddata in the tesseract/tessdata folder. 
The ground truth is then copied into tesstrain/data and I use make 
tesseract-langdata first to have langdata folder inside tesstrain.
I use this command: make training MODEL_NAME=lcd, START_MODEL=eng, TESSDATA=




For my ground truth folder, I have tried different sample sizes, from 1000 
lines to all 195k lines and used to train tesseract upto a few thousand 
iterations. Almost always the error converges down to 45% and just stays 
there. 


Can someone help me out with reducing this error rate? I have been going at 
it for quite a while, going through tessdocs and stuff, still unable to 
find a solution. If there's any information missing or required, lmk I'll 
update asap

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/27cb7606-2769-45b5-aa08-6080bb6c422en%40googlegroups.com.

[tesseract-ocr] Trying to train a new font (LCD screen style), unable to get error rate under 40%

Reply via email to