I have been trying to retrain tesseract to read characters on a LCD screen, like 0 with a slash, certain V, M, N, A s. My setup: Using tesseract 5.3.2, on a Debian 12 machine for training. I used the code found here https://github.com/astutejoe/tesseract_tutorial/blob/main/split_training_text.py to generate my training text. Training text is based on eng.training_text. I also have the correct eng.trainneddata in the tesseract/tessdata folder. The ground truth is then copied into tesstrain/data and I use make tesseract-langdata first to have langdata folder inside tesstrain. I use this command: make training MODEL_NAME=lcd, START_MODEL=eng, TESSDATA=
For my ground truth folder, I have tried different sample sizes, from 1000 lines to all 195k lines and used to train tesseract upto a few thousand iterations. Almost always the error converges down to 45% and just stays there. Can someone help me out with reducing this error rate? I have been going at it for quite a while, going through tessdocs and stuff, still unable to find a solution. If there's any information missing or required, lmk I'll update asap -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/27cb7606-2769-45b5-aa08-6080bb6c422en%40googlegroups.com.