I am just writing a little observation here for beginners like me. ( would love to be corrected if I am wrong). I am training by cutting the top layer of a best model; to improve the existing model. I have about 400,000 lines of texts; and generated the box and images files using text2image.
As I am training the model, I am getting BCER very low very fast. It took me not even two epochs to reach to BCER to 0.001. That might sound a good thing for an inexperienced user like me. But, as I am try the output model, the accuracy is nowhere as good as the default best model. So, I have to change t the target_error parameter to lower (0.0001), keep on training; and the model is getting better and better. So, it looks like watching watching your learning iteration, which is the first number from the number of iterations (https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#iterations-and-checkpoints) is a better approach than to watch the BCER. If the learning iteration keeps on growing, that means, the model is still learning. You need to keep on training, regardless of the BCER. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2dd7dec1-0e98-4a81-9b35-520131ed07f0n%40googlegroups.com.