In other words, the BCER is an unreliable measure of accuracy. At least, that is my experience training from synthetic data.
On Wednesday, October 18, 2023 at 10:10:00 AM UTC+3 Des Bw wrote: > I am just writing a little observation here for beginners like me. > ( would love to be corrected if I am wrong). > I am training by cutting the top layer of a best model; to improve the > existing model. I have about 400,000 lines of texts; and generated the box > and images files using text2image. > > As I am training the model, I am getting BCER very low very fast. It took > me not even two epochs to reach to BCER to 0.001. That might sound a good > thing for an inexperienced user like me. But, as I am try the output model, > the accuracy is nowhere as good as the default best model. So, I have to > change t the target_error parameter to lower (0.0001), keep on training; > and the model is getting better and better. > > So, it looks like watching watching your learning iteration, which is > the first number from the number of iterations ( > https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#iterations-and-checkpoints) > > is a better approach than to watch the BCER. If the learning iteration > keeps on growing, that means, the model is still learning. You need to keep > on training, regardless of the BCER. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cfd33029-4431-44d1-ab0d-af7dee56646dn%40googlegroups.com.