[tesseract-ocr] Re: Watching the learning iteration is better method than watching the BCER

Des Bw Wed, 18 Oct 2023 00:13:22 -0700

In other words, the BCER is an unreliable measure of accuracy. At least, 
that is my experience training from  synthetic data.


On Wednesday, October 18, 2023 at 10:10:00 AM UTC+3 Des Bw wrote:

> I am just writing a little observation here for beginners like me. 
> ( would love to be corrected if I am wrong). 
> I am training by cutting the top layer of a best model; to improve the 
> existing model. I have about 400,000 lines of texts; and generated the box 
> and images files using text2image. 
>
> As I am training the model, I am getting BCER very low very fast. It took 
> me not even two epochs to reach to BCER to  0.001. That might sound a good 
> thing for an inexperienced user like me. But, as I am try the output model, 
> the accuracy is nowhere as good as the default best model.  So, I have to 
> change t the target_error parameter to lower (0.0001), keep on training; 
> and the model is getting better and better. 
>
> So, it looks like watching watching  your learning iteration,  which is 
> the first number from the number of iterations (
> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#iterations-and-checkpoints)
>  
> is a better approach than to watch the BCER. If the learning iteration 
> keeps on growing, that means, the model is still learning. You need to keep 
> on training, regardless of the BCER. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cfd33029-4431-44d1-ab0d-af7dee56646dn%40googlegroups.com.

[tesseract-ocr] Re: Watching the learning iteration is better method than watching the BCER

Reply via email to