[tesseract-ocr] Cutting the top layer is deteriorating the original training

Des Bw Wed, 20 Sep 2023 10:40:04 -0700

The default traineddata for Amharic is pretty accurate except it misses a 
handful of characters. 
I have been emulating what  Shree did to add the Norwegian Æ to the 
dataset. It actually worked like charm.


The problem is: I cannot get nowhere near the accuracy of the original best 
model. 
- used 65,000 lines of text
- up to 400,000 iterations. 

What do you think is going on?Does cutting the layer and training over it 
affect the original model?
The interesting part is, the original accuracy was almost intact to up to 
8000 iterations. But, I am able to get the character after around 15000 
iterations. 
/increasing the distribution of the target character doesn't seem to help 
much/. 

Any suggestion please?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d33ebc64-ab74-42fb-ab46-656b3faddbebn%40googlegroups.com.

[tesseract-ocr] Cutting the top layer is deteriorating the original training

Reply via email to