The default traineddata for Amharic is pretty accurate except it misses a handful of characters. I have been emulating what Shree did to add the Norwegian Æ to the dataset. It actually worked like charm.
The problem is: I cannot get nowhere near the accuracy of the original best model. - used 65,000 lines of text - up to 400,000 iterations. What do you think is going on?Does cutting the layer and training over it affect the original model? The interesting part is, the original accuracy was almost intact to up to 8000 iterations. But, I am able to get the character after around 15000 iterations. /increasing the distribution of the target character doesn't seem to help much/. Any suggestion please? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d33ebc64-ab74-42fb-ab46-656b3faddbebn%40googlegroups.com.