I recently retrained the chi_tra model with a new font. The existing model
would confuse certain characters. In addition, the source images (I'm
decoding TV subtitles) had a weirdly shaped question mark. In the sample
below the last two characters output as the number "7".
[image: chi_tra_7_0_Q
I had many similar issues, especially with input with Yuan (rounded)
fonts. In the end I found the exact font used and ran additional training
with the new font.
Even after retraining some characters would be confused with others (like
your case). To strengthen those, I included many instan
Can any one suggest some debug settings I can activate to try to trace down
why I'm getting no output?
Thanks
Danny
On Tuesday, July 30, 2024 at 8:23:38 PM UTC+8 Danny wrote:
> I have a problem where tesseract produces no output (zero byte output
> file) when presented with Chinese characters f
3 matches
Mail list logo