Re: [tesseract-ocr] Re: why are there no new trained models since 2018?

2024-08-02 Thread 'Danny' via tesseract-ocr
I recently retrained the chi_tra model with a new font. The existing model would confuse certain characters. In addition, the source images (I'm decoding TV subtitles) had a weirdly shaped question mark. In the sample below the last two characters output as the number "7". [image: chi_tra_7_0_Q

[tesseract-ocr] Re: Chinise characters.

2024-08-02 Thread 'Danny' via tesseract-ocr
I had many similar issues, especially with input with Yuan (rounded) fonts. In the end I found the exact font used and ran additional training with the new font. Even after retraining some characters would be confused with others (like your case). To strengthen those, I included many instan

[tesseract-ocr] Re: No output when Chinese Traditional followed by dots or ellipsis

2024-08-02 Thread 'Danny' via tesseract-ocr
Can any one suggest some debug settings I can activate to try to trace down why I'm getting no output? Thanks Danny On Tuesday, July 30, 2024 at 8:23:38 PM UTC+8 Danny wrote: > I have a problem where tesseract produces no output (zero byte output > file) when presented with Chinese characters f