[tesseract-ocr] Tesseract improve prediction accuracy

Kehinde Adeoya Fri, 02 Dec 2022 01:27:13 -0800

Environment
   
   - *Tesseract Version*: 5.0.1-1.5.7, Tessdata: 3.04, Langdata: 3.04
   - *Platform*: 21.5.0 Darwin Kernel Version 21.5.0: 
   root:xnu-8020.121.3~4/RELEASE_X86_64 x86_64 i386 Darwin


Current Behavior:

Tesseract is unable to differentiate between font weights. After training a 
font, in the project, there are varying font weights used from 100, 200, to 
900. Are there provisions for how to get font-weight as attributes as it 
only returns bold? There is no way to check the weights.

Passport 
<https://user-images.githubusercontent.com/1056293/205259076-50d3e944-97ea-4370-b0e0-1c17e4ae75f3.png>

Secondly, Tesseract seems unstable in predictions. I have done all that has 
been recommended to improve accuracy and yet the prediction seems 
indefinite. The image above is a prime example, there are times it'll see 
it as bold, which is correct. In the next run, it might start seeing it as 
a normal font. The font-weight is 700, which interprets as bold. I have run 
the same test case more than 10 times, and the result could be bold=6, 
normalfont=4.
Expected Behavior:

It should be consistent in prediction and differentiate between 
font-weights.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bdc26147-d463-42ab-bc41-0c8de16654dan%40googlegroups.com.

[tesseract-ocr] Tesseract improve prediction accuracy

Reply via email to