It seems that you do not use tesseract directly ( Page.getText()) so it
would be good to describe what and how you do it...
It could be useful to post original images - maybe there is a better way
for preprocessing...


št 16. 9. 2021 o 7:59 vis li <> napísal(a):

> Tesseract Version:4.1.1
> Platform:Window10
> <>[image:
> testa.png]
> <>[image:
> testb.png]
> Page.getText():
> RQSUTV¥WYaZbdcef
> 1ppp000012121010
> &*(O+-,.:; O=%/
> like this,the result has some faults.
> I know that my image has some defects,but how can i improve this situation?
> I have done the binarization of the picture,and try to improve dpi to 300
> Because the pictures captured by the camera,I am worried if they can meet
> the standard for web pictures
> I have used LTSM mode ,and my Identified word library file is trained by
> LTSM and Microsoft Yahei Standard font
