It seems that you do not use tesseract directly ( Page.getText()) so it would be good to describe what and how you do it... It could be useful to post original images - maybe there is a better way for preprocessing...
Zdenko št 16. 9. 2021 o 7:59 vis li <liwei9...@gmail.com> napísal(a): > Tesseract Version:4.1.1 > Platform:Window10 > > <https://user-images.githubusercontent.com/51877381/133545017-12e2b715-be45-4198-8035-9838c5375ea9.png>[image: > testa.png] > > <https://user-images.githubusercontent.com/51877381/133545026-66cdd822-6885-4561-aa8c-d13496573a62.png>[image: > testb.png] > Page.getText(): > > ACBEDFHGIKJLNHOP > RQSUTV¥WYaZbdcef > > 1ppp000012121010 > &*(O+-,.:; O=%/ > > like this,the result has some faults. > I know that my image has some defects,but how can i improve this situation? > I have done the binarization of the picture,and try to improve dpi to 300 > Because the pictures captured by the camera,I am worried if they can meet > the standard for web pictures > > I have used LTSM mode ,and my Identified word library file is trained by > LTSM and Microsoft Yahei Standard font > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wixOJsWkng8xkdY7njDeqwnceOMBBORtiDT9uiaW6EZg%40mail.gmail.com.