Hi Vli, I think you should test this on something similar to your actual text, not on the alphabet or random strings. With real text you are not going to see () or <> that may be mistaken for a O.
The sequence of characters may influence the output, in other words try it on real text. You can also blacklist the characters you do not need. To be honest, the result does not seem bad to me. Special characters are the most difficult ones. Also this font is not easy to read, look at the M letter for example. If you can, change the font or try to capture the image at higher resolution before cleaning it. What language is zth? This looks like latin text, did you try eng? Lorenzo Il giorno gio 16 set 2021 alle ore 07:59 vis li <liwei9...@gmail.com> ha scritto: > Tesseract Version:4.1.1 > Platform:Window10 > > <https://user-images.githubusercontent.com/51877381/133545017-12e2b715-be45-4198-8035-9838c5375ea9.png>[image: > testa.png] > > <https://user-images.githubusercontent.com/51877381/133545026-66cdd822-6885-4561-aa8c-d13496573a62.png>[image: > testb.png] > Page.getText(): > > ACBEDFHGIKJLNHOP > RQSUTV¥WYaZbdcef > > 1ppp000012121010 > &*(O+-,.:; O=%/ > > like this,the result has some faults. > I know that my image has some defects,but how can i improve this situation? > I have done the binarization of the picture,and try to improve dpi to 300 > Because the pictures captured by the camera,I am worried if they can meet > the standard for web pictures > > I have used LTSM mode ,and my Identified word library file is trained by > LTSM and Microsoft Yahei Standard font > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLx7o2%3DJyGUv10_K3YCeRP1W3yn81cv3GDaSs3poJsZ4yw%40mail.gmail.com.