Hi Vli,
I think you should test this on something similar to your actual text, not
on the alphabet or random strings.  With real text you are not going to see
() or <> that may be mistaken for a O.

The sequence of characters may influence the output, in other words try it
on real text. You can also blacklist the characters you do not need.

To be honest, the result does not seem bad to me. Special characters are
the most difficult ones.

Also this font is not easy to read, look at the M letter for example. If
you can, change the font or try to capture the image at higher resolution
before cleaning it.

What language is zth? This looks like latin text, did you try eng?


Lorenzo

Il giorno gio 16 set 2021 alle ore 07:59 vis li <liwei9...@gmail.com> ha
scritto:

> Tesseract Version:4.1.1
> Platform:Window10
>
> <https://user-images.githubusercontent.com/51877381/133545017-12e2b715-be45-4198-8035-9838c5375ea9.png>[image:
> testa.png]
>
> <https://user-images.githubusercontent.com/51877381/133545026-66cdd822-6885-4561-aa8c-d13496573a62.png>[image:
> testb.png]
> Page.getText():
>
> ACBEDFHGIKJLNHOP
> RQSUTV¥WYaZbdcef
>
> 1ppp000012121010
> &*(O+-,.:; O=%/
>
> like this,the result has some faults.
> I know that my image has some defects,but how can i improve this situation?
> I have done the binarization of the picture,and try to improve dpi to 300
> Because the pictures captured by the camera,I am worried if they can meet
> the standard for web pictures
>
> I have used LTSM mode ,and my Identified word library file is trained by
> LTSM and Microsoft Yahei Standard font
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLx7o2%3DJyGUv10_K3YCeRP1W3yn81cv3GDaSs3poJsZ4yw%40mail.gmail.com.

Reply via email to