1. If you have quality problem - it good to play with tesseract executable instead of API ;-) 2. It is know that passing text (in your case just one letter) is not best idea - please try to add small white border e.g. 10 px 3. Please set dpi for image after SetImage
See attachment for improved images. $ tesseract.char_4_b.png - --psm 10 -c page_separator="" 4 For single character recognition legacy engine is better and it can process your images without modification (but rules above are generally good to follow!): $ tesseract char_0.png - --psm 10 --oem 0 --dpi 800 -c page_separator="" 0 $ tesseract char_4.png - --psm 10 --oem 0 --dpi 800 -c page_separator="" 4 $ tesseract char_h.png - --psm 10 --oem 0 --dpi 800 -c page_separator="" h Zdenko po 15. 10. 2018 o 22:44 'Yuliana Zigangirova' via tesseract-ocr < tesseract-ocr@googlegroups.com> napísal(a): > Hi everyone, > > I am trying to use Tesseract for single character recognizing and the > results are awful. > "h" is recognized as "n", "4" as "/i", "O" as "()"; > > [image: 1testtiff.png] > > [image: 6testtiff.png] > > > [image: 2testtiff.png] > > > > Single character mode seems not to act, as many characters are recognized > as two characters, > not just one. My images are simple bilevel black and white TIFF images, > latin characters. This is bitmap font, not scanned images, they are > absolutely clean and > need no improvement. > Оnly about half of the characters are correctly recognized, which seems to > be > a very low percent for such a simple task. > > The library Tesseract version I am using is "4.0.0-beta.3". > This is how I call Tesseract. > > int CharRecognizer::recognizeTIFFData(char* data, int datalength){ > char *outText; > TessBaseAPI* api = new TessBaseAPI(); > // Initialize tesseract-ocr with English, without specifying > tessdata path > if (api->Init(NULL, "deu")) { > fprintf(stderr, "Could not initialize tesseract.\n"); > exit(1); > } > api->SetPageSegMode(tesseract::PSM_SINGLE_CHAR); > Pix *image = pixReadMem(data,datalength); > api->SetImage(image); > // Get OCR result > outText = api->GetUTF8Text(); > printf("\nOCR output:\n%s", outText); > // Destroy used object and release memory > int utf8 = outText[0]; > api->End(); > delete[] outText; > pixDestroy(&image); > return utf8; > } > > I am new to Tesseract, so probably I am missing something. Do I have to > somehow train > the library first? May be I should set another OcrEngineMode? I have > expected no > problems with simple bitmap font recognizing and am quite at lost now. > Thank you very much in advance, > Yuliana > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w4tRdNx_TVJmpe%2BuOG6NER5k0USkiP5Dzx_cd05vfd0A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.