Thank you very much, I'll try all suggested changes. I have already tried borders and they seem to work! Yuliana
On Tuesday, October 16, 2018 at 9:04:04 AM UTC+3, zdenop wrote: > > > 1. If you have quality problem - it good to play with tesseract > executable instead of API ;-) > 2. It is know that passing text (in your case just one letter) is not > best idea - please try to add small white border e.g. 10 px > 3. Please set dpi for image after SetImage > > See attachment for improved images. > $ tesseract.char_4_b.png - --psm 10 -c page_separator="" > 4 > > For single character recognition legacy engine is better and it can > process your images without modification (but rules above are generally > good to follow!): > $ tesseract char_0.png - --psm 10 --oem 0 --dpi 800 -c page_separator="" > 0 > > $ tesseract char_4.png - --psm 10 --oem 0 --dpi 800 -c page_separator="" > 4 > > $ tesseract char_h.png - --psm 10 --oem 0 --dpi 800 -c page_separator="" > h > > Zdenko > > > po 15. 10. 2018 o 22:44 'Yuliana Zigangirova' via tesseract-ocr < > tesser...@googlegroups.com <javascript:>> napísal(a): > >> Hi everyone, >> >> I am trying to use Tesseract for single character recognizing and the >> results are awful. >> "h" is recognized as "n", "4" as "/i", "O" as "()"; >> >> [image: 1testtiff.png] >> >> [image: 6testtiff.png] >> >> >> [image: 2testtiff.png] >> >> >> >> Single character mode seems not to act, as many characters are recognized >> as two characters, >> not just one. My images are simple bilevel black and white TIFF images, >> latin characters. This is bitmap font, not scanned images, they are >> absolutely clean and >> need no improvement. >> Оnly about half of the characters are correctly recognized, which seems >> to be >> a very low percent for such a simple task. >> >> The library Tesseract version I am using is "4.0.0-beta.3". >> This is how I call Tesseract. >> >> int CharRecognizer::recognizeTIFFData(char* data, int datalength){ >> char *outText; >> TessBaseAPI* api = new TessBaseAPI(); >> // Initialize tesseract-ocr with English, without specifying >> tessdata path >> if (api->Init(NULL, "deu")) { >> fprintf(stderr, "Could not initialize tesseract.\n"); >> exit(1); >> } >> api->SetPageSegMode(tesseract::PSM_SINGLE_CHAR); >> Pix *image = pixReadMem(data,datalength); >> api->SetImage(image); >> // Get OCR result >> outText = api->GetUTF8Text(); >> printf("\nOCR output:\n%s", outText); >> // Destroy used object and release memory >> int utf8 = outText[0]; >> api->End(); >> delete[] outText; >> pixDestroy(&image); >> return utf8; >> } >> >> I am new to Tesseract, so probably I am missing something. Do I have to >> somehow train >> the library first? May be I should set another OcrEngineMode? I have >> expected no >> problems with simple bitmap font recognizing and am quite at lost now. >> Thank you very much in advance, >> Yuliana >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c4cd8381-a5bc-4cf9-8dfb-55d448c773ab%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.