Hi everyone, I am trying to use Tesseract for single character recognizing and the results are awful. "h" is recognized as "n", "4" as "/i", "O" as "()";
[image: 1testtiff.png] [image: 6testtiff.png] [image: 2testtiff.png] Single character mode seems not to act, as many characters are recognized as two characters, not just one. My images are simple bilevel black and white TIFF images, latin characters. This is bitmap font, not scanned images, they are absolutely clean and need no improvement. Оnly about half of the characters are correctly recognized, which seems to be a very low percent for such a simple task. The library Tesseract version I am using is "4.0.0-beta.3". This is how I call Tesseract. int CharRecognizer::recognizeTIFFData(char* data, int datalength){ char *outText; TessBaseAPI* api = new TessBaseAPI(); // Initialize tesseract-ocr with English, without specifying tessdata path if (api->Init(NULL, "deu")) { fprintf(stderr, "Could not initialize tesseract.\n"); exit(1); } api->SetPageSegMode(tesseract::PSM_SINGLE_CHAR); Pix *image = pixReadMem(data,datalength); api->SetImage(image); // Get OCR result outText = api->GetUTF8Text(); printf("\nOCR output:\n%s", outText); // Destroy used object and release memory int utf8 = outText[0]; api->End(); delete[] outText; pixDestroy(&image); return utf8; } I am new to Tesseract, so probably I am missing something. Do I have to somehow train the library first? May be I should set another OcrEngineMode? I have expected no problems with simple bitmap font recognizing and am quite at lost now. Thank you very much in advance, Yuliana -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.