Hi everyone,

I am trying to use Tesseract  for single character recognizing and the 
results are awful.
"h" is recognized as "n",  "4" as "/i",  "O" as "()"; 

[image: 1testtiff.png]

[image: 6testtiff.png]


[image: 2testtiff.png]



Single character mode seems not to act, as many characters are recognized 
as two characters,
not  just one. My images are  simple bilevel black and white TIFF images,
latin characters.  This is bitmap font, not scanned images, they are 
absolutely clean and
need no improvement.
Оnly about half of the characters are correctly recognized, which seems to 
be
a very low percent for such a simple task.

 The library Tesseract version I am using is  "4.0.0-beta.3".
This is how I call Tesseract.

 int CharRecognizer::recognizeTIFFData(char* data, int datalength){
            char *outText;
            TessBaseAPI* api = new TessBaseAPI();
            // Initialize tesseract-ocr with English, without specifying 
tessdata path
            if (api->Init(NULL, "deu")) {
                    fprintf(stderr, "Could not initialize tesseract.\n");
                    exit(1);
            }
            api->SetPageSegMode(tesseract::PSM_SINGLE_CHAR);
            Pix *image = pixReadMem(data,datalength);
            api->SetImage(image);
            // Get OCR result
            outText = api->GetUTF8Text();
            printf("\nOCR output:\n%s", outText);
            // Destroy used object and release memory
            int utf8 = outText[0];
            api->End();
            delete[] outText;
            pixDestroy(&image);
            return utf8;
 }

 I am new to Tesseract, so probably I am missing something.  Do I have to 
somehow train
 the library first?  May be I should set another  OcrEngineMode?  I have 
expected no
 problems  with simple  bitmap font recognizing and am quite at lost now.
Thank you very much in advance, 
Yuliana  

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to