Thank you very much,  I'll try all suggested changes.  I have already tried 
borders
and they seem to work!
Yuliana

On Tuesday, October 16, 2018 at 9:04:04 AM UTC+3, zdenop wrote:
>
>
>    1. If you have quality problem - it good to play with tesseract 
>    executable instead of API ;-)
>    2. It is know that passing text (in your case just one letter) is not 
>    best idea - please try to add small white border e.g. 10 px
>    3. Please set dpi for image after SetImage
>
> See attachment for improved images. 
> $ tesseract.char_4_b.png - --psm 10 -c page_separator=""
> 4
>
> For single character recognition legacy engine is better and it can 
> process your images without modification (but rules above are generally 
> good to follow!):
> $ tesseract char_0.png - --psm 10 --oem 0 --dpi 800 -c page_separator=""
> 0
>
> $ tesseract char_4.png - --psm 10 --oem 0 --dpi 800 -c page_separator=""
> 4
>
> $ tesseract char_h.png - --psm 10 --oem 0 --dpi 800 -c page_separator=""
> h
>
> Zdenko
>
>
> po 15. 10. 2018 o 22:44 'Yuliana Zigangirova' via tesseract-ocr <
> tesser...@googlegroups.com <javascript:>> napísal(a):
>
>> Hi everyone,
>>
>> I am trying to use Tesseract  for single character recognizing and the 
>> results are awful.
>> "h" is recognized as "n",  "4" as "/i",  "O" as "()"; 
>>
>> [image: 1testtiff.png]
>>
>> [image: 6testtiff.png]
>>
>>
>> [image: 2testtiff.png]
>>
>>
>>
>> Single character mode seems not to act, as many characters are recognized 
>> as two characters,
>> not  just one. My images are  simple bilevel black and white TIFF images,
>> latin characters.  This is bitmap font, not scanned images, they are 
>> absolutely clean and
>> need no improvement.
>> Оnly about half of the characters are correctly recognized, which seems 
>> to be
>> a very low percent for such a simple task.
>>
>>  The library Tesseract version I am using is  "4.0.0-beta.3".
>> This is how I call Tesseract.
>>
>>  int CharRecognizer::recognizeTIFFData(char* data, int datalength){
>>             char *outText;
>>             TessBaseAPI* api = new TessBaseAPI();
>>             // Initialize tesseract-ocr with English, without specifying 
>> tessdata path
>>             if (api->Init(NULL, "deu")) {
>>                     fprintf(stderr, "Could not initialize tesseract.\n");
>>                     exit(1);
>>             }
>>             api->SetPageSegMode(tesseract::PSM_SINGLE_CHAR);
>>             Pix *image = pixReadMem(data,datalength);
>>             api->SetImage(image);
>>             // Get OCR result
>>             outText = api->GetUTF8Text();
>>             printf("\nOCR output:\n%s", outText);
>>             // Destroy used object and release memory
>>             int utf8 = outText[0];
>>             api->End();
>>             delete[] outText;
>>             pixDestroy(&image);
>>             return utf8;
>>  }
>>
>>  I am new to Tesseract, so probably I am missing something.  Do I have to 
>> somehow train
>>  the library first?  May be I should set another  OcrEngineMode?  I have 
>> expected no
>>  problems  with simple  bitmap font recognizing and am quite at lost now.
>> Thank you very much in advance, 
>> Yuliana  
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c4cd8381-a5bc-4cf9-8dfb-55d448c773ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to