Re: [tesseract-ocr] Why do I get such poor results from Tesseract for simple single character recognizing?

Zdenko Podobny Mon, 15 Oct 2018 23:04:09 -0700

   1. If you have quality problem - it good to play with tesseract
   executable instead of API ;-)
   2. It is know that passing text (in your case just one letter) is not
   best idea - please try to add small white border e.g. 10 px
   3. Please set dpi for image after SetImage


See attachment for improved images.
$ tesseract.char_4_b.png - --psm 10 -c page_separator=""
4

For single character recognition legacy engine is better and it can process
your images without modification (but rules above are generally good to
follow!):
$ tesseract char_0.png - --psm 10 --oem 0 --dpi 800 -c page_separator=""
0

$ tesseract char_4.png - --psm 10 --oem 0 --dpi 800 -c page_separator=""
4

$ tesseract char_h.png - --psm 10 --oem 0 --dpi 800 -c page_separator=""
h

Zdenko


po 15. 10. 2018 o 22:44 'Yuliana Zigangirova' via tesseract-ocr <
tesseract-ocr@googlegroups.com> napísal(a):

> Hi everyone,
>
> I am trying to use Tesseract  for single character recognizing and the
> results are awful.
> "h" is recognized as "n",  "4" as "/i",  "O" as "()";
>
> [image: 1testtiff.png]
>
> [image: 6testtiff.png]
>
>
> [image: 2testtiff.png]
>
>
>
> Single character mode seems not to act, as many characters are recognized
> as two characters,
> not  just one. My images are  simple bilevel black and white TIFF images,
> latin characters.  This is bitmap font, not scanned images, they are
> absolutely clean and
> need no improvement.
> Оnly about half of the characters are correctly recognized, which seems to
> be
> a very low percent for such a simple task.
>
>  The library Tesseract version I am using is  "4.0.0-beta.3".
> This is how I call Tesseract.
>
>  int CharRecognizer::recognizeTIFFData(char* data, int datalength){
>             char *outText;
>             TessBaseAPI* api = new TessBaseAPI();
>             // Initialize tesseract-ocr with English, without specifying
> tessdata path
>             if (api->Init(NULL, "deu")) {
>                     fprintf(stderr, "Could not initialize tesseract.\n");
>                     exit(1);
>             }
>             api->SetPageSegMode(tesseract::PSM_SINGLE_CHAR);
>             Pix *image = pixReadMem(data,datalength);
>             api->SetImage(image);
>             // Get OCR result
>             outText = api->GetUTF8Text();
>             printf("\nOCR output:\n%s", outText);
>             // Destroy used object and release memory
>             int utf8 = outText[0];
>             api->End();
>             delete[] outText;
>             pixDestroy(&image);
>             return utf8;
>  }
>
>  I am new to Tesseract, so probably I am missing something.  Do I have to
> somehow train
>  the library first?  May be I should set another  OcrEngineMode?  I have
> expected no
>  problems  with simple  bitmap font recognizing and am quite at lost now.
> Thank you very much in advance,
> Yuliana
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w4tRdNx_TVJmpe%2BuOG6NER5k0USkiP5Dzx_cd05vfd0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Why do I get such poor results from Tesseract for simple single character recognizing?

Reply via email to