Here is my code with tesseract4:

#include 
#include 

int main() {
  Pix *image = pixRead("image-001.ppm");
  tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
  api->Init(NULL, "spa");
  api->SetImage(image);
  Boxa* boxes = api->GetComponentImages(tesseract::RIL_PARA, true, NULL, 
NULL);
  printf("Found %d para image components.\n", boxes->n);
  for (int i = 0; i < boxes->n; i++) {
    BOX* box = boxaGetBox(boxes, i, L_CLONE);
    api->SetRectangle(box->x, box->y, box->w, box->h);
    char* ocrResult = api->GetUTF8Text();
    int conf = api->MeanTextConf();
    fprintf(stdout, "Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: 
%s",
                    i, box->x, box->y, box->w, box->h, conf, ocrResult);
    boxDestroy(&box);
  }
  boxaDestroy(&boxes);
  pixDestroy(&image);
}


The file i'm using is image-001.ppm 
https://drive.google.com/file/d/1SVBet9sp0nnxhN0be6_byZMeH_HiEhP6/view?usp=sharing.
 
If you want to view it, it's the second page of this pdf: 
https://drive.google.com/file/d/1nXEzreb3kQnamgadQAFe0ri8qRW94aCU/view

On Friday, 3 January 2020 16:18:02 UTC, zdenop wrote:
>
> seems like you forget to attach you code, image, tesseract version 
> details....
>
> Zdenko
>
>
> pi 3. 1. 2020 o 17:13 Nils André <[email protected] <javascript:>> 
> napísal(a):
>
>> I'm trying to extract paragraphs from an image so I tried 
>> GetComponentImages using tesseract::RIL_PARA but I just get the whole image.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/f6ca7c70-17c0-4ba3-aeba-e4a508e88fde%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/f6ca7c70-17c0-4ba3-aeba-e4a508e88fde%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/379b1861-35f0-4a86-8b97-93bcb8230468%40googlegroups.com.

Reply via email to