Here is my code with tesseract4:
#include
#include
int main() {
Pix *image = pixRead("image-001.ppm");
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
api->Init(NULL, "spa");
api->SetImage(image);
Boxa* boxes = api->GetComponentImages(tesseract::RIL_PARA, true, NULL,
NULL);
printf("Found %d para image components.\n", boxes->n);
for (int i = 0; i < boxes->n; i++) {
BOX* box = boxaGetBox(boxes, i, L_CLONE);
api->SetRectangle(box->x, box->y, box->w, box->h);
char* ocrResult = api->GetUTF8Text();
int conf = api->MeanTextConf();
fprintf(stdout, "Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text:
%s",
i, box->x, box->y, box->w, box->h, conf, ocrResult);
boxDestroy(&box);
}
boxaDestroy(&boxes);
pixDestroy(&image);
}
The file i'm using is image-001.ppm
https://drive.google.com/file/d/1SVBet9sp0nnxhN0be6_byZMeH_HiEhP6/view?usp=sharing.
If you want to view it, it's the second page of this pdf:
https://drive.google.com/file/d/1nXEzreb3kQnamgadQAFe0ri8qRW94aCU/view
On Friday, 3 January 2020 16:18:02 UTC, zdenop wrote:
>
> seems like you forget to attach you code, image, tesseract version
> details....
>
> Zdenko
>
>
> pi 3. 1. 2020 o 17:13 Nils André <[email protected] <javascript:>>
> napísal(a):
>
>> I'm trying to extract paragraphs from an image so I tried
>> GetComponentImages using tesseract::RIL_PARA but I just get the whole image.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/f6ca7c70-17c0-4ba3-aeba-e4a508e88fde%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/tesseract-ocr/f6ca7c70-17c0-4ba3-aeba-e4a508e88fde%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/379b1861-35f0-4a86-8b97-93bcb8230468%40googlegroups.com.