Use tsv output but you will still need to parse it to get line information.
Zdenko st 21. 4. 2021 o 16:38 Baris Unsal <[email protected]> napĂsal(a): > I want the opposite way. Getting ril_textline like output from passing > argument to tesseract. > > On Wednesday, 21 April 2021 at 17:36:35 UTC+3 Quan Nguyen wrote: > >> I think it would need to operate at RIL_SYMBOL level, not RIL_TEXTLINE. >> >> On Wednesday, April 21, 2021 at 7:17:04 AM UTC-5 [email protected] >> wrote: >> >>> Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it >>> outputs individual chars' location. But when I use api like this: >>> >>> ``` >>> Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL >>> ,NULL); >>> for(int i = 0; i < boxes->n; i++){ >>> BOX* box =boxaGetBox(boxes,i,L_CLONE); >>> api->SetRectangle(box->x,box->y,box->w,box->h); >>> char* outText = api->GetUTF8Text(); >>> int conf = api->MeanTextConf(); >>> fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: >>> %s", >>> i, box->x, box->y, box->w, box->h, conf, outText); >>> boxDestroy(&box); >>> delete[] outText; >>> } >>> ``` >>> it outputs whole line like this: >>> Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset >>> Fanket FIliskiler >>> >>> Is there any way to combine individual boxes to print like API? Thanks >>> in advance. >>> >>> >>> >>> >>> >>> >>> ############ >>> ### Environment >>> >>> * **Tesseract Version**: <!-- compulsory. you must provide your version >>> --> >>> tesseract 4.1.1-rc2-25-g9707 >>> leptonica-1.78.0 >>> libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : >>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 >>> Found AVX2 >>> Found AVX >>> Found FMA >>> Found SSE >>> Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 >>> liblz4/1.8.3 libzstd/1.3.8 >>> >>> * **Platform**: <!-- either `uname -a` output, or if Windows, version >>> and 32-bit or 64-bit --> >>> Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) >>> x86_64 GNU/Linux >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/8e54bc79-113a-4685-9bba-2353216dad2fn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/8e54bc79-113a-4685-9bba-2353216dad2fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wS8XdwKW1eG%2BBW2L2ieVMYt%2B4GjAP59tyf%2BQpcWVOkwA%40mail.gmail.com.

