Hi All, Currently am doing OCR line by line and getting words details from ResultIterator like below
tessAPI->SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_LINE); tessAPI->SetRectangle(iXmin, iYmin, iW, iH); //these line boxes are being calculated by our pre-processing and segmentation code) tessAPI->Recognize(nullptr); tesseract::ResultIterator* rst_iter = tessAPI->GetIterator(); tesseract::PageIteratorLevel level = tesseract::RIL_WORD; if (nullptr != rst_iter) { do { const char* text = rst_iter->GetUTF8Text(level); rst_iter->WordFontAttributes(&is_bold, &is_italic, &is_underlined, &is_monospace, &is_serif, &is_smallcaps, &pointsize, &font_id); //here I want to get the line & para of the current word belongs to from tess API } while (rst_iter->Next(level)); } I can get paras/lines/words using tessAPI->GetComponentImages() function, but for words only can get block/paras only. Somehow I am mapping those words with lines, but still getting some garbage. Is there any way to get the line & para of the current word belongs to? Thanks in advance, Lakshman. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c3e96d5a-0260-4f8b-9269-829128052b96n%40googlegroups.com.