Hi, Anyone has update on this?
Thanks On Monday, November 23, 2020 at 6:31:31 PM UTC+5:30 mit wrote: > I am trying to get character level details of a file using > hocr_char_boxes=1 option. > But the output it generates seems to be overlapping between the characters. > > <div class='ocr_page' id='page_1' title='image "file-0.png"; bbox 0 0 > 1653 2336; ppageno 0'> > <div class='ocr_carea' id='block_1_1' title="bbox 111 203 930 219"> > <p class='ocr_par' id='par_1_1' lang='eng' title="bbox 111 203 930 > 219"> > <span class='ocr_line' id='line_1_1' title="bbox 111 203 930 219; > baseline -0.001 -3; x_size 20; x_descenders 5; x_ascenders 5"> > <span class='ocrx_word' id='word_1_1' title='bbox 111 204 135 216; > x_wconf 96'> > <span class='ocrx_cinfo' title='x_bboxes 111 204 117 216; > x_conf 99.56546'>S</span> > <span class='ocrx_cinfo' title='x_bboxes 111 204 119 216; > x_conf 99.574463'>e</span> > <span class='ocrx_cinfo' title='x_bboxes 120 207 135 216; > x_conf 99.543205'>e</span></span> > > How can two characters have the same starting point(For S: 'x_bboxes 111 > 204 117 216 and for e: x_bboxes 111 204 119 216 ) > > Tesseract details: > > tesseract v4.1.0-elag2019 > leptonica-1.78.0 > libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : > libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 > Found AVX2 > Found AVX > Found SSE > > Attached the image file. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6fd2cbe1-3fd3-4915-a383-63cecb110263n%40googlegroups.com.