Hi ,   
New  to this program.. not  sure how  and where to start  to fix.. 
i have  a image attached   that is used for testing Tesseract  and H-ocr  
file  for trace on missing char ; can  someone interpret   and guide me to 
the fix.  

TIA,
Ravi Kumar. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cf266779-e08c-4d8c-b970-738d2ad48084n%40googlegroups.com.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
 <head>
  <title></title>
  <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
  <meta name='ocr-system' content='tesseract v5.3.0.20221222' />
  <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word ocrp_wconf'/>
 </head>
 <body>
  <div class='ocr_page' id='page_1' title='image "E:\@@@@0000\IEA_TEL\untitled1.jpg"; bbox 0 0 296 264; ppageno 0; scan_res 96 96'>
   <div class='ocr_carea' id='block_1_1' title="bbox 26 7 131 239">
    <p class='ocr_par' id='par_1_1' lang='eng' title="bbox 28 7 59 24">
     <span class='ocr_line' id='line_1_1' title="bbox 28 7 59 24; baseline 0 -7; x_size 13.255814; x_descenders 3.2558138; x_ascenders 3.2558138">
      <span class='ocrx_word' id='word_1_1' title='bbox 28 7 59 24; x_wconf 68'>
       <span class='ocrx_cinfo' title='x_bboxes 28 10 35 24; x_conf 98.651524'>(</span>
       <span class='ocrx_cinfo' title='x_bboxes 35 7 43 17; x_conf 98.812364'>3</span>
       <span class='ocrx_cinfo' title='x_bboxes 44 7 59 24; x_conf 95.516244'>%</span>
      </span>
     </span>
    </p>

    <p class='ocr_par' id='par_1_2' lang='tel' title="bbox 26 33 131 239">
     <span class='ocr_line' id='line_1_2' title="bbox 32 33 73 47; baseline 0 0; x_size 24.758621; x_descenders 4; x_ascenders 6.7586203">
      <span class='ocrx_word' id='word_1_2' title='bbox 32 36 42 43; x_wconf 96'>
       <span class='ocrx_cinfo' title='x_bboxes 32 36 42 43; x_conf 99.501708'>ఆ</span>
      </span>
      <span class='ocrx_word' id='word_1_3' title='bbox 51 33 73 47; x_wconf 81'>
       <span class='ocrx_cinfo' title='x_bboxes 51 33 56 47; x_conf 97.423563'>ప</span>
       <span class='ocrx_cinfo' title='x_bboxes 51 33 73 47; x_conf 98.206584'>్ర</span>
       <span class='ocrx_cinfo' title='x_bboxes 64 33 68 47; x_conf 99.57299'>త</span>
       <span class='ocrx_cinfo' title='x_bboxes 68 33 73 47; x_conf 99.524428'>ి</span>
      </span>
     </span>
     <span class='ocr_line' id='line_1_3' title="bbox 32 53 126 73; baseline -0.011 -8; x_size 22.799999; x_descenders 5.5999999; x_ascenders 5.5999999">
      <span class='ocrx_word' id='word_1_4' title='bbox 32 53 126 73; x_wconf 68'>
       <span class='ocrx_cinfo' title='x_bboxes 32 57 38 73; x_conf 95.53849'>(</span>
       <span class='ocrx_cinfo' title='x_bboxes 39 56 53 65; x_conf 98.308647'>వ</span>
       <span class='ocrx_cinfo' title='x_bboxes 48 53 57 73; x_conf 98.912359'>ా</span>
       <span class='ocrx_cinfo' title='x_bboxes 54 55 64 65; x_conf 99.029296'>త</span>
       <span class='ocrx_cinfo' title='x_bboxes 67 53 78 73; x_conf 98.989148'>మ</span>
       <span class='ocrx_cinfo' title='x_bboxes 65 54 92 65; x_conf 98.943636'>ూ</span>
       <span class='ocrx_cinfo' title='x_bboxes 93 56 102 64; x_conf 99.02871'>ల</span>
       <span class='ocrx_cinfo' title='x_bboxes 103 54 110 64; x_conf 99.015533'>క</span>
       <span class='ocrx_cinfo' title='x_bboxes 108 53 117 73; x_conf 99.011991'>మ</span>
       <span class='ocrx_cinfo' title='x_bboxes 111 53 126 64; x_conf 98.967928'>ె</span>
      </span>
     </span>
     <span class='ocr_line' id='line_1_4' title="bbox 32 83 82 102; baseline 0 -8; x_size 22.799999; x_descenders 5.5999999; x_ascenders 5.5999999">
      <span class='ocrx_word' id='word_1_5' title='bbox 32 83 82 102; x_wconf 48' lang='eng'>
       <span class='ocrx_cinfo' title='x_bboxes 32 83 43 94; x_conf 96.726459'>E</span>
       <span class='ocrx_cinfo' title='x_bboxes 44 83 82 102; x_conf 92.649587'>T</span>
      </span>
     </span>
     <span class='ocr_line' id='line_1_5' title="bbox 35 115 88 134; baseline -0.019 -7; x_size 22.799999; x_descenders 5.5999999; x_ascenders 5.5999999">
      <span class='ocrx_word' id='word_1_6' title='bbox 35 115 88 134; x_wconf 37'>
       <span class='ocrx_cinfo' title='x_bboxes 35 116 48 127; x_conf 99.571671'>మ</span>
       <span class='ocrx_cinfo' title='x_bboxes 49 118 63 134; x_conf 91.091765'>ు</span>
       <span class='ocrx_cinfo' title='x_bboxes 59 115 67 134; x_conf 99.562924'>ద</span>
       <span class='ocrx_cinfo' title='x_bboxes 64 115 73 126; x_conf 99.571462'>ి</span>
       <span class='ocrx_cinfo' title='x_bboxes 74 115 84 126; x_conf 99.535323'>త</span>
       <span class='ocrx_cinfo' title='x_bboxes 85 123 88 129; x_conf 99.552063'>,</span>
      </span>
     </span>
     <span class='ocr_line' id='line_1_6' title="bbox 32 141 109 158; baseline -0.013 -5; x_size 22.799999; x_descenders 5.5999999; x_ascenders 5.5999999">
      <span class='ocrx_word' id='word_1_7' title='bbox 32 142 80 158; x_wconf 12'>
       <span class='ocrx_cinfo' title='x_bboxes 32 142 38 158; x_conf 98.87298'>శ</span>
       <span class='ocrx_cinfo' title='x_bboxes 37 142 44 158; x_conf 99.000684'>ి</span>
       <span class='ocrx_cinfo' title='x_bboxes 32 142 54 153; x_conf 99.0245'>ల</span>
       <span class='ocrx_cinfo' title='x_bboxes 51 142 58 158; x_conf 98.408999'>ా</span>
       <span class='ocrx_cinfo' title='x_bboxes 55 142 63 153; x_conf 99.038249'>మ</span>
       <span class='ocrx_cinfo' title='x_bboxes 64 144 75 153; x_conf 87.548984'>ు</span>
       <span class='ocrx_cinfo' title='x_bboxes 76 145 80 158; x_conf 95.454409'>ు</span>
      </span>
      <span class='ocrx_word' id='word_1_8' title='bbox 84 141 109 155; x_wconf 82'>
       <span class='ocrx_cinfo' title='x_bboxes 84 141 93 152; x_conf 98.986838'>ద</span>
       <span class='ocrx_cinfo' title='x_bboxes 91 141 98 155; x_conf 99.0367'>ి</span>
       <span class='ocrx_cinfo' title='x_bboxes 94 141 104 152; x_conf 98.984118'>త</span>
       <span class='ocrx_cinfo' title='x_bboxes 106 149 109 155; x_conf 98.914753'>,</span>
      </span>
     </span>
     <span class='ocr_line' id='line_1_7' title="bbox 34 164 93 186; baseline 0 -8; x_size 19.310345; x_descenders 5.3103447; x_ascenders 3">
      <span class='ocrx_word' id='word_1_9' title='bbox 34 164 93 186; x_wconf 40'>
       <span class='ocrx_cinfo' title='x_bboxes 34 164 44 178; x_conf 98.449933'>త</span>
       <span class='ocrx_cinfo' title='x_bboxes 41 164 47 186; x_conf 91.569757'>ీ</span>
       <span class='ocrx_cinfo' title='x_bboxes 45 171 53 186; x_conf 91.914705'>ీ</span>
       <span class='ocrx_cinfo' title='x_bboxes 53 168 62 178; x_conf 98.620529'>వ</span>
       <span class='ocrx_cinfo' title='x_bboxes 63 164 74 186; x_conf 99.026759'>మ</span>
       <span class='ocrx_cinfo' title='x_bboxes 63 167 83 185; x_conf 98.844275'>ై</span>
       <span class='ocrx_cinfo' title='x_bboxes 84 167 93 178; x_conf 98.731531'>న</span>
      </span>
     </span>
     <span class='ocr_line' id='line_1_8' title="bbox 31 189 116 211; baseline 0 -8; x_size 19.310345; x_descenders 5.3103447; x_ascenders 3">
      <span class='ocrx_word' id='word_1_10' title='bbox 31 189 116 211; x_wconf 68'>
       <span class='ocrx_cinfo' title='x_bboxes 31 196 39 211; x_conf 95.94613'>(</span>
       <span class='ocrx_cinfo' title='x_bboxes 39 192 48 203; x_conf 95.54386'>ప</span>
       <span class='ocrx_cinfo' title='x_bboxes 49 189 59 211; x_conf 97.902919'>క</span>
       <span class='ocrx_cinfo' title='x_bboxes 58 189 67 211; x_conf 98.749993'>ో</span>
       <span class='ocrx_cinfo' title='x_bboxes 66 189 76 211; x_conf 98.982082'>ప</span>
       <span class='ocrx_cinfo' title='x_bboxes 75 189 84 211; x_conf 98.788698'>మ</span>
       <span class='ocrx_cinfo' title='x_bboxes 51 189 116 204; x_conf 99.042411'>ు</span>
       <span class='ocrx_cinfo' title='x_bboxes 92 189 101 211; x_conf 98.871723'>వ</span>
       <span class='ocrx_cinfo' title='x_bboxes 100 189 108 211; x_conf 99.037691'>క</span>
       <span class='ocrx_cinfo' title='x_bboxes 107 189 116 211; x_conf 99.033182'>ు</span>
      </span>
     </span>
     <span class='ocr_line' id='line_1_9' title="bbox 26 215 131 239; baseline 0.01 -1; x_size 31; x_descenders 8; x_ascenders 6">
      <span class='ocrx_word' id='word_1_11' title='bbox 26 215 54 239; x_wconf 94'>
       <span class='ocrx_cinfo' title='x_bboxes 26 215 34 238; x_conf 99.432977'>(</span>
       <span class='ocrx_cinfo' title='x_bboxes 36 217 44 232; x_conf 99.447969'>1</span>
       <span class='ocrx_cinfo' title='x_bboxes 47 216 54 239; x_conf 99.284047'>)</span>
      </span>
      <span class='ocrx_word' id='word_1_12' title='bbox 63 218 131 239; x_wconf 6'>
       <span class='ocrx_cinfo' title='x_bboxes 64 218 72 239; x_conf 97.829305'>న</span>
       <span class='ocrx_cinfo' title='x_bboxes 63 222 82 239; x_conf 99.573861'>్య</span>
       <span class='ocrx_cinfo' title='x_bboxes 79 218 90 239; x_conf 99.565983'>ా</span>
       <span class='ocrx_cinfo' title='x_bboxes 89 218 97 239; x_conf 99.570838'>య</span>
       <span class='ocrx_cinfo' title='x_bboxes 83 218 122 239; x_conf 99.525794'>స</span>
       <span class='ocrx_cinfo' title='x_bboxes 109 218 115 239; x_conf 92.151932'>్థ</span>
       <span class='ocrx_cinfo' title='x_bboxes 114 218 125 239; x_conf 99.482669'>ా</span>
       <span class='ocrx_cinfo' title='x_bboxes 122 220 131 231; x_conf 86.597842'>న</span>
      </span>
     </span>
    </p>
   </div>
  </div>
 </body>
</html>

Reply via email to