Depending on your skills:
a) You can analyze space between boxes to identify words (if you want to
use box file)
b) You can parse tesseract hocr output (if you have no clue what is
hocr, search in this forum)
c) You can use C++/C API of tesseract to create your own output - have a
look at hocr implementation.
--
Zdenko
On 04.03.2013 12:21, SUBHADIP SINHA wrote:
Please help me if anybody know the solution !!!
THANK YOU ..
On Sunday, March 3, 2013 12:32:32 PM UTC+5:30, SUBHADIP SINHA wrote:
Hi,ALL
I finally got the .box file with all characters coordinate from .png
file,Now i want to group the charecters from the .box file with words
and need the words coordinates.
I am using tesseract 3.02 with windows machine .
i run tesseract image.png image batch.nochop makebox command on
image.png file and
below result i got in box ,
t 45 16 90 91 0
h 94 16 151 102 0
e 155 16 211 79 0
l 208 16 238 102 0
o 241 16 304 79 0
n 308 16 366 79 0
d 369 16 430 102 0
o 433 16 496 79 0
n 500 16 557 79 0
from the above box file i want to find the coordinates of two words
which are the,london.
i have not configure any other files in tesseract folder,
please help me with steps need to run following the above steps i done.
THANK YOU.
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.