Re: Tesseract ocr to XML with text positions (X and Y)

Benito2313 Mon, 03 Dec 2012 01:55:30 -0800

Op maandag 3 december 2012 09:22:13 UTC+1 schreef Nick White het volgende:
>
> On Sun, Dec 02, 2012 at 08:04:50AM -0800, Benito2313 wrote: 
> > I got a HTML output, its getting there. But is it possible to get the 
> hocr to 
> > give an XML output? 
>
> What is it that you're trying to do? HTML is an XML dialect, after 
> all (or can be, if XHTML). You should be able to parse it with all 
> XML tools. 
>
> The only way to get a different XML representation would be to 
> either delve into the API, or convert the hOCR to something more to 
> your liking. But hOCR is *the* XMLish OCR output standard; I don't 
> see why you'd want anything else. 
>
> Nick 
>
 
My program handles with Xml's.
I can see the script code of the HTML when i open it noteblock. how can i 
see if it is XHTML?
 
Thank you in advance.
 
Benito2313


-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Tesseract ocr to XML with text positions (X and Y)

Reply via email to