On Mon, Dec 03, 2012 at 01:49:08AM -0800, Benito2313 wrote: > What is it that you're trying to do? HTML is an XML dialect, after > all (or can be, if XHTML). You should be able to parse it with all > XML tools. > > My program handles with Xml's. > I can see the script code of the HTML when i open it noteblock. how can i see > if it is XHTML?
I just checked the HTML output from Tesseract. It is XHTML, so it is a proper dialect of XML. You can tell from the <?xml opening tag, plus the doctype and xmlns on the following lines. Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

