HTMLDocument and Xpath
Hi, I want to use xpath to scrape info from a website using pyXML but I keep getting no results. For example, in the following, I want to return the text "Element1" I can't get xpath to return anything at all. What's wrong with this code? from xml.dom.ext.reader import HtmlLib from xml.xpath import Evaluate reader = HtmlLib.Reader() doc_node = reader.fromString(""" Python Programming Language element1 """) test = Evaluate('td', doc_node.documentElement) print "test =", test All I get is an empty list for output. Thx in advance Shawn -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLDocument and Xpath
Alan Kennedy wrote: > [EMAIL PROTECTED] > > Hi, I want to use xpath to scrape info from a website using pyXML but I > > keep getting no results. > > > > For example, in the following, I want to return the text "Element1" I > > can't get xpath to return anything at all. What's wrong with this > > code? > > Your xpath expression is wrong. > > > test = Evaluate('td', doc_node.documentElement) > > Try one of the following alternatives, all of which should work. > > test = Evaluate('//td', doc_node.documentElement) > test = Evaluate('/html/body/table/tr/td', doc_node.documentElement) > test = Evaluate('/html/body/table/tr/td[1]', doc_node.documentElement) > > HTH, > > Alan. I tried all of those and in every case, test returns "[]". Does Evaluate only work with XML documents? Shawn -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLDocument and Xpath
Got the answer - there's a bug in xpath. I think the HTML parser converts all the tags (but not the attributes) to uppercase. Xpath definitely does not like my first string but, these work fine: test = Evaluate('//TD', doc_node.documentElement) test = Evaluate('/HTML/BODY/TABLE/TR/TD', doc_node.documentElement) test = Evaluate('/HTML/BODY/TABLE/TR/TD[1]', doc_node.documentElement) Shawn -- http://mail.python.org/mailman/listinfo/python-list