from:"swilson"

HTMLDocument and Xpath

2006-02-02 Thread swilson

Hi, I want to use xpath to scrape info from a website using pyXML but I
keep getting no results.

For example, in the following, I want to return the text "Element1" I
can't get xpath to return anything at all.  What's wrong with this
code?


from xml.dom.ext.reader import HtmlLib
from xml.xpath import Evaluate

reader = HtmlLib.Reader()
doc_node = reader.fromString("""


Python Programming Language


element1


""")

test = Evaluate('td', doc_node.documentElement)
print "test =", test


All I get is an empty list for output.

Thx in advance

Shawn

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLDocument and Xpath

2006-02-03 Thread swilson


Alan Kennedy wrote:
> [EMAIL PROTECTED]
> > Hi, I want to use xpath to scrape info from a website using pyXML but I
> > keep getting no results.
> >
> > For example, in the following, I want to return the text "Element1" I
> > can't get xpath to return anything at all.  What's wrong with this
> > code?
>
> Your xpath expression is wrong.
>
> > test = Evaluate('td', doc_node.documentElement)
>
> Try one of the following alternatives, all of which should work.
>
> test = Evaluate('//td', doc_node.documentElement)
> test = Evaluate('/html/body/table/tr/td', doc_node.documentElement)
> test = Evaluate('/html/body/table/tr/td[1]', doc_node.documentElement)
>
> HTH,
>
> Alan.

I tried all of those and in every case, test returns "[]".  Does
Evaluate only work with XML documents?

Shawn

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLDocument and Xpath

2006-02-07 Thread swilson

Got the answer - there's a bug in xpath.  I think the HTML parser
converts all the tags (but not the attributes) to uppercase.  Xpath
definitely does not like my first string but, these work fine:

test = Evaluate('//TD', doc_node.documentElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD', doc_node.documentElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD[1]', doc_node.documentElement)

Shawn

-- 
http://mail.python.org/mailman/listinfo/python-list

HTMLDocument and Xpath

Re: HTMLDocument and Xpath

Re: HTMLDocument and Xpath

3 matches

Site Navigation

Mail list logo

Footer information