Hi Paul... Thanks for the reply. Came to the same conclusion a few minutes before I saw your email.
Another question: tr=d.xpath(foo) gets me an array of nodes. is there a way for me to then iterate through the node tr[x] to see if a child node exists??? "d" is a document object, while "tr" would be a node object?, or would i convert the "tr[x]" to a string, and then feed that into the libxml2dom.parseString()... thanks -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Paul Boddie Sent: Friday, June 13, 2008 12:49 PM To: python-list@python.org Subject: Re: python screen scraping/parsing On 13 Jun, 20:10, "bruce" <[EMAIL PROTECTED]> wrote: > > url ="http://www.pricegrabber.com/rating_summary.php/page=1" [...] > tr = > "/html/body/[EMAIL PROTECTED]'pgSiteContainer']/[EMAIL PROTECTED]'pgPageContent']/table[2]/tbo > dy/tr[4]" > > tr_=d.xpath(tr) [...] > my issue appears to be related to the last "tbody", or tbody/tr[4]... > > if i leave off the tbody, i can display data, as the tr_ is an array with > data... Yes, I can confirm this. > with the "tbody" it appears that the tr_ array is not defined, or it has no > data... however, i can use the DOM tool with firefox to observe the fact > that the "tbody" is there... Yes, but the DOM tool in Firefox probably inserts virtual nodes for its own purposes. Remember that it has to do a lot of other stuff like implement CSS rendering and DOM event models. You can confirm that there really is no tbody by printing the result of this... d.xpath("/html/body/[EMAIL PROTECTED]'pgSiteContainer']/ [EMAIL PROTECTED]'pgPageContent']/table[2]")[0].toString() This should fetch the second table in a single element list and then obviously give you the only element of that list. You'll see that the raw HTML doesn't have any tbody tags at all. Paul -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list