alias, 16.09.2011 08:39:
code1: import lxml.html import urllib down='http://finance.yahoo.com/q/op?s=C+Options' content=urllib.urlopen(down).read() root=lxml.html.document_fromstring(content)
I see this quite often, but many people don't know that this can be simplified to
import lxml.html url = 'http://finance.yahoo.com/q/op?s=C+Options' root = lxml.html.parse(url).getroot() which is less code, but substantially more efficient.
table = root.xpath("//table[@class='yfnc_mod_table_title1']")[0] tds=table.xpath("tr[@valign='top']//td") for td in tds: print td.text_content() what i get is : Call Options Expire at close Friday, September 16, 2011 these are waht i want. code2 import lxml.html import urllib down='http://finance.yahoo.com/q/op?s=C+Options' content=urllib.urlopen(down).read() root=lxml.html.document_fromstring(content) table = root.xpath("//table[@class='yfnc_mod_table_title1']")[0] tds=table.xpath("//tr[@valign='top']//td")
Here, you are looking for all "tr" tags in the table recursively, instead of taking just the ones that are direct children of the "table" tag.
That's what "//" is there for, it's a recursive subtree selector. You might want to read up on XPath expressions.
what i get is : N/A N/A 2 114 48.00 C110917P00048000 16.75 0.00 N/A N/A 0 23 50.00 C110917P00050000 23.16 0.00 N/A N/A 115 2,411 Highlighted options are in-the-money.
I don't see any highlighting in your text above, and I don't know what you mean by "in-the-money".
Stefan -- http://mail.python.org/mailman/listinfo/python-list