here is my code: import urllib import lxml.html
down=" http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm " file=urllib.urlopen(down).read() root=lxml.html.document_fromstring(file) data1 = root.xpath('//tr[@class="tr_normal" and .//img]') print "the row which contains img :" for u in data1: print u.text_content() data2 = root.xpath('//tr[@class="tr_normal" and not(.//img)]') print "the row which do not contain img :" for u in data2: print u.text_content() the output is :(i omit many lines ) the row which contains img : 00329 the row which do not contain img : 00001长江实业1,000#HOF ................many lines omitted 00327百富环球1,000#H 00328ALCO HOLDINGS2,000# i wondered why there are so many lines i can't get such as : (you can see in the web http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm ) 00330思捷环球<http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/invest/company/profile_page_c.asp?WidCoID=00330&WidCoAbbName=&Month=&langcode=c> 100#HOF00331春天百货<http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/invest/company/profile_page_c.asp?WidCoID=00331&WidCoAbbName=&Month=&langcode=c> 2,000#H 00332NGAI LIK IND<http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/invest/company/profile_page_c.asp?WidCoID=00332&WidCoAbbName=&Month=&langcode=c> 4,000# ...................many lines ommitted i want to know how can i get these ??
-- http://mail.python.org/mailman/listinfo/python-list