python <mailtoman...@163.com> writes: > import urllib > import lxml.html > down='http://v.163.com/special/visualizingdata/' > file=urllib.urlopen(down).read() > root=lxml.html.document_fromstring(file) > urllist=root.xpath('//div[@class="down s-fc3 f-fl"]//a') > for url in urllist: > print url.get("href") > > i get the output , > http://mov.bn.netease.com/movieMP4/2012/12/A/7/S8H1TH9A7.mp4 > http://mov.bn.netease.com/movieMP4/2012/12/D/9/S8H1ULCD9.mp4 > http://mov.bn.netease.com/movieMP4/2012/12/4/P/S8H1UUH4P.mp4 > http://mov.bn.netease.com/movieMP4/2012/12/B/V/S8H1V8RBV.mp4 > http://mov.bn.netease.com/movieMP4/2012/12/6/E/S8H1VIF6E.mp4 > http://mov.bn.netease.com/movieMP4/2012/12/B/G/S8H1VQ2BG.mp4 > > when i change > > xpath('//div[@class="down s-fc3 f-fl"]//a') > > into > > xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a') > > that is to say , > > urllist=root.xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 > f-fl"]//a') > > why i can't get nothing?
There is only one <div class="col f-cb"> in the document and that div contains only a single <div class="down s-fc3 f-fl"> but the latter does not contain any <a>. The URLs that you get in the first code are not contained in a <div class="col f-cb">. They are contained in a <div class="m-tdli">, however. So xpath('//div[@class="m-tdli"]//div[@class="down s-fc3 f-fl"]//a') works. -- Piet van Oostrum <p...@vanoostrum.org> WWW: http://pietvanoostrum.com/ PGP key: [8DAE142BE17999C4] -- http://mail.python.org/mailman/listinfo/python-list