Hi, I'm trying to parse https://matchup.io/players/rocteur/friends
The body source I'm interested in contains blocks exactly like this <tr class='friend'> <td class='text--left'> <a href="/players/mizucci0"><img alt="mizucci0" class="media__avatar" src="https://matchup-io.s3.amazonaws.com/uploads/player/avatar/7651/7651_profile_150_square.jpeg" /> <div class='friend__info'> <span>mizucci0</span> <span>Mizuho</span> </div> </a></td> <td class='delta-alt'> 29,646 <br> steps </td> <td class='delta-alt'> 35,315 <br> steps </td> <td class='delta-alt'> 818.7 <br> Miles </td> </tr> I wanted to do it Python as I'm learning and I looked at the different modules but it isn't easy for me to work out the best way to do this as most tutorials I see use complicated classes and I just want to parse this one paragraph at a time (as I would do in Perl) and print 1 mizuho 26648 35315 2 xxxxxx 99999 99999 3 xxxxxx 99999 99999 etc. (in the above case I'm ignoring 818.7 and Miles. The best way I found so far is this: from lxml import html import requests page = requests.get("https://matchup.io/players/rocteur/friends/week/") tree = html.fromstring(page.text) a = tree.xpath('//span/text()') b = tree.xpath('//td/text()') And the manipulating indices e.g. print "%s %s %s %s" % (a[usern], a[users], b[tots], b[weekb]) tots += 4 weekb += 4 usern += 2 users += 2 But it isn't very scientific ;-) Which module would you use and how would you suggest is the best way to do it ? Thanks very much in advance, I haven't done a lot of HTML parsing.. I would much prefer using WebServices and an API but unfortunately they don't have it. -- Jerry Rocteur -- https://mail.python.org/mailman/listinfo/python-list