Hi. The webpage you need to parse is not very wellformed (I think), but no problem. perhaps the best option is to locate the portion of HTML yo want, in this case from "<h3 class="cardsect">Actual Pitching Statistics </h3><pre>" to "</pre>". Between this you have a few entries like this one: " 19 <a href=http://www.baseballprospectus.com/dt//1914BOS-A.shtml>1914 BOS-A</a> 2 1 0 3.91 4 3 96 23.0 21 12 10 1 7 3 0 0 0 0 1 0".
I'll put here a little portion of code using RE that I think will help you to develop the rest of the app. import re data=" 19 <a href=http://www.baseballprospectus.com/dt//1914BOS-A.shtml>1914 BOS-A</a> 2 1 0 3.91 4 3 96 23.0 21 12 10 1 7 3 0 0 0 0 1 0" pt=re.compile("(<a.*?>|</a>)") # this and the next line delete the html tags data1=pt.sub("",data) # Now data1 doesn't contain any html tag pt=re.compile(" +") # This sentence and te next will substitute spaces by "-" data2=pt.sub("-",data1) arrange_data=data2.aplit("-") # this make a list with data after this few sentences you'll have a list with the data you need, like the next: ['', '19', '1914', 'BOS', 'A', '2', '1', '0', '3.91', '4', '3', '96', '23.0', '21', '12', '10', '1', '7', '3', '0', '0', '0', '0', '1', '0'] I think is a good init for you. Tell me if you can resolve the the problem with this or if you need more help. Bye -- http://mail.python.org/mailman/listinfo/python-list