On Tue, 12 Aug 2014 13:00:30 -0700, Simon Evans wrote: > in accessing from the 'Racing Post' on a daily basis. Anyhow, the code
Following is some starter code. You will have to look at the output, compare it to the web page, and work out how you want to process it further. Note that I use beautifulsoup and requests. The output is the html for each cell in the table with a line of "+" characters at the table row breaks. I suggest you look at the beautifulsoup documentation at http://www.crummy.com/software/BeautifulSoup/bs4/doc/ to work out how you may wish to select which table cells contain data you are interested in and how to extract it. #!/usr/bin/python """ Program to extract data from racingpost. """ from bs4 import BeautifulSoup import requests r = requests.get( "http://www.racingpost.com/horses2/cards/card.sd? race_id=607466&r_date=2014-08-13#raceTabs=sc_" ) if r.status_code == 200: soup = BeautifulSoup( r.content ) table = soup.find( "table", id="sc_horseCard" ) for row in table.find_all( "tr" ): for cell in row.find_all( "td" ): print cell print "+++++++++++++++++++++++++++++++++++++" else: print "HTTP Status", r.status_code -- Denis McMahon, denismfmcma...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list