Re: Suitable Python code to scrape specific details from web pages.

Simon Evans Tue, 12 Aug 2014 15:47:32 -0700
On Tuesday, August 12, 2014 9:00:30 PM UTC+1, Simon Evans wrote:
> Dear Programmers,
> 
> I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves. 
> I have tried a few of his python programs in the Python27 command prompt, but 
> altered them from accessing data using links say from the Dow Jones index, to 
> accessing the details I would be interested in accessing from the 'Racing 
> Post' on a daily basis. Anyhow, the code it returns is not in the example I 
> am going to give, is not the information I am seeking, instead of returning 
> the given odds on a horse, it only returns a [], which isn't much use. 
> 
> I would be glad if you could tell me where I am going wrong. 
> 
> Yours faithfully
> 
> Simon Evans.
> 
> --------------------------------------------------------------------------------
> 
> >>>import urllib
> 
> >>>import re
> 
> >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?
> 
> 
> 
> race_id=600048r_date=2014-05-08#raceTabs=sc_")
> 
> htmltext = htmlfile.read()
> 
> regex = '<strong>1<a href="http://www.racingpost.com/horses/horse_home.sd?
> 
> 
> 
> horse_id=758752"onclick="scorecards.send(&quot;horse_name&quot:):return 
> Html.popup(this,
> 
> 
> 
> {width:695,height:800})"title="Full details about this HORSE">Lively 
> 
> 
> 
> Baron</a>9/4F</strong><br/>'
> 
> >>>pattern = re.compile(regex)
> 
> >>>odds=re.findall(pattern,htmltext)
> 
> >>>print odds
> 
> []
> 
> >>>
> 
> --------------------------------------------------------------------------------
> 
> >>>import urllib
> 
> >>>import re
> 
> >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?
> 
> 
> 
> >>>race_id=600048r_date=2014-05-08#raceTabs=sc_")
> 
> >>>htmltext = htmlfile.read()
> 
> >>>regex = '<a></a>'
> 
> >>>pattern = re.compile(regex)
> 
> >>>odds=re.findall(pattern,htmltext)
> 
> >>>print odds
> 
> []
> 
> >>>
> 
> -------------------------------------------------------------------------------
Dear Programmers, Thank you for your responses. I have installed 'Beautiful 
Soup' and I have the 'Getting Started in Beautiful Soup' book, but can't seem 
to make  any progress with it, I am too thick to make much use of it. I was 
hoping I could scrape specified stuff off Web pages without using it. I have 
installed 'Requests' also, is there any code I can use that you can suggest 
that can access the sort of Web page values that I have referred to ?  such as 
odds, names of runners, stuff like that off the 'inspect element' or 'source' 
htaml pages, on www.Racingpost.com. 
-- 
https://mail.python.org/mailman/listinfo/python-list
Re: Suitable Python code to scrape specific details from web pages.

Reply via email to