beautiful soup get class info
I am using beautifulsoup to get the title and date of the website. title is working fine but I am not able to pull the date. Here is the code in the url: October 22, 2011 In Python, I am using the following code: date1 = soup.span.text data=soup.find_all(date="value") Results in: [] March 5, 2014 What is the proper way to get this info? Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: beautiful soup get class info
On Thursday, March 6, 2014 2:58:12 PM UTC-6, John Gordon wrote: > In teddy writes: > > > > > October 22, 2011 > > > > > date1 = soup.span.text > > > data=soup.find_all(date="value") > > > > Try this: > > > > soup.find_all(name="span", class="date") > > > > -- > > John Gordon Imagine what it must be like for a real medical doctor to > > watch 'House', or a real serial killer to watch 'Dexter'. I have python 2.7.2 and it does not like class in the code you provided. Now when I take out [ class="date"], this is returned: [March 5, 2014, March 5, 2014] This is the code I am using: "data = soup.find_all(name="span") print (data)" 1. it returns today's date instead of the actual date 2. returns it twice -- https://mail.python.org/mailman/listinfo/python-list
Re: beautiful soup get class info
On Thursday, March 6, 2014 4:28:06 PM UTC-6, John Gordon wrote: > In writes: > > > > > > soup.find_all(name="span", class="date") > > > > > I have python 2.7.2 and it does not like class in the code you provided. > > > > Oh right, 'class' is a reserved word. I imagine beautifulsoup has > > a workaround for that. > > > > > Now when I take out [ class="date"], this is returned: > > >[March 5, 2014, March 5, > > 2014] > > > > > > This is the code I am using: "data = soup.find_all(name="span") > > > print (data)" > > > 1. it returns today's date instead of the actual date > > > 2. returns it twice > > > > Are there two occurrences of 'March 5, 2014' > > in the HTML? If so, then beautifulsoup is doing its job correctly. > > > > It might help if you posted the sample HTML data you're working with. > > > > -- > > John Gordon Imagine what it must be like for a real medical doctor to > >watch 'House', or a real serial killer to watch 'Dexter'. ok I got this working. now to the next problem thanks. -- https://mail.python.org/mailman/listinfo/python-list
extract from json
I can't find any example on how to do this. I have a json file like so: {"bostock":[{"url":"http://bl.ocks.org/mbostock/9360565","title":"titleplaceholder","date":"dateplaceholder"}, {"url":"http://bl.ocks.org/mbostock/9265674","title":"titleplaceholder","date":"dateplaceholder"}, {"url":"http://bl.ocks.org/mbostock/9265467","title":"titleplaceholder","date":"dateplaceholder"}, {"url":"http://bl.ocks.org/mbostock/9234731","title":"titleplaceholder","date":"dateplaceholder"}, {"url":"http://bl.ocks.org/mbostock/9232962","title":"titleplaceholder","date":"dateplaceholder"}, this goes on for more than 700 entries. only thing unique is the number at the end of the url. I am going to load the url in python, get the date and title and write it in the json itself. Right now I am stuck on just reading the url in the json. Here is my code: import json with open("bostock.json") as json_file: json_data = json.load(json_file) print(json_data) I have tried json_data[0], json_data.url and a few others I forget right now and it does not seem to work. I have already figured out how to get the title and date. First things first: How can i just get the url for each line of the above json file? -- https://mail.python.org/mailman/listinfo/python-list
Re: extract from json
On Friday, March 7, 2014 3:05:15 PM UTC-6, Kev Dwyer wrote: > wrote: > > I can't find any example on how to do this. > > > I have a json file like so: > > > {"bostock":[{"url":"http://bl.ocks.org/mbostock/9360565","title":"titleplaceholder","date":"dateplaceholder"},{"url":"http://bl.ocks.org/mbostock/9265674","title":"titleplaceholder","date":"dateplaceholder"},{"url":"http://bl.ocks.org/mbostock/9265467","title":"titleplaceholder","date":"dateplaceholder"},{"url":"http://bl.ocks.org/mbostock/9234731","title":"titleplaceholder","date":"dateplaceholder"},{"url":"http://bl.ocks.org/mbostock/9232962","title":"titleplaceholder","date":"dateplaceholder"}, > > this goes on for more than 700 entries. only thing unique is the number at > > > the end of the url. I am going to load the url in python, get the date and > > > title and write it in the json itself. Right now I am stuck on just > > > reading the url in the json. Here is my code: > > import json > > with open("bostock.json") as json_file: > > > json_data = json.load(json_file) > > > print(json_data) > > I have tried json_data[0], json_data.url and a few others I forget right > > > now and it does not seem to work. > > I have already figured out how to get the title and date. > > > First things first: How can i just get the url for each line of the above > > > json file? > Hello > Try: > > Python 2.7.2 (default, Aug 19 2011, 20:41:43) [GCC] on linux2 > > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import json > > >>> with open('/tmp/bostock.json') as f: > > ... json_data = json.load(f) > >>> json_data > {u'bostock': [{u'url': u'http://bl.ocks.org/mbostock/9360565', u'date': > u'dateplaceholder', u'title': u'titleplaceholder'}, {u'url': > u'http://bl.ocks.org/mbostock/9265674', u'date': u'dateplaceholder', > u'title': u'titleplaceholder'}, {u'url': > u'http://bl.ocks.org/mbostock/9265467', u'date': u'dateplaceholder', > > u'title': u'titleplaceholder'}, {u'url': > > u'http://bl.ocks.org/mbostock/9234731', u'date': u'dateplaceholder', > u'title': u'titleplaceholder'}, {u'url': > u'http://bl.ocks.org/mbostock/9232962', u'date': u'dateplaceholder', > u'title': u'titleplaceholder'}]} > >>> urls = [x['url'] for x in json_data['bostock']] > > >>> urls > > [u'http://bl.ocks.org/mbostock/9360565', > > u'http://bl.ocks.org/mbostock/9265674', > > u'http://bl.ocks.org/mbostock/9265467', > > u'http://bl.ocks.org/mbostock/9234731', > u'http://bl.ocks.org/mbostock/9232962'] > Python loads the json in the file into a dictionary. In this case, the > dictionary has a single key, 'bostock', and the value in the dictionary for > that key is a list (of dictionaries). > To get the urls, you need to get the list > json_data['bostock'] > and then iterate over it's elements, getting the value for the key url for > each one. > This is what the list comprehension > [x['url'] for x in json_data['bostock']] > does. > I hope that helps, > Kev Kev your the man. Thanks -- https://mail.python.org/mailman/listinfo/python-list
help with for loop----python 2.7.2
I am trying to get all the element data from the rss below. The only thing I am pulling is the first element. I don't understand why the for loop does not go through the entire rss. Here is my code try: from urllib2 import urlopen except ImportError: from urllib.request import urlopen from bs4 import BeautifulSoup soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss')) #print soup.find_all('item') #print (soup) for item in soup.find_all('item'): #for item in soup: title = soup.find('title').text link = soup.find('link').text item = soup.find('item').text print item print title print link -- https://mail.python.org/mailman/listinfo/python-list