beautiful soup get class info

2014-03-06 Thread teddybubu
I am using beautifulsoup to get the title and date of the website.
title is working fine but I am not able to pull the date. Here is the code in 
the url:

 October 22, 2011

In Python, I am using the following code:
date1 = soup.span.text
data=soup.find_all(date="value") 

Results in:

[]
March 5, 2014

What is the proper way to get this info?
Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: beautiful soup get class info

2014-03-06 Thread teddybubu
On Thursday, March 6, 2014 2:58:12 PM UTC-6, John Gordon wrote:
> In  teddy writes:
> 
> 
> 
> >  October 22, 2011
> 
> 
> 
> > date1 = soup.span.text
> 
> > data=soup.find_all(date="value") 
> 
> 
> 
> Try this:
> 
> 
> 
> soup.find_all(name="span", class="date")
> 
> 
> 
> -- 
> 
> John Gordon Imagine what it must be like for a real medical doctor to
> 
> watch 'House', or a real serial killer to watch 'Dexter'.

I have python 2.7.2 and it does not like class in the code you provided. Now 
when I take out [ class="date"], this is returned:
   [March 5, 2014, March 5, 
2014]
 
This is the code I am using: "data = soup.find_all(name="span") 
print (data)"
1. it returns today's date instead of the actual date
2. returns it twice
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: beautiful soup get class info

2014-03-06 Thread teddybubu
On Thursday, March 6, 2014 4:28:06 PM UTC-6, John Gordon wrote:
> In   writes:
> 
> 
> 
> > > soup.find_all(name="span", class="date")
> 
> 
> 
> > I have python 2.7.2 and it does not like class in the code you provided.
> 
> 
> 
> Oh right, 'class' is a reserved word.  I imagine beautifulsoup has
> 
> a workaround for that.
> 
> 
> 
> > Now when I take out [ class="date"], this is returned:
> 
> >[March 5, 2014, March 5, 
> > 2014]
> 
> >  
> 
> > This is the code I am using: "data = soup.find_all(name="span") 
> 
> > print (data)"
> 
> > 1. it returns today's date instead of the actual date
> 
> > 2. returns it twice
> 
> 
> 
> Are there two occurrences of 'March 5, 2014'
> 
> in the HTML?  If so, then beautifulsoup is doing its job correctly.
> 
> 
> 
> It might help if you posted the sample HTML data you're working with.
> 
> 
> 
> -- 
> 
> John Gordon Imagine what it must be like for a real medical doctor to
> 
>watch 'House', or a real serial killer to watch 'Dexter'.

ok I got this working. now to the next problem thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


extract from json

2014-03-07 Thread teddybubu
I can't find any example on how to do this.
I have a json file like so:
{"bostock":[{"url":"http://bl.ocks.org/mbostock/9360565","title":"titleplaceholder","date":"dateplaceholder"},
{"url":"http://bl.ocks.org/mbostock/9265674","title":"titleplaceholder","date":"dateplaceholder"},
{"url":"http://bl.ocks.org/mbostock/9265467","title":"titleplaceholder","date":"dateplaceholder"},
{"url":"http://bl.ocks.org/mbostock/9234731","title":"titleplaceholder","date":"dateplaceholder"},
{"url":"http://bl.ocks.org/mbostock/9232962","title":"titleplaceholder","date":"dateplaceholder"},

this goes on for more than 700 entries. only thing unique is the number at the 
end of the url. I am going to load the url in python, get the date and title 
and write it in the json itself. 
Right now I am stuck on just reading the url in the json. Here is my code:

import json

with open("bostock.json") as json_file:
json_data = json.load(json_file)
print(json_data)

I have tried json_data[0], json_data.url and a few others I forget right now 
and it does not seem to work.  

I have already figured out how to get the title and date.
First things first: How can i just get the url for each line of the above json 
file? 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: extract from json

2014-03-07 Thread teddybubu
On Friday, March 7, 2014 3:05:15 PM UTC-6, Kev Dwyer wrote:
>  wrote:
> > I can't find any example on how to do this.
> 
> > I have a json file like so:
> 
> > {"bostock":[{"url":"http://bl.ocks.org/mbostock/9360565","title":"titleplaceholder","date":"dateplaceholder"},{"url":"http://bl.ocks.org/mbostock/9265674","title":"titleplaceholder","date":"dateplaceholder"},{"url":"http://bl.ocks.org/mbostock/9265467","title":"titleplaceholder","date":"dateplaceholder"},{"url":"http://bl.ocks.org/mbostock/9234731","title":"titleplaceholder","date":"dateplaceholder"},{"url":"http://bl.ocks.org/mbostock/9232962","title":"titleplaceholder","date":"dateplaceholder"},
> > this goes on for more than 700 entries. only thing unique is the number at
> 
> > the end of the url. I am going to load the url in python, get the date and
> 
> > title and write it in the json itself. Right now I am stuck on just
> 
> > reading the url in the json. Here is my code:
> > import json
> > with open("bostock.json") as json_file:
> 
> > json_data = json.load(json_file)
> 
> > print(json_data)
> > I have tried json_data[0], json_data.url and a few others I forget right
> 
> > now and it does not seem to work.
> > I have already figured out how to get the title and date.
> 
> > First things first: How can i just get the url for each line of the above
> 
> > json file?
> Hello 
> Try:
> 
> Python 2.7.2 (default, Aug 19 2011, 20:41:43) [GCC] on linux2 
>   
> 
> Type "help", "copyright", "credits" or "license" for more information.
>   

> >>> import  json
> 
> >>> with open('/tmp/bostock.json') as f:
> 
> ... json_data = json.load(f)
> >>> json_data
> {u'bostock': [{u'url': u'http://bl.ocks.org/mbostock/9360565', u'date': 
> u'dateplaceholder', u'title': u'titleplaceholder'}, {u'url': 
> u'http://bl.ocks.org/mbostock/9265674', u'date': u'dateplaceholder', 
> u'title': u'titleplaceholder'}, {u'url': 
> u'http://bl.ocks.org/mbostock/9265467', u'date': u'dateplaceholder', 
> 
> u'title': u'titleplaceholder'}, {u'url': 
> 
> u'http://bl.ocks.org/mbostock/9234731', u'date': u'dateplaceholder', 
> u'title': u'titleplaceholder'}, {u'url': 
> u'http://bl.ocks.org/mbostock/9232962', u'date': u'dateplaceholder', 
> u'title': u'titleplaceholder'}]} 
> >>> urls = [x['url'] for x in json_data['bostock']]
> 
> >>> urls
> 
> [u'http://bl.ocks.org/mbostock/9360565', 
> 
> u'http://bl.ocks.org/mbostock/9265674', 
> 
> u'http://bl.ocks.org/mbostock/9265467', 
> 
> u'http://bl.ocks.org/mbostock/9234731', 

> u'http://bl.ocks.org/mbostock/9232962']
 
> Python loads the json in the file into a dictionary.  In this case, the 
> dictionary has a single key, 'bostock', and the value in the dictionary for 
 
> that key is a list (of dictionaries).  

> To get the urls, you need to get the list 
 
> json_data['bostock']
>  and then iterate over it's elements, getting the value for the key url for 
 > each one.  
> This is what the list comprehension 
> [x['url'] for x in json_data['bostock']]
> does.
> I hope that helps, 
> Kev

Kev your the man. Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


help with for loop----python 2.7.2

2014-03-22 Thread teddybubu
I am trying to get all the element data from the rss below.
The only thing I am pulling is the first element.
I don't understand why the for loop does not go through the entire rss.
Here is my code


try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen 

from bs4 import BeautifulSoup 

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)

for item in soup.find_all('item'):
#for item in soup:
title = soup.find('title').text
link = soup.find('link').text
item = soup.find('item').text
print item
print title
print link
-- 
https://mail.python.org/mailman/listinfo/python-list