Hi, all,
I want to get the information of the professors (name,title) from the
following link:
"http://www.economics.utoronto.ca/index.php/index/person/faculty/"
Ideally, I'd like to have a output file where each line is one Prof,
including his name and title. In practice, I use the CSV module.
The following is my program:
--------------- Program ----------------------------------------------------
import urllib,re,csv
url = "http://www.economics.utoronto.ca/index.php/index/person/faculty/"
sock = urllib.urlopen(url)
htmlSource = sock.read()
sock.close()
namePattern = re.compile(r'class="name">(.*)</a>')
titlePattern = re.compile(r'</a>, (.*)\s*</td>')
name = namePattern.findall(htmlSource)
title_temp = titlePattern.findall(htmlSource)
title =[]
for item in title_temp:
item_new=" ".join(item.split()) #Suppress the spaces between
'title' and </td>
title.extend([item_new])
output =[]
for i in range(len(name)):
output.insert(i,[name[i],title[i]]) #Generate a list of [name,
title]
writer = csv.writer(open("professor.csv", "wb"))
writer.writerows(output) #output CSV file
-------------- End of Program ----------------------------------------------
My questions are:
1.The code above assume that each Prof has a tilte. If any one of them does
not, the name and title will be mismatched. How to program to allow that
title can be empty?
2.Is there any easier way to get the data I want other than using list?
3.Should I close the opened csv file("professor.csv")? How to close it?
Thanks!
Jackie
---------------------------------
All new Yahoo! Mail -
---------------------------------
Get a sneak peak at messages with a handy reading pane.
--
http://mail.python.org/mailman/listinfo/python-list