Hi, all,
   
  I want to get the information of the professors (name,title) from the 
following link:
   
  "http://www.economics.utoronto.ca/index.php/index/person/faculty/";
   
  Ideally, I'd like to have a output file where each line is one Prof, 
including his name and title. In practice, I use the CSV module.
   
  The following is my program:
  
--------------- Program ----------------------------------------------------
  import urllib,re,csv
   
  url = "http://www.economics.utoronto.ca/index.php/index/person/faculty/";
   
  sock = urllib.urlopen(url)
htmlSource = sock.read()
sock.close()
   
  namePattern = re.compile(r'class="name">(.*)</a>') 
titlePattern = re.compile(r'</a>,&nbsp;(.*)\s*</td>')
  name = namePattern.findall(htmlSource)
title_temp = titlePattern.findall(htmlSource)
title =[]
for item in title_temp:
    item_new=" ".join(item.split())                #Suppress the spaces between 
'title' and </td>
    title.extend([item_new])
    
  output =[] 
for i in range(len(name)):
    output.insert(i,[name[i],title[i]])            #Generate a list of [name, 
title]
    
writer = csv.writer(open("professor.csv", "wb"))
writer.writerows(output)                           #output CSV file
  -------------- End of Program ----------------------------------------------
   
  My questions are:
   
  1.The code above assume that each Prof has a tilte. If any one of them does 
not, the name and title will be mismatched. How to program to allow that 
  title can be empty?
   
  2.Is there any easier way to get the data I want other than using list?
   
  3.Should I close the opened csv file("professor.csv")? How to close it?
   
  Thanks!
   
  Jackie
  
 

       
---------------------------------
 All new Yahoo! Mail - 
---------------------------------
Get a sneak peak at messages with a handy reading pane.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to