Output of html parsing

2007-06-16 Thread Jackie Wang
Hi, all,
   
  I want to get the information of the professors (name,title) from the 
following link:
   
  "http://www.economics.utoronto.ca/index.php/index/person/faculty/";
   
  Ideally, I'd like to have a output file where each line is one Prof, 
including his name and title. In practice, I use the CSV module.
   
  The following is my program:
  
--- Program 
  import urllib,re,csv
   
  url = "http://www.economics.utoronto.ca/index.php/index/person/faculty/";
   
  sock = urllib.urlopen(url)
htmlSource = sock.read()
sock.close()
   
  namePattern = re.compile(r'class="name">(.*)') 
titlePattern = re.compile(r', (.*)\s*')
  name = namePattern.findall(htmlSource)
title_temp = titlePattern.findall(htmlSource)
title =[]
for item in title_temp:
item_new=" ".join(item.split())#Suppress the spaces between 
'title' and 
title.extend([item_new])

  output =[] 
for i in range(len(name)):
output.insert(i,[name[i],title[i]])#Generate a list of [name, 
title]

writer = csv.writer(open("professor.csv", "wb"))
writer.writerows(output)   #output CSV file
  -- End of Program --
   
  My questions are:
   
  1.The code above assume that each Prof has a tilte. If any one of them does 
not, the name and title will be mismatched. How to program to allow that 
  title can be empty?
   
  2.Is there any easier way to get the data I want other than using list?
   
  3.Should I close the opened csv file("professor.csv")? How to close it?
   
  Thanks!
   
  Jackie
  
 

   
-
 All new Yahoo! Mail - 
-
Get a sneak peak at messages with a handy reading pane.-- 
http://mail.python.org/mailman/listinfo/python-list

dealing with emf/wmf files

2007-06-24 Thread Jackie Wang
I'd like to put some emf/wmf pictures into a pdf file
using 'reportlab', but the Python Imaging Library
cannot recognize emf files. The wmf files are said to
be 'identified only'.

Therefore, the following code does not work:

from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
width, height = letter # (595.27,841.89)
def hello(c):
c.drawImage(r'D:\01.wmf',1,1,height,width)

c = canvas.Canvas('hello.pdf',
pagesize=(height,width))
hello(c)
c.showPage()
c.save()

I do not want to convert the pictures into other
formats, e.g. jpg which will lower the quality. 

Is there any way to get around with this problem?

Thanks!


  Get news delivered with the All new Yahoo! Mail.  Enjoy RSS feeds right 
on your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca
-- 
http://mail.python.org/mailman/listinfo/python-list


automatically pdf files generating

2007-06-25 Thread Jackie Wang
Hi, all,
   
  There are 50 folders in my hard driver C:
C:\01.c:\02,...,c:\50
  There are 4 pictures in each folder:
1.jpg,2.jpg,3.jpg,4.jpg
   
  For each folder, I want to print the 4 pictures into a single-paged pdf file 
(letter sized; print horizontally). All together, I want to get 50 pdf files 
with names: 01.pdf,02.pdf,...,50.pdf.
   
  Is it possible to use Python to realized the above process? I know there is a 
module named "reportlab". Is there any easy command in the module to do my job?
   
  Thanks
   
  Jackie 


-
Be smarter than spam. See how smart SpamGuard is at giving junk email the boot 
with the All-new Yahoo! Mail  -- 
http://mail.python.org/mailman/listinfo/python-list

Automatically fill in forms on line

2008-03-31 Thread Jackie Wang
Dear all,

I want to automatically complete the following task:

1. Go to http://www.ffiec.gov/Geocode/default.aspx;
2. Fill in an address in the form "Street Address:" . e.g. "1316 State
Highway 102";
3. Fill in a ZIPcode in the form "Zip Code:" . e.g. "04609";
4. Click the bottom "search";
5. In the opened page, extract and save the number after "Tract Code".
In the example, it will be "9659".
6. Repeat Step 1 with a new address.

Can Python realize these steps? Can these steps be done witout
openning and IE windows? Especially, I dont know how to write code for
step 2, 4 and 5.

Thank you!
-- 
http://mail.python.org/mailman/listinfo/python-list


Extract Information from Tables in html

2008-09-05 Thread Jackie Wang
Dear all,

Here is a html code:



 Premier Community Bank of Southwest Florida
 
 Fort Myers, FL



My question is how I can extract the strings and get the results:
Premier Community Bank of Southwest Florida; Fort Myers, FL

Thanks a lot

Jackie
--
http://mail.python.org/mailman/listinfo/python-list


Use BeautifulSoup to delete certain tag while keeping its content

2008-09-06 Thread Jackie Wang
Dear all,

I have the following html code:


 
  Center Bank
  
  Los Angeles, CA
 



 
  Salisbury
Bank and Trust Company
  
   
   Lakeville, CT
  
 


How should I delete the 'font' tags while keeping the content inside?
Ideally I want to get:


  Center Bank
  
  Los Angeles, CA



  Salisbury
Bank and Trust Company
   
   Lakeville, CT


Thank you.

Jackie
--
http://mail.python.org/mailman/listinfo/python-list


pause between the loops

2008-09-21 Thread Jackie Wang
Hi all,

For a loop like:

for i = range (0,10);

can I ask python to stop for, say, 5mins, after it go through loop i=0
before it starts loop i=1?

Thank you very much!

Jackie
--
http://mail.python.org/mailman/listinfo/python-list