Output of html parsing
Hi, all, I want to get the information of the professors (name,title) from the following link: "http://www.economics.utoronto.ca/index.php/index/person/faculty/"; Ideally, I'd like to have a output file where each line is one Prof, including his name and title. In practice, I use the CSV module. The following is my program: --- Program import urllib,re,csv url = "http://www.economics.utoronto.ca/index.php/index/person/faculty/"; sock = urllib.urlopen(url) htmlSource = sock.read() sock.close() namePattern = re.compile(r'class="name">(.*)') titlePattern = re.compile(r', (.*)\s*') name = namePattern.findall(htmlSource) title_temp = titlePattern.findall(htmlSource) title =[] for item in title_temp: item_new=" ".join(item.split())#Suppress the spaces between 'title' and title.extend([item_new]) output =[] for i in range(len(name)): output.insert(i,[name[i],title[i]])#Generate a list of [name, title] writer = csv.writer(open("professor.csv", "wb")) writer.writerows(output) #output CSV file -- End of Program -- My questions are: 1.The code above assume that each Prof has a tilte. If any one of them does not, the name and title will be mismatched. How to program to allow that title can be empty? 2.Is there any easier way to get the data I want other than using list? 3.Should I close the opened csv file("professor.csv")? How to close it? Thanks! Jackie - All new Yahoo! Mail - - Get a sneak peak at messages with a handy reading pane.-- http://mail.python.org/mailman/listinfo/python-list
dealing with emf/wmf files
I'd like to put some emf/wmf pictures into a pdf file using 'reportlab', but the Python Imaging Library cannot recognize emf files. The wmf files are said to be 'identified only'. Therefore, the following code does not work: from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import letter width, height = letter # (595.27,841.89) def hello(c): c.drawImage(r'D:\01.wmf',1,1,height,width) c = canvas.Canvas('hello.pdf', pagesize=(height,width)) hello(c) c.showPage() c.save() I do not want to convert the pictures into other formats, e.g. jpg which will lower the quality. Is there any way to get around with this problem? Thanks! Get news delivered with the All new Yahoo! Mail. Enjoy RSS feeds right on your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca -- http://mail.python.org/mailman/listinfo/python-list
automatically pdf files generating
Hi, all, There are 50 folders in my hard driver C: C:\01.c:\02,...,c:\50 There are 4 pictures in each folder: 1.jpg,2.jpg,3.jpg,4.jpg For each folder, I want to print the 4 pictures into a single-paged pdf file (letter sized; print horizontally). All together, I want to get 50 pdf files with names: 01.pdf,02.pdf,...,50.pdf. Is it possible to use Python to realized the above process? I know there is a module named "reportlab". Is there any easy command in the module to do my job? Thanks Jackie - Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail -- http://mail.python.org/mailman/listinfo/python-list
Automatically fill in forms on line
Dear all, I want to automatically complete the following task: 1. Go to http://www.ffiec.gov/Geocode/default.aspx; 2. Fill in an address in the form "Street Address:" . e.g. "1316 State Highway 102"; 3. Fill in a ZIPcode in the form "Zip Code:" . e.g. "04609"; 4. Click the bottom "search"; 5. In the opened page, extract and save the number after "Tract Code". In the example, it will be "9659". 6. Repeat Step 1 with a new address. Can Python realize these steps? Can these steps be done witout openning and IE windows? Especially, I dont know how to write code for step 2, 4 and 5. Thank you! -- http://mail.python.org/mailman/listinfo/python-list
Extract Information from Tables in html
Dear all, Here is a html code: Premier Community Bank of Southwest Florida Fort Myers, FL My question is how I can extract the strings and get the results: Premier Community Bank of Southwest Florida; Fort Myers, FL Thanks a lot Jackie -- http://mail.python.org/mailman/listinfo/python-list
Use BeautifulSoup to delete certain tag while keeping its content
Dear all, I have the following html code: Center Bank Los Angeles, CA Salisbury Bank and Trust Company Lakeville, CT How should I delete the 'font' tags while keeping the content inside? Ideally I want to get: Center Bank Los Angeles, CA Salisbury Bank and Trust Company Lakeville, CT Thank you. Jackie -- http://mail.python.org/mailman/listinfo/python-list
pause between the loops
Hi all, For a loop like: for i = range (0,10); can I ask python to stop for, say, 5mins, after it go through loop i=0 before it starts loop i=1? Thank you very much! Jackie -- http://mail.python.org/mailman/listinfo/python-list