On Apr 26, 2:19 pm, Kiuhnm <kiuhnm03.4t.yahoo.it> wrote: > On 4/26/2012 19:54, smac2...@comcast.net wrote: > > > > > > > > > > > Hello, > > > I am having some difficulty generating the output I want from web > > scraping. Specifically, the script I wrote, while it runs without any > > errors, is not writing to the output file correctly. It runs, and > > creates the output .txt file; however, the file is blank (ideally it > > should be populated with a list of names). > > > I took the base of a program that I had before for a different data > > gathering task, which worked beautifully, and edited it for my > > purposes here. Any insight as to what I might be doing wrote would be > > highly appreciated. Code is included below. Thanks! > > > import os > > import re > > import urllib2 > > > outfile = open("Skadden.txt","w") > > > A = 1 > > Z = 26 > > > for letter in range(A,Z): > > > for line in > > urllib2.urlopen("http://www.skadden.com/Index.cfm?contentID=44&alphaSearch="+str(letter)): > > You need > alphaSearch=a > but you're using > alphaSearch=1 > > > x = line > > if '"><B>' in line: > > You should search for ' ><B>'. > > > start=x.find('"><B>"') > > Ditto. > > > end= x.find('</B></A></nobr></td>',start) > > name=x[start:end] > > You should use start+5 to skip ' ><B>'. > > > outfile.write(name+"\n") > > print name > > Your code is bound to break over and over (you should do some smarter > parsing), but here's a working version: > > ---> > import os > import re > import urllib2 > > outfile = open("Skadden.txt","w") > > A = ord('a') > Z = ord('z') > > for letter in range(A, Z): > for line in > urllib2.urlopen("http://www.skadden.com/Index.cfm?contentID=44&alphaSearch="+chr(letter)): > x = line > if ' ><B>' in line: > start=x.find(' ><B>') > end= x.find('</B></A></nobr></td>',start) > name=x[start+5:end] > outfile.write(name+"\n") > print name > <--- > > Kiuhnm
Great, thanks so much for your help! -- http://mail.python.org/mailman/listinfo/python-list