Ellerbee, Edward wrote: > I've been working on this for 3 weeks as a project while I'm learning > python. It's ugly, only function in there is from a fellow lister, but > it works perfectly and does what it is intended to do. > > I'd love to hear comments on how I could improve this code, what would > be good to turn into a function/class etc. > > # The function of this script is to gather user data from the user to > # build appropriate local dial-peers in a cisco voice gateway. > # Data gathered from the user: > # name of file to write dial-peers to > # NPA > # NXX > # outbound port to send calls > # question if home npa local calls are 7 digit > # script navigates to nanpa.com, gathers the local calling area > # and returns the data > # the data is converted to beautiful soup (I had a problem with the xml > format) > # the data is parsed for the proper NPA/NXXs > # list is sorted, duplicates removed > # data from list is formatted and printed to the file specified > > > #--------Import dependencies----------- > import urllib2 > from BeautifulSoup import BeautifulSoup > import re > > #--------Define variables------------ > count = 0 > count2 = 0 > count3 = 0 > npalist = [] > nxxlist = [] > sortlist = [] > sortedlist = [] > npaReg = re.compile('(?<=<npa>)(...)(?=</npa>)') > nxxReg = re.compile('(?<=<nxx>)(...)(?=</nxx>)') > proxy = urllib2.ProxyHandler({'http': 'proxy.com:8080'}) > # --actual proxy was removed for confidentiality-- > opener = urllib2.build_opener(proxy) > > #----- function to concatenate numbers - thanks to Chris Angelico ----- > def combine(list_of_numbers, position): > lastprefix=tails=lastsuffix=None > result=[] > for cur in list_of_numbers: > prefix=cur[:position]; tail=cur[position]; > suffix=cur[position+1:] > if prefix!=lastprefix or suffix!=lastsuffix: > if lastprefix!=None: > if len(tails)>1: > result.append("%s[%s]%s"%(lastprefix,tails,lastsuffix)) > else: result.append(lastprefix+tails+lastsuffix) > lastprefix,tails,lastsuffix=prefix,"",suffix > tails+=tail > if lastprefix!=None: > if len(tails)>1: > result.append("%s[%s]%s"%(lastprefix,tails,lastsuffix)) > else: result.append(lastprefix+tails+lastsuffix) > return result > > > #-------Gather info from user-------- > x = raw_input("please enter a filename: ") > y = raw_input("please enter the npa: ") > z = raw_input("please enter the nxx: ") > p = raw_input("please enter the port: ") > q = raw_input("Is home npa local dialing 7 digit?(y/n) ") > #print x > > #-------Modify user data------------- > o = open(x, 'w') > y = str(y) > z = str(z) > p = str(p) > pagedef = ("http://www.localcallingguide.com/xmllocalprefix.php?npa=" + > y + "&nxx=" + z) > print "Querying", pagedef > > #------Get info from NANPA.com ---------- > urllib2.install_opener(opener) > page = urllib2.urlopen(pagedef) > soup = BeautifulSoup(page) > soup = str(soup) > > #------Parse Gathered Data---------- > for line in npaReg.findall(soup): > npalist.insert(count,line) > count = count + 1 > > for line2 in nxxReg.findall(soup): > nxxlist.insert(count2,line2) > count2 = count2 + 1 > > #-----Sort, remove duplicates, concatenate the last digits for similiar > NPA/NXX ------ > for makenewlist in range(0,count): > sortlist.append(npalist.pop(0) + nxxlist.pop(0)) > > sortlist.sort() > > for sortednumber in sortlist: > if sortednumber not in sortedlist: > sortedlist.append(sortednumber) > > catlist = combine(sortedlist,5) > catlist2 = combine(catlist,4) > > #------Print the dial-peers to file---- > for line in catlist2: > figureDpn = count3 + 1000 > dpn = str(figureDpn) > label = "dial-peer voice " + dpn > o.write(label) > o.write('\n') > o.write("description *** local outbound dialpeer ***") > o.write('\n') > destpatt = "destination-pattern %s...." % line.rstrip() > o.write(destpatt) > o.write('\n') > port = "port " + p > o.write(port) > o.write('\n') > if line[0:3] == y and q == "y": > o.write("forward-digits 7") > o.write('\n') > o.write('\n') > count3 = count3 + 1 > > o.close()
- Most important: read or reread the tutorial. It should give you an idea about the basics like adding items to a list. - Use self-evident variable names - Avoid global variables - Prefer docstrings over comments - Split your code into small chunks that do one thing well, i. e. functions. At the top-level your script could look like this def main(): args = read_arguments() url = make_url(args.npa, args.nxx) page = download_page(url) npa_nxx_pairs = scrape_page(page) clusters = combine_pairs(npa_nxx_pairs) with open(args.destination_file) as out: write_data(out, clusters, args.npa, args.port, args.seven_digit) if __name__ == "__main__" main() This makes dependencies clear, avoids globals, and reduces the need for comments. You can test separate functionality separately. NB: As I have no expertise in the problem domain the variable names I used above are a bit too abstract. - Convert flags to boolean early. Bad: seven_digit = raw_input(...) ... if seven_digit == "y": ... Better: seven_digit = raw_input(...) == "y" ... if seven_digit: ... - Relentlessly eliminate dead code. Example: page = str(BeautifulSoup(page)) is almost a no-op. You don't need BeautifulSoup at all! - Don't comment out code, remove it. Rely on version control if you think that you may need it again later. - The page you are scraping is xml. Regular expressions are not the best tool to deal with that. Here's a sample implementation of scrape_page() using lxml and xpath: from lxml import etree def scrape_page(page): root = etree.fromstring(page) npas = root.xpath("//prefix/npa/text()") nxxs = root.xpath("//prefix/nxx/text()") pairs = (npa + nxx for npa, nxx in zip(npas, nxxs)) return sorted(set(pairs)) An xpath expert might be able to simplify that. - I prefer to pass script arguments on the commandline rather than answering a sequence of questions interactively. With that approach get_arguments() becomes something like import argparse def get_arguments(): parser = argparse.ArgumentParser() parser.add_argument("npa") parser.add_argument("nxx") parser.add_argument("destfile", nargs="?") parser.add_argument("--port", default="1234") parser.add_argument("-7", "--seven-digit", action="store_true") return parser.parse_args() Sadly the instructive help="..." parameters are all missing... -- http://mail.python.org/mailman/listinfo/python-list