"Phillip B Oldham" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > I'd like the community's thoughts/comments on what I've done; > improvements I can make, "don'ts" I should be avoiding, etc. I'm not > so much bothered about the resulting data - for the moment it meets my > needs. But any comment is welcome!
I'm not expert, but here are a few thoughts. I hope they help. > #!/usr/bin/env python > ## Open a file containing a list of domains (1 per line), > ## request and parse it's whois record and push to a csv > ## file. You might want to look into doc strings as a method of providing longer documentation like this about what your program does. > dest = open('./whois.csv', 'w'); Semicolon!!!! :) > def trim( txt ): > x = [] > for line in txt.split("\n"): > if line.strip() == "": > continue > if line.strip().startswith('WHOIS'): > continue > if line.strip().startswith('>>>'): > continue > if line.strip().startswith('%'): > continue > if line.startswith("--"): > return ''.join(x) Is all this properly indented? One thing you can do is put each of these on one line, since they are fairly simple: if line.strip().startswith('WHOIS'): continue although I still like proper indentation. But you have a lot of them so it might save a good amount of space to do it this way. Also, just my personal preference, I like to be consistent with the type of quotes I use for strings. Here, you mix both single and double quotes on different lines. > return "\n".join(x); Semicolon!!!! :) :) > details = ['','','','','','','','',''] I don't have Python available to me right now, but I think you can do this instead: details = [''] * 9 > except: > continue Non-specific except clauses usually aren't preferred since they catch everything, even something you might not want to catch. > if domain == '': > continue You can say: if not domain instead of that equivalence test. But what does this if statement do? > if rec.startswith("No whois server") == True: > continue > > if rec.startswith("This TLD has no whois server") == True: > continue Like above, you don't need "== True" here. > if domain.endswith(".net"): > rec = clean_net(rec) > > if domain.endswith(".com"): > rec = clean_net(rec) > > if domain.endswith(".tv"): > rec = clean_net(rec) > > if domain.endswith(".co.uk"): > rec = clean_co_uk(rec) > > if domain.endswith(".info"): > rec = clean_info(rec) Hmm, my first thought is to do something like this with all these if tests: for extension in [<list all the extensions as strings here>]: rec = clean_net(extension) But for that to work, you may need to generalize the clean_net function so it works for all of them, instead of having to call different functions depending on the extension. Anyway, I hope some of that helps! -- http://mail.python.org/mailman/listinfo/python-list