Sebastian Bassi, this is an piece from the #5: ProtSeq = raw_input("Protein sequence: ").upper() ProtDeg = {"A":4,"C":2,"D":2,"E":2,"F":2,"G":4,"H":2, "I":3,"K":2,"L":6,"M":1,"N":2,"P":4,"Q":2, "R":6,"S":6,"T":4,"V":4,"W":1,"Y":2} SegsValues = [] for aa in range(len(ProtSeq)):
A more pythonic code is: prot_seq = raw_input("Protein sequence: ").upper() prot_deg = {... segs_values = [] for aa in xrange(len(prot_seq)): Note the use of xrange and names_with_underscores. In Python names are usually lower case and their parts are separated by underscores. >From #6: segsvalues=[]; segsseqs=[]; segment=protseq[:15]; a=0 ==> segs_values = [] segs_seqs = [] segment = prot_seq[:15] a = 0 If you want to limit the space in the book the you can pack those lines in a single line, but it's better to keep the underscores. >From #18: prop = 100.*cp/len(AAseq) return (charge,prop) ==> prop = 100.0 * cp / len(aa_seq) return (charge, prop) Adding spaces between operators and after a comma, and a zero after the point improves readability. >From #35: import re pattern = "[LIVM]{2}.RL[DE].{4}RLE" ... rgx = re.compile(pattern) When the pattern gets more complex it's better to show readers to use a re.VERBOSE pattern, to split it on more lines, indent those lines as a program, and add #comments to those lines. The #51 is missing. I like Python and I think Python is fit for bioinformatics purposes, but 3/4 of the purposes of a book like this are to teach bioinformatics first and computer science and Python second. And sometimes a dynamic language isn't fast enough for bioinformatics purposes, so a book about this topic probably has to contain some pieces of C/D/Java code too, to show and discuss implementations of algorithms that require more heavy computations (that are often already implemented inside biopython, etc, but someone has to write those libs too). The purpose here is not to teach how to write industrial-strength C libraries to perform those heavier computations, but to give the reader an idea (in a real lower-level language) how those libraries are actually implemented. Because science abhors black boxes, a scientist must have an idea of how all subsystems she/he/hir is using are working inside (that's why using Mathematica can be bad for a scientist, because such person has to write "and here magic happens" in the produced paper). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list