On Mar 3, 9:44 am, "Shawn Milo" <[EMAIL PROTECTED]> wrote: > I'm new to Python and fairly experienced in Perl, although that > experience is limited to the things I use daily. > > I wrote the same script in both Perl and Python, and the output is > identical. The run speed is similar (very fast) and the line count is > similar. > > Now that they're both working, I was looking at the code and wondering > what Perl-specific and Python-specific improvements to the code would > look like, as judged by others more knowledgeable in the individual > languages. > > I am not looking for the smallest number of lines, or anything else > that would make the code more difficult to read in six months. Just > any instances where I'm doing something inefficiently or in a "bad" > way. > > I'm attaching both the Perl and Python versions, and I'm open to > comments on either. The script reads a file from standard input and > finds the best record for each unique ID (piid). The best is defined > as follows: The newest expiration date (field 5) for the record with > the state (field 1) which matches the desired state (field 6). If > there is no record matching the desired state, then just take the > newest expiration date. >
[big snip] Here is my rewrite in what I regard as idiomatic reasonably-modern Python (OMMV of course). A few of the comments are applicable irrespective of the language used. HTH, John 8<------ #! /usr/bin/env python ### Layout: Use 4-space indent. Don't use tabs. Don't exceed 79 chars per line. import sys def process_file(opened_file=sys.stdin): ### Local variable access is faster than global ### input = sys.stdin ### shadowing built_in function "input" ### Use names to access elements in rows PIID = 0 ACTUAL_STATE = 1 DESIRED_STATE = 6 EXPIRY_DATE = 5 recs = {} for row in opened_file: row = row.rstrip('\n').split('\t') ### Do the split('\t') *once* per row piid = row[PIID] ### if recs.has_key(piid) is False: ### has_key() is ancient ### "if boolean_expression is False" is megabletchworthy; ### use "if not boolean_expression" if piid not in recs: recs[piid] = [] recs[piid].append(row) ### for piid in recs.keys(): for piid in recs: best = None ### use out-of-band sentinel for current in recs[piid]: if best is None: best = current ### had cockroach crap (";") at EOL else: #If the current record is the correct state ### Clear code (like the next line) doesn't need comments ### like the above line if current[ACTUAL_STATE] == current[DESIRED_STATE]: if best[ACTUAL_STATE] == best[DESIRED_STATE]: if current[EXPIRY_DATE] > best[EXPIRY_DATE]: best = current else: best = current else: if (best[ACTUAL_STATE] != best[ACTUAL_STATE] and current[EXPIRY_DATE] > best[EXPIRY_DATE]): best = current print "\t".join(best) if __name__ == "__main__": ### Standard idiom to avoid executing script content ### when/if file is imported process_file() 8<---- -- http://mail.python.org/mailman/listinfo/python-list