Shawn Milo kirjoitti: > <snip> > I am not looking for the smallest number of lines, or anything else > that would make the code more difficult to read in six months. Just > any instances where I'm doing something inefficiently or in a "bad" > way. > > I'm attaching both the Perl and Python versions, and I'm open to > comments on either. The script reads a file from standard input and > finds the best record for each unique ID (piid). The best is defined > as follows: The newest expiration date (field 5) for the record with > the state (field 1) which matches the desired state (field 6). If > there is no record matching the desired state, then just take the > newest expiration date. >
I don't know if this attempt satisfies your criteria but here goes! This is not a rewrite of your program but was created using your problem description above. I've not included the reading of the data because it has not much to do with the problem per se. #============================================================ input = [ "aaa\tAAA\t...\t...\t...\t20071212\tBBB\n", "aaa\tAAA\t...\t...\t...\t20070120\tAAA\n", "aaa\tAAA\t...\t...\t...\t20070101\tAAA\n", "aaa\tAAA\t...\t...\t...\t20071010\tBBB\n", "aaa\tAAA\t...\t...\t...\t20071111\tBBB\n", "ccc\tAAA\t...\t...\t...\t20071201\tBBB\n", "ccc\tAAA\t...\t...\t...\t20070101\tAAA\n", "ccc\tAAA\t...\t...\t...\t20071212\tBBB\n", "ccc\tAAA\t...\t...\t...\t20071212\tAAA\n", "bbb\tAAA\t...\t...\t...\t20070101\tAAA\n", "bbb\tAAA\t...\t...\t...\t20070101\tAAA\n", "bbb\tAAA\t...\t...\t...\t20071212\tAAA\n", "bbb\tAAA\t...\t...\t...\t20070612\tAAA\n", "bbb\tAAA\t...\t...\t...\t20071212\tBBB\n", ] input = [x[:-1].split('\t') for x in input] recs = {} for row in input: recs.setdefault(row[0], []).append(row) for key in recs: rows = recs[key] rows.sort(key=lambda x:x[5], reverse=True) for current in rows: if current[1] == current[6]: break else: current = rows[0] print '\t'.join(current) #============================================================ The output is: aaa AAA ... ... ... 20070120 AAA bbb AAA ... ... ... 20071212 AAA ccc AAA ... ... ... 20071212 AAA and it is the same as the output of your original code on this data. Further testing would naturally be beneficial. Cheers, Jussi -- http://mail.python.org/mailman/listinfo/python-list