Shawn Milo a écrit : (snip) > The script reads a file from standard input and > finds the best record for each unique ID (piid). The best is defined > as follows: The newest expiration date (field 5) for the record with > the state (field 1) which matches the desired state (field 6). If > there is no record matching the desired state, then just take the > newest expiration date. >
Here's a fixed (wrt/ test data) version with a somewhat better (and faster) algorithm using Decorate/Sort/Undecorate (aka schwarzian transform): import sys output = sys.stdout input = [ #ID STATE ... ... ... DATE TARGET "aaa\tAAA\t...\t...\t...\t20071212\tBBB\n", "aaa\tAAA\t...\t...\t...\t20070120\tAAA\n", "aaa\tAAA\t...\t...\t...\t20070101\tAAA\n", "aaa\tAAA\t...\t...\t...\t20071010\tBBB\n", "aaa\tAAA\t...\t...\t...\t20071111\tBBB\n", "ccc\tAAA\t...\t...\t...\t20071201\tBBB\n", "ccc\tAAA\t...\t...\t...\t20070101\tAAA\n", "ccc\tAAA\t...\t...\t...\t20071212\tBBB\n", "ccc\tAAA\t...\t...\t...\t20071212\tAAA\n", "bbb\tAAA\t...\t...\t...\t20070101\tBBB\n", "bbb\tAAA\t...\t...\t...\t20070101\tBBB\n", "bbb\tAAA\t...\t...\t...\t20071212\tBBB\n", "bbb\tAAA\t...\t...\t...\t20070612\tBBB\n", "bbb\tAAA\t...\t...\t...\t20071212\tBBB\n", ] def find_best_match(input=input, output=output): PIID = 0 STATE = 1 EXP_DATE = 5 DESIRED_STATE = 6 recs = {} for line in input: line = line.rstrip('\n') row = line.split('\t') sort_key = (row[STATE] == row[DESIRED_STATE], row[EXP_DATE]) recs.setdefault(row[PIID], []).append((sort_key, line)) for decorated_lines in recs.itervalues(): print >> output, sorted(decorated_lines, reverse=True)[0][1] Lines are sorted first on whether the state matches the desired state, then on the expiration date. Since it's a reverse sort, we first have lines that match (if any) sorted by date descending, then the lines that dont match sorted by date descending. So in both cases, the 'best match' is the first item in the list. Then we just have to get rid of the sort key, et voilà !-) HTH -- http://mail.python.org/mailman/listinfo/python-list