On Wed, 29 Apr 2009 01:53:24 +0100, VP <vadim.pestovni...@gmail.com> wrote:
Hi, I have a csv file: 'aaa.111', 'T100', 'pn123', 'sn111' 'aaa.111', 'T200', 'pn123', 'sn222' 'bbb.333', 'T300', 'pn123', 'sn333' 'ccc.444', 'T400', 'pn123', 'sn444' 'ddd', 'T500', 'pn123', 'sn555' 'eee.666', 'T600', 'pn123', 'sn444' 'fff.777', 'T700', 'pn123', 'sn777' How can I extract duplicates checking each row by filed1 and filed4?
Untested: import csv seen_in_field0 = set() seen_in_field3 = set() reader = csv.reader(open("myfile.csv", "rb")) for row in reader: if row[0] in seen_in_field0 or row[3] in seen_in_field3: reject_this(row) else: seen_in_field0.add(row[0]) seen_in_field3.add(row[3]) accept_this(row) This assumes that you don't record fields 0 and 3 for lines that are rejected, i.e. if the file is: 'aaa.111', 'T100', 'pn123', 'sn111' 'aaa.111', 'T200', 'pn123', 'sn222' 'aaa.222', 'T300', 'pn123', 'sn222' you want to keep: 'aaa.111', 'T100', 'pn123', 'sn111' 'aaa.222', 'T300', 'pn123', 'sn222' -- Rhodri James *-* Wildebeeste Herder to the Masses -- http://mail.python.org/mailman/listinfo/python-list