On 10/10/2012 09:51, Joon Ki Choi wrote:
Hello Pythonistas, i have a very large textfile with contents like: @INBOOK{Ackermann1999-b, author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann}, year = {1980}, timestamp = {1995-12-02} } And i want to delete the duplicate rows except these rows containing the brackets { or }. The result should look like: @INBOOK{Ackermann1999-b, author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann}, year = {1980}, timestamp = {1995-12-02} } I come across with this Python-Skript: lines_seen = set() # holds lines already seen outfile = open("literatur_clean.txt", "w")
Slight aside, you could use this so there's no need to explicitly close the file.
with open("literatur_dupl.txt", "r") as infile
for line in infile: if line not in lines_seen: # not a duplicate outfile.write(line) lines_seen.add(line)
Something like:- if "{" in line or "}" in line or line not in lines_seen:
outfile.close() But it deletes also the lines with a closing bracket } and the lines with the same authordata. Therefor i need the condition of the brackets. Could someone point me out to adding this condition? Thanks in advance, Joon
-- Cheers. Mark Lawrence. -- http://mail.python.org/mailman/listinfo/python-list