On 10/10/2012 09:51, Joon Ki Choi wrote:

Hello Pythonistas,

i have a very large textfile with contents like:

@INBOOK{Ackermann1999-b,
   author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann,
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann,
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann,
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann},
   year = {1980},
   timestamp = {1995-12-02}
}       

And i want to delete the duplicate rows except these rows containing the 
brackets { or }.
The result should look like:

@INBOOK{Ackermann1999-b,
   author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann,
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann},
   year = {1980},
   timestamp = {1995-12-02}
}

I come across with this Python-Skript:

lines_seen = set() # holds lines already seen
outfile = open("literatur_clean.txt", "w")

Slight aside, you could use this so there's no need to explicitly close the file.

with open("literatur_dupl.txt", "r") as infile

for line in infile:
     if line not in lines_seen: # not a duplicate
         outfile.write(line)
         lines_seen.add(line)

Something like:-

if "{" in line or "}" in line or line not in lines_seen:

outfile.close()

But it deletes also the lines with a closing bracket } and the lines with the 
same authordata.
Therefor i need the condition of the brackets.

Could someone point me out to adding this condition?

Thanks in advance,
Joon


--
Cheers.

Mark Lawrence.

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to