Hello everyone, I'm reading the rows from a CSV file. csv.DictReader puts those rows into dictionaries.
The actual files contain old and new translations of software strings. The dictionary containing the row data looks like this: o={'TermID':'4', 'English':'System Administration', 'Polish':'Zarzadzanie systemem'} I put those dictionaries into the list: oldl=[x for x in orig] # where orig=csv.DictReader(ofile ... ..and then search for matching source terms in two loops: for o in oldl: for n in newl: if n['English'] == o['English']: ... Now, this works. However, not only this is very un-Pythonic, but also very inefficient: the complexity is O(n**2), so it scales up very badly. What I want to know is if there is some elegant and efficient way of doing this, i.e. finding all the dictionaries dx_1 ... dx_n, contained in a list (or a dictionary) dy, where dx_i contains a specific value. Or possibly just the first dx_1 dictionary. I HAVE to search for values corresponding to key 'English', since there are big gaps in both files (i.e. there's a lot of rows in the old file that do not correspond to the rows in the new file and vice versa). I don't want to do ugly things like converting dictionary to a string so I could use string.find() method. Obviously it does not have to be implemented this way. If data structures here could be designed in a proper (Pythonesque ;-) way, great. I do realize that this resembles doing some operation on matrixes. But I have never tried doing smth like this in Python. #---------- Code follows --------- import sys import csv class excelpoldialect(csv.Dialect): delimiter=';' doublequote=True lineterminator='\r\n' quotechar='"' quoting=0 skipinitialspace=False epdialect=excelpoldialect() csv.register_dialect('excelpol',epdialect) try: ofile=open(sys.argv[1],'rb') except IOError: print "Old file %s could not be opened" % (sys.argv[1]) sys.exit(1) try: tfile=open(sys.argv[2],'rb') except IOError: print "New file %s could not be opened" % (sys.argv[2]) sys.exit(1) titles=csv.reader(ofile, dialect='excelpol').next() orig=csv.DictReader(ofile, titles, dialect='excelpol') transl=csv.DictReader(tfile, titles, dialect='excelpol') cfile=open('cmpfile.csv','wb') titles.append('New') titles.append('RowChanged') cm=csv.DictWriter(cfile,titles, dialect='excelpol') cm.writerow(dict(zip(titles,titles))) print titles print "-------------" oldl=[x for x in orig] newl=[x for x in transl] all=[] for o in oldl: for n in newl: if n['English'] == o['English']: if n['Polish'] == o['Polish']: status='' else: status='CHANGED' combined={'TermID': o['TermID'], 'English': o['English'], 'Polish': o['Polish'], 'New': n['Polish'], 'RowChanged': status} cm.writerow(combined) all.append(combined) # duplicates dfile=open('dupes.csv','wb') dupes=csv.DictWriter(dfile,titles,dialect='excelpol') dupes.writerow(dict(zip(titles,titles))) """for i in xrange(0,len(all)-2): for j in xrange(i+1, len(all)-1): if (all[i]['English']==all[j]['English']) and all[i]['RowChanged']=='CHANGED': dupes.writerow(all[i]) dupes.writerow(all[j])""" cfile.close() ofile.close() tfile.close() dfile.close() -- Real world is perfectly indifferent to lies that are the foundation of leftist "thinking". -- http://mail.python.org/mailman/listinfo/python-list