Hello everyone,
I'm reading the rows from a CSV file. csv.DictReader puts those rows into dictionaries.
The actual files contain old and new translations of software strings. The dictionary containing the row data looks like this:
o={'TermID':'4', 'English':'System Administration', 'Polish':'Zarzadzanie systemem'}
I put those dictionaries into the list:
oldl=[x for x in orig] # where orig=csv.DictReader(ofile ...
..and then search for matching source terms in two loops:
for o in oldl: for n in newl: if n['English'] == o['English']: ...
Now, this works. However, not only this is very un-Pythonic, but also very inefficient: the complexity is O(n**2), so it scales up very badly.
What I want to know is if there is some elegant and efficient way of doing this, i.e. finding all the dictionaries dx_1 ... dx_n, contained in a list (or a dictionary) dy, where dx_i contains a specific value. Or possibly just the first dx_1 dictionary.
Sure, just do a little preprocessing. Something like (untested):
def make_map(l): # This assumes that each English key is unique in a given l # if it's not you'll have to use a list of o instead of o itself. map = {} for d in l: if 'English' in d: key = d['English'] map[key] = d
old_map = make_map(oldl) new_map = make_map(newl)
for engphrase in old_map: if engphrase in new_map: o = old_map[engphrase] n = new_map[engphrase] if n['Polish'] == o['Polish']: status='' else: status='CHANGED' # process....
I've assumed that the English key is unique in both the old and new lists. If it's not this will need some adjustment. However, your original algorithm is going to behave weirdly in that case anyway (spitting out multiple lines with the same id, but potentially different new terms and update status).
Hope that's useful.
I HAVE to search for values corresponding to key 'English', since
there are big gaps in both files (i.e. there's a lot of rows in the old file that do not correspond to the rows in the new
file and vice versa). I don't want to do ugly things like converting
dictionary to a string so I could use string.find() method.
Obviously it does not have to be implemented this way. If
data structures here could be designed in a proper (Pythonesque ;-) way, great.
I do realize that this resembles doing some operation on matrixes. But I have never tried doing smth like this in Python.
#---------- Code follows ---------
import sys import csv
class excelpoldialect(csv.Dialect): delimiter=';' doublequote=True lineterminator='\r\n' quotechar='"' quoting=0 skipinitialspace=False
epdialect=excelpoldialect() csv.register_dialect('excelpol',epdialect)
try: ofile=open(sys.argv[1],'rb') except IOError: print "Old file %s could not be opened" % (sys.argv[1]) sys.exit(1)
try: tfile=open(sys.argv[2],'rb') except IOError: print "New file %s could not be opened" % (sys.argv[2]) sys.exit(1)
titles=csv.reader(ofile, dialect='excelpol').next()
orig=csv.DictReader(ofile, titles, dialect='excelpol')
transl=csv.DictReader(tfile, titles, dialect='excelpol')
cfile=open('cmpfile.csv','wb') titles.append('New') titles.append('RowChanged') cm=csv.DictWriter(cfile,titles, dialect='excelpol') cm.writerow(dict(zip(titles,titles)))
print titles print "-------------"
oldl=[x for x in orig] newl=[x for x in transl]
for o in oldl: for n in newl: if n['English'] == o['English']: if n['Polish'] == o['Polish']: status='' else: status='CHANGED' combined={'TermID': o['TermID'], 'English': o['English'], 'Polish': o['Polish'], 'New': n['Polish'], 'RowChanged': status} cm.writerow(combined) all.append(combined)
# duplicates
dfile=open('dupes.csv','wb') dupes=csv.DictWriter(dfile,titles,dialect='excelpol') dupes.writerow(dict(zip(titles,titles)))
"""for i in xrange(0,len(all)-2):
for j in xrange(i+1, len(all)-1):
if (all[i]['English']==all[j]['English']) and
Real world is perfectly indifferent to lies that are the foundation of leftist "thinking".
-- http://mail.python.org/mailman/listinfo/python-list