Re: Pythonic search of list of dictionaries

Tim Hochberg Tue, 04 Jan 2005 09:10:46 -0800

Bulba! wrote:

Hello everyone,

I'm reading the rows from a CSV file. csv.DictReader puts
those rows into dictionaries.

The actual files contain old and new translations of software
strings. The dictionary containing the row data looks like this:

    o={'TermID':'4', 'English':'System Administration',
'Polish':'Zarzadzanie systemem'}

I put those dictionaries into the list:

   oldl=[x for x in orig]  # where orig=csv.DictReader(ofile ...

..and then search for matching source terms in two loops:

   for o in oldl:
       for n in newl:
           if n['English'] == o['English']:
           ...

Now, this works. However, not only this is very un-Pythonic, but also
very inefficient: the complexity is O(n**2), so it scales up very
badly.

What I want to know is if there is some elegant and efficient
way of doing this, i.e. finding all the dictionaries dx_1 ... dx_n,
contained in a list (or a dictionary) dy, where dx_i  contains
a specific value. Or possibly just the first dx_1 dictionary.


Sure, just do a little preprocessing. Something like (untested):

####

def make_map(l):
    # This assumes that each English key is unique in a given l
    # if it's not you'll have to use a list of o instead of o itself.
    map = {}
    for d in l:
        if 'English' in d:
            key = d['English']
            map[key] = d

old_map = make_map(oldl)
new_map = make_map(newl)

for engphrase in old_map:
    if engphrase in new_map:
        o = old_map[engphrase]
        n = new_map[engphrase]
        if n['Polish'] == o['Polish']:
            status=''
        else:
            status='CHANGED'
        # process....

####

I've assumed that the English key is unique in both the old and new lists. If it's not this will need some adjustment. However, your original algorithm is going to behave weirdly in that case anyway (spitting out multiple lines with the same id, but potentially different new terms and update status).

Hope that's useful.

-tim

I HAVE to search for values corresponding to key 'English', since there are big gaps in both files (i.e. there's a lot of rows in the old file that do not correspond to the rows in the new file and vice versa). I don't want to do ugly things like converting dictionary to a string so I could use string.find() method.

Obviously it does not have to be implemented this way. If data structures here could be designed in a proper (Pythonesque ;-) way, great.

I do realize that this resembles doing some operation on matrixes. But I have never tried doing smth like this in Python.
#---------- Code follows ---------
import sys
import csv
class excelpoldialect(csv.Dialect):
    delimiter=';'
    doublequote=True
    lineterminator='\r\n'
    quotechar='"'
    quoting=0
    skipinitialspace=False
epdialect=excelpoldialect()
csv.register_dialect('excelpol',epdialect)
try:
    ofile=open(sys.argv[1],'rb')
except IOError:
    print "Old file %s could not be opened" % (sys.argv[1])
    sys.exit(1)
try:
    tfile=open(sys.argv[2],'rb')
except IOError:
    print "New file %s could not be opened" % (sys.argv[2])
    sys.exit(1)
titles=csv.reader(ofile, dialect='excelpol').next() orig=csv.DictReader(ofile, titles, dialect='excelpol') transl=csv.DictReader(tfile, titles, dialect='excelpol')
cfile=open('cmpfile.csv','wb')
titles.append('New')
titles.append('RowChanged')
cm=csv.DictWriter(cfile,titles, dialect='excelpol')
cm.writerow(dict(zip(titles,titles)))
print titles
print "-------------"
oldl=[x for x in orig]
newl=[x for x in transl]
all=[]
for o in oldl:
    for n in newl:
        if n['English'] == o['English']:
            if n['Polish'] == o['Polish']:
                status=''
            else:
                status='CHANGED'
            combined={'TermID': o['TermID'], 'English': o['English'],
'Polish': o['Polish'], 'New': n['Polish'], 'RowChanged': status}
            cm.writerow(combined)
            all.append(combined)
# duplicates
dfile=open('dupes.csv','wb')
dupes=csv.DictWriter(dfile,titles,dialect='excelpol')
dupes.writerow(dict(zip(titles,titles)))
"""for i in xrange(0,len(all)-2): for j in xrange(i+1, len(all)-1): if (all[i]['English']==all[j]['English']) and all[i]['RowChanged']=='CHANGED': dupes.writerow(all[i]) dupes.writerow(all[j])""" cfile.close() ofile.close() tfile.close() dfile.close()
--
Real world is perfectly indifferent to lies that are the foundation of leftist "thinking".


--
http://mail.python.org/mailman/listinfo/python-list

Re: Pythonic search of list of dictionaries

Reply via email to