[snip]Following advice of two posters here (thanks) I have written two versions of the same program, and both of them work, but the difference in speed is drastic, about 6 seconds vs 190 seconds for about 15000 of processed records, taken from 2 lists of dictionaries.
I've read "Python Performance Tips" at
http://manatee.mojam.com/~skip/python/fastpython.html
..but still don't understand why the difference is so big.
# snippet 1, this runs in about 6 seconds !def prepend(l): ! map = {} ! for d in l: ! key = d['English'] ! map[key] = d ! return map ! !old_map = prepend(oldl) !new_map = prepend(newl) ! !for engphrase in old_map: ! if engphrase in new_map: ! o = old_map[engphrase] ! n = new_map[engphrase] ! cm.writerow(matchpol(o,n))
# snippet 2, this needs 190 seconds
!while 1:
! if len(oldl) == 0 or len(newl) == 0:
! break
! if oldl[o]['English'] == newl[n]['English']:
! cm.writerow(matchpol(oldl[o], newl[n]))
! del oldl[o]
! del newl[n]
! o, n = 0, 0
! continue
! elif cmp(oldl[o]['English'], newl[n]['English']) < 0:
! if o == len(oldl):
! cm.writerow(newl[0])
! del(newl[0])
! o, n = 0, 0
! continue
! o+=1
! elif cmp(oldl[o]['English'], newl[n]['English']) > 0:
! if n == len(newl):
! cm.writerow(newl[0])
! del(oldl[0])
! o, n = 0, 0
! continue
! n+=1
I believe you're running into the fact that deleting from anywhere but the end of a list in Python is O(n), where n is the number of items in the list. Consider:
---------- test.py ---------- def delfromstart(lst): while lst: del lst[0]
def delfromend(lst): for i in range(len(lst)-1, -1, -1): del lst[i] -----------------------------
[D:\Steve]$ python -m timeit -s "import test" "test.delfromstart(range(1000))"
1000 loops, best of 3: 1.09 msec per loop
[D:\Steve]$ python -m timeit -s "import test" "test.delfromend(range(1000))" 1000 loops, best of 3: 301 usec per loop
Note that Python lists are implemented basically as arrays, which means that deleting an item from anywhere but the end of the list is O(n) because all items in the list must be moved down to fill the hole.
Repeated deletes from a list are generally not the way to go, as your example shows. =)
Steve -- http://mail.python.org/mailman/listinfo/python-list