On 4 Jan 2005 14:33:34 -0800, "John Machin" <[EMAIL PROTECTED]> wrote:
>(b) Fast forwarding 30+ years, let's look at the dictionary method, >assuming you have enough memory to hold all your data at once: > >Step 1: read the "left" table; for each row, if english not in mydict, >then do mydict[english] = MyObject(). In any case, do >mydict[english].left_list.append(row) >Step 2: same for the "right" table. >Step 3: for english, obj in mydict.iteritems(): process(english, obj) >As your datasets are stored in MS Excel spreadsheets, N < 64K so >whether your solution is O(N) or O(N*log(N)) doesn't matter too much. >You are however correct to avoid O(N**2) solutions. Following advice of two posters here (thanks) I have written two versions of the same program, and both of them work, but the difference in speed is drastic, about 6 seconds vs 190 seconds for about 15000 of processed records, taken from 2 lists of dictionaries. I've read "Python Performance Tips" at http://manatee.mojam.com/~skip/python/fastpython.html ..but still don't understand why the difference is so big. Both versions use local variables, etc. Both have their lists initially sorted. Both essentially use a loop with conditional for comparison, then process the record in the same way. The overhead of second version is that it also uses cmp() function and two additional integer variables - that should not slow the program _so much_. I have measured the run of snippet 2 with time checkpoints written to a log (write time delta to log every 200 records), even made a graph of time deltas in spreadsheet and in fact snippet 2 seems after initial slowdown looks exactly linear, like that: ^ (time) | | /----------- | / |/ ---------------> (# of records written) So yes, it would scale to big files. However, why is it so frigging slow?! # snippet 1, this runs in about 6 seconds !def prepend(l): ! map = {} ! for d in l: ! key = d['English'] ! map[key] = d ! return map ! !old_map = prepend(oldl) !new_map = prepend(newl) ! !for engphrase in old_map: ! if engphrase in new_map: ! o = old_map[engphrase] ! n = new_map[engphrase] ! cm.writerow(matchpol(o,n)) # snippet 2, this needs 190 seconds !while 1: ! if len(oldl) == 0 or len(newl) == 0: ! break ! if oldl[o]['English'] == newl[n]['English']: ! cm.writerow(matchpol(oldl[o], newl[n])) ! del oldl[o] ! del newl[n] ! o, n = 0, 0 ! continue ! elif cmp(oldl[o]['English'], newl[n]['English']) < 0: ! if o == len(oldl): ! cm.writerow(newl[0]) ! del(newl[0]) ! o, n = 0, 0 ! continue ! o+=1 ! elif cmp(oldl[o]['English'], newl[n]['English']) > 0: ! if n == len(newl): ! cm.writerow(newl[0]) ! del(oldl[0]) ! o, n = 0, 0 ! continue ! n+=1 -- It's a man's life in a Python Programming Association. -- http://mail.python.org/mailman/listinfo/python-list