On 23 loka, 15:24, Peter Otten <[EMAIL PROTECTED]> wrote: > Jani Tiainen wrote: > > I have rather simple 'Address' object that contains streetname, > > number, my own status and x,y coordinates for it. I have two lists > > both containing approximately 30000 addresses. > > > I've defined __eq__ method in my class like this: > > > def __eq__(self, other): > > return self.xcoord == other.xcoord and \ > > self.ycoord == other.ycoord and \ > > self.streetname == other.streetname and \ > > self.streetno == other.streetno > > > But it turns out to be very, very slow. > > > Then I setup two lists: > > > list_external = getexternal() > > list_internal = getinternal() > > > Now I need get all all addresses from 'list_external' that are not in > > 'list_internal', and mark them as "new". > > > I did it like this: > > > for addr in list_external: > > if addr not in list_internal: > > addr.status = 1 # New address > > > But in my case running that loop takes about 10 minutes. What I am > > doing wrong? > > Even if list_external and list_internal contain the same items you need > about len(list_external)*(len(list_internal)/2), or 45 million comparisons. > You can bring that down to 30000*some_small_factor if you make your > addresses hashable and put the internal ones into a dict or set > > def __hash__(self): > return hash((self.xcoord, self.yccord, self.streetname, self.streetno)) > def __eq__(self, other): > # as above > > Then > > set_internal = set(list_internal) > for addr in list_external: > if addr not in set_internal: > addr.status = 1 > > Note that the attributes relevant for hash and equality must not change > during this process.
Very complete answer, thank you very much. I tried that hash approach and sets but it seemed to get wrong results first time and it was all due my hash method. Now it takes like 2-3 seconds to do all that stuff and result seem to be correct. Apparently I have lot to learn about Python... :) -- Jani Tiainen -- http://mail.python.org/mailman/listinfo/python-list