Maric Michaud spiegò: > Le Mercredi 14 Juin 2006 22:57, Emanuele Aina a écrit : > > Here you can see that even with only the __lt__ method it goes 10x > > slower, but __lt__ is never called as "Foo" is not printed. > > No, that is not what it shows. The only thing it shows is that in operator is > slow for lists, and as it iterates over the entire list untill it find an > element, it is much slower when it fails a lot.
Yes, I now that "in" is slow for lists and that a dict is preferable, but what I'm pointing out is that in one case is much slower without a reason. In my test, the one without __lt__ runs in ~500ms, while the one with the __lt__ method runs in ~5000ms. Just for comparision I've also tested with an overridden __eq__ and it shows timings similar to the ones in the second test, but this time it is expected, as "in" has to call __eq__ for every element in the list. In the first test python uses the native equality (comparing memory pointer) and it is fast because it is implemented in C in the interpreter (no python function call). In the last case python has to call __eq__ for every element, and a python function call is obviuosly more expensive than the native comparision. But the __lt__ case leaves me wondering why it is slow as the __eq__ case, while using the native comparision. What I asked is what is the difference between the first and the second case? > Use dict in operator instead. Yes, I know that a dict is the right datastructure for a fast "in", but I need to keep track of the ordering, so I cannot use a dict. > Here is a program to show this in details : > > import timing > > class State(object): > def __init__(self, value): > self.v = value > > class StateEQ(State): > def __eq__ (self, other): > if isinstance(other, State) : > return self.v == other.v > else : > return self.v == other > > class StateLT(State) : > def __lt__ (self, other): > if isinstance(other, State) : > return self.v < other.v > else : > return self.v < other > > class StateLTEQ(StateEQ, StateLT) : pass > > def test(state_cls): > num_elem = 3*10**4 > print > print state_cls > > l = [state_cls(0)] > for i in xrange(num_elem): l.append(state_cls(i+1)) > print > print "in operator at the beginning of list:", > timing.start() > for i in xrange(300): i in l > timing.finish() > print timing.milli() > print "in operator at the end of list:", > timing.start() > for i in xrange(num_elem-300, num_elem): i in l > timing.finish() > print timing.milli() Sorry, but this works only when you override __eq__. Without it it will alway scan the entire list, so you are comparing the timings of two different things. > print > print "converting to dict :", > timing.start() > d = {}.fromkeys(l) > timing.finish() > print timing.milli() > print "in operator for a dict for %s elements:" % (num_elem*2), > timing.start() > for i in xrange(num_elem, num_elem*2): i in d > timing.finish() > print timing.milli() As I said, that was only a test demo exploiting my observation, in my real program I need to keep track of the order in which I add items in the list, so I can't use a dict. > for which the output is : > > <class '__main__.State'> > > in operator at the beginning of list: 1983 > in operator at the end of list: 2011 > > converting to dict : 82 > in operator for a dict for 60000 elements: 12 > > <class '__main__.StateLT'> > > in operator at the beginning of list: 3866 > in operator at the end of list: 3871 Here python is using the equality comparator from "int", giving those fast results. I've substituted the ints with instances of state_cls: - for i in xrange(300): i in l + for i in xrange(300): state_cls(i) in l - for i in xrange(num_elem-300, num_elem): i in l + for i in xrange(num_elem-300, num_elem): state_cls(i) in l Running these test, in the case of StateLT, now takes ~10000ms and ~9000ms. [...] > Every classes that define the __eq__ operator will find quickly the elements > if they are at the beginning of the list. > If they are at the end, the in oprerator is slower in these classes because of > the overhead of function calls (in StateLt there is also an overhead due to ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > internal lookup, IMO). ^^^^^^^^^^^^^^^ Yes, that was my question! :) But I hoped in a more exaustive answer: why python has to do this lookup when the __lt__ method is not involved at all? -- http://mail.python.org/mailman/listinfo/python-list