__lt__ slowing the "in" operator even if not called

2006-06-14 Thread Emanuele Aina
I have some code which does a lot of "in" on lists containing objects
with no __eq__ defined.

It all goes fast until I add the __lt__() method: then I have a
slowdown comparable to the one I get using the overridden __eq__, while
the __lt__ method is never called.

Someone can explain me why?


I have here a simple program which shows the behaviour:

*---*

#!/usr/bin/env python

import timing

class State(object):
def __init__(self, value):
self.v = value

def generate(self):
return [self.__class__(self.v+1)]

class StateLT(State):
def __lt__ (self, other):
print 'Foo'
return self.v < other.v

class StateEQ(State):
def __eq__ (self, other):
return self.v == other.v


def test(state_cls):
print state_cls

timing.start()
initial = state_cls(0)
l = [initial]
for i in xrange(3000):

successors = l[i].generate()
successors = [s for s in successors if s not in l]

l += successors
timing.finish()
print timing.milli()

if __name__ == '__main__':
test(State)
print ' '
test(StateLT)
print ' '
test(StateEQ)

*---*

On my machine the output is:


406


4982


5395

Here you can see that even with only the __lt__ method it goes 10x
slower, but __lt__ is never called as "Foo" is not printed.

I use the python 2.3.5-8 package on a debian unstable, but even the
python 2.4.3-5 package shows the same behaviour.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: __lt__ slowing the "in" operator even if not called

2006-06-15 Thread Emanuele Aina
Maric Michaud spiegò:

> Le Mercredi 14 Juin 2006 22:57, Emanuele Aina a écrit :
> > Here you can see that even with only the __lt__ method it goes 10x
> > slower, but __lt__ is never called as "Foo" is not printed.
>
> No, that is not what it shows. The only thing it shows is that in operator is
> slow for lists, and as it iterates over the entire list untill it find an
> element, it is much slower when it fails a lot.

Yes, I now that "in" is slow for lists and that a dict is preferable,
but what I'm pointing out is that in one case is much slower without a
reason.

In my test, the one without __lt__ runs in ~500ms, while the one with
the __lt__ method runs in ~5000ms.

Just for comparision I've also tested with an overridden __eq__ and it
shows timings similar to the ones in the second test, but this time it
is expected, as "in" has to call __eq__ for every element in the list.

In the first test python uses the native equality (comparing memory
pointer) and it is fast because it is implemented in C in the
interpreter (no python function call).

In the last case python has to call __eq__ for every element, and a
python function call is obviuosly more expensive than the native
comparision.

But the __lt__ case leaves me wondering why it is slow as the __eq__
case, while using the native comparision.

What I asked is what is the difference between the first and the second
case?

> Use dict in operator instead.

Yes, I know that a dict is the right datastructure for a fast "in", but
I need to keep track of the ordering, so I cannot use a dict.


> Here is a program to show this in details :
>
> import timing
>
> class State(object):
> def __init__(self, value):
> self.v = value
>
> class StateEQ(State):
> def __eq__ (self, other):
> if isinstance(other, State) :
> return self.v == other.v
> else :
> return self.v == other
>
> class StateLT(State) :
> def __lt__ (self, other):
> if isinstance(other, State) :
> return self.v < other.v
> else :
> return self.v < other
>
> class StateLTEQ(StateEQ, StateLT) : pass
>
> def test(state_cls):
> num_elem = 3*10**4
> print
> print state_cls
>
> l = [state_cls(0)]
> for i in xrange(num_elem): l.append(state_cls(i+1))
> print
> print "in operator at the beginning of list:",
> timing.start()
> for i in xrange(300): i in l
> timing.finish()
> print timing.milli()
> print "in operator at the end of list:",
> timing.start()
> for i in xrange(num_elem-300, num_elem): i in l
> timing.finish()
> print timing.milli()

Sorry, but this works only when you override __eq__.

Without it it will alway scan the entire list, so you are comparing the
timings of two different things.

> print
> print "converting to dict :",
> timing.start()
> d = {}.fromkeys(l)
> timing.finish()
> print timing.milli()
> print "in operator for a dict for %s elements:" % (num_elem*2),
> timing.start()
> for i in xrange(num_elem, num_elem*2): i in d
> timing.finish()
> print timing.milli()

As I said, that was only a test demo exploiting my observation, in my
real program I need to keep track of the order in which I add items in
the list, so I can't use a dict.

> for which the output is :
>
> 
>
> in operator at the beginning of list: 1983
> in operator at the end of list: 2011
>
> converting to dict : 82
> in operator for a dict for 6 elements: 12
>
> 
>
> in operator at the beginning of list: 3866
> in operator at the end of list: 3871

Here python is using the equality comparator from "int", giving those
fast results.
I've substituted the ints with instances of state_cls:
-for i in xrange(300): i in l
+for i in xrange(300): state_cls(i) in l

-for i in xrange(num_elem-300, num_elem): i in l
+for i in xrange(num_elem-300, num_elem): state_cls(i) in l

Running these test, in the case of StateLT, now takes ~1ms and
~9000ms.

[...]

> Every classes that define the __eq__ operator will find quickly the elements
> if they are at the beginning of the list.
> If they are at the end, the in oprerator is slower in these classes because of
> the overhead of function calls (in StateLt there is also an overhead due to
  ^
> internal lookup, IMO).
  ^^^

Yes, that was my question! :)

But I hoped in a more exaustive answer: why python has to do this
lookup when the __lt__ method is not involved at all?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: __lt__ slowing the "in" operator even if not called

2006-06-15 Thread Emanuele Aina
Maric Michaud continuò:

> > But I hoped in a more exaustive answer: why python has to do this
> > lookup when the __lt__ method is not involved at all?
>
> It is not the case, that's what my program shows :
>
> 
>
> in operator at the beginning of list: 173
> in operator at the end of list: 28249 <- here
>
> converting to dict : 79
> in operator for a dict for 6 elements: 14
>
> 
>
> in operator at the beginning of list: 202
> in operator at the end of list: 30472 <- and here

It is very obvious that these two have similiar timings, as both call
__eq__.

I asked why the State and StateLT don't give similar results, but
StateLT is noticeably slower.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: __lt__ slowing the "in" operator even if not called

2006-06-15 Thread Emanuele Aina
[EMAIL PROTECTED] dettagliò:

> > Someone can explain me why?
>
> The list's __contains__ method is very simple

[...]

> So if you define "__lt__" in your object then the type gets a richcmp
> function and your == test implicit in the 'in' search always incurs the
> cost of figuring out that "__eq__" is not defined.

Thank you for the detailed explanation! :)

Do you think I should report this as a performance bug, maybe with the
'wishlist' priority, or I should accept the truth and hope for better
luck next time? ;)

-- 
http://mail.python.org/mailman/listinfo/python-list