Re: Which type should be used when testing static structure appartenance

Oscar Benjamin Tue, 17 Nov 2015 08:18:19 -0800

On 17 November 2015 at 14:27, Nicolas Évrard <ni...@altern.org> wrote:
> Hello,
>
> I saw the following retweet by Raymond Hettinger in this morning:
>
>    https://twitter.com/sanityinc/status/666485814214287360
>
>    Programming tip: many of those arrays and hashes in your code
>    should actually be sets. Match data structures to data
>    constraints!
>
> I saw just in time because in a review I wrote something like this:
>
>    if operator not in ('where', 'not where')
>
> and my colleague proposed that I should use a list instead of a tuple.
> But reading the mentioned tweet I tend to think that a set would be a
> better choice.
>
> What are your opinion on this issue (I realize it's not something of
> the utmost importance but rather a "philosophical" question).


Conceptually it should be a set. Really it makes little difference though.

I think that what the tip is really referring to is a situation where
you have a larger data structure. For example asking if x in y has
different algorithmic performance when y is a large tuple/list than a
set. This is because a set is a hash-table with O(1) average lookup
cost and a tuple/list needs to be scanned linearly having O(N) average
cost. This can be very costly and can often be seen in beginner code
where someone does something like:

   for x in stuff:
       if x in alist:
           # do stuff

Here the "in" operator loops over alist once for each item in stuff
(until it finds a match). If there are M things in stuff and N things
in alist then this is O(M*N) items to compare. If we convert alist to
a set at the start of the loop then we have:

    aset = set(alist)
    for xin stuff:
        if x in aset:
            # do stuff

Here the set constructor scans alist which is O(N) and then for each
of M things in stuff we do an O(1) lookup in aset. So the result is
O(M+N) meaning linear rather than quadratic performance. This can be a
significant difference especially for something so trivially easy to
do.

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Which type should be used when testing static structure appartenance

Reply via email to