On 17 November 2015 at 14:27, Nicolas Évrard <ni...@altern.org> wrote: > Hello, > > I saw the following retweet by Raymond Hettinger in this morning: > > https://twitter.com/sanityinc/status/666485814214287360 > > Programming tip: many of those arrays and hashes in your code > should actually be sets. Match data structures to data > constraints! > > I saw just in time because in a review I wrote something like this: > > if operator not in ('where', 'not where') > > and my colleague proposed that I should use a list instead of a tuple. > But reading the mentioned tweet I tend to think that a set would be a > better choice. > > What are your opinion on this issue (I realize it's not something of > the utmost importance but rather a "philosophical" question).
Conceptually it should be a set. Really it makes little difference though. I think that what the tip is really referring to is a situation where you have a larger data structure. For example asking if x in y has different algorithmic performance when y is a large tuple/list than a set. This is because a set is a hash-table with O(1) average lookup cost and a tuple/list needs to be scanned linearly having O(N) average cost. This can be very costly and can often be seen in beginner code where someone does something like: for x in stuff: if x in alist: # do stuff Here the "in" operator loops over alist once for each item in stuff (until it finds a match). If there are M things in stuff and N things in alist then this is O(M*N) items to compare. If we convert alist to a set at the start of the loop then we have: aset = set(alist) for xin stuff: if x in aset: # do stuff Here the set constructor scans alist which is O(N) and then for each of M things in stuff we do an O(1) lookup in aset. So the result is O(M+N) meaning linear rather than quadratic performance. This can be a significant difference especially for something so trivially easy to do. -- Oscar -- https://mail.python.org/mailman/listinfo/python-list