Andrew Dalke <da...@dalkescientific.com> added the comment: My belief is that the people who use set.intersection with more than two terms are 1) going to pass in a list of sets, and 2) don't care about the specific order.
To check the validity of my belief, I did a Google Code Search to find cases of people using set intersection in Python. I searched for "set\.intersection\(\*" and "\.intersection\(.*\, lang:^python$", among others. I am sad to report that the most common way to compute set.intersection(*list) is by using reduce, like: possible = (set(index[c]) for c in set(otp)) possible = reduce(lambda a, b: a.intersection(b), possible) That comes from: git://github.com/Kami/python-yubico-client.git /yubico/modhex.py and similar uses are in: git://github.com/sburns/PyCap.git /redcap/rc.py http://hltdi-l3.googlecode.com/hg//xdg/languages/morpho/fst.py http://dsniff.googlecode.com/svn/trunk/dsniff/lib/fcap.py As well as in the Rosetta Code example for a simple inverted index, at: http://rosettacode.org/wiki/Inverted_index#Python This was also implemented more verbosely in: http://eats.googlecode.com/svn/trunk/server/eats/views/main.py intersected_set = sets[0] for i in range(1, len(sets)): intersected_set = intersected_set.intersection(sets[i]) and http://iocbio.googlecode.com/svn/trunk/iocbio/microscope/cluster_tools.py s = set (range (len (data[0]))) for d in zip(*data): s = s.intersection(set(find_outliers(d, zoffset=zoffset))) return sorted(s) In other words, 7 codebases use manual pairwise reduction rather than use the equivalent code in Python. (I have not checked for which are due to backwards compatibility requirements.) On the other hand, if someone really wants to have a specific intersection order, this shows that it's very easy to write. I found 4 other code bases where set intersection was used for something other than binary intersection, and used the built-in intersection(). git://github.com/valda/wryebash.git/experimental/bait/bait/presenter/impl/filters.py def get_visible_node_ids(self, filterId): if filterId in self.idMask: visibleNodeIdSets = [f.get_visible_node_ids(filterId) for f in self._filters] return set.intersection(*[v for v in visibleNodeIdSets if v is not None]) return None http://wallproxy.googlecode.com/svn/trunk/local/proxy.py if threads[ct].intersection(*threads.itervalues()): raise ValueError('All threads failed') (here, threads' values contain sets) git://github.com/argriffing/xgcode.git/20100623a.py header_sets = [set(x) for x in header_list] header_intersection = set.intersection(*header_sets) http://pyvenn.googlecode.com/hg//venn.py to_exclude = set() for ii in xrange(0, len(self.sets)): if (i & (2**ii)): sets_to_intersect.append(sets_by_power_of_two[i & (2**ii)]) else: to_exclude = to_exclude.union(sets_by_power_of_two[(2**ii)]) final = set.intersection(*sets_to_intersect) - to_exclude These all find the intersection of sets (not iterators), and the order of evaluation does not appear like it will affect the result. I do not know though if there will be a performance advantage in these cases to reordering. I do know that in my code, and any inverted index, there is an advantage. And I do know that the current CPython implementation has bad worst-case performance. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13653> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com