Some idioms are so common that I think they deserve to be written in C into the itertools module.
1) leniter(iterator) It returns the length of a given iterator, consuming it, without creating a list. I have discussed this twice in the past. Like itertools.izip_longest don't use it with infinite iterators. The following code is faster than sum(1 for _ in iterator) A Python implementation: def leniter(iterator): if hasattr(iterator, "__len__"): return len(iterator) nelements = 0 for _ in iterator: nelements += 1 return nelements I use it to count the grup items that groupby() gives: [(h, leniter(g)) for h,g in groupby(iterable)] And I use it various other situations, like when I want to test the speed of a lazy generator: timeit(leniter, xpairwise(xrange(20))) (Where "timeit" is a function of mine similar to a Mathematica function that given a function and some data, applies it and returns the result and the time taken to perform it). ---------------- 2) xpairwise(iterable) This is at the the bottom of the itertools docs: def pairwise(iterable): a, b = tee(iterable) for elem in b: break return izip(a, b) The following of mine is faster: def xpairwise(iterable): return izip(iterable, islice(iterable, 1, None)) But a C implementation is better (also avoiding copy&paste from the bottom of the docs page). This function can be generalized a lot (I also have written a much more flexible version), but in practice I have seen this basic version covers most of the common situations. ---------------- 3) xpairs(seq) This is a Python implementation: def xpairs(seq): """xpairs(seq): yields all the distinct pairs of the given seq, inside a pair the order doesn't matter. No pairs have the same element twice. This means it yields just the upper triangular half of the pair matrix. >>> list( xpairs([]) ) [] >>> list(xpairs([1,2])) [(1, 2)] >>> list( xpairs("abc") ) [('a', 'b'), ('a', 'c'), ('b', 'c')] >>> list(xpairs(range(5))) [(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)] """ len_seq = len(seq) for i, e1 in enumerate(seq): for j in xrange(i+1, len_seq): yield e1, seq[j] A more general implementation can also be written when seq isn't randomly-addressable, but I think that may become not efficient. IHMO xpairs and xpairwise may even deserve an interpreter support, to speed up them when possible. In a modern compiled language (like Chapel, http://chapel.cs.washington.edu/ ) such idioms (in theory) produce no slowdown compared to normal loops. I have written generic xpairs() and xpairwise() for the D language too, but the compiler isn't able to fully optimize away such constructs. ---------------- 4) xsubsets(seq) This is the recipe at the bottom of the docs page (a bit modified, now it returns lists because in most situations I don't want the results to be sets): def subsets(iterable): # Recipe credited to Eric Raymond pairs = [(2 ** i, x) for i, x in enumerate(iterable)] for n in xrange(2 ** len(pairs)): yield [x for m, x in pairs if m & n] After starting to use Python 2.6 I now use the following version (this replaces a longer version I used in the past): def xsubsets(seq): seq = list(seq) yield [] for nitems in xrange(1, len(seq)+1): for subset in combinations(seq, nitems): yield subset Thanks to the very fast combinations() this xsubsets code is faster than that recipe by Eric Raymond, but it has the disadvantage that it doesn't yield subsets in the natural "binary" order. So a C implementation can be done to respect that natural order. Such "binary" order has a disadvantage, because the lists/tuples it yields have a different length, so the C code may have troubles in using the the nice trick it currently use, re-using the same memory and avoiding a new memory allocation every loop: /* Copy the previous result tuple or re-use it if available */ if (Py_REFCNT(result) > 1) { PyObject *old_result = result; result = PyTuple_New(r); if (result == NULL) goto empty; co->result = result; for (i=0; i<r ; i++) { elem = PyTuple_GET_ITEM(old_result, i); Py_INCREF(elem); PyTuple_SET_ITEM(result, i, elem); } Py_DECREF(old_result); } /* Now, we've got the only copy so we can update it in-place * CPython's empty tuple is a singleton and cached in * PyTuple's freelist. */ assert(r == 0 || Py_REFCNT(result) == 1); Maybe there are ways to avoid that in the C implementation of xsubsets too, I don't know Python implementation enough to tell... Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list