The documentation for the itertools has this nice implementation for a fast bigram function:
from itertools import tee def pairwise(iterable): "s -> (s0,s1), (s1,s2), (s2, s3), ..." a, b = tee(iterable) next(b, None) return zip(a, b) https://docs.python.org/3/library/itertools.html#itertools-recipes Which gives us an obvious trigram and 4-gram implementation: def trigram(iterable): a, b, c = tee(iterable, 3) next(b, None) next(c, None); next(c, None) return zip(a, b, c) def four_gram(iterable): a, b, c, d = tee(iterable, 4) next(b, None) next(c, None); next(c, None) next(d, None); next(d, None); next(d, None) return zip(a, b, c, d) And here's an implementation for arbitrary n-grams: def ngrams(iterable, n=2): if n < 1: raise ValueError t = tee(iterable, n) for i, x in enumerate(t): for j in range(i): next(x, None) return zip(*t) Can we do better, or is that optimal (for any definition of optimal that you like)? -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list