Paul Rubin wrote: > This can probably be cleaned up some: > > from itertools import islice > from collections import deque > > def ngram(n, seq): > it = iter(seq) > d = deque(islice(it, n)) > if len(d) != n: > return > for s in it: > yield tuple(d) > d.popleft() > d.append(s) > if len(d) == n: > yield tuple(d) > > def test(): > xs = range(20) > for a in ngram(5, xs): > print a > > test()
I started with def ngrams2(items, n): items = iter(items) d = deque(islice(items, n-1), maxlen=n) for item in items: d.append(item) yield tuple(d) and then tried a few dirty tricks, but nothing except omitting tuple(d) brought performance near Steven's version. Just for fun, here's the obligatory oneliner: def ngrams1(items, n): return zip(*(islice(it, i, None) for i, it in enumerate(tee(items, n)))) Be aware that the islice() overhead is significant (I wonder if the islice() implementation could be tweaked to reduce that). -- https://mail.python.org/mailman/listinfo/python-list