Am 10.11.2016 um 03:06 schrieb Paul Rubin:
This can probably be cleaned up some:
from itertools import islice
from collections import deque
def ngram(n, seq):
it = iter(seq)
d = deque(islice(it, n))
if len(d) != n:
return
for s in it:
yield tuple(d)
d.popleft()
d.append(s)
if len(d) == n:
yield tuple(d)
def test():
xs = range(20)
for a in ngram(5, xs):
print a
test()
That's along the lines of what I thought. Steven's version has two areas
that might be possible to be improved:
1. The startup looks slightly ugly to me.
2. If n is large, tee has to maintain a lot of unnecessary state.
collections.deque manages all the state we need. Here are Steve's
version (ngram) and mine (ngram_2):
from itertools import tee, islice
from collections import deque
def ngrams(iterable, n=2):
if n < 1:
raise ValueError
t = tee(iterable, n)
for i, x in enumerate(t):
for j in range(i):
next(x, None)
return zip(*t)
def ngrams_2(iterable, n=2):
if n < 1:
raise ValueError
it = iter(iterable)
d = deque(islice(it, n-1), maxlen=n)
for elem in it:
d.append(elem)
yield tuple(d)
print(list(ngrams(range(1000), 4)) == list(ngrams_2("abcdefg", 4)))
One problem my version has, is that it does the iteration over the
iterable itself, so that's probably more python code (instead of C code
in Steven's version). For large n the reduced number of iterators does
pay off, though:
%timeit list(ngrams(range(1000), n=500))
10 loops, best of 3: 26.5 ms per loop
%timeit list(ngrams_2(range(1000), n=500))
100 loops, best of 3: 4.07 ms per loop
For small n, it's slower, as expected:
%timeit list(ngrams(range(1000), n=3))
10000 loops, best of 3: 120 µs per loop
%timeit list(ngrams_2(range(1000), n=3))
1000 loops, best of 3: 603 µs per loop
--
https://mail.python.org/mailman/listinfo/python-list