Am 10.11.2016 um 03:06 schrieb Paul Rubin:
This can probably be cleaned up some:

     from itertools import islice
     from collections import deque

     def ngram(n, seq):
         it = iter(seq)
         d = deque(islice(it, n))
         if len(d) != n:
             return
         for s in it:
             yield tuple(d)
             d.popleft()
             d.append(s)
         if len(d) == n:
             yield tuple(d)

     def test():
         xs = range(20)
         for a in ngram(5, xs):
             print a

     test()
That's along the lines of what I thought. Steven's version has two areas that might be possible to be improved:

1. The startup looks slightly ugly to me.
2. If n is large, tee has to maintain a lot of unnecessary state.

collections.deque manages all the state we need. Here are Steve's version (ngram) and mine (ngram_2):

from itertools import tee, islice
from collections import deque

def ngrams(iterable, n=2):
    if n < 1:
        raise ValueError
    t = tee(iterable, n)
    for i, x in enumerate(t):
        for j in range(i):
            next(x, None)
    return zip(*t)

def ngrams_2(iterable, n=2):
    if n < 1:
        raise ValueError
    it = iter(iterable)
    d = deque(islice(it, n-1), maxlen=n)
    for elem in it:
        d.append(elem)
        yield tuple(d)

print(list(ngrams(range(1000), 4)) == list(ngrams_2("abcdefg", 4)))


One problem my version has, is that it does the iteration over the iterable itself, so that's probably more python code (instead of C code in Steven's version). For large n the reduced number of iterators does pay off, though:

%timeit list(ngrams(range(1000), n=500))
10 loops, best of 3: 26.5 ms per loop

%timeit list(ngrams_2(range(1000), n=500))
100 loops, best of 3: 4.07 ms per loop

For small n, it's slower, as expected:

%timeit list(ngrams(range(1000), n=3))
10000 loops, best of 3: 120 µs per loop

%timeit list(ngrams_2(range(1000), n=3))
1000 loops, best of 3: 603 µs per loop


--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to