Re: N-grams

2016-11-10 Thread srinivas devaki
On Thu, Nov 10, 2016 at 2:26 PM, Peter Otten <__pete...@web.de> wrote: > > I don't think I've seen tee(iterable, 1) before. Did you do this for > aesthetic reasons or is there an advantage over > > t = [iter(iterable)] Yeah just to be aesthetic, there's no extra advantage over that as with n

Re: N-grams

2016-11-10 Thread Peter Otten
Paul Rubin wrote: > This can probably be cleaned up some: > > from itertools import islice > from collections import deque > > def ngram(n, seq): > it = iter(seq) > d = deque(islice(it, n)) > if len(d) != n: > return > for s in it: >

Re: N-grams

2016-11-10 Thread Peter Otten
srinivas devaki wrote: Interesting approach. > def myngrams(iterable, n=2): > t = list(tee(iterable, 1)) I don't think I've seen tee(iterable, 1) before. Did you do this for aesthetic reasons or is there an advantage over t = [iter(iterable)] ? > for _ in range(n - 1): >

Re: N-grams

2016-11-10 Thread Steven D'Aprano
On Thursday 10 November 2016 17:53, Wolfram Hinderer wrote: [...] > 1. The startup looks slightly ugly to me. > 2. If n is large, tee has to maintain a lot of unnecessary state. But n should never be large. If practice, n-grams are rarely larger than n=3. Occasionally you might use n=4 o

Re: N-grams

2016-11-09 Thread srinivas devaki
On Thu, Nov 10, 2016 at 12:43 PM, srinivas devaki wrote: > complexity wise it's O(N), but space complexity is O(N**2) to execute > this function, I'm sorry, that is a mistake. I just skimmed through the itertoolsmodule.c, and it seems like the space complexity is just O(N), as when tee objects ar

Re: N-grams

2016-11-09 Thread Paul Rubin
Ian Kelly writes: > I'd use the maxlen argument to deque here. Oh that's cool, it's a Python 3 thing though. > Better to move the extra yield above the loop and reorder the loop > body so that the yielded tuple includes the element just read. Thanks, I'll give that a try. >> if len(d)

Re: N-grams

2016-11-09 Thread srinivas devaki
> def ngrams(iterable, n=2): > if n < 1: > raise ValueError > t = tee(iterable, n) > for i, x in enumerate(t): > for j in range(i): > next(x, None) > return zip(*t) def myngrams(iterable, n=2): t = list(tee(iterable, 1)) for _ in range(n - 1):

Re: N-grams

2016-11-09 Thread Wolfram Hinderer
Am 10.11.2016 um 03:06 schrieb Paul Rubin: This can probably be cleaned up some: from itertools import islice from collections import deque def ngram(n, seq): it = iter(seq) d = deque(islice(it, n)) if len(d) != n: return for s in

Re: N-grams

2016-11-09 Thread Ian Kelly
On Wed, Nov 9, 2016 at 7:06 PM, Paul Rubin wrote: > This can probably be cleaned up some: Okay. :-) > from itertools import islice > from collections import deque > > def ngram(n, seq): Looks like "seq" can be any iterable, not just a sequence. > it = iter(seq) > d

Re: N-grams

2016-11-09 Thread Paul Rubin
This can probably be cleaned up some: from itertools import islice from collections import deque def ngram(n, seq): it = iter(seq) d = deque(islice(it, n)) if len(d) != n: return for s in it: yield tuple(d) d.popleft(

Re: N-grams

2016-11-09 Thread Ian Kelly
On Wed, Nov 9, 2016 at 6:38 AM, Steve D'Aprano wrote: > And here's an implementation for arbitrary n-grams: > > > def ngrams(iterable, n=2): > if n < 1: > raise ValueError > t = tee(iterable, n) > for i, x in enumerate(t): >

N-grams

2016-11-09 Thread Steve D'Aprano
next(b, None) next(c, None); next(c, None) next(d, None); next(d, None); next(d, None) return zip(a, b, c, d) And here's an implementation for arbitrary n-grams: def ngrams(iterable, n=2): if n < 1: raise ValueError t = tee(iterable, n) for i, x in enumerate

Language detector using N-grams

2009-05-12 Thread Cesar D. Rodas
Hello Pythoners! I just finished my first useful project in Python, It is a language detector using N-grams. I hope this can be useful for someone, http://github.com/crodas/py-languess/tree/master The License of the project is BSD Best regards -- Cesar D. Rodas http://cesar.la/ Phone: +595