On Nov 24, 5:44 am, Licheng Fang <[EMAIL PROTECTED]> wrote: > Yes, millions. In my natural language processing tasks, I almost > always need to define patterns, identify their occurrences in a huge > data, and count them. Say, I have a big text file, consisting of > millions of words, and I want to count the frequency of trigrams: > > trigrams([1,2,3,4,5]) == [(1,2,3),(2,3,4),(3,4,5)]
BTW, if the components of your trigrams are never larger than a byte, then encode the tuples as integers and don't worry about pointer comparisons. >>> def encode(s): return (ord(s[0])*256+ord(s[1]))*256+ord(s[2]) >>> def trigram(s): return [ encode(s[i:i+3]) for i in range(0, len(s)-2)] >>> trigram('abcde') [6382179, 6447972, 6513765] -- http://mail.python.org/mailman/listinfo/python-list