Hi, Does any one know how to tokenize a string in python that returns the byte offsets and tokens? Moreover, the sentence splitter that returns the sentences and byte offsets? Finally n-grams returned with byte offsets.
Input: This is a string. Output: This 0 is 5 a 8 string. 10 thanks -- http://mail.python.org/mailman/listinfo/python-list