Hi,

Does any one know how to tokenize a string in python that returns the
byte offsets and tokens? Moreover, the sentence splitter that returns
the sentences and byte offsets? Finally n-grams returned with byte
offsets.

Input:
This is a string.

Output:
This  0
is      5
a       8
string.   10


thanks
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to