John Machin wrote:
> Not picking on Tim in particular; try the following with *all*
> suggestions so far:
>
> textbox = "He was wont to be alarmed/amused by answers that won't work"

Not perfect, but would work for many cases:

s = "He was wont to be alarmed/amused by answers that won't work"
r = r'[()\[\]<>{}.,@#$%^&*?!-:;\\/_"\s\b]+'
l = filter(lambda x: not x == '', re.split(r, string))

Check out this short paper from the Natural Language Toolkit folks on
some problems / strategies for tokenization:
http://nltk.sourceforge.net/lite/doc/en/tokenize.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to