John Machin wrote: > Not picking on Tim in particular; try the following with *all* > suggestions so far: > > textbox = "He was wont to be alarmed/amused by answers that won't work"
Not perfect, but would work for many cases: s = "He was wont to be alarmed/amused by answers that won't work" r = r'[()\[\]<>{}.,@#$%^&*?!-:;\\/_"\s\b]+' l = filter(lambda x: not x == '', re.split(r, string)) Check out this short paper from the Natural Language Toolkit folks on some problems / strategies for tokenization: http://nltk.sourceforge.net/lite/doc/en/tokenize.html -- http://mail.python.org/mailman/listinfo/python-list