>> Any more crazy examples? :) > > 'ey, 'alf a mo, wot about when 'enry 'n' 'orace drop their aitches?
I said "crazy"...not "pathological" :) If one really wants such a case, one has to omit the standard practice of nesting quotes: John replied "Dad told me 'you can't go' but let Judy" However, if you don't have such situations and to want to make 'enry and 'orace 'appy, you can change the regexp to >>> s="He was wont to be alarmed/amused by answers that won't work" >>> s2="The two-faced liar--a real joker--can't tell the truth" >>> s3="'ey, 'alf a mo, wot about when 'enry 'n' 'orace drop their aitches?" >>> r = re.compile("(?:(?:[a-zA-Z][-'])|(?:[-'][a-zA-Z])|[a-zA-Z])+") It will also choke using double-dashes: >>> r.findall(s), r.findall(s2), r.findall(s3) (['He', 'was', 'wont', 'to', 'be', 'alarmed', 'amused', 'by', 'answers', 'that', "won't", 'work'], ['The', 'two-faced', 'liar--a', 'real', "joker--can't", 'tell', 'the', 'truth'], ["'ey", "'alf", 'a', 'mo', 'wot', 'about', 'when', "'enry", "'n", "'orace", 'drop', 'their', 'aitches']) Or you could combine them to only allow infix dashes, but allow apostrophes anywhere in the word, including the front or back, one could use: >>> r = re.compile("(?:(?:[a-zA-Z]')|(?:'[a-zA-Z])|(?:[a-zA-Z]-[a-zA-Z])|[a-zA-Z])+") >>> r.findall(s), r.findall(s2), r.findall(s3) (['He', 'was', 'wont', 'to', 'be', 'alarmed', 'amused', 'by', 'answers', 'that', "won't", 'work'], ['The', 'two-faced', 'liar', 'a', 'real', 'joker', "can't", 'tell', 'the', 'truth'], ["'ey", "'alf", 'a', 'mo', 'wot', 'about', 'when', "'enry", "'n", "'orace", 'drop', 'their', 'aitches']) Now your spell-checker has to have the "dropped initial or terminal letter" locale... :) -tkc -- http://mail.python.org/mailman/listinfo/python-list