Tim Chase wrote: > >> Any more crazy examples? :) > > > > 'ey, 'alf a mo, wot about when 'enry 'n' 'orace drop their aitches? > > I said "crazy"...not "pathological" :) > > If one really wants such a case, one has to omit the standard > practice of nesting quotes: > > John replied "Dad told me 'you can't go' but let Judy" > > However, if you don't have such situations and to want to make > 'enry and 'orace 'appy, you can change the regexp to > > > >>> s="He was wont to be alarmed/amused by answers that won't work" > >>> s2="The two-faced liar--a real joker--can't tell the truth" > >>> s3="'ey, 'alf a mo, wot about when 'enry 'n' 'orace drop > their aitches?" > > >>> r = > re.compile("(?:(?:[a-zA-Z][-'])|(?:[-'][a-zA-Z])|[a-zA-Z])+") > > It will also choke using double-dashes: > > >>> r.findall(s), r.findall(s2), r.findall(s3) > (['He', 'was', 'wont', 'to', 'be', 'alarmed', 'amused', 'by', > 'answers', 'that', "won't", 'work'], ['The', 'two-faced', > 'liar--a', 'real', "joker--can't", 'tell', 'the', 'truth'], > ["'ey", "'alf", 'a', 'mo', 'wot', 'about', 'when', "'enry", "'n", > "'orace", 'drop', 'their', 'aitches']) > > Or you could combine them to only allow infix dashes, but allow > apostrophes anywhere in the word, including the front or back, > one could use: > > >>> r = > re.compile("(?:(?:[a-zA-Z]')|(?:'[a-zA-Z])|(?:[a-zA-Z]-[a-zA-Z])|[a-zA-Z])+") > >>> r.findall(s), r.findall(s2), r.findall(s3) > (['He', 'was', 'wont', 'to', 'be', 'alarmed', 'amused', 'by', > 'answers', 'that', "won't", 'work'], ['The', 'two-faced', 'liar', > 'a', 'real', 'joker', "can't", 'tell', 'the', 'truth'], ["'ey", > "'alf", 'a', 'mo', 'wot', 'about', 'when', "'enry", "'n", > "'orace", 'drop', 'their', 'aitches']) > > > Now your spell-checker has to have the "dropped initial or > terminal letter" locale... :) >
Too complicated for string.bleedin'_split(), innit? Cheers, John -- http://mail.python.org/mailman/listinfo/python-list