And hey, you could probably use a regex to modify a regex, if you were really twisted. ;-)
Sorry. I really shouldn't have said that. Somebody's going to do it now. :-P
Sure, but only 'cause you asked so nicely. =)
>>> import re >>> def internationalize(expr, ... letter_matcher=re.compile(r'\[A-(?:Za-)?z\]')): ... return letter_matcher.sub(r'[^\W_\d]', expr) ... >>> def compare(expr, text): ... def item_str(matcher): ... return ' '.join(matcher.findall(text)) ... print 'reg: ', item_str(re.compile(expr)) ... print 'intl:', item_str(re.compile(internationalize(expr), ... re.UNICODE)) ... >>> compare(r'\d+\s+([A-z]+)', '1 viola. 2 voilą') reg: viola voil intl: viola voilą >>> compare(r'\d+\s+([A-Za-z]+)', '1 viola. 2 voilą') reg: viola voil intl: viola voilą
This code converts [A-z] style regexps to a regexp that is suitable for use with other encodings. Note that without the conversion, characters like 'ą' are not found.
Steve -- http://mail.python.org/mailman/listinfo/python-list