Terry Hancock wrote:
And hey, you could probably use a regex to modify a regex, if you were
really twisted. ;-)

Sorry. I really shouldn't have said that. Somebody's going to do it now. :-P

Sure, but only 'cause you asked so nicely. =)

>>> import re
>>> def internationalize(expr,
...                      letter_matcher=re.compile(r'\[A-(?:Za-)?z\]')):
...     return letter_matcher.sub(r'[^\W_\d]', expr)
...
>>> def compare(expr, text):
...     def item_str(matcher):
...         return ' '.join(matcher.findall(text))
...     print 'reg: ', item_str(re.compile(expr))
...     print 'intl:', item_str(re.compile(internationalize(expr),
...                                        re.UNICODE))
...
>>> compare(r'\d+\s+([A-z]+)', '1 viola. 2 voilą')
reg:  viola voil
intl: viola voilą
>>> compare(r'\d+\s+([A-Za-z]+)', '1 viola. 2 voilą')
reg:  viola voil
intl: viola voilą

This code converts [A-z] style regexps to a regexp that is suitable for use with other encodings. Note that without the conversion, characters like 'ą' are not found.

Steve
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to