Hello all, Can someone confirm that compiled regular expressions from ascii strings will always (and safely) yield unicode values when matched against unicode strings ?
I've tested it and it works - but can someone confirm that this is consistent and safe ? (No lurking encode errors - I assume it is only a decode that is done, in which case is it safe on a system that has a non-ascii compatible default encoding ? OTOH it would seem to me that that would break *everything*.) >>> import re >>> r = re.compile('(.*)=(.*)') >>> s = '£££=£££'.decode('cp1252') # yields a unicode string that can't be >>> encoded as ascii >>> c = r.match(s) >>> c.groups() # yields two unicode strings (u'\xa3\xa3\xa3', u'\xa3\xa3\xa3') >>> print c.groups()[0].encode('cp1252') # which encode safely £££ All the best, Fuzzyman http://www.voidspace.org.uk/python/index.shtml -- http://mail.python.org/mailman/listinfo/python-list