Unicode strings and ascii regular expressions

Fuzzyman Mon, 30 Jan 2006 15:15:47 -0800

Hello all,

Can someone confirm that compiled regular expressions from ascii
strings will always (and safely) yield unicode values when matched
against unicode strings ?


I've tested it and it works - but can someone confirm that this is
consistent and safe ? (No lurking encode errors - I assume it is only a
decode that is done, in which case is it safe on a system that has a
non-ascii compatible default encoding ? OTOH it would seem to me that
that would break *everything*.)

>>> import re
>>> r = re.compile('(.*)=(.*)')
>>> s = '£££=£££'.decode('cp1252') # yields a unicode string that can't be 
>>> encoded as ascii
>>> c = r.match(s)
>>> c.groups()   # yields two unicode strings
(u'\xa3\xa3\xa3', u'\xa3\xa3\xa3')
>>> print c.groups()[0].encode('cp1252') # which encode safely
£££


All the best,


Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

-- 
http://mail.python.org/mailman/listinfo/python-list

Unicode strings and ascii regular expressions

Reply via email to