Re: Unicode strings and ascii regular expressions

Fredrik Lundh Mon, 30 Jan 2006 15:30:46 -0800

Fuzzyman wrote:

> Can someone confirm that compiled regular expressions from ascii
> strings will always (and safely) yield unicode values when matched
> against unicode strings ?
>
> I've tested it and it works - but can someone confirm that this is
> consistent and safe ? (No lurking encode errors - I assume it is only a
> decode that is done, in which case is it safe on a system that has a
> non-ascii compatible default encoding ? OTOH it would seem to me that
> that would break *everything*.)
>
> >>> import re
> >>> r = re.compile('(.*)=(.*)')
> >>> s = '£££=£££'.decode('cp1252') # yields a unicode string that can't be 
> >>> encoded as ascii
> >>> c = r.match(s)
> >>> c.groups()   # yields two unicode strings
> (u'\xa3\xa3\xa3', u'\xa3\xa3\xa3')
> >>> print c.groups()[0].encode('cp1252') # which encode safely
> £££


ascii patterns work just fine on unicode strings.  the engine doesn't care
what string type you use for the pattern, and it always returns slices of
the target string, so you get back what you pass in.

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode strings and ascii regular expressions

Reply via email to