On 24 November 2014 at 14:21, Sara Golemon <poll...@php.net> wrote: > On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds <a...@ajf.me> wrote: >> Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape >> > I'm okay with producing UTF-8 even though our strings are technically > binary. As you state, UTF-8 is the de-facto encoding, and recognizing > this is pretty reasonable.
I'm also OK with this, although I do wonder if we should be respecting the user's default_charset setting instead. (Since default_charset defaults to "UTF-8", in practice this isn't a significant difference for the average user.) > You may want to make it a requirement that strings containing \u > escapes are denoted as: u"blah blah" We set aside this format > back in the PHP6 days (note that b"blah" is equivalent to "blah" for > binary strings). It seems to me that the point of \u and \U escapes is to embed Unicode in potentially non-Unicode strings, so using u"" doesn't feel right. > On the BMP versus SMP issue of \uXXXX styles, we addressed this in > PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six > hexit codepoints. e.g. "\u1234" === "\U001234" I'd rather > follow this style than making \u special and different from hex and > octal notations by using braces. I think I prefer the brace style, personally. Non-BMP codepoints have become more important since PHP 6 (thanks, emoji), and having \u and \U be case sensitive when \x isn't seems confusing. Adam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php