On 24 November 2014 at 14:21, Sara Golemon <poll...@php.net> wrote:
> On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds <a...@ajf.me> wrote:
>> Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape
>>
> I'm okay with producing UTF-8 even though our strings are technically
> binary.  As you state, UTF-8 is the de-facto encoding, and recognizing
> this is pretty reasonable.

I'm also OK with this, although I do wonder if we should be respecting
the user's default_charset setting instead. (Since default_charset
defaults to "UTF-8", in practice this isn't a significant difference
for the average user.)

> You may want to make it a requirement that strings containing \u
> escapes are denoted as:   u"blah blah"    We set aside this format
> back in the PHP6 days (note that b"blah" is equivalent to "blah" for
> binary strings).

It seems to me that the point of \u and \U escapes is to embed Unicode
in potentially non-Unicode strings, so using u"" doesn't feel right.

> On the BMP versus SMP issue of \uXXXX styles, we addressed this in
> PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six
> hexit codepoints.   e.g.    "\u1234" === "\U001234"   I'd rather
> follow this style than making \u special and different from hex and
> octal notations by using braces.

I think I prefer the brace style, personally. Non-BMP codepoints have
become more important since PHP 6 (thanks, emoji), and having \u and
\U be case sensitive when \x isn't seems confusing.

Adam

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to