Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-27 Thread Stanislav Malyshev
Hi! > I'm not completely against it. It's just an incomplete solution. > > echo "\u{1F602}"; // won't output 😂 if the output encoding is not UTF-8 You can always use iconv/recode to bring it to every encoding you need (provided it supports full unicode range). I see this as a readability feature

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Sara Golemon
On Tue, Nov 25, 2014 at 3:20 AM, Alain Williams wrote: > If we decide to support non-utf-8 encoding at compile time then we could > extend > the syntax a bit to allow the encoding to be specified, eg: > > \U{utf-8: arabic letter alef} > > \U{iso-8859-6: arabic letter alef} > God, that's s

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Dmitry Stogov
On Tue, Nov 25, 2014 at 2:18 PM, Andrea Faulds wrote: > > > On 25 Nov 2014, at 10:41, Dmitry Stogov wrote: > > > > u8"string" tells that the whole string is UTF-8 encoded. > > Your escape Unicode proposal assumes just UTF-8 codepoint, but the > whole string encoding is still undefined. > > True

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
> On 25 Nov 2014, at 11:48, Derick Rethans wrote: > > I think "incomplete" nails it on the head. Without "proper" Unicode > support in the parser, compiler and string function semantics, having > these escape codes doesn't really do a lot for us. How so? Why are they less useful because we do

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Yasuo Ohgaki
Hi all, On Tue, Nov 25, 2014 at 8:09 PM, Andrea Faulds wrote: > non-BMP code points are more important than ever. Yes, it is! We(Japanese) have number of them already. \u{code point} has huge advantage. We do not have care if code point value is BMP or not. i.e. We can do echo "\u{code point}

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Derick Rethans
On Tue, 25 Nov 2014, Dmitry Stogov wrote: > On Tue, Nov 25, 2014 at 1:00 PM, Andrea Faulds wrote: > > > > > > On 25 Nov 2014, at 08:33, Dmitry Stogov wrote: > > > > > > May be I misunderstood something, but why to introduce unicode escapes > > if PHP engine doesn't support Unicode. > > > > We d

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Alain Williams
On Tue, Nov 25, 2014 at 11:25:17AM +, Andrea Faulds wrote: > Well, we *do* already have a compile-time system for declaring encoding, the > declare() construct. I missed that. Reading the documentation I confess that I do not really understand what the effect of declare(encoding=xxx) does.

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Christoph Becker
Ivan Enderlin @ Hoa wrote: > Le 24/11/2014 23:09, Andrea Faulds a écrit : >> Good evening, >> >> Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape >> >> It has a rationale section explaining why certain decisions were made, >> that I’d recommend you read in full. > Excellent RFC, thank you

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
> On 25 Nov 2014, at 11:20, Alain Williams wrote: > > I think that we need to clarify what we are talking about. > > What Andrea has proposed is a way of writing string constants. These > characters > in these strings will still be 8 bits big, this means that there needs to be > some way of en

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Alain Williams
On Tue, Nov 25, 2014 at 02:41:48PM +0400, Dmitry Stogov wrote: > I'm not completely against it. It's just an incomplete solution. > > echo "\u{1F602}"; // won't output 😂 if the output encoding is not UTF-8 > > echo "Привет \u{1F602}"; // won't output anything useful if script > encoding is not U

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
> On 25 Nov 2014, at 10:41, Dmitry Stogov wrote: > > u8"string" tells that the whole string is UTF-8 encoded. > Your escape Unicode proposal assumes just UTF-8 codepoint, but the whole > string encoding is still undefined. True. There’s an assumption there that you’re using a UTF-8-compatible

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
> On 25 Nov 2014, at 10:32, Derick Rethans wrote: > > On Mon, 24 Nov 2014, Sara Golemon wrote: > >> On the BMP versus SMP issue of \u styles, we addressed this in >> PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six >> hexit codepoints. e.g."\u1234" === "\U001234"

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Dmitry Stogov
On Tue, Nov 25, 2014 at 1:00 PM, Andrea Faulds wrote: > > > On 25 Nov 2014, at 08:33, Dmitry Stogov wrote: > > > > May be I misunderstood something, but why to introduce unicode escapes > if PHP engine doesn't support Unicode. > > We don't have Unicode strings which are made of codepoints rather

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Derick Rethans
On Mon, 24 Nov 2014, Sara Golemon wrote: > On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: > > Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape > > > I'm okay with producing UTF-8 even though our strings are technically > binary. As you state, UTF-8 is the de-facto encoding, and r

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
> On 25 Nov 2014, at 08:33, Markus Fischer wrote: > >> On 24.11.14 23:09, Andrea Faulds wrote: >> Good evening, >> >> Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape > > I think the choice of \u{xx} is interesting, i.e. using '{' and '}'. > > Afaik, one of the current best practices

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
> On 25 Nov 2014, at 08:33, Dmitry Stogov wrote: > > May be I misunderstood something, but why to introduce unicode escapes if PHP > engine doesn't support Unicode. We don't have Unicode strings which are made of codepoints rather than bytes, sure. But we do usually treat these strings as UTF

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Markus Fischer
On 24.11.14 23:09, Andrea Faulds wrote: > Good evening, > > Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I think the choice of \u{xx} is interesting, i.e. using '{' and '}'. Afaik, one of the current best practices is to use json_decode(), like so: $ cat test.php http://www.php.net

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Dmitry Stogov
May be I misunderstood something, but why to introduce unicode escapes if PHP engine doesn't support Unicode. Always converting such escapes into UTF-8 encoding, doesn't make any sense for people who use other encodings for output, databases, etc. Thanks. Dmitry. On Tue, Nov 25, 2014 at 1:09

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Ivan Enderlin @ Hoa
Le 24/11/2014 23:09, Andrea Faulds a écrit : Good evening, Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape It has a rationale section explaining why certain decisions were made, that I’d recommend you read in full. Excellent RFC, thank you for this proposal. I would suggest this tal

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Sara Golemon
On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: > Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape > I've linked a provisional HHVM implementation from that page. Planning to match whatever PHP7 does, of course, but for the moment I've added named entity support since it's being dis

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Alain Williams
On Mon, Nov 24, 2014 at 11:36:28PM +, Andrea Faulds wrote: > > > On 24 Nov 2014, at 23:29, Alain Williams wrote: > > echo "\U{arabic letter alef}\n”; > > Ooh, that’s an interesting idea. I believe Perl actually has this already, > although it uses the \N syntax: > > http://perldoc.perl.or

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
> On 24 Nov 2014, at 23:29, Alain Williams wrote: > > There is a big difference with \u or \U and \x or \o and that is the number of > characters that follow the escape. \x has 2, \o has 3 - both are short and > easy > to count with the eye. \U012345 is quite long and it is not so visually > o

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Alain Williams
On Mon, Nov 24, 2014 at 02:21:37PM -0800, Sara Golemon wrote: > On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: > > Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape > > > I'm okay with producing UTF-8 even though our strings are technically > binary. As you state, UTF-8 is the de-f

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
> On 24 Nov 2014, at 23:19, Sara Golemon wrote: > >> We would have to require ICU, but that might be worthwhile for PHP 7 >> anyway. Having at least one i18n API that's guaranteed to be available >> would be nice. >> > It's 2014. I think requiring ICU is reasonable at this point. I also think

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Sara Golemon
> We would have to require ICU, but that might be worthwhile for PHP 7 > anyway. Having at least one i18n API that's guaranteed to be available > would be nice. > It's 2014. I think requiring ICU is reasonable at this point. Orthogonal to this RFC, but I'd be in favor of deprecating all the non-I

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Adam Harvey
On 24 November 2014 at 14:35, Andrea Faulds wrote: > >> On 24 Nov 2014, at 22:30, Adam Harvey wrote: >> I'm also OK with this, although I do wonder if we should be respecting >> the user's default_charset setting instead. (Since default_charset >> defaults to "UTF-8", in practice this isn't a sig

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
> On 24 Nov 2014, at 22:30, Adam Harvey wrote: > > On 24 November 2014 at 14:21, Sara Golemon wrote: >> On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: >>> Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape >>> >> I'm okay with producing UTF-8 even though our strings are technica

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Adam Harvey
On 24 November 2014 at 14:21, Sara Golemon wrote: > On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: >> Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape >> > I'm okay with producing UTF-8 even though our strings are technically > binary. As you state, UTF-8 is the de-facto encoding

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
> On 24 Nov 2014, at 22:21, Sara Golemon wrote: > > On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: >> Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape >> > I'm okay with producing UTF-8 even though our strings are technically > binary. As you state, UTF-8 is the de-facto encod

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Sara Golemon
On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: > Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape > I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the de-facto encoding, and recognizing this is pretty reasonable. You may want

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
> On 24 Nov 2014, at 22:09, Andrea Faulds wrote: > > Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape My apologies to you all, a small correction: The title of that email should’ve been “[RFC] Unicode Codepoint Escape Syntax” to match the title of the RFC, I missed out the “Codepoint