Hi! > On 9 Dec 2014, at 02:14, ma...@include-once.org wrote: > > 2014-12-09 0:51 GMT+01:00 Andrea Faulds <a...@ajf.me>: >> >> https://wiki.php.net/rfc/unicode_escape > > > Still leaves unmentioned that there was already an established Unicode > escape syntax. PCRE provides \x{1F520} for codepoints in conjunction to > plain \xFF for byte escapes.
Interesting, I was unaware of that until now, thanks for pointing this out. > Maybe there should be more elaboration on why PHP itself should go with > the \u{xxxx} ECMAScript representaton, thus introducing a syntax disparity > with our most major string handling extension. Well, PCRE does what it does probably because of its name: *Perl-Compatible* Regular Expressions. Perl has the \x syntax. But PCRE’s syntax comes from what suits Perl, not PHP, so I don’t see why we should necessarily match its behaviour. If we add \x{xxxxx} syntax to PHP’s string literals, then we’ll break existing code which uses double quoted strings for regular expressions. I think \x{xxxx} is misleading anyway - \xXX is always single-byte/character, yet Unicode code points can’t be represented in PHP strings as single bytes when encoded in UTF-8 (unless they’re below U+0100, of course). If I saw "\x{abcd}” I'd expect it to be the same as "\xab\xbc”. Plus, while Perl has \x{xxxx} syntax, Ruby and ECMAScript 6 have the \u{xxxx} syntax, so \u{xxxx} is already more popular. The ‘u’ in \u{xxxx} also makes it more obviously “Unicode”. Thanks! -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php