Tue, 9 Dec 2014 02:44:33 +0000 Andrea Faulds <a...@ajf.me>: > > Well, PCRE does what it does probably because of its name: > *Perl-Compatible* Regular Expressions. Perl has the \x syntax. But > PCRE’s syntax comes from what suits Perl, not PHP, so I don’t see why > we should necessarily match its behaviour. If we add \x{xxxxx} syntax > to PHP’s string literals, then we’ll break existing code which uses > double quoted strings for regular expressions.
Actually the opposite seems alarming. For double quoted strings it'd be irrelevant if \u{} or \x{} was priorly handled by PHP or left to PCREs interpretation. (Having an alternative there is even beneficial.) However, single-quoted strings are more commonly and habitually used for regexps. And with \u{} going to be used regularily, then unknowingly or accidentially in regex context, is where it would trigger PCRE failures. preg_match('~\u{bad}~umixUs') Both \u{} and \x{} are used in fringe cases only of course. Consistently settling on one would still benefit forward compatibility here. > > I think \x{xxxx} is misleading anyway - \xXX is always > single-byte/character, yet Unicode code points can’t be represented > in PHP strings as single bytes when encoded in UTF-8 (unless they’re > below U+0100, of course). If I saw "\x{abcd}” I'd expect it to be the > same as "\xab\xbc”. Plus, while Perl has \x{xxxx} syntax, Ruby and > ECMAScript 6 have the \u{xxxx} syntax, so \u{xxxx} is already more > popular. The ‘u’ in \u{xxxx} also makes it more obviously “Unicode”. There's no question really about \u being more common and therefore recognize~ and preferrable. Taking the cue from Ruby is appreciated! Since the RFC rightly discounts the standard \uFFFF due to compatibility reasons, there's however little visual and semantic distinction between the {}-embellished variant \u{hhhhh} and a hypothetical \x{hhhhh}. Not sure why or who would misinterpret \x{abc} as multi-bytes, really. It's well understood and working for PCRE. The advantage of overloading \x is a much lessened likelihood to ever encounter a residual "\x{" in PHP strings. Whereas "\u" is new, and never had an implicit payload constraint, thus could run into a preexisting "\u{xxxx}" that was formerly targeted at a later/distinct context. Going with the Ruby theme; when piping a string there or receiving one it's irrelevant who uses which syntax to preinterpret it. It's only really interesting when exchanging string literals. But the RFC and the patch don't cover stripcslashes() or addcslashes() for instance. So there's no direct string syntax interoperability earmarked for. Which is why I brought forward \x{hhhhh} as alternative for within-PHP consistency at least. (Not bent on lobbying for x, as \u{…} is visually more pleasing; just unsure about its scope.) \u{1F44B} -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php