On Fri, Mar 25, 2016 at 1:25 PM, Scott Arciszewski <sc...@paragonie.com> wrote:
> On Fri, Mar 25, 2016 at 10:20 AM, Andrea Faulds <a...@ajf.me> wrote: > > > Hi everyone, > > > > Identifiers in PHP source code (including variables names with $) conform > > to the regex /[_a-zA-Z\x7F-\xFF][_0-9a-zA-Z\x7F-\xFF]*/. Most of this > regex > > is pretty standard: it allows alphanumeric ASCII characters and > > underscores, plus any character with the 8th bit set (presumably to allow > > any extension of ASCII, such as Latin-1 or UTF-8, to be used). > > > > But there's one part of this I find rather curious: why is \x7F included? > > It's not a high-byte/8-bit character, it's a 7-bit ASCII character, and a > > control character at that. Unless there's some ASCII extension which > reuses > > that value as a printing character, I assume it must have been a mistake > to > > include this character. As a control character, it is invisible and > > difficult to type, and it might do weird things in some terminal > emulators. > > I can't see the value in permitting it within an identifier. > > > > I've done a little bit of looking around, and I can't find an important > > ASCII extension which changes what 0x7F does. Given that, I assume it was > > simply a mistake. But one of you might be able to enlighten me otherwise. > > > > I've filed a bug report, and made a patch to fix this in php-src and > > php-langspec master: > > > > https://bugs.php.net/bug.php?id=71897 > > > > Thanks! > > -- > > Andrea Faulds > > https://ajf.me/ > > > > -- > > PHP Internals - PHP Runtime Development Mailing List > > To unsubscribe, visit: http://www.php.net/unsub.php > > > > > Interestingly, extract() skips keys with \x7F: https://3v4l.org/ZC9ZA\ > Also the keys after the \x7F were not present in HHVM, PHP7, however in 5.5-5.6 you get [9]=>string(1) "" along with they key that came after it. That's very strange indeed! > Scott Arciszewski > Chief Development Officer > Paragon Initiative Enterprises <https://paragonie.com/> >