On Fri, Mar 25, 2016 at 10:20 AM, Andrea Faulds <a...@ajf.me> wrote:

> Hi everyone,
>
> Identifiers in PHP source code (including variables names with $) conform
> to the regex /[_a-zA-Z\x7F-\xFF][_0-9a-zA-Z\x7F-\xFF]*/. Most of this regex
> is pretty standard: it allows alphanumeric ASCII characters and
> underscores, plus any character with the 8th bit set (presumably to allow
> any extension of ASCII, such as Latin-1 or UTF-8, to be used).
>
> But there's one part of this I find rather curious: why is \x7F included?
> It's not a high-byte/8-bit character, it's a 7-bit ASCII character, and a
> control character at that. Unless there's some ASCII extension which reuses
> that value as a printing character, I assume it must have been a mistake to
> include this character. As a control character, it is invisible and
> difficult to type, and it might do weird things in some terminal emulators.
> I can't see the value in permitting it within an identifier.
>
> I've done a little bit of looking around, and I can't find an important
> ASCII extension which changes what 0x7F does. Given that, I assume it was
> simply a mistake. But one of you might be able to enlighten me otherwise.
>
> I've filed a bug report, and made a patch to fix this in php-src and
> php-langspec master:
>
> https://bugs.php.net/bug.php?id=71897
>
> Thanks!
> --
> Andrea Faulds
> https://ajf.me/
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
​Interestingly, extract() skips keys with \x7F: https://3v4l.org/ZC9ZA

Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises <https://paragonie.com/>​

Reply via email to