Hi everyone,
Identifiers in PHP source code (including variables names with $)
conform to the regex /[_a-zA-Z\x7F-\xFF][_0-9a-zA-Z\x7F-\xFF]*/. Most of
this regex is pretty standard: it allows alphanumeric ASCII characters
and underscores, plus any character with the 8th bit set (presumably to
allow any extension of ASCII, such as Latin-1 or UTF-8, to be used).
But there's one part of this I find rather curious: why is \x7F
included? It's not a high-byte/8-bit character, it's a 7-bit ASCII
character, and a control character at that. Unless there's some ASCII
extension which reuses that value as a printing character, I assume it
must have been a mistake to include this character. As a control
character, it is invisible and difficult to type, and it might do weird
things in some terminal emulators. I can't see the value in permitting
it within an identifier.
I've done a little bit of looking around, and I can't find an important
ASCII extension which changes what 0x7F does. Given that, I assume it
was simply a mistake. But one of you might be able to enlighten me
otherwise.
I've filed a bug report, and made a patch to fix this in php-src and
php-langspec master:
https://bugs.php.net/bug.php?id=71897
Thanks!
--
Andrea Faulds
https://ajf.me/
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php