On 17 September 2017 13:18:44 BST, "Christoph M. Becker" <cmbecke...@gmx.de> wrote: >On 17.09.2017 at 12:53, Rowan Collins wrote: > >> I checked the PHP lang-spec repo expecting to find a set of Unicode >classes, but it currently mentions "U+0080-U+00FF": >https://github.com/php/php-langspec/blob/master/spec/09-lexical-structure.md#names >That seems wrong to me, unless I'm looking at the wrong definition - >the first part of that range is control characters, and you can have >variables called things like $๐ (with an emoji as the entire name). > >The specification in the PHP manual[1] appears to be more appropriate >for our current implementation: > >| As a regular expression, it would be expressed thus: '[a-zA-Z_\x7f- >| \xff][a-zA-Z0-9_\x7f-\xff]*' > >With regard to control characters: that depends on the chosen character >encoding; for instance in Windows-1252 the ยข character is mapped to >\xA2. > >[1] <http://php.net/manual/en/language.variables.basics.php>
Ah, so the mistake in the spec is that these aren't actually Unicode code points at all, but allowed *bytes*, which happen to allow for the UTF8 encoding of pretty much any Unicode codepoints. That makes much more sense, but doesn't answer the other question, of if there's a working definition of what we mean by "case insensitive". Regards, -- Rowan Collins [IMSoP] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php