On 17 September 2017 13:18:44 BST, "Christoph M. Becker" <cmbecke...@gmx.de> 
wrote:
>On 17.09.2017 at 12:53, Rowan Collins wrote:
>
>> I checked the PHP lang-spec repo expecting to find a set of Unicode
>classes, but it currently mentions "U+0080-U+00FF":
>https://github.com/php/php-langspec/blob/master/spec/09-lexical-structure.md#names
>That seems wrong to me, unless I'm looking at the wrong definition -
>the first part of that range is control characters, and you can have
>variables called things like $๐Ÿ˜ (with an emoji as the entire name).
>
>The specification in the PHP manual[1] appears to be more appropriate
>for our current implementation:
>
>| As a regular expression, it would be expressed thus: '[a-zA-Z_\x7f-
>| \xff][a-zA-Z0-9_\x7f-\xff]*'
>
>With regard to control characters: that depends on the chosen character
>encoding; for instance in Windows-1252 the ยข character is mapped to
>\xA2.
>
>[1] <http://php.net/manual/en/language.variables.basics.php>

Ah, so the mistake in the spec is that these aren't actually Unicode code 
points at all, but allowed *bytes*, which happen to allow for the UTF8 encoding 
of pretty much any Unicode codepoints.

That makes much more sense, but doesn't answer the other question, of if 
there's a working definition of what we mean by "case insensitive".

Regards,

-- 
Rowan Collins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to