On 17.09.2017 at 12:53, Rowan Collins wrote: > I checked the PHP lang-spec repo expecting to find a set of Unicode classes, > but it currently mentions "U+0080-U+00FF": > https://github.com/php/php-langspec/blob/master/spec/09-lexical-structure.md#names > That seems wrong to me, unless I'm looking at the wrong definition - the > first part of that range is control characters, and you can have variables > called things like $๐ (with an emoji as the entire name).
The specification in the PHP manual[1] appears to be more appropriate for our current implementation: | As a regular expression, it would be expressed thus: '[a-zA-Z_\x7f- | \xff][a-zA-Z0-9_\x7f-\xff]*' With regard to control characters: that depends on the chosen character encoding; for instance in Windows-1252 the ยข character is mapped to \xA2. [1] <http://php.net/manual/en/language.variables.basics.php> -- Christoph M. Becker -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php