> Am 04.10.2021 um 12:08 schrieb Nikita Popov <nikita....@gmail.com>:
> 
> On Thu, Sep 23, 2021 at 8:32 AM Tim Starling <tstarl...@wikimedia.org 
> <mailto:tstarl...@wikimedia.org>>
> wrote:
> 
>> Please consider my RFC for locale-independent case conversion.
>> 
>> https://wiki.php.net/rfc/strtolower-ascii
>> https://github.com/php/php-src/pull/7506
>> 
>> The RFC and associated PR ended up going some way beyond the original
>> scope, because for consistency, it's best if everything has the same
>> concept of case folding. I saw this as an opportunity to clean up a
>> common kind of locale-dependence in PHP which was previously inconsistent.
>> 
>> So not only will strtolower() and strtoupper() become
>> locale-independent, converting only ASCII, but also stristr, stripos,
>> strripos, lcfirst, ucfirst, ucwords, str_ireplace, the array sorting
>> functions with SORT_FLAG_CASE, and array_change_key_case.
>> 
>> Also, I changed a number of internal functions to use ASCII case
>> folding, giving rise to a range of effects in callers throughout the
>> core tree. The effects are all documented in the RFC.
>> 
>> I am proposing that locale-sensitive case conversion be provided with
>> the new names ctype_tolower() and ctype_toupper(). Those names might
>> seem odd at first glance, but they are wrappers for functions in
>> ctype.h and work in a very similar way to the rest of the ctype extension.
>> 
> 
> Hi Tim,
> 
> Thanks for creating this proposal, it looks great!
> 
> I think this is a very beneficial change, and the amount of incorrect
> locale-dependent calls we had just in php-src further convinced me of this:
> We're generally aware of the problem, and we still made this mistake. Many
> times.

I definitely agree that it's good to make these functions locale-insentive.

> The only open question I have is regarding the ctype_* functions. One might
> argue that these functions should be locale-independent as well. Certainly,
> whenever I have used ctype_digit() I only intended it to match [0-9]. It
> seems like some people try to use ctype_alpha() in a locale-sensitive way (
> https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check
>  
> <https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check>)
> and then fail because it doesn't support UTF-8.

On that topic, do we also want to add mb_ucfrist, mb_lcfirst and mb_ucwords?
Then you have also proper local independent functions handling these use cases.

This is only tangentially related to the current RFC, so feel free to ignore 
this.

Bob

Reply via email to