> Am 04.10.2021 um 12:08 schrieb Nikita Popov <nikita....@gmail.com>: > > On Thu, Sep 23, 2021 at 8:32 AM Tim Starling <tstarl...@wikimedia.org > <mailto:tstarl...@wikimedia.org>> > wrote: > >> Please consider my RFC for locale-independent case conversion. >> >> https://wiki.php.net/rfc/strtolower-ascii >> https://github.com/php/php-src/pull/7506 >> >> The RFC and associated PR ended up going some way beyond the original >> scope, because for consistency, it's best if everything has the same >> concept of case folding. I saw this as an opportunity to clean up a >> common kind of locale-dependence in PHP which was previously inconsistent. >> >> So not only will strtolower() and strtoupper() become >> locale-independent, converting only ASCII, but also stristr, stripos, >> strripos, lcfirst, ucfirst, ucwords, str_ireplace, the array sorting >> functions with SORT_FLAG_CASE, and array_change_key_case. >> >> Also, I changed a number of internal functions to use ASCII case >> folding, giving rise to a range of effects in callers throughout the >> core tree. The effects are all documented in the RFC. >> >> I am proposing that locale-sensitive case conversion be provided with >> the new names ctype_tolower() and ctype_toupper(). Those names might >> seem odd at first glance, but they are wrappers for functions in >> ctype.h and work in a very similar way to the rest of the ctype extension. >> > > Hi Tim, > > Thanks for creating this proposal, it looks great! > > I think this is a very beneficial change, and the amount of incorrect > locale-dependent calls we had just in php-src further convinced me of this: > We're generally aware of the problem, and we still made this mistake. Many > times.
I definitely agree that it's good to make these functions locale-insentive. > The only open question I have is regarding the ctype_* functions. One might > argue that these functions should be locale-independent as well. Certainly, > whenever I have used ctype_digit() I only intended it to match [0-9]. It > seems like some people try to use ctype_alpha() in a locale-sensitive way ( > https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check > > <https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check>) > and then fail because it doesn't support UTF-8. On that topic, do we also want to add mb_ucfrist, mb_lcfirst and mb_ucwords? Then you have also proper local independent functions handling these use cases. This is only tangentially related to the current RFC, so feel free to ignore this. Bob