On 16/12/22 02:34, Derick Rethans wrote:
Hi,
I have just published an initial draft of the "Unicode Text Processing"
RFC, a proposal to have performant unicode text processing always
available to PHP users, by introducing a new "Text" class.
Using "collator" and "locale" interchangeably seems imprecise. If the
input is an ICU locale string, then I think you should just call it
locale. Then the user will be armed with the correct terminology when
they go looking for more information in the ICU manual. In ICU, case
conversion and BreakIterator need a locale, not a collator.
I'm concerned about the time order of using grapheme offsets. For
example, is subString() O(N) in $offset? If the idea is to be easy to
use and performant, you don't want to have subtle algorithmic
complexity traps.
I'm probably not the target audience for this class, since I'm
generally looking for maximum flexibility, not minimum complexity. As
such, I'd like intl to have better documentation and more features.
The RFC has a family of locale-aware case conversion functions which
do not exist in intl. This was raised as an issue during the
discussion on my ASCII case conversion RFC. It would be great if intl
could get those functions too.
I think you should consider making this Text class a part of the intl
extension. You're adding a class which is similar to the classes in
that extension. In terms of data, it's like IntlChar, except it's for
strings not characters. Its constructor takes an ICU locale string,
just like IntlBreakIterator or MessageFormatter.
I can understand if you don't want to follow all the existing
conventions of the intl extension. But if that is the rationale for
the RFC, I'd like to see a discussion of the specific usability
problems with the intl extension.
-- Tim Starling
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php