Re: [PHP-DEV] [RFC] Unicode Text Processing

Tim Starling Thu, 15 Dec 2022 19:21:24 -0800

On 16/12/22 02:34, Derick Rethans wrote:

Hi,


I have just published an initial draft of the "Unicode Text Processing"
RFC, a proposal to have performant unicode text processing always
available to PHP users, by introducing a new "Text" class.

Using "collator" and "locale" interchangeably seems imprecise. If theinput is an ICU locale string, then I think you should just call itlocale. Then the user will be armed with the correct terminology whenthey go looking for more information in the ICU manual. In ICU, caseconversion and BreakIterator need a locale, not a collator.

I'm concerned about the time order of using grapheme offsets. Forexample, is subString() O(N) in $offset? If the idea is to be easy touse and performant, you don't want to have subtle algorithmiccomplexity traps.

I'm probably not the target audience for this class, since I'mgenerally looking for maximum flexibility, not minimum complexity. Assuch, I'd like intl to have better documentation and more features.The RFC has a family of locale-aware case conversion functions whichdo not exist in intl. This was raised as an issue during thediscussion on my ASCII case conversion RFC. It would be great if intlcould get those functions too.

I think you should consider making this Text class a part of the intlextension. You're adding a class which is similar to the classes inthat extension. In terms of data, it's like IntlChar, except it's forstrings not characters. Its constructor takes an ICU locale string,just like IntlBreakIterator or MessageFormatter.

I can understand if you don't want to follow all the existingconventions of the intl extension. But if that is the rationale forthe RFC, I'd like to see a discussion of the specific usabilityproblems with the intl extension.


-- Tim Starling

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] Unicode Text Processing

Reply via email to