On Fri, 16 Dec 2022, Tim Starling wrote: > On 16/12/22 02:34, Derick Rethans wrote: > > > > I have just published an initial draft of the "Unicode Text > > Processing" RFC, a proposal to have performant unicode text > > processing always available to PHP users, by introducing a new > > "Text" class. > > Using "collator" and "locale" interchangeably seems imprecise. If the > input is an ICU locale string, then I think you should just call it > locale. Then the user will be armed with the correct terminology when > they go looking for more information in the ICU manual. In ICU, case > conversion and BreakIterator need a locale, not a collator.
Yeah, the terms are currently used interchangably (sort of). I will update that. Although I really would not suggest that users look at the ICU manual, as it's really hard to find things in it :-) > I'm concerned about the time order of using grapheme offsets. For > example, is subString() O(N) in $offset? Yes. It would have to scan the Text. > I'm probably not the target audience for this class, since I'm > generally looking for maximum flexibility, not minimum complexity. As > such, I'd like intl to have better documentation and more features. > The RFC has a family of locale-aware case conversion functions which > do not exist in intl. This was raised as an issue during the > discussion on my ASCII case conversion RFC. It would be great if intl > could get those functions too. AFAIK Intl can do all of these things, but yes, its documentation is "sparse". However, that's not in scope of this RFC. > I think you should consider making this Text class a part of the intl > extension. You're adding a class which is similar to the classes in > that extension. In terms of data, it's like IntlChar, except it's for > strings not characters. Its constructor takes an ICU locale string, > just like IntlBreakIterator or MessageFormatter. I did consider that, and rejected that idea. Intl, although powerful, does not have an approcable API. It is also not installed or available by default, and I am not suggesting we do that. That than means that it doesn't fit the design goals here (having it always available). cheers, Derick -- https://derickrethans.nl | https://xdebug.org | https://dram.io Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/support Host of PHP Internals News: https://phpinternals.news mastodon: @derickr@phpc.social @xdebug@phpc.social twitter: @derickr and @xdebug -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php