On Fri, 16 Dec 2022, Tim Starling wrote:

> On 16/12/22 02:34, Derick Rethans wrote:
> > 
> > I have just published an initial draft of the "Unicode Text 
> > Processing" RFC, a proposal to have performant unicode text 
> > processing always available to PHP users, by introducing a new 
> > "Text" class.
> 
> Using "collator" and "locale" interchangeably seems imprecise. If the 
> input is an ICU locale string, then I think you should just call it 
> locale. Then the user will be armed with the correct terminology when 
> they go looking for more information in the ICU manual. In ICU, case 
> conversion and BreakIterator need a locale, not a collator.

Yeah, the terms are currently used interchangably (sort of). I will 
update that. Although I really would not suggest that users look at the 
ICU manual, as it's really hard to find things in it :-)

> I'm concerned about the time order of using grapheme offsets. For 
> example, is subString() O(N) in $offset?

Yes. It would have to scan the Text.

> I'm probably not the target audience for this class, since I'm 
> generally looking for maximum flexibility, not minimum complexity. As 
> such, I'd like intl to have better documentation and more features. 
> The RFC has a family of locale-aware case conversion functions which 
> do not exist in intl. This was raised as an issue during the 
> discussion on my ASCII case conversion RFC. It would be great if intl 
> could get those functions too.

AFAIK Intl can do all of these things, but yes, its documentation is 
"sparse". However, that's not in scope of this RFC.

> I think you should consider making this Text class a part of the intl 
> extension. You're adding a class which is similar to the classes in 
> that extension. In terms of data, it's like IntlChar, except it's for 
> strings not characters. Its constructor takes an ICU locale string, 
> just like IntlBreakIterator or MessageFormatter.

I did consider that, and rejected that idea. Intl, although powerful, 
does not have an approcable API. It is also not installed or available 
by default, and I am not suggesting we do that. That than means that it 
doesn't fit the design goals here (having it always available).

cheers,
Derick

-- 
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/support
Host of PHP Internals News: https://phpinternals.news

mastodon: @derickr@phpc.social @xdebug@phpc.social
twitter: @derickr and @xdebug

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to