On Tue, 12 May 2026, youkidearitai wrote:

> 2022年12月16日(金) 0:34 Derick Rethans <[email protected]>:
> 
> > I have just published an initial draft of the "Unicode Text 
> > Processing" RFC, a proposal to have performant unicode text 
> > processing always available to PHP users, by introducing a new 
> > "Text" class.
> >
> > You can find it at:
> > https://wiki.php.net/rfc/unicode_text_processing
> >
> > I'm looking forwards to hearing your opinions, additions, and 
> > suggestions — the RFC specifically asks for these in places.
> 
> Is still available this topic?
> I have interesting this Text class.
> I'm glad to control based on grapheme cluster such as Swift's string type.

I still have interest in working this out into supporting even more 
things. Since I wrote that Draft RFC, I did add a few more features:

https://github.com/derickr/php-text/commits/main/

> 
> I have some idea.
> 
> 1. Move to Intl extension such as \Intl\Text
>   * I think keep it simple for implementation.

I don't agree with this, as although it builds on top of ICU like the 
classes in the Intl extension, it isn't following ICU's API style at 
all.

It is meant to be a much more opiniated API that does the simple 80% 
case well.

> 2. Add Text type for grapheme_* function only such as string|Text.
>    * It is some complexy for implementation but userland is simple

I am not too sure about this. The grapheme_* functions closely match 
ICUs internal, and powerful, API. If you want them to accept a Test 
object too, that means these grapheme_* functions' signature needs to be 
overloaded.

for example:

grapheme_strstr(string $haystack, string $needle, bool $beforeNeedle = false, 
string $locale = "" ): string|false

would need to change into:

grapheme_strstr(string|Text $haystack, string|Text $needle, bool $beforeNeedle 
= false, string $locale = "" ): string|false

And then '$locale' makes no sense, as this is already part of each of 
the Text objects themselves.

Instead, the 'contains' method on the Text object already does something 
very similar:

https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt

I think the grapheme functions should stay as they are, and additional 
methods can be added on the Text class, where there is currently 
functionality missing that the grapheme_* functions already support.

The RFC document also already lists more functions than I have 
implemented so far too.

> 3. If UTF-8 validaion failed, throws an exception

It already does that, see this test case: 
https://github.com/derickr/php-text/blob/main/tests/text-in-out-basic.phpt#L13 
— although the exception message itself could be improved.

> __toString method returns string type is seems good.
> Please consider this.

This is already implemented too: 
https://github.com/derickr/php-text/blob/main/text.c#L323

cheers,
Derick

-- 
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/support

mastodon: @[email protected] @[email protected]

Reply via email to