On Thu, Dec 15, 2022 at 9:34 AM Derick Rethans <der...@php.net> wrote:
> I have just published an initial draft of the "Unicode Text Processing" > RFC, a proposal to have performant unicode text processing always > available to PHP users, by introducing a new "Text" class. > > You can find it at: > https://wiki.php.net/rfc/unicode_text_processing > > I'm looking forwards to hearing your opinions, additions, and > suggestions — the RFC specifically asks for these in places. > > Obviously, hurdle one is making the ICU library a requirement for building PHP. I'd almost make that it's own milestone in this project with the introduction of the Text class as a separate followon. A very casual (IANAL) read of the ICU license doesn't seem to make this a problem, so it may be more of a question of whether we put this on people wanting to build PHP. ICU is pretty widely available and used, so I also don't see this as a major stumbling block. Question 2 is that class. I know folks have been clammoring for a `String` class for some time and this actually fills that niche quite well. A part of me wonders if we can overload it a little to provide a psuedo locale of "binary" so that users can, optionally, treat it like a more generalized String class in specific cases, storing a normal `char*` zend_string under the hood in that case. Possibly as a specialzation tree. /* names as examples only */ interface Stringy { /* define all those APIs */ } class Text implements Stringy { /* ... */ } class BinaryString implements Stringy { /* ... */ } I think you'd get a lot more buy-in from the folks who worry that UTF16 is overhead they don't want, but who do like the idea of an OOPy string. It also provides a migration path to avoid having to rethink byte vs grapheme conversions up front, instead deferring that part of a migration till later. Overall, I'm more positive on this than negative, and I eagerly await the rest of this thread. -Sara