On Thu, Dec 15, 2022 at 9:34 AM Derick Rethans <der...@php.net> wrote:

> I have just published an initial draft of the "Unicode Text Processing"
> RFC, a proposal to have performant unicode text processing always
> available to PHP users, by introducing a new "Text" class.
>
> You can find it at:
> https://wiki.php.net/rfc/unicode_text_processing
>
> I'm looking forwards to hearing your opinions, additions, and
> suggestions — the RFC specifically asks for these in places.
>
>
Obviously, hurdle one is making the ICU library a requirement for building
PHP.  I'd almost make that it's own milestone in this project with the
introduction of the Text class as a separate followon.  A very casual
(IANAL) read of the ICU license doesn't seem to make this a problem, so it
may be more of a question of whether we put this on people wanting to build
PHP.  ICU is pretty widely available and used, so I also don't see this as
a major stumbling block.

Question 2 is that class.  I know folks have been clammoring for a `String`
class for some time and this actually fills that niche quite well.  A part
of me wonders if we can overload it a little to provide a psuedo locale of
"binary" so that users can, optionally, treat it like a more generalized
String class in specific cases, storing a normal `char*` zend_string under
the hood in that case.  Possibly as a specialzation tree.

/* names as examples only */
interface Stringy { /* define all those APIs */ }
class Text implements Stringy { /* ... */ }
class BinaryString implements Stringy { /* ... */ }

I think you'd get a lot more buy-in from the folks who worry that UTF16 is
overhead they don't want, but who do like the idea of an OOPy string.  It
also provides a migration path to avoid having to rethink byte vs grapheme
conversions up front, instead deferring that part of a migration till later.

Overall, I'm more positive on this than negative, and I eagerly await the
rest of this thread.

-Sara

Reply via email to