Re: [PHP-DEV] Re: [RFC] Unicode Text Processing

Derick Rethans Fri, 16 Dec 2022 04:55:52 -0800

On Thu, 15 Dec 2022, Jakub Zelenka wrote:

> On Thu, Dec 15, 2022 at 4:56 PM Christoph M. Becker <cmbecke...@gmx.de>
> wrote:
> 
> > On 15.12.2022 at 16:34, Derick Rethans wrote:
> >
> > > I have just published an initial draft of the "Unicode Text 
> > > Processing" RFC, a proposal to have performant unicode text 
> > > processing always available to PHP users, by introducing a new 
> > > "Text" class.
> > >
> > > You can find it at: 
> > > https://wiki.php.net/rfc/unicode_text_processing
> > >
> > > I'm looking forwards to hearing your opinions, additions, and 
> > > suggestions — the RFC specifically asks for these in places.
> >
> > | As the implementation requires ICU, this would also mean that PHP 
> > | depend on the ICU library.
> >
> > Our current stance is that a minimal PHP should be buildable without 
> > requiring any "non-standard" libraries; this is the reason why we 
> > bundle PCRE.  If we wanted to stick with that policy, we would need 
> > to bundle ICU, what might not be the best idea – it's generally not 
> > great to have bundled libraries which are still maintained outside 
> > of php-src, and especially for such huge libraries.
> >
> >
> I agree with this. Bundling ICU doesn't seem like a good idea. 
> Wouldn't be better to base on something smaller that can be bundled 
> and does the job? For example NJS and QuickJS use their own 
> implementations which seem to be fine. Especially 
> https://github.com/bellard/quickjs/blob/master/libunicode.c seems like 
> something that we could fork and maintain potentially.


I have no intentions of bundling ICU. That'd be a crazy thing to do. 
Instead, the current proposal is to make PHP depend on libicu. I realise 
that this is against our current stance, but considering that 1. most 
(if not all) Linux distributions ignore our bundled libraries any way as 
per their policies; 2. libicu is pretty much available everywhere; and 
3. I am not proposing to require the latest and greatest, I believe we 
can safely rely on it being available.

I'm not opposed to using something else than ICU Most of the other 
unicode related libraries that I had a quick look at, either provide a 
small subset — either just character properties, or graphemes, none of 
them also take care of collation/locales and transliteration. I am also 
weary about some of these library's development and future proofness. 
ICU won't have these problems.

cheers,
Derick

-- 
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/support
Host of PHP Internals News: https://phpinternals.news

mastodon: @derickr@phpc.social @xdebug@phpc.social
twitter: @derickr and @xdebug

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Re: [RFC] Unicode Text Processing

Reply via email to