On 16 August 2024 19:44:22 BST, Mike Schinkel <m...@newclarity.net> wrote:
>Let me see if I understand your argument correctly? You are asserting that
>Unicode is "too complex" to be handled in the standard library so that
>complexity should instead be shouldered individually by each and every PHP
>developer who needs to work with Unicode text in PHP, which is many PHP
>developers if not eventually most. Is that your argument?
Not really, no. I'm definitely in favour of including more Unicode-based string
handling functionality, by improving and extending ext/intl, or coming up with
new convenience wrappers for common tasks.
What I'm always sceptical of is the idea that you could ever consider such
functionality "complete", or that "Unicode support" can ever be a single
deliverable, rather than an ongoing aspiration. (And consequently, I'm
sceptical of any language which says it has achieved that.)
I also think "Unicode support" is probably the wrong angle to approach from; it
leads to features like IntlChar, which technically provides access to tons of
data from the Unicode standard, but practically has no use for 99% of PHP
developers. Instead we should be talking about "internationalisation support",
of which handling different writing systems is one (fairly big) part.
For instance, I would welcome proposals like "here's some functions for
handling locale-specific case folding and normalisation-based matching",
"here's some functions for limiting the storage size of a string without
producing garbage characters", etc. As well as related things which aren't just
about text encoding, like "here's some functions for working with
locale-specific date formatting" (or even just "here's some documentation for
how you're supposed to use ext/intl's date classes").
>> We also have the "mbstring" extension, ...
>
>Interesting historical factoid, but how is that really relevant to including
>Unicode into the standard library?
I was just summarising the current situation, to work out where we could go
next. Any attempt to extend string handling functionality is likely to build on
either ext/intl or ext/mbstring, so it's useful to understand how they differ.
Regards,
Rowan Tommins
[IMSoP]