Re: [PHP-DEV] [RFC] Unicode Text Processing

Tim Düsterhus Fri, 16 Dec 2022 07:48:51 -0800

Hi

On 12/16/22 16:27, Andreas Heigl wrote:

I rather not see this either, because if a 'Text' object may contain
binary data, the type safety is lost and users cannot rely on "'Text'
implies valid UTF-8" (see sibling thread).


Does Text contain valid UTF-8? Or valid Unicode? As IIRC the idea was to
internally use UTF-16 as encoding.

In the end the internal encoding should be irrelevant to the user as
long as we can assert that __toString() returns a Unicode-String in a
valid encoding. And I'm with you that UTF-8 might be the best choice for
that.

The RFC already specifies that the inputs (__construct()) and outputs(__toString()) must/will be UTF-8 strings inhttps://wiki.php.net/rfc/unicode_text_processing#basics.

So for all intents and purposes "'Text' implies valid UTF-8" is whatthis guarantees, because the internal representation will not be visibleto the user.


Best regards
Tim Düsterhus

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] Unicode Text Processing

Reply via email to