Hi

On 12/16/22 16:27, Andreas Heigl wrote:
I rather not see this either, because if a 'Text' object may contain
binary data, the type safety is lost and users cannot rely on "'Text'
implies valid UTF-8" (see sibling thread).

Does Text contain valid UTF-8? Or valid Unicode? As IIRC the idea was to
internally use UTF-16 as encoding.

In the end the internal encoding should be irrelevant to the user as
long as we can assert that __toString() returns a Unicode-String in a
valid encoding. And I'm with you that UTF-8 might be the best choice for
that.

The RFC already specifies that the inputs (__construct()) and outputs (__toString()) must/will be UTF-8 strings in https://wiki.php.net/rfc/unicode_text_processing#basics.

So for all intents and purposes "'Text' implies valid UTF-8" is what this guarantees, because the internal representation will not be visible to the user.

Best regards
Tim Düsterhus

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to