Re: [PHP-DEV] [Discussion] Scalar Object Strings and MultibyteEncodings

Rowan Collins Sat, 22 Jun 2019 06:32:17 -0700

On 20/06/2019 23:30, Mark Randall wrote:

There does at least seem to be the starting point in that mb_string isalready widely used, and my suggestion that it "work as expected" ismore that it would work as the equivalent mb_string / iconv functionwould.

I think this is a rather short-sighted way of looking at it. If peoplewant the API provided by the mbstring extension, they can just use thosefunctions; the advantage of designing a new set of functions is surelythat we don't need to stick to past decisions. If we start to build anew standard library, as Zeev suggested in the deprecation thread, it isa once-in-a-lifetime chance to build something better, not just copywhat's gone before.

mb_strlen returns the number of codepoints for example, I'm notimmediately seeing anything about mb_string supporting Graphemes asthe only reference I could find to their manipulation was The intlextension.

The mbstring extension was not built for Unicode, but for older Japanesemulti-byte encodings, where the definition of "character" is much morestraight-forward. Its Unicode support seems to mostly see code points asmappings for characters in some other encoding. (The oldest manual pagefor it on archive.org [1] is from 2001, and includes the quaint remark"As Unicode is getting popular, UTF-8 is used also.") The iconv libraryis even more explicitly aimed at converting between character sets,rather than understanding them (the extra functions such as iconv_strlenare unique to PHP).

Unicode today is much more than a mapping of legacy encodings to auniversal character set, and I can think of no useful purpose indeclaring the "string length" of the British flag emoji to be 2, justbecause it is encoded as the sequence U+1F1EC U+1F1E7.

[1]http://web.archive.org/web/20010605075550/http://www.php.net/manual/en/ref.mbstring.php


Regards,

--
Rowan Collins
[IMSoP]


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] [Discussion] Scalar Object Strings and MultibyteEncodings

Reply via email to