There already is such a function:
strlen(unicode_encode($string, "UTF-16BE"));
I think wanting to have access to internal representation of Unicode
strings is an extremely rare operation in any case.
-Andrei
On Jun 23, 2006, at 12:16 AM, Ron Korving wrote:
Maybe it'd be useful if there was a function to "cast" a UTF string
into a
binary string without changing anything on the inside. That way one
could do
strlen(str_to_binary($string)). That would also be useful for
binary storing
and reading (with binary_to_str).
Ron
"Andrei Zmievski" <[EMAIL PROTECTED]> schreef in bericht
news:[EMAIL PROTECTED]
The only way they can get at the internal UTF-16 representation
is via
unicode_encode($uni, 'UTF-16') which will return a binary UTF-16
string.
In that case, strlen() will work just as well.
-Andrei
On Jun 22, 2006, at 11:30 PM, Andi Gutmans wrote:
I don't quite agree. I think there's a good chance people will
want to
save
Unicode strings in a binary format for performance reasons. Save
it the
way
it looks in memory, and put it back... Why convert to UTF-8 or
any other
encoding if it's just about storage?
Andi
-----Original Message-----
From: Sara Golemon [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 22, 2006 9:15 PM
To: "Ron Korving"
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics
Still, it's gotta be useful to be know how many bytes it occupies.
Perhaps for Content-length headers or something. There are
plenty of
low level concepts to think of where one might need this.
And even if
you can't think of any reason now, you don't wanna get hit
in the face
by it and have to implement such a function for PHP 6.0.1.
For this type of usage, I'd think it'd be relevant to know
how many bytes the string will occupy in a given output
encoding moreso that what it happens to occupy in the
underlying implementation. In the example you cited, string
contents will more typically be sent as utf8 rather than the
utf16 of php's internal encoding.
$utf8str = unicode_encode($unistr, 'utf8');
header('Content-type: text/html; encoding="utf8"');
header('Content-length: ' . strlen($utf8str)); echo $utf8str;
I'm not saying it's impossible that a legitimate use will
come up to know the internal byte-usage of a unicode string,
there's certainly no harm in adding such a function (apart
from the tired shot-foot argument). I just doubt you (or
anyone) will come up such a case anytime soon.
-Sara
--
PHP Internals - PHP Runtime Development Mailing List To
unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php