There already is such a function:

strlen(unicode_encode($string, "UTF-16BE"));

I think wanting to have access to internal representation of Unicode strings is an extremely rare operation in any case.

-Andrei


On Jun 23, 2006, at 12:16 AM, Ron Korving wrote:

Maybe it'd be useful if there was a function to "cast" a UTF string into a binary string without changing anything on the inside. That way one could do strlen(str_to_binary($string)). That would also be useful for binary storing
and reading (with binary_to_str).

Ron


"Andrei Zmievski" <[EMAIL PROTECTED]> schreef in bericht
news:[EMAIL PROTECTED]
The only way they can get at the internal UTF-16 representation is via unicode_encode($uni, 'UTF-16') which will return a binary UTF-16 string.
In that case, strlen() will work just as well.

-Andrei


On Jun 22, 2006, at 11:30 PM, Andi Gutmans wrote:

I don't quite agree. I think there's a good chance people will want to
save
Unicode strings in a binary format for performance reasons. Save it the
way
it looks in memory, and put it back... Why convert to UTF-8 or any other
encoding if it's just about storage?

Andi

-----Original Message-----
From: Sara Golemon [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 22, 2006 9:15 PM
To: "Ron Korving"
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics

Still, it's gotta be useful to be know how many bytes it occupies.
Perhaps for Content-length headers or something. There are
plenty of
low level concepts to think of where one might need this.
And even if
you can't think of any reason now, you don't wanna get hit
in the face
by it and have to implement such a function for PHP 6.0.1.

For this type of usage, I'd think it'd be relevant to know
how many bytes the string will occupy in a given output
encoding moreso that what it happens to occupy in the
underlying implementation.  In the example you cited, string
contents will more typically be sent as utf8 rather than the
utf16 of php's internal encoding.

$utf8str = unicode_encode($unistr, 'utf8');

header('Content-type: text/html; encoding="utf8"');
header('Content-length: ' . strlen($utf8str)); echo $utf8str;

I'm not saying it's impossible that a legitimate use will
come up to know the internal byte-usage of a unicode string,
there's certainly no harm in adding such a function (apart
from the tired shot-foot argument).  I just doubt you (or
anyone) will come up such a case anytime soon.

-Sara

--
PHP Internals - PHP Runtime Development Mailing List To
unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to