Re: [PHP-DEV] Re: strlen() under unicode.semantics

Andrei Zmievski Fri, 23 Jun 2006 00:21:01 -0700

There already is such a function:

strlen(unicode_encode($string, "UTF-16BE"));

I think wanting to have access to internal representation of Unicodestrings is an extremely rare operation in any case.


-Andrei


On Jun 23, 2006, at 12:16 AM, Ron Korving wrote:

Maybe it'd be useful if there was a function to "cast" a UTF stringinto abinary string without changing anything on the inside. That way onecould dostrlen(str_to_binary($string)). That would also be useful forbinary storing

and reading (with binary_to_str).

Ron


"Andrei Zmievski" <[EMAIL PROTECTED]> schreef in bericht
news:[EMAIL PROTECTED]

The only way they can get at the internal UTF-16 representationis viaunicode_encode($uni, 'UTF-16') which will return a binary UTF-16string.

In that case, strlen() will work just as well.

-Andrei


On Jun 22, 2006, at 11:30 PM, Andi Gutmans wrote:

I don't quite agree. I think there's a good chance people willwant to

save

Unicode strings in a binary format for performance reasons. Saveit the

way

it looks in memory, and put it back... Why convert to UTF-8 orany other

encoding if it's just about storage?

Andi

-----Original Message-----
From: Sara Golemon [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 22, 2006 9:15 PM
To: "Ron Korving"
Cc: [email protected]
Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics

Still, it's gotta be useful to be know how many bytes it occupies.
Perhaps for Content-length headers or something. There are

plenty of

low level concepts to think of where one might need this.

And even if

you can't think of any reason now, you don't wanna get hit

in the face

by it and have to implement such a function for PHP 6.0.1.

For this type of usage, I'd think it'd be relevant to know
how many bytes the string will occupy in a given output
encoding moreso that what it happens to occupy in the
underlying implementation.  In the example you cited, string
contents will more typically be sent as utf8 rather than the
utf16 of php's internal encoding.

$utf8str = unicode_encode($unistr, 'utf8');

header('Content-type: text/html; encoding="utf8"');
header('Content-length: ' . strlen($utf8str)); echo $utf8str;

I'm not saying it's impossible that a legitimate use will
come up to know the internal byte-usage of a unicode string,
there's certainly no harm in adding such a function (apart
from the tired shot-foot argument).  I just doubt you (or
anyone) will come up such a case anytime soon.

-Sara

--
PHP Internals - PHP Runtime Development Mailing List To
unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: strlen() under unicode.semantics

Reply via email to