Re: [PHP-DEV] Re: strlen() under unicode.semantics

Andrei Zmievski Fri, 23 Jun 2006 00:21:36 -0700

Really? I think it's very rare that someone'd want to get at theinternals of a Unicode string.


-Andrei



On Jun 22, 2006, at 11:44 PM, Andi Gutmans wrote:

Hmm, I was thinking we might have some binary write function whichwould do

that automagically.  I think it'd be worth it.

-----Original Message-----
From: Andrei Zmievski [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 22, 2006 11:38 PM
To: Andi Gutmans
Cc: 'Sara Golemon'; '"Ron Korving"'; internals@lists.php.net
Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics

The only way they can get at the internal UTF-16
representation is via unicode_encode($uni, 'UTF-16') which
will return a binary UTF-16 string. In that case, strlen()
will work just as well.

-Andrei

On Jun 22, 2006, at 11:30 PM, Andi Gutmans wrote:

I don't quite agree. I think there's a good chance people

will want to

save Unicode strings in a binary format for performance

reasons. Save

it the way it looks in memory, and put it back... Why

convert to UTF-8

or any other encoding if it's just about storage?

Andi

-----Original Message-----
From: Sara Golemon [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 22, 2006 9:15 PM
To: "Ron Korving"
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics

Still, it's gotta be useful to be know how many bytes it occupies.
Perhaps for Content-length headers or something. There are

plenty of

low level concepts to think of where one might need this.

And even if

you can't think of any reason now, you don't wanna get hit

in the face

by it and have to implement such a function for PHP 6.0.1.

For this type of usage, I'd think it'd be relevant to know

how many

bytes the string will occupy in a given output encoding

moreso that

what it happens to occupy in the underlying

implementation.  In the

example you cited, string contents will more typically be sent as
utf8 rather than the
utf16 of php's internal encoding.

$utf8str = unicode_encode($unistr, 'utf8');

header('Content-type: text/html; encoding="utf8"');
header('Content-length: ' . strlen($utf8str)); echo $utf8str;

I'm not saying it's impossible that a legitimate use will

come up to

know the internal byte-usage of a unicode string, there's

certainly

no harm in adding such a function (apart from the tired shot-foot
argument).  I just doubt you (or
anyone) will come up such a case anytime soon.

-Sara

--
PHP Internals - PHP Runtime Development Mailing List To

unsubscribe,

visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List To

unsubscribe,

visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: strlen() under unicode.semantics

Reply via email to