Really? I think it's very rare that someone'd want to get at the internals of a Unicode string.

-Andrei


On Jun 22, 2006, at 11:44 PM, Andi Gutmans wrote:

Hmm, I was thinking we might have some binary write function which would do
that automagically.  I think it'd be worth it.

-----Original Message-----
From: Andrei Zmievski [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 22, 2006 11:38 PM
To: Andi Gutmans
Cc: 'Sara Golemon'; '"Ron Korving"'; internals@lists.php.net
Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics

The only way they can get at the internal UTF-16
representation is via unicode_encode($uni, 'UTF-16') which
will return a binary UTF-16 string. In that case, strlen()
will work just as well.

-Andrei


On Jun 22, 2006, at 11:30 PM, Andi Gutmans wrote:

I don't quite agree. I think there's a good chance people
will want to
save Unicode strings in a binary format for performance
reasons. Save
it the way it looks in memory, and put it back... Why
convert to UTF-8
or any other encoding if it's just about storage?

Andi

-----Original Message-----
From: Sara Golemon [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 22, 2006 9:15 PM
To: "Ron Korving"
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics

Still, it's gotta be useful to be know how many bytes it occupies.
Perhaps for Content-length headers or something. There are
plenty of
low level concepts to think of where one might need this.
And even if
you can't think of any reason now, you don't wanna get hit
in the face
by it and have to implement such a function for PHP 6.0.1.

For this type of usage, I'd think it'd be relevant to know
how many
bytes the string will occupy in a given output encoding
moreso that
what it happens to occupy in the underlying
implementation.  In the
example you cited, string contents will more typically be sent as
utf8 rather than the
utf16 of php's internal encoding.

$utf8str = unicode_encode($unistr, 'utf8');

header('Content-type: text/html; encoding="utf8"');
header('Content-length: ' . strlen($utf8str)); echo $utf8str;

I'm not saying it's impossible that a legitimate use will
come up to
know the internal byte-usage of a unicode string, there's
certainly
no harm in adding such a function (apart from the tired shot-foot
argument).  I just doubt you (or
anyone) will come up such a case anytime soon.

-Sara

--
PHP Internals - PHP Runtime Development Mailing List To
unsubscribe,
visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List To
unsubscribe,
visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to