Re: [PHP-DEV] Re: strlen() under unicode.semantics

Ron Korving Fri, 23 Jun 2006 00:18:15 -0700

Maybe it'd be useful if there was a function to "cast" a UTF string into a 
binary string without changing anything on the inside. That way one could do 
strlen(str_to_binary($string)). That would also be useful for binary storing 
and reading (with binary_to_str).


Ron


"Andrei Zmievski" <[EMAIL PROTECTED]> schreef in bericht 
news:[EMAIL PROTECTED]
> The only way they can get at the internal UTF-16 representation is  via 
> unicode_encode($uni, 'UTF-16') which will return a binary UTF-16  string. 
> In that case, strlen() will work just as well.
>
> -Andrei
>
>
> On Jun 22, 2006, at 11:30 PM, Andi Gutmans wrote:
>
>> I don't quite agree. I think there's a good chance people will want  to 
>> save
>> Unicode strings in a binary format for performance reasons. Save it  the 
>> way
>> it looks in memory, and put it back... Why convert to UTF-8 or any  other
>> encoding if it's just about storage?
>>
>> Andi
>>
>>> -----Original Message-----
>>> From: Sara Golemon [mailto:[EMAIL PROTECTED]
>>> Sent: Thursday, June 22, 2006 9:15 PM
>>> To: "Ron Korving"
>>> Cc: internals@lists.php.net
>>> Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics
>>>
>>>> Still, it's gotta be useful to be know how many bytes it occupies.
>>>> Perhaps for Content-length headers or something. There are
>>> plenty of
>>>> low level concepts to think of where one might need this.
>>> And even if
>>>> you can't think of any reason now, you don't wanna get hit
>>> in the face
>>>> by it and have to implement such a function for PHP 6.0.1.
>>>>
>>> For this type of usage, I'd think it'd be relevant to know
>>> how many bytes the string will occupy in a given output
>>> encoding moreso that what it happens to occupy in the
>>> underlying implementation.  In the example you cited, string
>>> contents will more typically be sent as utf8 rather than the
>>> utf16 of php's internal encoding.
>>>
>>> $utf8str = unicode_encode($unistr, 'utf8');
>>>
>>> header('Content-type: text/html; encoding="utf8"');
>>> header('Content-length: ' . strlen($utf8str)); echo $utf8str;
>>>
>>> I'm not saying it's impossible that a legitimate use will
>>> come up to know the internal byte-usage of a unicode string,
>>> there's certainly no harm in adding such a function (apart
>>> from the tired shot-foot argument).  I just doubt you (or
>>> anyone) will come up such a case anytime soon.
>>>
>>> -Sara
>>>
>>> --
>>> PHP Internals - PHP Runtime Development Mailing List To
>>> unsubscribe, visit: http://www.php.net/unsub.php
>>>
>>
>> -- 
>> PHP Internals - PHP Runtime Development Mailing List
>> To unsubscribe, visit: http://www.php.net/unsub.php 

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: strlen() under unicode.semantics

Reply via email to